10,000 Matching Annotations
  1. Oct 2025
    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors investigate the role of HSPA2 during mouse preimplantation development. Knocking down HSPA2 in zygotes, the authors describe lower chances of developing into blastocysts, which show a reduced number of inner cell mass cells. They find that HSPA2 mRNA and protein levels show some heterogeneity among blastomeres at the 4-cell stage and propose that HSPA2 could contribute to skewing their relative contribution to embryonic lineages. To test this, the authors try to reduce HSPA2 expression in one of the 2-cell stage blastomere and propose that it biases their contribution to towards extra-embryonic lineages. To explain this, the authors propose that HSPA2 would interact with CARM1, which controls chromatin accessibility around genes regulating differentiation into embryonic lineage.

      Strengths:

      (1) The study offers simple and straightforward experiments with large sample sizes.

      Thanks for your kind recognition.

      (2) Unlike most studies in the field, this research often relies on both mRNA and protein levels to analyses gene expression and differentiation.

      Thanks for your kind recognition.

      Weaknesses:

      (1) Image and statistical analyses are not well described.

      Thanks for your advisable comment. We redescribe the image and statistical analyses in our revised version (line 255-257).

      (2) The functionality of the overexpression construct is not validated.

      Thanks for your kind suggestion. We validate the functionality of the overexpression construct in our revised version (Figure S3).

      (3) Tracking of KD cells in embryos injected at the 2-cell stage with GFP is unclear.

      Thanks for your kind suggestion. We randomly co-injected green fluorescent protein (Gfp) mRNA as a linage tracer with either Hspa2-siRNA or NC-FAM into one of the 2 -cell, and then monitored embryo development to the blastocyst stage (line 342-344).

      (4) A key rationale of the study relies on measuring small differences in the levels of mRNA and proteins using semi-quantitative methods to compare blastomeres. As such, it is not possible to know whether those subtle differences are biologically meaningful. For example, the lowest HSPA2 level of the embryo with the highest level is much higher than the top cell from the embryo with the lowest level. What does this level mean then? Does this mean that some blastomeres grafted from strong embryos would systematically outcompete all other blastomeres from weaker embryos? That would be very surprising. I think the authors should be more careful and consider the lack of quantitative power of their approach before reaching firm conclusions. Although to be fair, the authors only follow a long trend of studies with the same intrinsic flaw of this approach.

      Thanks for your advisable comment. Indeed, despite the approach drew on previous research (Zhou Cell 2018), we were clearly aware that this approach can only reflect relative comparisons. This means that the relative difference among the blastomeres from the same embryo were detected and compared. We did not compare the absolute levels of mRNA between different embryos. We also offered simple and straightforward experiments with large sample sizes to confirm this conclusion.

      (5) Some of the analyses on immunostaining do not take into account that this technique only allows for semi-quantitative measurements and comparisons.

      a) Some of the microscopy images are shown with an incorrect look-up table.

      b) Some of the schematics are incorrect and misleading.

      Thanks for your advisable comment. We revised microscopy images and schematics in our revised version.

      Reviewer #2 (Public review):

      Summary:

      In this study, Gao et al. use RNA-seq to identify Hspa2 as one of the earliest transcripts heterogeneously distributed between blastomeres. Functional studies are performed using siRNA knockdown showing Hspa2 may bias cells toward the ICM lineage via interaction with the known methyltransferase CARM1.

      Strengths:

      This study tackles an important question regarding the origins of the first cell fate decision in the preimplantation embryo. It provides novelty in its identification of Hspa2 as a heterogeneous transcript in the early embryo and proposes a plausible mechanism showing interactions with Carm1. Multiple approaches are used to validate their functional studies (FISH, WB, development rates, proteomics). Given only 4 other transcripts/RNA have been identified at or before the 4-cell stage (LincGET, CARM1, PRDM14, HMGA1), this would be an important addition to our understanding of how TE vs ICM fate is established.

      Thanks for your kind recognition.

      The RNA-seq results leading the authors to focus on Hspa2 are not included in the manuscript. This dataset would serve as an important resource but is neither included nor discussed. Nor is it mentioned whether Hspa2 was identified in prior RNA-seq embryos studies (for example Deng Science 2014).

      Thanks for your advisable comment. To identify genes that show a significantly high variability across blastomeres in the same embryo, we regressed out the embryo effect by established a new method, which will be published and uploaded to the database in the future. Thus, the RNA-seq results leading the we focus on Hspa2 are not included in the manuscript.   

      In addition, the functional studies are centered on Hspa2 knockdown at the zygote (1-cell) stage, which would largely target maternal transcript. Given the proposed mechanism relies on Hspa2 heterogeneity post-ZGA (late 2-cell stage), the knockdown studies don't necessarily test this and thus don't provide direct support to the authors' conclusions. The relevance of the study would be improved if the authors could show that zygotic knockdown leads to symmetric Hspa2 levels at the late 2-cell and/or 4-cell stage. It may be possible that zygotic knockdown leads to lower global Hspa2 levels, but that asymmetry is still generated at the 4-cell stage.

      Thanks for your advisable comment. We showed that the Hspa2 levels at the late 2-cell and 4cell stage after zygotic knockdown in our revised version (Figure S1 G-H, line 450-452).

      Furthermore, the authors show that Hspa2 knockdown at the 1-cell stage lowers total Carm1 levels at the 4-cell stage. However, it is unclear how total abundance within the embryo alters lineage specification within blastomeres. The authors go on to propose a plausible mechanism involving Hspa2 and Carm1 interaction, but do not discuss how expression levels may be involved.

      Thanks for your advisable comment. Previous research suggests that heterogeneous activity of the methyltransferase CARM1 results in differential methylation of histone H3R26 to modulate establishment of lineage specification (Zernicka-Goetz Cell 2018). Thus, we didn't discuss the total abundance within the embryo alters lineage specification.

      Recommendations for the authors:  

      Reviewer #1 (Recommendations for the authors):

      (1) Major issue with analyses:

      Image analysis needs to be much better explained than simply saying that ImageJ was used. Where are cells measured (at their equatorial plane? What is the size of the ROI?)? Ideally, the ROI and/or raw measurements should be provided.

      Thanks for your advisable comment. We redescribe the Image analysis in our revised version (line 187-194). 

      What are the objective criteria determining whether a cell is counted as GFP positive, CDX2 positive, or OCT4 positive? This is very unclear and key to the interpretation of many experiments.

      Thanks for your advisable comment. We think that the cell containing fluorescence signals above background noise were counted positive.

      Statistical analyses mention ANOVA in the methods but the student's t-test in the figure legend. Which is which? Most data are heavily normalized, which would unlikely fit the description for Student's t-test analyses.

      Thanks for your advisable comment. We redescribe the statistical analyses in our materials and methods (line 253-260).

      Figure 5H describes a relative fluorescence intensity with control at 1. The legend describes a normalization to "DNA" (I guess the authors meant DAPI), which is unlikely to give 1. This suggests that additional normalization was done and is not described. Is that the case? Also, since the authors propose that HSPA2 would control Histone modification and chromatin packing, I do not think that using DAPI is an appropriate way of normalizing the fluorescence signal.

      Thanks for your advisable comment. We replaced DNA with DAPI in our revised version. Based on previous studies, we adopted DAPI as a normalized fluorescence signal (Zhou Cell 2018, Zernicka-Goetz Cell 2018).

      Figure 1E shows data normalized to the lowest level while Figure 1H is normalized to the highest level. A consistent representation would be welcome.

      Thanks for your advisable comment. We revised the Figure 1H in our revised version.

      Is Figure 1C showing a t-test between correlations?

      Yes, Figure 1C shows the t-test between correlation.

      (2) Major issue with the interpretation of semi-quantitative methods and measurements:

      qPCR, WB, immunostaining are all semi-quantitative methods that require some kind of normalization due to non-linear bias in the way the molecules are picked up. Such normalization makes it difficult to know whether a detectable difference is meaningful biologically speaking i.e. if a difference of 1 CT between blastomeres can be detected after qPCR, is it meaningful? If that were the case, then embryos with lower CT than others (Figure 1D) would not be able to develop into blastocyst, like siRNA injected embryos, or grafting a blastomere with a high CT onto an embryo with low CT would lead to the systematic differentiation of these strong blastomeres into ICM.

      Thanks for your advisable comment. The CT values represent the relative mRNA levels of Hspa2 between blastomeres, and the higher CT value represents the lower expression of Hspa2 at mRNA level. Figure 1D shows the Hspa2 mRNA levels between blastomeres. The blastomere with lowlevel expression of the Hspa2 mRNA is not bias an ICM fates.  

      The same goes for fluorescence analyses (Figure 1F). Can the authors also provide the measurements for DAPI as they did for HSPA2? I am sure that with enough measurements, DAPI is variable enough to give a statistical difference among blastomeres with questionable biological meaning.

      I think the reasoning used here (unfortunately following the reasoning that has been used in a series of studies by other groups) of ranking blastomeres after semi-quantitative measurement is fundamentally flawed.

      Thanks for your advisable comment. The DAPI was determined by the maximal area using a custom Python script. Based on previous studies, we adopted DAPI as a normalized fluorescence signal (Zhou Cell 2018). This approach is to normalize embryo-to-embryo variance from the technical reason.

      (3) Major issue with overexpression experiment:

      While the siRNA experiment is partially validated by qPCR and WB measurements of HSPA2 after KD, the overexpression experiment is not. Do the authors have any evidence that the construct they use is produced into protein and functional? Can the authors check by WB? Can the authors rescue the siRNA with their overexpression?

      Thanks for your advisable comment. We verified the overexpression experiment by WB in in our revised version (Figure S3, line 360-361). Considering that siRNA degrades mRNA and prevents the mRNA translation process, we did not co-inject the siRNA with their overexpression.

      The lack of effect of HSPA2 overexpression on blastocyst formation is difficult to reconcile with the interpretation from the authors that levels of HSPA2 bias lineages.

      Have the authors tried lower concentrations? Have the authors tried FISH on their half-injected 2cell embryos? Of course, if the antibody against HSPA2 would work with immunostaining, that would be ideal.

      Thanks for your advisable comment. We chose the concentrations for our study based on previous research (Zernicka-Goetz Cell 2016). To verified Hspa2 was successfully inject into one blastomere at the 2-cell stage, we observed green fluorescence after co-injected GFP mRNA with either siRNA or NC-FAM into one blastomere of the two-cell embryos. Thus, we didn't try FISH on half-injected 2-cell embryos. We tried to perform immunostaining experiments with various HSPA2 antibodies (Proteintech: 12797-1-AP, Abcam: ab108416) and no good results were achieved.

      Author response image 1.

      (4) Major issue with tracking of injected cells:

      It is unclear what counts as a GFP-positive cell. In Figure 3D, most cells appear to have the same level of GFP.

      Thanks for your advisable comment. The cell containing green fluorescence signals above background noise were counted GFP-positive in Figure 3D. Most cells seem to have the same level of GFP because they are daughter cells of the blastomeres injected with GFP.

      In the images of GFP-expressing cells used to track the control of KD cells shown in Figure 3A, it seems that the control embryos have mostly GFP cells in the ICM. Is that the case, or just a bad example?

      Thanks for your advisable comment. The green fluorescent signals in Figure 3A represented OCT4 protein, an ICM marker.

      Can the authors do FISH against HSPA2 and visualize their GFP cells to validate the heterogeneous expression in situ?

      Thanks for your advisable comment. We have verified the heterogeneous expression of HSPA2 in Figure1.

      (5) Issue with fluorescent images:

      Many images are shown with inappropriate look-up tables with saturated DAPI, OCT4, CDX2, and FISH. This raises the doubt that analyses were made on saturated images, which would be incorrect.

      The LUT of Figure 5H should be adjusted similarly between the control and siRNA.

      Thanks for your advisable comment. We revised some images which showed inappropriate lookup tables in our revised version. The LUT of Figure 5H had been adjusted between the control and siRNA. 

      (6) Issue with schematics:

      Schematics of blastomere isolation grown into blastocyst-like structures are misleading since the final blastocyst-like structure should not have a zona pellucida and should have fewer cells than regular blastocysts.

      Thanks for your advisable comment. We revised schematics of blastomere grown into morula in our revised version (Figure 1A and Figure S1A).

      The summary schematics in the final figure should not state HSPA2 -/- since experiments in the study did not use KO but KD.

      Thanks for your advisable comment. We revised the summary schematics in our revised version.

      The blastocysts are the same sizes as the cleavage stage or morula embryos which implies that cells lose volume to the lumen, which is not the case.

      Thanks for your advisable comment. We revised the schematics in our revised version.

      (7) Issue with data presentation:

      In the tables within the figures, the number of decimals given should be the same for the mean and SE (one decimal should be more than enough).

      Thanks for your advisable comment. We revised the figure 2H in our revised version.

      The comparison of cell number and distribution within embryos (e.g. Figure 2B) would be best represented by a correlation analysis of TE vs ICM cells.

      Thanks for your advisable comment. We add the figure of a correlation analysis of TE vs ICM cells in our revised version (Figure 3B).

      The docking simulations are described in the main text as "experiments".

      Thanks for your advisable comment. We redescribed the docking simulations in our revised version.

      (8) Issue with data interpretation:

      The reduced number of ICM cells is interpreted as a slowed-down cell cycle. This could also be explained by failed cytokinesis and the generation of binucleated or polyploid cells. Have the authors checked for that? For example, by looking at their DAPI staining. 

      Thanks for your advisable comment. Our RNA-seq results revealed that the differentially expressed genes (DEGs) at blastocyst stage with HSPA2 knocking down are closely related to negative regulation of cell cycle, G1/S transition of mitotic cell cycle, mitotic cell cycle phase transition and regulation of mitotic cell cycle phase transition. Additionally, the previous study demonstrated that knockdown of HSPA2 reduced cell proliferation and led to G1/S phase cell cycle arrest (Hu Ann Transl Med 2019). Additionally, the lower cell number in ICM may also associated with failed cytokinesis and the generation of binucleated or polyploid cells. Thus, we guessed that HSPA2 has a role in ICM lineage establishment, although half of the ICM cells were able to survive with HSPA2 deficiency (line 463-472).

      It is unclear to me why reduced ICM should lead to fewer blastocysts. Blastocysts should be able to form as long as their TE is fine. In Figure 2G, embryos seem to be cultured in close proximity, which is fine if they are healthy but not if some of the embryos start dying and releasing toxic compounds (e.g. ROS). Have the authors tried removing the dying KD embryos to see if the development of the remaining embryos would improve?

      Thanks for your advisable comment. We think HSPA2 may affect blastocyst development by affecting other signaling pathways. And, the GO enriched terms was closely related to blastocyst development (Figure 2E). There was no significant difference in morula formation rate between Hspa2-KD group and NC group, thus the assumption that the toxic compounds released by some of the embryos that lead to downregulation of blastocyst rate may not be correct. Indeed, the rate of blastocyst formation in Hspa2-KD embryos was reduced significantly lower when few embryos was cultured separately. In addition, we discussed the possibility that the lower cell number in ICM may also associated with failed cytokinesis and the generation of binucleated or polyploid cells.

      Author response image 2.

      Reviewer #2 (Recommendations for the authors):

      One of the significant findings in the paper is the discovery portion where Hspa2 is identified as a heterogeneous transcript. To improve the logic and impact of the manuscript, it may benefit from reorganizing some of the figures and text. For example:

      (1) The paragraph in the introduction (Lines 56-68) should be moved to the discussion as the Hspa2 reveal should be in section 3.1, not prior to the RNA-seq results presented in Figure 1.

      Thanks for your advisable comment. We think it is more logical that HSPA2 needs to be introduced in the introduction.

      (2) Add text at the beginning of Section 3.1 to describe the rationale and results for the RNAseq. It would help the readers if the authors clearly stated why they chose the 4-cell stage.

      Thanks for your advisable comment. We explain why we chose the 4-cell stage in our revised version (line 272-273).

      (3) As this is the first time Hspa2 is identified, consider moving Figure S1C to the main figure to show expression throughout development.

      Thanks for your advisable comment. We moved Figure S1C to the main figure in our revised version (line 286-291).

      (4) Figure 1C: the correlation between Hspa2 and ICM markers would be strengthened if additional transcripts were used (Oct4, Sox2, Sox21). The graph in 1C would also be more informative if represented as a scatter plot with correlation coefficients (Nanog log2TPM vs Hspa2 log2TPM), rather than bar graphs.

      Thanks for your advisable comment. We chose Nanog as the correlation between Hspa2 and Nanog, a ICM markers, was showing the strongest correlation in result. And, the figure 1C shows the stronger positive correlation between Nanog and Hspa2 in gene expression than random gene pairs (n=100, n means the number of random gene pairs). Thus, the figure 1C with bar graphs is easier to understand.

      (5) Figure 1D: how were individual blastomeres grouped into B1-4? Individually run and then pooled based on relative expression?

      Thanks for your advisable comment. Blastomeres are named B1 to B4 according to increasing Hspa2 concentration in figure 1E.

      (6) Figures 1F, 1I, 5H: the DAPI channel appears to be saturated, but is used to normalize fluorescence intensity and may incorrectly account for light scattering within the embryo. Please clarify by adding more details regarding image analysis. Were partial stacks through the nucleus used for analysis, or max projections? Graph axes should be "relative fluorescence intensity."

      Thanks for your advisable comment. We added the details of fluorescence images analysis. The graph axes had revised in our revised version.

      (7) Line 278: the results in Figure S1C would benefit from more text regarding expression patterns throughout development. The maternal transcript appears to have a sharp downregulation by the early 2-cell stage, and is then upregulated coinciding with ZGA.

      Thanks for your advisable comment. We added more describe of the Figure in main text (LINE 285-290).

      (8) For the analyses in Figure 2 I-J and 2K-L, were arrested embryos excluded from analysis? This is an important detail as including arrested embryos would significantly bias the RNA-seq results. 

      Thanks for your advisable comment. The arrested embryos were excluded in Figure 2 I-J and 2K-L.

      (9) Figures 2G-H would be aided by converting the table in 2H to a bar graph and adding development rates for all stages (2-, 4-, 8-, morula, and blast). This would also show when an arrest occurs.

      Thanks for your advisable comment. We converted the table in 2H to a bar graph.

      (10) Blast rates are represented with too many significant digits (Figures 2H, 4B). They should only be reported to the closest ones given the unit of measure (number of blasts divided by number of zygotes). For instance, a blast rate of 81.63 {plus minus} 2.000 reflects excessive precision that is not measured in the data, it should rather read 82 {plus minus} 2%. This is also true for % cells (Figures 3E, 4H).

      Thanks for your advisable comment. Values were rounded down to the one decimal place (rounded down).

      (11) The clarity and impact of Figure 3A and 3D would benefit from 2D slices through the ICM. 

      Thanks for your advisable comment. In order to get more comprehensive understanding of the 3D structure of blastocyst of Figure 3A and 3D, we did not choose 2D slices.

      (12) To improve clarity and logic, separate the 1-cell and 2-cell knockdown experiments in the text and figures:

      a) 1-cell knockdown with RNA-seq results (Fig 2A-F).

      b) 1-cell knockdown showing less ICM/pluripotency markers in (combine Figures 2G-M and Figures 3A-B; "new Fig 3").

      c) 2-cell knockdown tracing lineage (Figures 2D-E; "new Fig 4").

      The new Figures 3 and 4 should mirror one another (i.e. for each knockdown experiment, development rates and cell counts should be included). For the 2-cell knockdown (Figures 2 D-E), what were the developmental rates (8-cell, morula, blast)?

      Thanks for your advisable comment. However, in order to the overall logical of the article, we do not separate the 1-cell and 2-cell knockdown experiments in the text and figures. And, we added the developmental rates (8-cell, morula, blast) of 2-cell knockdown group in our revised version (Figure S2).

      For the overexpression experiment (Figure 4), why were injections performed at the zygote stage versus the 2-cell stage? Given the significant downregulation of maternal transcript demonstrated in Figure S1C, it seems plausible that the injected RNA was also downregulated.

      Thanks for your advisable comment. For the overexpression experiment, we first chose to inject Hspa2 mRNA at the zygote stage and found that the overexpression of Hspa2 does not induce blastomere cells to bias an ICM fate. The qRT-PCR results indicated that the expression level of Hspa2 in overexpression group was significantly increased compared with normal group at 4cell and blastocyst stage (Figure 4C, 4D).  In addition, there is no guarantee that an equal amount of Hspa2 mRNA be injected into each blastomere in 2-cell stage. Thus, we did not microinject Hspa2 mRNA into the 2-cell stage.

      The 3.5 subheading overstates the results as the Hspa2-Carm1 interaction is not linked to lineage segregation. For example, a more specific subtitle might be, "Hspa2 interacts with Carm1 and alters H3R26me2 levels."

      Thanks for your advisable comment. We revised the subtitle in our revised version (line 376).

      Figures 5B-C and 5D-E. The qRT-PCR and WB analysis of knockdown blasts shows a correlation between Hspa2 downregulation and Carm1 downregulation. However, if the proposed mechanism is Hspa2 binding to Carm1 to mediate downstream methylation, why would it be expected to alter transcript levels at the 4-cell or blast stage? Please add further details and discussion in the results and discussion sections.

      Thanks for your advisable comment. The reason we chose to work at the 4-cell stage is because previous studies on CARM1 have focused on the 4-cell stage (Zernicka-Goetz Cell 2018,2016). 

      In the discussion, the statement in Lines 430-431 is an overinterpretation: "the heterogeneity of HSPA2... acts as an upstream factor to drive [the] first cell-fate decision." The knockdown experiments don't alter heterogeneity per se, but total abundance. Furthermore, the results do not show that heterogeneity drives heterogeneity in H3R26me2 patterns, for example.

      Thanks for your advisable comment. We redescribe the relevant statement in the discussion.

      More needs to be said regarding the ICM cells that persisted in the 1-cell KD experiment (Fig 3B). Lines 449-450 point out this result, but do not propose any plausible explanations. For instance, ICM cells may still form due to the incomplete knockdown achieved or the possibility that redundant pathways exist.

      Thanks for your advisable comment. We redescribe the relevant statement in our revised version (line 468-473).

      The 5th paragraph of the discussion seems incomplete. The authors point out a possible link between Hspa2 and Hippo and Wnt signaling pathways, but need to expand their discussion on how this may act as an additional mechanism incorporating Hspa2 with lineage segregation.

      Thanks for your advisable comment. We redescribe the 5th paragraph of the discussion (line 483-494).

      Statistics: all comparisons with greater than 2 groups should be performed with a one-way ANOVA and multiple comparisons, rather than Student's t-test (Figures 1B, 1D, 1E, 1F).

      All figure legends lack statistical test details.

      Thanks for your advisable comment. All figure legends added statistical test details in statistical analysis.

      Minor comments:

      In all graphs, individual blastomere expression levels should be represented as boxwhisker/bar/scatter/violin plots since the comparison is groups rather than time points (i.e. symbols should not be connected with a line in Figures 1B, 1D, 1F-G, 1I, S1D, S1F).

      Thanks for your advisable comment. Each colored line represents a single cell, and the dots of the same color represent the blastomere of the same cell. Thus, we use a line representation individual blastomere.

      For all fluorescent images, having two representative images may be confusing for the reader. Figures may be improved by just including one representative image for each stage/treatment (Figures 1F, 1I, S1F, 3A, 3D, 4E, 4G).

      Thanks for your advisable comment. The figures just including one representative image for each stage in our revised version. In addition, two representative images from each group were shown for each treatment (Figures 3A, 3D, 4E, 4G).

      The manuscript would be improved with thorough grammar and typo editing.

      For example:

      (1) Lines 18, 73, the wording is confusing, consider: "knockdown of Hspa2 in one of the two-cell blastomeres biased its progeny towards the trophectoderm lineage.".

      (2) Line 23, overstatement. Consider: "we demonstrated that HSPA2 levels correlate with ICMassociated genes and that it interacts with the CARM1.".

      (3) Line 25 confusing wording, "via the execution of commitment and differentiation phases.".

      (4) Line 37, replace "that" with "of;" replace "cell-fate decisions" with "cell-fate decision".

      (5) Line 40: needs space before (CARM1).

      (6) Line 43: the wording is confusing, consider "can result in higher expression levels of".

      (7) Line 45: wording, consider "Recent [studies have] further suggested".

      (8) Line 70: plurality, consider "analyzed gene expression pattern".

      (9) Line 73 typo: "prevents its".

      (10) Line 76-77 wording, consider "Hspa2 expression patterns can bias cell fate in the mouse embryo".

      (11) Line 276: remove "in whole embryos," since MII eggs are not embryos.

      (12) Line 617 "There" should be "Three".

      (13) Axis label in Fig 3b "Totle" should be "Total".

      (14) Lines 417, 419 missing spaces.

      (15) Line 448 missing word, "interfering [with] the cell cycle".

      (16) Line 462 incorrect word, "[a]polar cells being specified as ICM".

      (17) Line 469 incorrect plural, "cell differentiation".

      Thanks for your advisable comment. We revised the whole manuscript carefully according to the reviewers' suggestions.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1:

      Summary:

      The manuscript by Zhang et al describes the use of a protein language model (pLM) to analyse disordered regions in proteins, with a focus on those that may be important in biological phase separation. While the paper is relatively easy to read overall, my main comment is that the authors could perhaps make it clearer which observations are new, and which support previous work using related approaches. Further, while the link to phase separation is interesting, it is not completely clear which data supports the statements made, and this could also be made clearer.

      We thank the reviewer for their thoughtful evaluation of our manuscript and for the supportive comments. As outlined in the responses below, we have made substantial revisions to clarify the novel observations presented in our study and to strengthen the connection between sequence conservation and phase separation.

      Comment 1: With respect to putting the work in a better context of what has previously been done before, this is not to say that there is not new information in it, but what the authors do is somewhat closely related to work by others. I think it would be useful to make those links more directly.

      We have addressed the specific comments as outlined below.

      Comment 1a: Alderson et al (reference 71) analysed in detail the conservation of IDRs (via pLDDT, which is itself related to conservation) to show, for example, that conserved residues fold upon binding. This analysis is very similar to the analysis used in the current study (using ESM2 as a different measure of conservation). Thus, the result that "Given that low ESM2 scores generally reflect mutational constraint in folded proteins, the presence of region a among disordered residues suggests that certain disordered amino acids are evolutionarily conserved and likely functionally significant" is in some ways very similar to the results of that (Alderson et al) paper .

      We thank the reviewer for the comment. However, we would like to clarify that our findings show subtle but important differences from those reported by Alderson et al. Specifically, Alderson et al. used AlphaFold2 predictions to identify IDRs that undergo disorder-to-order transitions, which the authors termed as conditionally folded IDRs. These regions could potentially be functionally important, assuming that function of IDRs necessitate folding.

      We argue, however, that, the validity of this structure-function relationship for IDRs remains to be tested. In our opinion, The most direct way to evaluate the functional significance is via evaluating the evolutionary conservation.

      As shown in Author response image 1, the correlation between pLDDT scores and the conservation score, while noticable, is significantly weaker than that between the ESM2 score and the conservation score.

      Author response image 1.

      Comparison of the correlation between AlphaFold2 pLDDT scores and conservation scores with the correlation between ESM2 scores and conservation scores. Calculations were performed using proteins in the MLO-hProt dataset. (A) Correlation between the mean AlphaFold2 pLDDT scores and conservation scores for various amino acids. Pearson correlation coefficients (r) are indicated in the figure legends. The four panels on the right present analogous correlation plots for amino acids grouped by structural order, as defined by their pLDDT scores. (B) Similar as in part A but for ESM2 scores.

      Therefore, we believe that ESM2 score is a better indicator than AlphaFold2 pLDDT score for functional relevance.

      Furthermore, for the human IDRs, we explicitly selected amino acids with pLDDT scores ≤ 70.

      These would be classified as structureless, disordered amino acids, according to the study by Alderson et al. Nevertheless, as shown in Figures 2 and 3 of the main text, our analyses still identifies conserved regions. Therefore, these regions may function via distinct mechanisms than the disorder to order transition.

      We now discuss the novelty of our work in the context of existing studies in the newly added Conclusions and Discussion: Related Work, as quoted below.

      “Numerous studies have sought to identify functionally relevant amino acid groups within IDRs [cite]. For instance, using multiple sequence alignment, several groups have identified evolutionarily conserved residues that contribute to phase separation [cite]. Alderson et al. employed AlphaFold2 to detect disordered regions with a propensity to adopt structured conformations, suggesting potential functional relevance [alderson et al].

      In contrast, our approach based on ESM2 is more direct: it identifies conserved residues without relying on alignment or presupposing that functional significance requires folding into stable 3D structures. Notably, many of the conserved residues identified in our analysis exhibit low pLDDT scores (Figure 2), implying potential functional roles independent of stable conformations.”

      Comment 1b: Dasmeh et al, Lu et al and Ho & Huang analysed conservation in IDRs, including aromatic residues and their role in phase separation.

      We thank the reviewer for bringing these works to our attention! We now explicitly discuss these studies in both the Discussion section as mentioned above and in the Introduction as quoted below.

      “Evolutionary analysis of IDRs is challenging due to difficulties in sequence alignment [cite], though several studies have attempted alignment of disordered proteins with promising results [Dasmeh et al, Lu et al and Ho & Huang].”

      Comment 1c: A number of groups have performed proteomewide saturation scans using pLMs, including variants of the ESM family, including Meier (reference 89, but cited about something else) and Cagiada et al (https://doi.org/10.1101/2024.05.21.595203) that analysed variant effects in IDRs using a pLM. Thus, I think statements such as "their applicability to studying the fitness and evolutionary pressures on IDRs has yet to be established" should possibly be qualified.

      We added a new paragraph in the Introduction to discuss the application of protein language models to IDRs and cited the suggested references.

      “While protein language models have been widely applied to structured proteins [cite], it is important to emphasize that these models themselves are not inherently biased toward folded domains. For example, the Evolutionary Scale Model (ESM2) [cite] is trained as a probabilistic language model on raw protein sequences, without incorporating any structural or functional annotations. Its unsupervised learning paradigm enables ESM2 to capture statistical patterns of residue usage and evolutionary constraints without relying on explicit structural information. Thus, the success of ESM2 in modeling the mutational landscapes of folded proteins [cite] reflects the model’s ability to learn sequence-level constraints imposed by natural selection — a property that is equally applicable to IDRs if those regions are also under functional selection. Indeed, protein language models are increasingly been used to analyze variant effects in IDRs [cite].”

      Comment 2: On page 4, the authors write, "The conserved residues are primarily located in regions associated with phase separation." These results are presented as a central part of the work, but it is not completely clear what the evidence is.

      We thank the reviewer this insightful comment. We realized that our wording is not as precise as we should have been. We meant to state that the regions associated with phase separation are significantly enriched in these conserved residues. This is a significant finding and indicates that phase separation could be a source of evolutionary pressure in dictating IDP sequence conservation. However, we do not intend to suggest that phase separation is the only evolutionary pressure.

      The sentence has been revised to

      “Notably, regions associated with phase separation are significantly enriched in these conserved residues.”

      We further replaced the section title "Conserved, Disordered Residues Localize in Regions Driving Phase Separation" with "Regions Driving Phase Separation Are Enriched with Conserved, Disordered Residues" to further clarify our findings and avoid overinterpretation.

      Finally, we revised the following sentence in the discussion

      “Notably, these conserved, disordered residues are predominantly located in regions actively involved in phase separation, contributing to the formation of membraneless organelles.”

      to

      “Notably, regions actively involved in phase separation are enriched with these conserved, disordered residues, supporting their potential role in the formation of membraneless organelles.”

      The submitted manuscript provides clear evidence supporting the enrichment of conserved residues in MLO-driving IDRs. Specifically, Figures 4A and 4C demonstrate that these IDRs exhibit a substantially higher fraction of conserved residues compared to other IDRs involved in phase separation.

      In this analysis, the nMLO-hIDR group serves as a baseline, representing the distribution of conservation in disordered regions lacking MLO-related functions. In contrast, IDRs from MLOassociated groups show a pronounced lower shift in their median and interquartile ranges, indicating stronger evolutionary constraints. Within the dMLO cohort, the degree of conservation follows a distinct gradient: driving residues exhibit the highest levels of conservation, followed by participant residues, with non-participant residues showing values closer to the nMLO baseline. This pattern reflects the relative functional importance of each group in phase separation, with conservation levels corresponding to their roles in MLO scaffolding.

      To further support this, we computed, for each IDR, the fraction of conserved amino acids. As shown in Figure S11B, for IDRs that actively contribute to phase separation, the fraction is indeed higher than those not involved in phase separation. This analysis is now included in SI.

      During the revision, we explicitly evaluated whether conserved residues are preferentially located in regions associated with phase separation. To this end, for each protein in the MLO-hProt dataset, we calculated the probability p of finding conserved residues within regions contributing to phase separation. These regions include both "driving" and "participating" segments as defined in Figure 4 of the main text.

      Figure S11A presents the distribution of p across all proteins. For comparison, we also include the distribution of 1− p, representing the probability of finding conserved residues in regions not associated with phase separation. On average, p exceeds 0.5, suggesting a tendency for conserved residues to be more frequently located in phase-separating regions. However, the difference between the two distributions is not statistically significant. This result may be due to the generally low density of conserved residues in IDRs, which makes the estimation of p challenging for individual proteins. Additionally, some conserved sites may be involved in functions unrelated to phase separation.

      We added the following text to the Discussion section of the main text.

      “We emphasize that the results presented in Figure 4 do not directly demonstrate that conserved residues are preferentially located in regions associated with phase separation. Although these regions are more enriched in conserved amino acids, their total sequence length can be smaller than that of non-phase-separating regions. As a result, the absolute number of conserved residues may still be higher outside phase-separating regions. To quantitatively assess this, we calculated, for each protein in the MLO-hProt dataset, the probability p of finding conserved residues within regions contributing to phase separation. These regions include both "driving" and "participating" segments, as defined in Figure 4 of the main text. Figure S11 shows the distribution of p across all proteins. For comparison, we also present the distribution of 1− p, which reflects the probability of finding conserved residues in non-phase-separating regions. While the average value of p exceeds 0.5, indicating a trend toward conserved residues being more frequently located in phase-separating regions, the difference between the two distributions is not statistically significant. Future studies with expanded datasets may be necessary to clarify this trend.”

      Comment 3: It would be useful with an assessment of what controls the authors used to assess whether there are folded domains within their set of IDRs.

      We acknowledge that our previous labeling may have caused some confusion. Protein sequences used in Figures 2 and 3 include both folded and disordered domains. Results presented in these figures were constructed using full-length protein sequences to highlight the similarities and differences in ESM2 scores between folded and disordered domains.

      In contrast, the analyses presented in Figures 4 and 5 focus exclusively on IDRs to examine their role in phase separation.

      To prevent further confusion, we have renamed the dataset used in Figures 2 and 3 as MLO-hProt, emphasizing that the analysis pertains to entire protein sequences. The term MLO-hIDR is now reserved for a new dataset that includes only disordered residues, as used in Figures 4 and 5, and corresponding SI Figures.

      For the dMLO-IDR dataset, all except one amino acid (P40967, residue G592) are annotated as disordered in the MobiDB database (https://mobidb.org/). This database characterizes disordered regions based on a combination of predictive algorithms and experimental data. As illustrated in Figure S5A, 25.5% of the proteins in the dataset have direct experimental evidence supporting their disorderedness. These experimental annotations are derived from a diverse range of techniques (Figure S5B). For the remaining proteins, disorder was predicted by one or more computational tools. Although not all tools were applied to every protein, each protein in the dataset was identified as disordered by at least one method.

      For human proteins, IDRs were identified based on AlphaFold2 pLDDT scores, using a threshold of 70. As established in prior studies [1, 2], the pLDDT score provides a quantitative measure of local structural confidence, with lower values indicating greater structural disorder. IDRs associated with conditional folding or disorder-to-order transitions generally exhibit high pLDDT values (e.g., >70).

      Author response image 2 shows a violin plot of AlphaFold2 pLDDT scores for the various MLO-hIDR groups. The consistently low scores support the conclusion that these regions are structurally disordered.

      We also cross-checked the MLO-hIDR regions against the MobiDB database. As shown in Figure S6, approximately 76% of the proteins in the dataset are predicted to contain disordered regions. Among the non-labeled segments with pLDDT scores ≤ 70, the majority are relatively short, with segments of 1–5 residues accounting for approximately 80%.

      Author response image 2.

      AlphaFold pLDDT scores of hIDRs in different MLO-related groups.

      In addition to renaming the dataset, we also revised the manuscript to highlight the validation of disorderedness in section of Results: Regions Driving Phase Separation Are Enriched with Conserved, Disordered Residues.

      “The presence of evolutionarily conserved disordered residues raises the question of their functional significance. To explore this, we identified disordered regions of MLO-hProt using a pLDDT score less than 70 and partitioned these regions into two categories: drivers (dMLO-hIDR), which actively drive phase separation, and clients (cMLO-hIDR), which are present in MLOs under certain conditions but do not promote phase separation themselves [cite]. Additionally, IDRs from human proteins not associated with MLOs, termed nMLO-hIDR, were included as a control. To enhance statistical robustness, we extended our dataset by incorporating driver proteins from additional species [cite], resulting in the expanded dMLO-IDR dataset. Beyond the pLDDT-based classification, the majority of residues in these datasets are also predicted to be disordered by various computational tools and supported by experimental evidence (Figures S5 and S6).”

      Recommendation 1: The authors use the terms "evolutionary fitness of IDRs" (abstract and p. 5, for example), "fitness of amino acids" (p. 4), and "quantify the fitness of particular residues at specific sites" (p. 5). It is not clear what is meant by fitness in this context.

      We thank the reviewer for pointing out the ambiguity in the term fitness. To enhance clarity, we have replaced “fitness" with “mutational tolerance" to more directly emphasize the evolutionary conservation of specific residues.

      Recommendation 2: The authors write (p. 6) "Previous studies have demonstrated a strong correlation between ESM2 scores and changes in free energy related to protein structure stability". While that may be true, it might be worth noting that ESM2 scores report on the effects of mutations and function more broadly than stability because these models have previously been shown to capture conservation effects beyond stability.

      We fully agree with the reviewer’s comment and have revised the main text accordingly. Specifically, the referenced sentence has been revised and relocated, as shown below.

      “Our analysis demonstrated that HP1_α_’s structured domains consistently yield low ESM2 scores, reflecting strong mutational constraints characteristic of folded regions. These constraints are further evident in the local LLR predictions, as shown in Figure 2B, where we illustrate the folded region G120-T130. Given the functional importance of preserving the 3D of structured domains, mutations with greater detrimental effects are likely to disrupt protein folding substantially. This interpretation is consistent with previous studies reporting a significant correlation between ESM2 LLRs and changes in free energy associated with protein structural stability [cite].”

      Recommendation 3: p. 10: The authors write "To exclude sequences that no longer qualify as homologs, we filtered for sequences with at least 20% identity to the reference". How did they decide on 20% and why? And over which residues are these 20% calculated.

      We apologize for the earlier lack of clarity. Sequence alignment was performed using the full-length protein sequences, encompassing both folded and disordered regions. For each sequence, we calculated the percent identity by counting the number of positions, denoted as n, at which the amino acid matches the reference. The percent identity was then computed as n/N, where N represents the total length of the aligned reference sequence. This total includes residues in folded and disordered regions, as well as gap positions introduced during alignment.

      We updated the Methods section of the main text to clarify.

      “We performed multi-sequence alignment (MSA) analysis using HHblits from the HH-suite3 software suite [citations], a widely used open-source toolkit known for its sensitivity in detecting sequence similarities and identifying protein folds. HHblits builds MSAs through iterative database searches, sequentially incorporating matched sequences into the query MSA with each iteration. Sequence alignment was performed using the full-length protein sequences, encompassing both folded and disordered regions.

      ...

      To refine alignment quality by focusing on closely related homologs, we filtered out sequences with ≤ 20% identity to the query, excluding weakly related sequences where only short segments show similarity to the reference. For each sequence, we calculated the percent identity by counting the number of positions, denoted as n, at which the amino acid matches the reference. The percent identity was then computed as n/N, where N represents the total length of the aligned reference sequence. This total includes residues in folded and disordered regions, as well as gap positions introduced during alignment.”

      We selected a 20% sequence identity threshold to balance inclusion of true homologs with exclusion of distant matches that may not share functional relevance. To determine this cutoff, we compared identity thresholds of 0%, 10%, 20%, and 40% and examined the resulting distributions of conservation and ESM2 scores across aligned residues for MLO-hProt dataset (Author response image 3). Thresholds of 10%, 20%, and 40% produced qualitatively similar results, with a consistent correspondence between low ESM2 scores and high conservation. Lower thresholds introduced highly divergent sequences that added noise to the alignment, resulting in reduced overall conservation scores. In contrast, higher thresholds excluded homologs with potentially meaningful conservation, particularly in disordered regions where conservation scores tend to be relatively low.

      Author response image 3.

      Histograms of the ESM2 score and the conservation score, presented in a format consistent with Figure 3B of the main text. The conservation scores were computed using aligned sequences with identity thresholds of ≥0, ≥10%, ≥20%, and ≥40% (left to right). Contour lines represent different levels of −log_P_(CS,ESM2), where P is the joint probability density of conservation score (CS) and ESM2 score. Contours are spaced at 0.5-unit intervals, highlighting regions of distinct density.

      Recommendation 4: In their description of "motif" searching algorithm (p. 20) I think that the search algorithm would give a different result whether the search is performed N->C or C->N (because the first residue (i) needs to have a score <0.5 but the last (j) could have a score >0.5 as long as the average is below 0.5. Is that correct? And if so, why did they choose an asymmetric algorithm? .

      We thank the reviewer for highlighting the asymmetry in our motif-search algorithm.

      To investigate this issue, we repeated the algorithm starting from the C-terminus and compared the resulting motifs with those obtained from the N-terminal scan. We found that the two sets of motifs overlap entirely: each motif identified from the C-terminal direction has a corresponding counterpart from the N-terminal scan. However, the motifs are not identical. The directionality of the search introduces additional amino acids—referred to here as peripheral residues—at the motif boundaries, which differ between the two sets.

      As shown in Author response image 4, the number of peripheral residues is small relative to the total motif length.

      To eliminate asymmetry and ambiguity, we have revised our method to perform bidirectional scans—from both the N- and C-termini—and define each motif as the overlapping region identified by both directions. This approach emphasizes the conserved core and avoids the inclusion of spurious terminal residues. The updated procedure is described in Methods: Motif Identification.

      “To identify motifs within a given IDR, we implemented the following iterative procedure. Starting from either the N– or C–terminus of the sequence, we first locate the initial residue i whose ESM2 score falls within 0.5. From i, residues are sequentially appended…”

      Author response image 4.

      Number of peripheral residues and their relative length to the full-motif length identified from both sides. (A). The unique motifs identified from N-to-C terminal direction. (B) The unique motifs identified from C-to-N terminal direction.

      “…in the direction toward the opposite terminus until the segment’s average ESM2 score exceeds 0.5; the first residue to breach this threshold is denoted j. The segment (i,i+1,..., j−1) is then recorded as a candidate motif. This process repeats starting from j until the end of the IDR is reached.

      We perform this full procedure independently from both termini and designate the final motif as the intersection of the two candidate-motif sets. This bidirectional overlap strategy excludes terminal residues that might transiently satisfy the average-score criterion only due to adjacent low-scoring regions, thereby isolating the conserved core of each motif. All other residues—those not included in either directional pass—are classified as non-motif regions, minimizing peripheral artifacts.”

      Accordingly, we have updated the Supplementary material: ESM2_motif_with_exp_ref.csv for the new identified motifs commonly exited from both N-terminal and C-terminal searches. Minor changes were observed in the set of motifs as being discussed, but these do not affect the main conclusions. Figures 5C, 5D, and S6 have been revised accordingly.

      Reviewer #2:

      Summary:

      Unfortunately, I do not believe that the results can be trusted. ESM2 has not been validated for IDRs through experiments. The authors themselves point out its little use in that context. In this study, they do not provide any further rationale for why this situation might have changed. Furthermore, they mention that experimental perturbations of the predicted motifs in in vivo studies may further elucidate their functional importance, but none of that is done here. That some of the motifs have been previously validated does not give any credibility to the use of ESM2 here, given that such systems were probably seen during the training of the model.

      We thank the reviewer for their detailed and thoughtful critique of our manuscript. We recognize the importance of careful model validation, especially in the context of IDRs, and appreciate the opportunity to clarify the scope and rationale of our study. Below, we respond point-by-point to the main concerns.

      (1) The use of ESM2 is not validated for IDRs, and the authors provide no rationale for its applicability in this context.

      We thank the reviewer for raising this important point.

      First, we emphasize that ESM2 is a probabilistic language model trained entirely on amino acid sequences, without any structural supervision. The model does not receive any input about protein structure — folded or disordered — during training. Instead, it learns to estimate the likelihood of each amino acid at a given position, conditioned on the surrounding sequence context. This makes ESM2 agnostic to whether a sequence is folded or disordered; the model’s capacity to identify patterns of residue usage arises solely from the statistics of natural sequences.

      As such, ESM2 is not inherently biased toward folded proteins, even though previous studies have demonstrated its usefulness in identifying conserved and functionally constrained residues in structured domains [3–9]. These findings support the broader utility of language models for uncovering evolutionary constraints — and by extension, suggest that similar signatures could exist in IDRs, particularly if they are under functional selection.

      Indeed, if certain residues or motifs in IDRs are conserved due to their importance in biological processes (e.g., phase separation), we would expect such selection to be reflected in sequence-based features, which ESM2 is designed to detect. The model’s applicability to IDRs, then, is a natural extension of its core probabilistic architecture.

      To further evaluate this, we carried out an independent in silico validation using multiple sequence alignments (MSAs). This analysis allowed us to compute the evolutionary conservation of individual amino acids without any reliance on ESM2. We then compared these conservation scores to ESM2 scores and found a strong correlation between the two. This provides direct, quantitative support for the idea that ESM2 is capturing biologically meaningful sequence constraints — even in disordered regions.

      While we agree that experimental testing would ultimately provide the most compelling validation, we believe that our MSA-based comparison constitutes a strong and arguably ideal computational validation of the model’s predictions. It offers an orthogonal measure of evolutionary pressure that confirms the biological plausibility of ESM2 scores.

      We added the following text in the introduction to highlight the applicability of ESM2 to IDRs.

      “While protein language models have been widely applied to structured proteins, it is important to emphasize that these models themselves are not inherently biased toward folded domains. For example, the Evolutionary Scale Model (ESM2) [cite] is trained as a probabilistic language model on raw protein sequences, without incorporating any structural or functional annotations. It operates by estimating the likelihood of observing a given amino acid at a particular position, conditioned on the entire surrounding sequence context. This unsupervised learning paradigm enables ESM2 to capture statistical patterns of residue usage and evolutionary constraints without relying on explicit structural information. Thus, the success of ESM2 in modeling fitness landscapes of folded proteins reflects the model’s ability to learn sequence-level constraints imposed by natural selection — a property that is equally applicable to IDRs if those regions are also under functional selection. Indeed, protein language models are increasingly been used to analyze variant effects in IDRs [cite].”

      (2) There is no experimental validation of the ESM2-based predictions in this study.

      We agree that experimental validation would provide definitive support for the utility of ESM2 in IDRs, and we explicitly state this as a limitation in the revised manuscript as quoted below.

      “Limitations: Despite the promising findings, our study has several limitations. Most notably, our analysis is purely computational, relying on ESM2-derived predictions and sequence-based conservation without accompanying experimental validation. While the strong correlation between ESM2 scores and evolutionary conservation provides compelling evidence that the identified motifs are functionally constrained, the precise biological roles of these motifs remain uncharacterized. ESM2 is well-suited for highlighting regions under selective pressure, but it does not provide mechanistic insights into how conserved motifs contribute to specific molecular functions such as phase separation, molecular recognition, or dynamic regulation. Determining these roles will require targeted experimental investigations, including mutagenesis and biophysical characterization.”

      In addition, we revised the manuscript title from “Protein Language Model Identifies Disordered, Conserved Motifs Driving Phase Separation" to “Protein Language Model Identifies Disordered, Conserved Motifs Implicated in Phase Separation". This revision softens the original claim to better reflect the absence of direct experimental evidence for the motifs’ role in phase separation.

      However, we also emphasize that the goal of our study is not to claim definitive predictive power, but rather to explore whether ESM2-derived mutational profiles align with known biological features of IDRs — and in doing so, to generate new, testable hypotheses.

      In addition, while no in vivo experiments were performed, our study does include an in silico validation step, as detailed in the response to the previous comment. The strong correlation between ESM2 scores and conservation scores provides direct support for the utility of ESM2 in identifying residues under evolutionary constraint in disordered regions.

      (3) The overlap between predicted motifs and known ones may be due totraining data leakage.

      We respectfully clarify that training data leakage is not possible in this case, as ESM2 is trained using unsupervised learning on raw protein sequences alone. The model has no access to experimental annotations, functional labels, or knowledge of which motifs are involved in phase separation. It only models statistical sequence patterns derived from evolutionarily observed proteins.

      Therefore, any agreement between ESM2-derived predictions and previously validated motifs arises not from memorization of experimental data, but from the model’s ability to learn meaningful sequence constraints from the natural distribution of proteins.

      (4) The authors should revamp the study with a testable predictive framework.

      We respectfully suggest that a full revamp is not necessary or appropriate in this context.

      As outlined in our previous responses, we believe that certain misunderstandings about the nature and capabilities of ESM2 may have influenced the reviewer’s assessment.

      Importantly, both Reviewer 1 and Reviewer 3 express strong support for the significance and novelty of this work, and recommend publication following minor revisions.

      In this context, we believe the manuscript provides a useful contribution as a first step toward understanding disordered regions using language models, and that it has value even in the absence of direct experimental testing. We have now better positioned the manuscript in this light, clarified limitations, and suggested concrete next steps for follow-up research.

      We hope these clarifications and revisions address the reviewer’s concerns, and we thank them again for helping us strengthen the framing, rigor, and clarity of our study.

      Reviewer #3:

      Summary:

      This is a very nice and interesting paper to read about motif conservation in protein sequences and mainly in IDRs regions using the ESM2 language model. The topic of the paper is timely, with strong biological significance. The paper can be of great interest to the scientific community in the field of protein phase transitions and future applications using the ESM models. The ability of ESM2 to identify conserved motifs is crucial for disease prediction, as these regions may serve as potential drug targets. Therefore, I find these findings highly significant, and the authors strongly support them throughout the paper. The work motivates the scientific community towards further motif exploration related to diseases.

      Strengths:

      (1) Revealing conserved regions in IDRs by the ESM-2 language model.

      (2) Identification of functionally significant residues within protein sequences, especially in IDRs.

      (3) Findings supported by useful analyses.

      We appreciate the reviewer’s thoughtful words and support for our work.

      Weaknesses:

      (1) Lack of examples demonstrating the potential biological functions of these conserved regions.

      As detailed in the Response to Recommendation 6, we conducted additional analyses to connect the identified conserved regions with their biological functions.

      (2) Very limited discussion of potential future work and of limitations.

      We have substantially revised the Conclusions and Discussion section to provide a detailed analysis of the study’s limitations and to propose several directions for future research, as elaborated in our Response to Recommendation 5 below.

      Recommendation 1: The authors describe the ESM2 score such that lower scores are associated with conserved residues, stating that "lower scores indicate higher mutational constraint and reduced flexibility, implying that these residues are more likely essential for protein function, as they exhibit fewer permissible mutational states." However, when examining intrinsically disordered regions (IDRs), which are known to drive phase separation, I observe that the ESM2 score is relatively high (Figure 3C, pLDDT < 50, and Supplementary Figure S2). Could the authors clarify how this relatively high score aligns with the conservation of motifs that drive phase separation?

      We thank the reviewer for this insightful comment. We would like to clarify that most amino acids in the IDRs are not conserved, even for IDRs that contribute to phase separation. Only a small set of amino acids in these IDRs, which we term as motifs, are evolutionarily conserved with low ESM2 scores. Therefore, the ESM2 scores exhibit bimodal distribution at high and low values, as shown in Figures 4A and 4C of the manuscript. When averaged over all the amino acids, the mean ESM2 scores, plotted in Figure 3C, are relatively high due to dominant population of non-conserved amino acids.

      Recommendation 2: The authors mention: "We first analyzed the relationship between ESM2 and pLDDT scores for human Heterochromatin Protein 1 (HP1, residues 1-191)". I appreciate this example as a demonstration of amino acid conservation in IDRs. However, it is questionable whether the authors could provide some more examples to support amino acid conservation particularly within the IDRs along with lower ESM2 score (e.g, Could the authors provide some additional examples of "conserved disordered" regions in various proteins which are associated with relatively low ESM2 score as appear in Figure 2A).

      We thank the reviewer for this valuable suggestion. We want to kindly noted that the conserved residues on IDRs are prevalent as indicated in Figures 2D and 3B. To further illustrate the prevalence of “conserved disordered” regions, we generated ESM2 versus pLDDT score plots for the full dMLO–hProt dataset (82 proteins) in Figure S2. In these plots, residues with pLDDT ≤ 70 are highlighted in blue to denote structural disorder (dMLO-hIDR), and these disordered residues with ESM2 score ≤ 1.5 are shown in purple to indicate conserved disordered segments.

      Recommendation 3: Could the authors plot a Violin conservation score plot for Figure 4A to emphasise the relationship between ESM2 scores and conservation scores of disordered residues?

      We thank the reviewer for this suggestion. We included a violin plot illustrating the distribution of conservation scores for disordered residues across all four IDR groups, shown in Author response image 5. Consistent with the findings in Figure 4A, the phase separation drivers (dMLO-hIDR and dMLOIDR) exhibit a higher proportion of conserved amino acids compared to the client group (cMLOhIDR).

      We also note that the nMLO-hIDR group may contain conserved residues due to functions unrelated to MLO formation, which could contribute to the higher observed levels of conservation in this group.

      Author response image 5.

      Violin plots illustrating the distribution of conservation scores for disordered residues across the nMLO–hIDR, cMLO–hIDR, dMLO–hIDR, and dMLO–IDR datasets. Pairwise statistical comparisons were conducted using two-sided Mann–Whitney U tests on the conservation score distributions (null hypothesis: the two groups have equal medians). P-values indicate the probability of observing the observed rank differences under the null hypothesis. Statistical significance is denoted as follows: ***: p < 0.001; **: p < 0.01; *:p < 0.05;

      Recommendation 4: It will be appreciated if the authors could add to Figure 4 Violin plots, a statistical comparison between the different groups.

      We thank the reviewer for this valuable suggestion. We included the p-values for Figures 4A and 4C to quantify the statistical significance of differences in the distributions.

      Most comparisons are highly significant (p < 0.001), while the largest p-value (p = 0.089) between the conservation score of driving and non-participating groups (Figure 4C) still suggests a marginally significant trend.

      Recommendation 5: Could the authors expand more on potential future research directions using ESM2, given its usefulness in identifying conserved motifs? Specifically, how do the authors envision conserved motifs will contribute to future discoveries/applications/models using ESM (e.g, discuss the importance of conserved motifs, especially in IDRs motifs, in protein phase transition prediction in relation to diseases).

      We thank the reviewer for this insightful comment. To further assess the functional relevance of the conserved motifs, we incorporated pathogenic variant data from ClinVar [10, 11] to evaluate mutational impacts. As shown in Figure S12A and B, a substantial number of pathogenic variants in MLO-hProt proteins are associated with low ESM2 LLR values. This pattern holds for both folded and disordered residues.

      Moreover, we observed that variants located within motifs are more frequently pathogenic compared to those outside motifs (Figure S12C). In the main text, motifs were defined only for driver proteins; however, the available variant data for this subset are limited (6 data points). To improve statistical power, we extended motif identification to include both client and driver human proteins, following the same methodology described in the main text. Consistent with previous findings, variants within motifs in this expanded set are also more likely to be pathogenic. These results further support the functional importance of both low ESM2-scoring residues and the conserved motifs in which they reside.

      The following text was added in the Discussion section of the manuscript to discuss these results and outline future research directions.

      “Several promising directions could extend this work, both to refine our mechanistic understanding and to explore clinical relevance. One avenue is testing the hypothesis that conserved motifs in scaffold proteins act as functional stickers, mediating strong intermolecular interactions. This could be evaluated computationally via free energy calculations or experimentally via interaction assays. Deletion of such motifs in client proteins may also reduce their partitioning into condensates, illuminating their roles in molecular recruitment.

      To explore potential clinical implications, we analyzed pathogenicity data from Clin-Var [10, 11]. As shown in Figure S12A, single-point mutations with low LLR values—indicative of constrained residues—are enriched among clinically reported pathogenic variants, while benign variants typically exhibit higher LLR values. Moreover, mutations within conserved motifs are significantly more likely to be pathogenic than those in non-motif regions (Figure S12B). These findings highlight the potential of ESM2 as a first-pass screening tool for identifying clinically relevant residues and suggest that the conserved motifs described here may serve as priorities for future studies, both mechanistic and therapeutic.”

      Moreover, the functional significance of conserved motifs, particularly their implications in disease and pathology, warrants further investigation. As an initial analysis, we incorporated ClinVar pathogenic variant data [citation] to assess mutational effects within our datasets. As illustrated in Figure R12A, single-point mutations with low LLR values are enriched among clinically reported pathogenic variants, whereas benign variants are more commonly associated with higher LLR values. Notably, mutations within conserved motifs are substantially more likely to be pathogenic compared to those in non-motif regions. These findings highlight the potential of ESM2 as a firstpass tool for identifying residues of clinical relevance. The conserved motifs identified here may be prioritized in future studies aimed at elucidating their biological roles and evaluating their viability as therapeutic targets.

      Recommendation 6: The authors mention: "Our findings provide strong evidence for evolutionary pressures acting on specific IDRs to preserve their roles in scaffolding phase separation mechanisms, emphasizing the functional importance of entire motifs rather than individual residues in MLO formation." They also present a word cloud of functional motifs in Figure 5D. Although it makes sense that evolutionarily conserved motifs, especially within the IDRs regions, act as functional units, I think there is no direct evidence for such functionality (e.g., examples of biological pathways associated with IDRs and phase separation). Hence, there is no justification to write in the figure caption: "ESM2 Identifies Functional Motifs in driving IDRs" unless the authors provide some examples of such functionality. This will even make the paper stronger by establishing a clear connection to biological pathways, and hence these motifs can serve as potential drug targets.

      We thank the reviewer for this insightful suggestion. We have replaced “functional motifs" with “conserved motifs" in the figure caption.

      Identifying the precise biological pathways associated with the conserved motifs is a complex task and a comprehensive investigation lies beyond the scope of this study. Nonetheless, as an initial effort, we explored the potential functions of these motifs using annotations available in DisProt (https://disprot.org/).

      DisProt is the leading manually curated database dedicated to IDPs, providing both structural and functional annotations. Expert curators compile experimentally validated data, including definitions of disordered regions, associated functional terms, and supporting literature references. Author response image 6 presents a representative DisProt entry for DNA topoisomerase 1 (UniProt ID: P11387), illustrating its structural and biological annotation.

      For each motif, we located the corresponding DisProt entry and assigned a functional annotation based on the annotated IDR from which the motif originates. We emphasize that this functional assignment should be regarded as an approximation. Because experimental annotations often pertain to the entire IDR, regions outside the motif may also contribute to the reported function.

      Nevertheless, the annotations provide valuable insights.

      Author response image 6.

      Screenshot of information provided by the DisProt database. Detailed annotations of biological functions and structural features, along with experimental references, are accessible via mouse click.

      Approximately 50% of ESM2-predicted IDR motifs lack functional annotations. Among those that are annotated, motifs from the dMLO-IDR dataset are predominantly associated with “molecular condensate scaffold activity,” followed by various biomolecular binding functions (Author response image 7A). These findings support the role of these motifs in MLO formation.

      For comparison, we applied the same identification procedure (described in Methods: Motif Identification) to motifs from the nMLO-hIDR dataset. In contrast to the dMLO-IDR motifs, these exhibit a broader range of annotated functions related to diverse cellular processes. Collectively, these results suggest that motifs identified by ESM2 are aligned with biologically relevant functions captured in current databases.

      Finally, as illustrated in Figure S12 and discussed in the Response to Recommendation 5, variants occurring within identified motifs are more likely to be pathogenic than those in non-motif regions, further underscoring their functional importance.

      Author response image 7.

      Biological functions of ESM2-predicted motifs. (A) Distribution of biological functions associated with all identified motifs from dMLO-IDR driving groups. (B) Distribution of biological functions associated with all identified motifs from nMLO-hIDR groups.

      Recommendation 7: In Figure 2C the authors present FE (I assume this is free energy), some discussion about the difference in the free energy referring to the "a" region is missing (i.e. both "Folded" and "Disordered" regions are associated with low ESM score but with low and high free energy (FE), respectively.

      We thank the reviewer for the comments. FE indeed abbreviates free energy. To improve clarify and avoid confusion, we have updated all figure captions by replacing “FE” with “−logP” to explicitly denote the logarithm of probability in the contour density plots.

      We used “a" in Figures 2C and 2D to refer to regions with low ESM2 scores, which appears a local minimum in both plots. Since most residues in folded regions are conserved, region a has lower free energy than region b in Figure 2C. On the other hand, as most residues in disordered regions are not conserved, as we elaborated in Response to Recommendation 1, region a has lower population and higher free energy than region b.

      To avoid confusion, we have replaced “a" and “b" in Figure 2D with “I" and “II".

      Recommendation 8: Figure S2: It would be useful to plot the same figure for structured and disordered regions as well.

      We are not certain we fully understood this comment, as we believe the requested analysis has already been addressed. In Figure S2, we used the AlphaFold2 pLDDT score to represent the structural continuum of different protein regions, where residues with pLDDT > 70 (red and lightred bars) are classified as structured, while those with pLDDT ≤ 70 (blue and light-blue bars) are classified as disordered.

      Minor suggestion 1: Could the authors clarify the meaning of the abbreviation "FE" in the colorbar of the contour line? I assume this is free energy.

      We have updated all contour density plot figure captions by replacing “FE” with “−logP” to explicitly denote the logarithm of probability.

      Minor suggestion 2: In Figure 2A - do the authors mean "Conserved folded" instead of just "Folded"? If so, could the authors indicate this?

      We thank the reviewer for this comment. The ESM2 scores indeed suggest that, within folded regions, there may be multiple distinct groups exhibiting varying degrees of evolutionary conservation. However, as our primary focus is on IDRs, we chose not to investigate these distinctions further.

      Figure 2A illustrates a randomly selected folded region based on AlphaFold2 pLDDT scores.

      References

      (1) Ruff, K. M.; Pappu, R. V. AlphaFold and Implications for Intrinsically Disordered Proteins. Journal of Molecular Biology 2021, 433, 167208.

      (2) Alderson, T. R.; Pritišanac, I.; Kolaric, Ð.; Moses, A. M.; Forman-Kay, J. D. Systematic´ Identification of Conditionally Folded Intrinsically Disordered Regions by AlphaFold2. Proceedings of the National Academy of Sciences of the United States of America, 120, e2304302120.

      (3) Brandes, N.; Goldman, G.; Wang, C. H.; Ye, C. J.; Ntranos, V. Genome-Wide Prediction of Disease Variant Effects with a Deep Protein Language Model. Nature Genetics 2023, 55, 1512–1522.

      (4) Lin, Z. et al. Evolutionary-Scale Prediction of Atomic-Level Protein Structure with a Language Model. 2023.

      (5) Zeng, W.; Dou, Y.; Pan, L.; Xu, L.; Peng, S. Improving Prediction Performance of General Protein Language Model by Domain-Adaptive Pretraining on DNA-binding Protein. Nature Communications 2024, 15, 7838.

      (6) Gong, J. et al. THPLM: A Sequence-Based Deep Learning Framework for Protein Stability Changes Prediction upon Point Variations Using Pretrained Protein Language Model. Bioinformatics 2023, 39, btad646.

      (7) Lin, W.; Wells, J.; Wang, Z.; Orengo, C.; Martin, A. C. R. Enhancing Missense Variant Pathogenicity Prediction with Protein Language Models Using VariPred. Scientific Reports 2024, 14, 8136.

      (8) Saadat, A.; Fellay, J. Fine-Tuning the ESM2 Protein Language Model to Understand the Functional Impact of Missense Variants. Computational and Structural Biotechnology Journal 2025, 27, 2199–2207.

      (9) Chu, S. K. S.; Narang, K.; Siegel, J. B. Protein Stability Prediction by Fine-Tuning a Protein Language Model on a Mega-Scale Dataset. PLOS Computational Biology 2024, 20, e1012248.

      (10) Landrum, M. J.; Lee, J. M.; Riley, G. R.; Jang, W.; Rubinstein, W. S.; Church, D. M.; Maglott, D. R. ClinVar: Public Archive of Relationships among Sequence Variation and Human Phenotype. Nucleic Acids Research 2014, 42, D980–D985.

      (11) Landrum, M. J. et al. ClinVar: Improving Access to Variant Interpretations and Supporting Evidence. Nucleic Acids Research 2018, 46, D1062–D1067.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer 1:

      (1) A major issue throughout the paper is that Hox expression analysis is done exclusively through quantitative PCR, with values ranging from 2-fold to several thousand-fold upregulation, with no antibody validation for any Hox protein (presumably they are all upregulated).

      Thank you for your comment.

      We tried to verify the stimulated Hox expression pattern by in situ hybridization. Although in early embryos (E9.5) we could detect clearly hox (i.e. Hox8 and Hox9 in Author response image 1) expression patterns in the neural tube by whole mount in situ hybridization, we failed to detect a clear pattern in the brain stem at E18.5 either in whole mount tissue or on sections. That’s one reason that we turned to single nuclear RNA-seq instead.

      This is likely due to their low expression levels at late developmental stages and need to be detected by more sensitive method. However, we estimated that the stimulated expression levels of the representative Hox genes are at least comparable to the physiological levels at posterior spinal cord to evoke a functional effect.

      Author response image 1.

      Some Hox8 and Hox9 expression pattern in E9.5 embryos.

      (2) In Figure 1, massive upregulation of most Hox genes in the brainstem is shown after e16.5 but the paper quickly focuses on analysis of PN nuclei. What are the other consequences of this broad upregulation of Hox genes in the brainstem? There is no discussion of the overall phenotype of the mice, the structure of the brainstem, the migration of neurons, etc. The very narrow focus on motor cortex projections to PN nuclei seems bizarre without broad characterization of the mice, and the brainstem in particular. There is only a mention of "severe motor deficits" from previous studies, but given the broad expression of Rnf220, the fact that is a global knockout, and the effects on spinal cord populations shown previously the justification for focusing on PN nuclei does not seem strong.

      Thank you for your comment.

      Although RNF220 is important for the dorsal-ventral patterning of the spinal cord as well as the hindbrain during embryonic development, the earlier neural patterning and differentiation are normal in the Rnf220+/- mice (Wang et al., 2022). However, these mice showed reduced survival and motility to various degree postnatally (Ma et al., 2019; Ma et al., 2021), likely suggesting a dosage dependent role of RNF220 in maintaining late neural development. As our microarray assay showed the deregulation of the Hox genes in the brain, we followed this direction in this study and narrowed down the affected region to the pons. Our single nuclear RNA-Seq (snRNA-seq) data further shows that the Hox de-regulation mainly occurred in 3 clusters of neurons. However, the pons is complex and contains tens of nuclei. And the current resolution of our data does not support to assign a clear identity to each of them. Although it is clear that more nuclei are likely affected, the PN (cluster7) is the only cluster we can identify to follow in the current study. 

      As to general effect of RNF220 haploinsufficiency on the brainstem, we carried out Nissl staining assays and found no clear difference in neuronal cell organization between WT and Rnf220+/- pons (revised Figure 2-figure supplement 2).

      (3) It is stated that cluster 7 in scRNA-seq corresponds to the PN nuclei. The modest effect shown on Hox3-5 expression in that data in Figure 1 is inconsistent with the larger effect shown in Figure 2.

      Thank you for your comment.

      Due to the low efficiency of snRNA-seq and the depth of the sequencing, the quantification of the Hox expression based on the snRNA-seq data is likely less accurate as the qRT-PCR. In addition, only mRNAs in the nuclear could be captured by snRNA-seq, while mRNAs in both the nuclear and cytoplasm were reversed-transcribed and examined for qRT-PCR assays in Figure 2A.

      (4) Presumably, Hox genes are not the only targets of Rnf220 as shown in the microarray/RNA-sequencing data. There is no definitive evidence that any phenotypes observed (which are also not clear) are specifically due to Hox upregulation. The only assay the authors use to look at a Hox-dependent phenotype in the brainstem is the targeting of PN nuclei by motor cortex axons. This is only done in 2 animals and there are no details as to how the data was analyzed and quantified. The only 2 images shown are not convincing of a strong phenotype, they could be taken at slightly different levels or angles. At the very least, serial sections should be shown and the experiment repeated in more animals. There is also no discussion of how these phenotypes, if real, would relate to previous work by the Rijli group which showed very precise mechanisms of synaptic specificity in this system.

      Thank you for your comments and suggestions.

      The deregulation of Hox is the most obvious phenomena observed from the RNA-seq data, and we tried to assign its specific phenotypic effect in this study. As the roles of Hox in PN patterning and circuit formation is well established, we focused on the PN in the following study. Based on literature, we carried out the circuit analysis to examine the targeting of PN neurons by the motor cortex axons. A cohort of additional animals with different genotypes (n=10 for WT and n=9 for Rnf220+/-) were used to repeat the experiment and we got the same conclusion. More detailed information on data analysis and serial images were included in the revised manuscript and figure legends.

      (5) The temporal aspect of this regulation in vivo is not clear. The authors show some expression changes begin at e16.5 but are also present at 2 months. Is the presumed effect on neural circuits a result of developmental upregulation at late embryonic stages or does the continuous overexpression in adult mice have additional influence? Are any of the Hox genes upregulated normally expressed in the brainstem, or PN specifically, at 2 months? Why perform single-cell sequencing experiments at 2 months if this is thought to be mostly a developmental effect? Similarly, the significance of the upregulated WRD5 in the pons and pontine nuclei at 2 months in Figure 3 is not clear.

      Thank you for your comment.

      The spatial and temporal expression pattern of Hox genes is established at early embryonic stages and then maintained throughout developmental stage in mammals. As we have shown, the de-repression of Hox genes is a long-lasting defect in Rnf220+/- mice beginning at late embryonic stages. Since the neuronal circuit is established after birth in mice, we speculated that the neuronal circuit defects from motor cortex to PN neurons were due to the long-lasting up-regulation of Hox genes in PN neurons. We could not distinguish the effect on neural circuit a result of Hox genes developmental upregulation or continuous overexpression in adult mice. An inducible knockout mouse model may help to answer this question in the future. The discussion on this point was included in the revised manuscript.

      We carried out snRNA-seq analysis using pons tissues from adult mice aiming to identify the specific cell population with Hox up-regulation, which we failed to specify by in situ hybridization.

      We repeated the related experiments in the original Figure 3 and some of the blot images were replaced and quantified.

      (6) In Figure 3C, the levels of RNF220 in wt and het don't seem to be that different.

      We repeated the experiments and changed the related image in the revised Figure 3C.

      (7) Based on the single-cell experiments, and the PN nuclei focus, the rescue experiments are confusing. If the Rnf220 deletion has a sustained effect for up to 2 months, why do the injections in utero? If the focus is the PN nuclei why look at Hox9 expression and not Hox3-5 which are the only Hox genes upregulated in PN based on sc-sequencing? No rescue of behavior or any phenotype other than Hox expression by qPCR is shown and it is unclear whether upregulation of Hox9 paralogs leads to any defects in the first place. The switch to the Nes-cre driver is not explained. Also, it seems that wdr5 mRNA levels are not so relevant and protein levels should be shown instead (same for rescue experiments in P19 cells).

      Thank you for your comments.

      Since our data suggest that the upregulation of Hox genes expression is a long-lasting effect beginning at the late embryonic stage of E16.5, we conducted the rescue experiments by in utero injection of WDR5 inhibitor at E15.5 and examined the expression of Hox genes at E18.5. Although it is also necessary to examine whether the rescue effect by WDR5 inhibitor injection is also a long-lasting effect at adult stages, it is difficult to distinguish the embryos or pups when they were given birth. As a supplement, rescue assays with genetic ablation of Wdr5 gene were conducted and the results showed that genetic ablation of a single copy of Wdr5 allele could revere the upregulation of Hox genes by RNF220 haploinsufficiency in the hindbrains at P15.

      Most of the upregulated Hox genes including both Hox9 and Hox3-5 were examined in our rescue experiments. Since this study focuses on the PN nuclei, the results of Hox3-5 genes were shown in the revised main Figure 6.

      We conducted rescue experiments by deleting Wdr5 in neural tissue using Nestin-Cr_e mice because _Wdr5+/- mice is embryonic lethal. And the up-regulation of Hox genes could be also observed in the hindbrains of Rnf220fl/wt; Nestin-Cre mice. Although Rnf220fl/wt; Wdr5fl/wt; Nestin-Cre mice are viable and could survive to adult stages, developmental defects in the forebrains, including cerebral cortex and hippocampus, were observed in Rnf220fl/wt;Wdr5fl/wt;Nestin-Cre mice. Therefore, no rescue of behavior tests was conducted in this study. We believe that it is out of the scope of this study to discuss the role of WDR5 in the development of forebrains.

      The potential defects due to the up-regulation of Hox9 paralogs awaits further investigations.

      Wdr5 mRNA levels were firstly examined to confirm the genetic deletion or siRNA mediated knockdown of Wdr5 genes. We have carried out western blot to examine the WDR5 protein levels and the results were included in the revised Figure 3.

      (8) What is the relationship between Retinoic acid and WRD5? In Figure 3E there is no change in WRD5 levels without RA treatment in Rnf KO but an increase in expression with RA treatment and Rnf KO. However, the levels of WRD5 do not seem to change with RA treatment alone. Does Rnf220 only mediate WDR5 degradation in the presence of RA? This does not seem to be the case in experiments in 293 cells in Figure 4.

      Thank you for your comment.

      We believe that the regulation of WDR5 and Hox expression by RNF220 is context dependent and precisely controlled in vivo, depending on the molecular and epigenetic status of the cell, which is fulfilled by RA treatment in P19 cells. In Figure 4, the experiment is based on exogenous overexpression assays, which might not fully reflect the situation in vivo.

      (9) Why are the levels of Hox upregulation after RA treatment so different in Figure 5 and Figure Supplement 5?

      In Figure.5C, the Hox expression levels were normalized against the control group in the presence of RA; while in Figure Supplement 5 they were normalized to the control group without RA treatment.

      (10) In Figures 4B+C which lanes are input and which are IP? There is no quantitation of Figure 4D, from the blot it does look that there is a reduction in the last 2 columns as well. The band in the WT flag lane seems to have a bubble. Need to quantitate band intensities. Same for E, the effect does not seem to be completely reversed with MG132.

      Thanks for pointing this out. The labels were included in the revised Figure 4B and 4C.

      We repeated the experiments for Figure 4D and 4E. Some of bot images were replaced and quantified in the revised Figure 4D and 4E.

      Reviewer 2:

      (1) Figure 1E shows that Rnf220 knockdown alone could not induce an increase in Hox expression without RA, which indicates that Rnf220 might endogenously upregulate Retinoic acid signaling. The authors should test if RA signaling is downstream of Rnf220 by looking at differences in the expression of Retinaldehyde dehydrogenase genes (as a proxy for RA synthesis) upon Rnf220 knockdown.

      Thank you for your comment and suggestion.

      Two sequential reactions are required for RA synthesis from retinol, which catalyzed by alcohol dehydrogenases (ADHs)/ retinol dehydrogenase (RDH) and retinaldehyde dehydrogenase (RALDHs also known as ALDHs) respectively. When RA is no longer needed, it is catabolized by cytochrome enzymes (CYP26 enzymes) (Niederreither, et al.,2008; Kedishvili et al., 2016). Here, we test ADHs、ALDHs and CYP26 enzymes in E16.5 WT and Rnf220-/- embryos.

      The results are as follows. ADH7 and ADH10 are slightly upregulated. ALDH1 and ALDH3 are upregulated and downregulated in Rnf220-/- embryos, respectively, but there is no significant change in the expression of ALDH2, which plays a key role in RA synthesis during embryonic development (Niederreither, et al.,2008). Furthermore, Cyp26a1 which responsible for RA catabolism was upregulated in Rnf220-/- embryos. Collectively, these data do not support a clear effect on RA signaling by RNF220.  

      Author response image 2.

      The effect of Rnf220 on RA synthesis and degradation pathways

      (2) In Figure 2C-D further explanation is required to describe what criteria were used to segment the tissue into Rostral, middle, and caudal regions. Additionally, it is unclear whether the observed change in axonal projection pattern is caused due to physical deformation and rearrangement of the entire Pons tissue or due to disruption of Hox3-5 expression levels. Labeling of the tissue with DAPI or brightfield image to show the structural differences and similarities between the brain regions of WT and Rnf220 +/- will be helpful.

      Thank you for your comment and suggestion.

      More information on the quantification of the results shown in Figure 2C-D was included in our revised manuscript. We carried out Nissl staining assays using coronal sections of the brainstem and found that there is no significant difference in neuronal cell organization between WT and Rnf220+/- (revised Figure 2-figure supplement 2).

      (3) Line 192-195. These roles of PcG and trxG complexes are inconsistent with their initial descriptions in the text - lines 73-74.

      We are sorry for the mistake. We carefully revised the related descriptions to avoid such mistake. Thank you.

      (4) In Figure 4D, the band in the gel seems unclear and erased. Please provide a different one. These data show that neither Rnf220 nor wdr5 directly regulates Hox gene expressions. The effect of double knockdown in the presence of RA suggests that they work together to suppress Hox gene expression via a different downstream target. This point should be addressed in the text and discussion section of the paper. example for the same data which shows a full band with lower intensity.

      Thank you for your suggestion.

      We repeated the experiment of Figure 4D and some of the blot images were replaced in the revised Figure 4D.

      Indeed, in the presence of RA, knockdown of Rnf220 alone can upregulate the expression Hox genes (Figure 5C). Knockdown of Wdr5 could reverse the upregulation of Hox genes in RNF220 knockdown cells, suggesting that Rnf220 regulated Hox gene expression in a Wdr5 dependent manner. However, in the absence of RA, none of Rnf220 knockdown, Wdr5 knockdown or Rnf220 and Wdr5 double knockdown had a significant effect on the expression of Hox genes in P19 cells. It seems that RA signaling plays a crucial role for the regulation of RNF220 to WDR5 in P19 cells and discussion on this point was included in the revised manuscript.

      (5) In Figure 4G the authors could provide some form of quantitation for changes in ubiquitination levels to make it easier for the reader. They should also describe the experimental procedures and conditions used for each of the pull-down and ubiquitination assays in greater detail in the methods section.

      Thank you for your suggestion.

      The quantitation and statistics for the original Figure 4G were included in the revised Figure 4. More information on the biochemical assays was included in the “Methods and Materials” section of our revised manuscript.

      (6) Figure 5 shows that neither Rnf220 nor wdr5 directly regulate Hox gene expressions. The effect of double knockdown in the presence of RA suggests that they work together to suppress Hox gene expression via a different downstream target.

      Thank you for your comment.

      In fact, knockdown of Rnf220 alone can upregulate the expression Hox genes in the presence of RA (Figure 5C). Furthermore, knockdown of Wdr5 could reverse the upregulation of Hox genes in Rnf220 knockdown cells, which suggest that Rnf220 regulated Hox gene expression in a Wdr5 dependent manner. However, in the absence of RA, none of Rnf220 knockdown, Wdr5 knockdown or Rnf220 and Wdr5 double knockdown had a significant effect on the expression of Hox genes in P19 cells. It seems that RA signaling plays a crucial role for the regulation of RNF220 to WDR5 in P19 cells and discussion on this point was included in the revised manuscript.

      (7) In Figure 6, while the reversal of changes in Hox gene expression upon concurrent Rnf220; Wdr5 inhibition highlights the importance of Wdr5 in this regulatory process, the mechanistic role of wdr5 and its functional consequences are unclear. To answer these questions, the authors need to: (i) Assay for activated and repressive epigenetic modifications upon double knockdown of Rnf220 and Wdr5 similar to that shown in Figure 3- supplement 1. This will reveal if wdr5 functions according to its intended role as part of the TrxG complex. (ii) The authors need to assay for changes in axon projection patterns in the double knockdown condition to see if Wdr5 inhibition rescues the neural circuit defects in Rnf220 +/- mice.<br />

      Thank you for your suggestion.

      Although it is also necessary to examine whether the rescue effect by WDR5 inhibitor injection in uetro is also a long-lasting effect for neuronal cirtuit at adult stages, it is difficult to distinguish the embryos or pups when they were given birth. Although Rnf220fl/wt;Wdr5fl/wt;Nestin-Cre mice are viable and could survive to adult stages, developmental defects in the forebrains, including cerebral cortex and hippocampus, were observed in Rnf220fl/wt;Wdr5fl/wt;Nestin-Cre mice. Therefore, no rescue effect on defects of behavior and neuronal circuit were examined in this study. Maybe, a PN nuclei specific inducible Cre mouse line could help toward this direction in the future.

      We carried out ChIP-qPCR and tested activated and repressive epigenetic modifications upon double knockdown of Rnf220 and Wdr5 in P19 cell line and found Rnf220 and Wdr5 double knockdown recured Hox epigenetic modification to a certain degree (Figure 6-figure supplement 1).

      References

      Kedishvili, N.Y. 2016. Retinoic acid synthesis and degradation. Subcell Biochem, 81:127-161. DOI: 10.1007/978-94-024-0945-1_5, PMID: 2783050

      Ma, P., Li, Y., Wang, H., Mao, B., Luo, Z.-G. 2021. Haploinsufficiency of the TDP43 ubiquitin E3 ligase RNF220 leads to ALS-like motor neuron defects in the mouse. Journal of Molecular Cell Biology, 13: 374-382. DOI: 10.1093/jmcb/mjaa072, PMID: 33386850

      Ma, P., Song, N.-N., Li, Y., Zhang, Q., Zhang, L., Zhang, L., Kong, Q., Ma, L., Yang, X., Ren, B., Li, C., Zhao, X., Li, Y., Xu, Y., Gao, X., Ding, Y.-Q., Mao, B. 2019. Fine-Tuning of Shh/Gli Signaling Gradient by Non-proteolytic Ubiquitination during Neural Patterning. Cell Rep, 28: 541-553.e544. DOI: 10.1016/j.celrep.2019.06.017, PMID: 31291587

      Niederreither, K., Dollé, P. 2008. Retinoic acid in development: towards an integrated view. Nat Rev Genet, 9: 541-53. DOI: 10.1038/nrg2340, PMID: 18542081

      Wang, Y.-B., Song, N.-N., Zhang, L., Ma, P., Chen, J.-Y., Huang, Y., Hu, L., Mao, B., Ding, Y.-Q. 2022. Rnf220 is Implicated in the Dorsoventral Patterning of the Hindbrain Neural Tube in Mice. Front Cell Dev Biol, 10. DOI: 10.3389/fcell.2022.831365, PMID: 35399523

    1. Author Response

      The following is the authors’ response to the original reviews.

      Thank you for the thoughtful consideration of our work, including both reviewers’ constructive comments. Our apologies for taking some extra time for this revision, but we wanted to adress comments thoroughly with new analyses, not to mention a PhD defense, parental leave and my teaching ultimately being the bottleneck for the team’s work!

      Reviewer #1 (Public Review):

      The authors use a combination of structural and MD simulation approaches to characterize phospholipid interactions with the pentameric ligand-gated ion channel, GLIC. By analyzing the MD simulation data using clusters of closed and open states derived previously, the authors also seek to compare lipid interactions between putative functional states. The ultimate goal of this work is to understand how lipids shape the structure and function of this channel.

      The strengths of this article include the following:

      1) The MD simulation data provide extensive sampling of lipid interactions in GLIC, and these interactions were characterized in putative closed and open states of the channel. The extensive sampling permits confident delineation of 5-6 phospholipid interaction sites per subunit. The agreement in phospholipid binding poses between structures and the all-atom MD simulations supports the utility of MD simulations to examine lipid interactions.

      2) The study presents phospholipid binding sites/poses that agree with functionally-important lipid binding sites in other pLGICs, supporting the notion that these sites are conserved. For example, the authors identify interactions of POPC at an outer leaflet intersubunit site that is specific for the open state. This result is quite interesting as phospholipids or drugs that positively modulate other pLGICs are known to occupy this site. Also, the effect of mutating W217 in the inner leaflet intersubunit site suggests that this residue, which is highly conserved in pLGICs, is an important determinant of the strength of phospholipid interactions at this site. This residue has been shown to interact with phospholipids in other pLGICs and forms the binding site of potentiating neurosteroids in the GABA(A) receptor.

      Weaknesses of this article include the following:

      1) The authors describe in detail state-dependent lipid interactions from the MD simulations; however, the functional significance of these findings is unclear. GLIC function appears to be insensitive to lipids, although this understanding is based on experiments where GLIC proteoliposomes were fused to oocyte membranes, which may not be optimal to control the lipid environment. Without functional studies of GLIC in model membranes, the lipid dependence of GLIC function is not definitively known. Therefore, it is difficult to interpret the meaning of these state-dependent lipid interactions in GLIC.

      2) It is unlikely that the bound phospholipids in the GLIC structures, which are co-purified from e. coli membranes, are POPC. Rather, these are most like PE or PG lipids. While it is difficult to accommodate mixed phospholipid membranes in all-atom MD simulations, the choice of POPC for this model, while practically convenient, seems suboptimal, especially since it is not known if PE or PG lipids modulate GLIC function. Nevertheless, it is striking that the overall binding poses of POPC from the simulations agree with those identified in the structures. It is possible that the identity of the phospholipid headgroup will have more of an impact on the strength of interactions with GLIC rather than the interaction poses (see next point).

      3) The all-atom MD simulations provide limited insight into the strength of the POPC interactions at each site, which is important to interpret the significance of these interactions. It is unlikely that the system has equilibrated within the 1.7 microseconds of simulation for each replicate preventing a meaningful assessment of the lipid interaction times. Although the authors report exchange of up to 4 POPC interacting at certain residues in M4, this may not represent binding/unbinding events (depending on how binding/interaction is defined), since the 4 Å cutoff distance for lipid interactions is relatively small. This may instead be a result of small movements of POPC in and out of this cutoff. The ability to assess interaction times may have been strengthened if the authors performed a single extended replicate up to, for example, 10-20 microseconds instead of extending multiple replicates to 1.7 microseconds.

      Reviewer #2 (Public Review):

      The authors convincingly show multiple inner and outer leaflet non-protein (lipid) densities in a cryo-EM closed state structure of GLIC, a prokaryotic homologue of canonical pentameric ligand-gated ion channels, and observe lipids in similar sites during extensive simulations at both resting and activating pH. The simulations not only corroborate structural observations, but also suggest the existence of a state-dependent lipid intersubunit site only occupied in the open state. These important findings will be of considerable interest to the ion channel community and provide new hypotheses about lipid interactions in conjunction with channel gating.

      Recommendations for the authors: please note that you control which, if any, revisions, to undertake

      In particular, a discussion of whether the timescale of the simulations permit measurements of residence or interaction times of the lipids should be addressed.

      Reviewer #1 (Recommendations for the authors):

      Comment 1.1: The authors may consider expanding the discussion about the significance of state-dependent lipid interactions. On the one hand, they emphasize state-dependent interactions of POPC with closed and open states in the outer leaflet in the results. On the other hand, they state that GLIC is insensitive to its lipid environment. What is the significance of the state-dependent interactions of POPC in GLIC, if any? It is possible that GLIC agonist responses are sensitive to phospholipids (such as PE or PG found in e. coli)? The state-dependent differences in lipid interaction identified in this study support this possibility and suggest the need to better understand the effects of phospholipids on GLIC function.

      Response 1.1: We agree with the reviewer that this is an interesting question and we have therefore extended the discussion with additional references on the functional effects on GLIC of various lipid membranes:

      p. 11 (Discussion)

      “Sampling was further simplified by performing simulations in a uniform POPC membrane. Prior experiments have been conducted to assess the sensitivity of GLIC in varying lipid environments (Labriola et al., 2013; Carswell et al., 2015; Menny et al., 2017), indicating that GLIC remains fully functional in pure POPC bilayers. In our cryo-EM experiments, the protein was recombinantly expressed from E. coli, which means that the experimental density would likely represent phosphatidylglycerol or phosphatidylethanolamine lipids. However, as the molecular identities of bound lipids could not be precisely determined, POPC lipids were built for straightforward comparison with simulation poses. While it appears that GLIC is capable of gating in a pure POPC bilayer, it remains plausible that its function could be influenced by different lipid species, especially due to the presence of multiple charged residues around the TMD/ECD interface which might interact differently with different lipid head groups. Further experiments would be needed to confirm whether the state dependence observed in simulations is also lipid-dependent. It is possible that certain types of lipids bind in one but not the other state, or that certain states are stabilized by a particular lipid type.”

      Comment 1.2: It would be helpful to state in the discussion that the co-purified lipids from GLIC structures are likely PE or PG from e. coli membranes. Nevertheless, it is interesting that the phospholipid poses from the structures generally agree with those identified from the MD simulations using PC.

      Response 1.2: Good point. We have clarified in the discussion that the native lipids in the cryo-EM structure are likely PG or PE lipids, as quoted in the preceding Response.

      Comment 1.3: The authors describe a more deeply penetrating interaction of POPC in the outer intrasubunit cleft in the open state, but this is difficult to appreciate from the images in Fig. 4B, 4E or S3B. The same is true of the deep POPC interaction at the outer intersubunit site. It may be helpful to show these densities from a different perspective to appreciate the depth of these binding poses.

      Response 1.3: We have added Figure 4 – figure supplement 1 to better show the depth of lipid binding poses, especially the ones in the outer leaflet intrasubunit cleft and at the inner intersubunit site, and cited the figure on p. 7 (Results).

      Comment 1.4: The representation of the lipid densities in Fig. 4B is not easy to interpret. First, the meaning of resting versus activating conditions and closed versus open states can be easily missed for readers who are not familiar with the author's previous study. It may be helpful to describe this (i.e. how open and closed state clusters were generated from structures determined in resting and activating conditions) in greater detail in either the figure legend, results or methods. Second, the authors state that there are differences in lipid poses between the closed and open states but not resting and activating conditions. With the exception of the intersubunit density, this is difficult to appreciate from Fig. 4B. As stated in point #3, the difference, for example, in the complementary intrasubunit site may be better appreciated with an image from a different perspective.

      Response 1.4: Acknowledged - the distinction between resting and activating conditions v.s. open and closed states can be confusing. We have tried to clarify these differences at the beginning of the results section, the methods section, and in the caption of Figure 4. Regarding differences in lipid poses between open and closed states, we agree it is difficult to appreciate from Figure 4, but here we refer the reader to Figure 4 – figure supplement 2 for an overlay between open and closed densities. Additionally, we now added Figure 1 – figure supplement 1 which provides lipid densities for all five subunits and overlays with the build cryo-EM lipids, possibly making differences easier to appreciate. Regarding images from different perspectives, we trust the new figure supplement described in Response 1.3 provides a better perspective.

      p. 3 (Results)

      “For computational quantification of lipid interactions and binding sites, we used molecular simulations of GLIC conducted under either resting or activating conditions (Bergh et al., 2021a). As described in Methods, resting conditions corresponded to neutral pH with most acidic residues deprotonated; activating conditions corresponded to acidic pH with several acidic residues protonated. Both open and closed conformations were present in both conditions, albeit with different probabilities.”

      p. 8 (Figure 4)

      “Overlaid densities for each state represent simulations conducted under resting (dark shades) or activating (light shades) conditions, which were largely superimposable within each state.”

      p. 24 (Methods)

      “We analyzed previously published MSMs of GLIC gating under both resting and activating conditions (Bergh et al., 2021a). Resting conditions corresponded to pH 7, at which GLIC is nonconductive in functional experiments, with all acidic residues modeled as deprotonated. Activating conditions corresponded to pH 4.6, at which GLIC is conductive and has been crystallized in an open state (Bocquet et al., 2009). These conditions were modeled by protonating a group of acidic residues (E26, E35, E67, E75, E82, D86, D88, E177, E243; H277 doubly protonated) as previously described (Nury et al., 2011).”

      Comment 1.5: The new closed GLIC structure was obtained by merging multiple datasets. What were the conditions of the datasets used? Was it taken from samples in resting or also activating conditions?

      Response 1.5: We have updated the Results, Discussion, and Methods to clarify this important point, in particular by merging datasets and rerunning the classification:

      p. 3 (Results)

      “In our cryo-EM work, a new GLIC reconstruction was generated by merging previously reported datasets collected at pH 7, 5, and 3 (Rovšnik et al., 2021). The predominant class from the merged data corresponded to an apparently closed channel at an overall resolution of 2.9 Å, the highest resolution yet reported for GLIC in this state (Figure 1 – figure supplement 2, Table 1).”

      p. 11 (Discussion)

      “Interestingly, the occupational densities varied remarkably little between resting and activating conditions (Figure 1 – figure supplement 1), indicating state- rather than pH- dependence in lipid interactions, also further justifying the approach of merging closed- state GLIC cryo-EM datasets collected at different pH conditions to resolve lipids.”

      p. 14 (Methods)

      “After overnight thrombin digestion, GLIC was isolated from its fusion partner by size exclusion in buffer B at pH 7, or in buffer B with citrate at pH 5 or 3 substituted for Tris. The purified protein was concentrated to 3–5 mg/mL by centrifugation. [...] Data from three different grids, at pH 7, 5, and 3, were merged and processed together.”

      Comment 1.6: In Fig. 3D, do the spheres represent the double bond? If so, please state in the legend

      Response 1.6: We have clarified in the legend of Figure 3D that the yellow spheres on the lipid tails represent a double bond.

      Comment 1.7: In Fig. 3E, what is the scale of the color representation?

      Response 1.7: We have clarified in the legend of Figure 3E that colors span 0 (white) to 137015 contacts (dark red).

      Reviewer #2 (Recommendations For The Authors):

      Comment 2.1: I'm not sure I fully understand how the final lipids were modeled (built). Fig. 1 caption suggests they may have been manually built? I understand that the idea was to place them in the overlap of simulation densities and structure densities, but can the authors please clarify if there were any quantifiable conditions that were employed during this process or if this was entirely manual placement in a pose that looked good? Regardless, it would be helpful to see an overlay of the built lipids with both the cryo and simulation densities (e.g., overly of Fig. 1F/H and G/H) to better visualize how the final built lipids compare.

      Response 2.1: We thank the reviewer for pointing out unclarities regarding our methods. We have extended the methods section to clarify how the lipids were manually built in the cryo-EM structure. We have also added Figure 1 – figure supplement 1 showing overlays of the computational densities and built cryo-EM lipids.

      p. 15 (Methods)

      “Lipids were manually built in COOT by importing a canonical SMILES format of POPC (Kim et al., 2021) and adjusting it individually into the cryo-EM density in each of the sites associated with a single subunit, based in part on visual inspection of lipid densities from simulations, as described above. After building, 5-fold symmetry was applied to generate lipids at the same sites in the remaining four subunits.”

      Comment 2.2: Regarding the state-dependent lipid entry to the outer leaflet intersubunit site associated with channel opening, if the authors could include a movie depicting this process that would be great. The current short explanation does not do this justice. Also, what were the dynamics of this process? Beyond the correlation between site occupancy and the pore being open, how did the timing of lipid entry/exit and pore opening/closing correlate?

      Response 2.2: The point regarding the timing of state-dependent lipid binding at the subunit interface and pore opening is indeed an interesting one. We have added Figure 4 – figure supplement 3D showing that the state-dependent P250 lipid interaction precedes pore opening, as quantified by pore hydration levels, indicating a potential role in gating. The interaction between lipid binding and conformational change of the protein is also depicted in the newly added Figure 4 - video supplement 1, which we hope will be able to better communicate the conclusions regarding state-dependent interactions. We have also expanded the results and discussion to better explain these results:

      p. 9 (Results)

      “The lipid head made particularly close contacts with residue P250 on the M2-M3 loop, which undergoes substantial conformational change away from the pore upon channel opening, along with outer-leaflet regions of M1–M3 (Figure 4E, Figure 4—figure Supplement 3A,B,C, Figure 4—video 1). These conformational changes were accompanied by a flip of M1 residue F195, which blocked the site in the closed state but rotated inward to allow closer lipid interactions in the open state (Figure 4—figure Supplement 3C, Figure 4—video 1). Indeed, P250 was predominantly located within 3 Å of the nearest lipid atom in open- but not closed-state frames (Figure 4F). Despite being restricted to the open state, interactions with P250 were among the longest duration in all simulations (Figure 2C) and as these binding events preceded pore opening, it is plausible to infer a role for this state-dependent lipid interaction in the gating process (Figure 4 – figure supplement 3D).”

      p. 12 (Discussion)

      “The state-dependent binding event at this site preceded pore opening in MSMs, where lipid binding coincided with crossing a smaller energy barrier between closed and intermediate states, followed by pore opening at the main energy barrier between intermediate and open states (Figure 4 – figure supplement 3D). Further, since the P250- lipid interaction was characterized by relatively long residence times (Figure 2), it is possible this lipid interaction has a role to play in GLIC gating.”

      Comment 2.3: Although the interaction times are helpful, I didn't get a great sense of how mobile the lipids are during the simulations. Can the authors discuss this a bit more. For example, are interaction times dominated by lipids that jiggle a bit away from a residue and then back again, vs how often are lipids exchanging with other lipids initially further away from the protein?

      Response 2.3: We have now added various measures of lipid diffusion, both for initially interacting lipids and for bulk lipids, which are summarized in the new Figure 2 – figure supplement 1. We have further addressed the question of simulation timescales in Results, Discussion, and Methods. These numbers highlight that it is possible for lipids several nanometers away from the protein surface to exchange with lipids of the first lipid shell.

      p. 3,6 (Results)

      “Lateral lipid diffusion coefficients were estimated to 1.47 nm2/µs for bulk lipids and 0.68 nm2/µs for lipids of the first lipid shell (Figure 2 – figure supplement 1A), which is relatively slow compared to the timescales of each trajectory (1.7 µs). However, multiple residues throughout the M1, M3, and M4 helices exchanged contacts with 2-4 different lipid molecules in individual simulations (Figure 2C). Furthermore, 1.7-µs root mean square displacement of lipids originally in the first lipid shell was 2.15 nm, and 3.16 nm in the bulk bilayer, indicating such exchanges are not limited to nearby lipids (Figure 2 – figure supplement 1B). Thus, exchange events and diffusion estimates indicate that the duration of lipid contacts observed in this work can be at least partly attributed to interaction stabilities and not solely to sampling limitations.”

      p. 11 (Discussion)

      “Indeed, the unrestrained atomistic MD simulations studied here were not expected to capture the maximal duration of stable contacts, as indicated by some interaction times approaching the full 1.7-µs trajectory (Figure 2}). Nevertheless, simulations were of sufficient length to sample exchange of up to four lipids, particularly around the M4 helix. Calculation of lipid lateral diffusion coefficients resulted in average displacements at the end of simulations of 2.15 nm for lipids initially interacting with the protein surface, roughly corresponding to lipids diffusing out to the 4th lipid shell. Diffusion of bulk lipids was faster, allowing lipids originally 3.16 nm away from the protein surface to ingress the first lipid shell. This observation underscores the potential for lipid exchange events even among lipids initially distant from the protein surface. Of course, duration of exceptionally stable interactions, such as those involving T274 (Figure 2C), inevitably remain bounded by the length of our simulations. Still, diffusion metrics, supported by robust statistical analysis encompassing diverse starting conditions (500 trajectories), enable confident estimation of relative interaction times.“

      p. 13 (Methods)

      “Time-based measures of protein-lipid interactions, such as mean duration times and exchange of interactions, were calculated for the 100 x 1.7 µs-long simulations using prolintpy (Sejdiu and Tieleman, 2021) with a 4 Å interaction cutoff. Analysis of lateral lipid diffusion in individual simulations was carried out for two disjoint sets of lipids: the first lipid shell defined as lipids with any part within 4 Å of the protein surface (~90 lipids), and bulk lipids consisting of all other lipids (~280 lipids). Mean square displacements of each lipid set were calculated using GROMACS 2021.5 (Abraham et al., 2015b) with contributions from the protein center of mass removed. Diffusion coefficients for each set, DA, were calculated using the Einstein relation (Equation 1) by estimating the slope of the linear curve fit to the data.

      where ri(t) is the coordinate of the center of mass of lipid i of set A at time t and DA is the self-diffusion coefficient.”

      Comment 2.4: How symmetric or asymmetric are the cryo and simulation densities across subunits and was there subunit asymmetry in the final build lipids? I could not tell from any of the figures beyond the casual observation that they maybe look somewhat similar in Fig. 1?

      Response 2.4: We thank the reviewer for this useful remark. We have clarified in the methods that the cryo-EM lipids were built in C5-symmetry, and thus the positions are symmetric. The computational densities were calculated independently for each subunit and are thus not necessarily symmetric. We have added Figure 1 – figure supplement 1 showing densities for all five subunits, also serving as an indication of convergence of the results.

      p. 3 (Results) “Although the stochastic nature of simulations resulted in nonidentical lipid densities associated with the five GLIC subunits, patterns of lipid association were notably symmetric (Figure 1 – figure supplement 1).”

      p. 14-15 (Methods)

      “A smaller subset of particles was used to generate an initial model. All subsequent processing steps were done using 5-fold symmetry. […] A monomer of that model was fit to the reconstructed density and 5-fold symmetry was applied with PHENIX 1.19.2-4158 through NCS restraints detected from the reconstructed cryo-EM map, to generate a complete channel. […] After building, 5-fold symmetry was applied to generate lipids at the same sites in the remaining four subunits.”

      Minor comments:

      Comment 2.5: Fig. 1 is probably not easy to follow for the general reader and the caption is very brief. I suggest adding an additional explanation to the caption and/or additional annotations to the figure to help a general reader step through this.

      Response 2.5: We have expanded the caption of Figure 1 and clarified the meanings of colors, labels, and annotations.

      Comment 2.6: Fig. 1B - Caption is confusing. I would not call the state separation lines outlines as they are not closed loops. Also, I see red/orange and two shades of blue whereas the caption mentions orange and blue only. The caption should also explicitly say what the black lines are (other cluster separations).

      Response 2.6: We have edited the caption to better describe colors, annotations, and the meaning of the data:

      p. 4 (Figure 1)

      “(B) Markov state models were used to cluster simulations conducted under resting (R) or activating (A) conditions into five states, including closed (left of the light or dark orange lines) and open (right of the light or dark blue lines). Black lines mark edges of other state clusters derived from MSM eigenvectors. Experimental structures are highlighted as white circles.”

      Comment 2.7: Fig. 3F caption appears to conflict with data where interaction with W217A appears longer than W217. I think the authors want to suggest here that W217A reduces contact time with T274 as stated in the main text.

      Response 2.7: We have clarified in this legend that “Mutation of residue W217, lining this pocket, reveals shortened interactions at the T274 binding site” (p. 6, Figure 3).

      Comment 2.8: Ref 25 and 26 are the same.

      Response 2.8: Apologies; this mistake has been corrected.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary: 

      This study investigated the role of CD47 and TSP1 in extramedullary erythropoiesis by utilization of both global CD47-/- mice and TSP1-/- mice. 

      Strengths:  

      Flow cytometry combined with spleen bulk and single-cell transcriptomics were employed. The authors found that stress-induced erythropoiesis markers were increased in CD47-/- spleen cells, particularly genes that are required for terminal erythroid differentiation. Moreover, CD47 dependent erythroid precursors population was identified by spleen scRNA sequencing. In contrast, the same cells were not detected in TSP1-/- spleen. These findings provide strong evidence to support the conclusion that the differential role of CD47 and TSP1 in extramedullary erythropoiesis in mouse spleen. 

      Weaknesses: 

      Methods and data analysis are appropriate. However, some clarifications are required. The discussion section needs to be expanded.  

      (1) The sex of mice that were used in the study is unknown.  

      (2) In the method of Single-cell RNA sequencing (page 10), it mentioned that single cell suspensions from mouse spleens were depleted of all mature hematopoietic cell lineages by passing through CD8a microbeads and CD8a+ T cell isolation Kit. As described, it is confusing what cell types are obtained for performing scRNAseq. More information is required for clarity.  

      (3) The constitutive CD47 knockout mouse model is utilized in this study. The observed accumulation of erythroid precursors in the spleens of CD47-/- mice suggests a chronic effect of CD47 on spleen function. Can the current findings be extrapolated to acute scenarios involving CD47 knockdown or loss, as this may have more direct relevance to the potential side effects associated with an-CD47-mediated cancer therapy? Please expand on this topic in the discussion section.  

      (1) The missing mouse gender information is incorporated into the revised manuscript. For flow cytometry, two male and two female mice of each genotype were used. For single cell RNA sequencing, two female and one male mouse of each genotype were used. For the bulk RNA sequencing four male cd47−/− mice and four male wildtype mice were used.

      (2) We apologize for the confusing presentation, which has been corrected. The bulk RNA sequencing analysis identified elevated expression of erythropoietic genes in CD8+ spleen cells from cd47−/− versus wildtype mice that were obtained using magnetic bead depletion of all other lineages. Therefore, we used the same Miltenyi negative selection kit as the first step to prepare the cells for single cell RNA sequencing. These untouched cells were then depleted of most mature CD8 T cells using a Miltenyi CD8a(Ly2) antibody positive selection kit. An important consideration underlying this approach was recognizing that the commercial magnetic bead depletion kits used for preparing specific immune cell types are optimized to give relatively pure populations of the intended immune cells using wildtype mice. Our previous experience studying NK cell development in the cd47−/− mice taught us that NK precursors, which are rare in wildtype mouse spleens, accumulate in cd47−/− spleens and were not removed by the antibody cocktail optimized for wildtype spleen cells (Nath et al Front Immunol 2018). The present data indicate that erythroid precursors behave similarly.

      (3) The Discussion was edited as recommended. Anemia is a prevalent side effect of several CD47 therapeutic antibodies being developed for cancer therapy. This anemia would be expected to induce erythropoiesis in bone marrow and possibly at extramedullary sites. Human spleen cells are not accessible to directly evaluate extramedullary erythropoiesis in cancer patients, but analysis of circulating erythroid precursors or liquid biopsy methods could be useful to detect induction of extramedullary erythropoiesis by these therapeutics. We are currently investigating the ability of CD47 antibodies to directly induce erythropoiesis using a human in vitro model.

      Reviewer #2 (Public Review):

      Summary: 

      The authors used existing mouse models to compare the effects of ablating the CD47 receptor and its signaling ligand Thrombospondin. The CD47-KO model used in this study was generated by Kim et al, 2018, where hemolytic anemia and splenomegaly was reported. This study analyzes the cell composition of the spleens from CD47-KO and Thsp-KO, focusing on early hematopoietic and erythroid populations. The data broadly shows that splenomegaly in the CD47-KO is largely due to an increase in committed erythroid progenitors as seen by Flow Cytometry and single-cell sequencing, whereas the Thsp-KO shows a slight depletion of committed erythroid progenitors but is otherwise similar to WT in splenic cell composition.  

      Strengths:

      The techniques used are appropriate for the study and the data support the main conclusions of the study. This study provides novel insights into a putative role of Thsp-CD47 signaling in triggering definitive erythropoiesis in the mouse spleen in response to anemic stress and constitutes a good resource for researchers seeking to understand extramedullary erythropoiesis.  

      Weaknesses:

      The Flow cytometry data alone supports the authors' main conclusion and single-cell sequencing confirms them but does not add further information, other than those already observed in the Flow data. The single-cell sequencing analysis and presentation could be improved by using alternate clustering methods as well as separating the data by genotype and displaying them in order for readers to fully grasp the nuanced differences in marker expression between the genotypes. Further, it is not clear from the authors' description of their results whether the increased splenic erythropoiesis is a direct consequence of CD47-KO or a response to the anemic stress in this mouse model. The enrichment of cKit+ Ter119+ Sca1- cells in CD47-KO indicates that these are likely stress erythroid progenitors. Another CD47-KO mouse model (Lindberg et al 1996) has no reported erythroid defects and was also not examined in this study.  

      (1) The reviewer asked, “whether the increased splenic erythropoiesis is a direct consequence of CD47-KO or a response to the anemic stress in this mouse model.” Our data supports both a direct role for CD47 and an indirect role resulting from the response to anemic stress. We cited our previous publications describing increased Sox2+ stem cells in spleens of Cd47 and Thbs1 knockout mice, but we neglected to emphasize another study where we found that bone marrow from cd47−/− mice subjected to the stress of ionizing radiation exhibited more colony forming units for erythroid (CFU-E) and burst-forming unit-erythroid (BFU-E) progenitors compared to bone marrow from irradiated wildtype mice (Maxhimer Sci Transl Med 2009). Taken together, our published data demonstrates that loss of CD47 results in an intrinsic protection of hematopoietic stem cells from genotoxic stress. This function of CD47 is thrombospondin-1-dependent and is consistent with the up-regulation of early erythroid precursors in the spleens of both knockout mice but cannot explain why the Thbs1−/−  mice have fewer committed erythroid precursors than wildtype. We cited studies that documented increased red cell turnover in cd47−/− mice but less red cell turnover in Thbs1−/−  mice compared to wildtype mice. Increased red cell clearance in cd47−/− mice is mediated by loss of the “don’t eat me” function of CD47 on red cells. In wildtype mice, clearance is augmented by thrombospondin-1 binding to the clustered CD47 on aging red cells (Wang, Aging Cell 2020). Thus, anemic stress in the mouse strains studied here decreases in the order cd47−/− > WT > Thbs−/−. This is consistent with the increased committed erythroid progenitors reported here in cd47−/− spleens and decreased committed progenitors in the Thbs1−/− spleens. 

      (2) Based on the reviewer’s question regarding alternative mechanisms and the publication of Yang et al 2022 identifying a role for CD47 in stress erythropoiesis though transfer of mitochondria to erythroblasts, we asked whether cd47-/- erythroid precursors  would show decreased mRNA expression for mitochondrial chromosome genes (new Figure 4−figure supplement 3C). Some of these mRNAs were more abundant in cd47-/- and thbs1-/- erythroid cells, which is the opposite of what we expected based on Yang 2022 but consistent with our previous publications identifying thrombospondin-1 and CD47 as negative regulators of mitochondrial homeostasis in muscle cells and T cells.

      (3) The cd47−/− mice used for the current study are the same strain as those reported by Lindberg et al in 1996, with additional backcrossing onto a C57BL/6 background.

      Recommendations For The Authors:

      Reviewer #2 (Recommendations For The Authors):

      Suggestions for improved or additional experiments, data, or analyses.  

      Significant efforts went into analyzing the type of erythroid progenitors by marker expression, but typical Flow cytometry strategies using Ter119 and CD44 combined with forward scatter can be used to stage the committed erythroid progenitors precisely.  

      We appreciate this suggestion to extend the flow data. However, the upcoming retirement of the PI required closing our breeding colony, and the mice are no longer available.  

      How can the difference between the erythroid phenotypes of the Lindberg et al 1996 CD47-KO (exon2 Neo knock-in) and Kim et al 2018 CD47-ko (exon1 26bp indel) be explained?  

      We are not convinced that the erythroid phenotypes of the Lindberg and Kim CD47-KO mice differ at the age used in our studies. Kim et al. focused on progressive hemolytic anemia and changes in T cells in spleen that emerge at 26 weeks age, whereas the mice used here were younger. The Lindberg and Kim mice have similar spleen enlargement at the age we used.

      Another manuscript under review from our lab suggests that cis-regulation of an adjacent colinear gene could contribute to some phenotypes observed when perturbing the Cd47 gene. The Lindberg mouse exhibits minimal perturbation of that adjacent gene, but we have no data regarding the Kim et al mouse. The reviewer’s question brought to our attention that we neglected to state in the Methods that the mice used here are the Lindberg mice, not the Kim mice. This omission is now corrected.

      The authors used Lindberg mouse for 2018 study on NK cells and observed splenomegaly. Did they check for extramedullary erythropoiesis there?  

      Retrospective examination of the RNAseq data for the spleen cells enriched in NK precursors used in our 2018 publication (Nath, 2018) reveals significantly elevated expression for a majority of the extramedullary erythroid markers listed in Table 1, but they were generally less abundant than observed for the lineage-depleted spleen cells used in the present manuscript.   

      Author response table 1.

      To clarify the stress erythropoiesis issue, it might be helpful to examine the sc-seq data for the expression of specific stress erythropoiesis markers in CD47-KO. Targets of BMP4 and Hedgehog signaling can also be examined. Further colony assays can help determine if stress BFU-Es are prevalent in the CD47-KO spleens and depleted in Thsp-KO  

      As noted in Table 1, twelve of the genes we studied are established markers of stress-induced extramedullary erythropoiesis, and most of these were included in the scRNA seq data presented. Our previous publication demonstrated that bone marrow from cd47−/− mice subjected to the stress of ionizing radiation exhibited more colony forming units for erythroid (CFU-E) and burst-forming unit-erythroid (BFU-E) progenitors compared to bone marrow from irradiated wildtype mice (Maxhimer Sci Transl Med 2009). We have not performed colony formation assays using spleen.

      To address the reviewer’s question regarding BMP4 and hedgehog signaling we performed gene set enrichment analysis for known BMP4 and hedgehog signaling signatures. Using GSE26351_UNSTIM_VS_BMP_PATHWAY_STIM_HEMATOPOIETIC_PROGENITORS, cd47-/- cells in cluster 12 or their CD34+ orCD34- subsets did not show significant enrichment for BMP4 targets compared to WT. Thbs1-/- cells in clusters 12 and 14 showed marginally significant depletion of the BMP4 signature (p=0.04 and p=0.023, respectively). Using the KEGG_HEDGEHOG_SIGNALING_PATHWAY, we did not find any significant enrichment. However, only a few genes in this pathway were detectable in the scRNAseq data. These data suggest that the BMP4 signaling may be regulated by thrombospondin-1, but properly testing this hypothesis would require achieving greater sequencing depth combined with a cell isolation method that better enriches the early hematopoietic progenitors that are known to utilize the BMP4 pathway.

      In the reclustering of erythroid progenitors in Figure 5, inclusion of Gata1 as a selection marker may help capture more of the early erythroid progenitors from the dataset and provide a more complete picture of the erythroid populations. 

      We thank the reviewer for suggesting inclusion of Gata1. We repeated the reclustering including Gata1 and found the selected cell count increased from 876 cells to 1007 cells. However, most of the increase was not in the erythroid cluster, which increased from 413 cells to 419 cells. Most of the increase represented Gata1+ T cells (548 cells including Gata1 versus 463 cells without). The revised manuscript presents genotype-dependent differential gene expression based on including Gata1 selection, but none of the specific conclusions were changed from the initial submission. The new Table 4 and Figure 7−figure supplement 1 enabled us to compare differential expression of erythropoietic genes obtained using supervised and unsupervised clustering and show that both methods yield comparable results.

      Just out of curiosity, was there an attempt to make a CD47 Thsp double KO? . Is it viable?  

      Cd47 KO mice are somewhat difficult breeders, and several previous attempts to cross with other transgenics have produced viable homozygous offspring that could not be propagated.

      Recommendations for improving the wring and presentation.  

      Perhaps readers would find it more intriguing if the paper led with the single-cell sequencing showing enrichment of erythroid populations in CD47-KO, and later confirmed with Flow Cytometry (even if this was not necessarily the order in which the experiments were done). 

      We considered this suggestion but believe that some of the flow cytometry data is needed to understand why we focused on CD34+ and CD34- subsets and proliferation markers when analyzing the scRNAseq data

      The single-cell sequencing data in Figure 3 might benefit from UMAP clustering as well. In addition, it would greatly help readers if the data points were separated by genotype and displayed after clustering. A similar analysis has been done in this paper: doi:10.1038/s41556-022-00898-9 by clustering different conditions together but displaying them separately by condition. 

      We initially explored tSNE and UMAP clustering and obtained similar results. We have added violin plots separated by genotype in Figure 4-figure supplement 2. We also included improved clusters separated by genotype in the revised Figure 3 panels C and D and for the reclustering in Figure 6D. UMAP plots provided better presentation for the reclustering (revised Figure 7). All data have been updated to the latest pipeline as noted in the Methods.

      Minor corrections to the text and figures.  

      Figure 4: Labels and plot legends are illegible in general, please relabel manually and if possible, redo plots with bigger font size and legends (relatively easy using ggplot2) 

      All figure panels were relabeled using larger fonts

      Figure 5D: Individual plots are stacked randomly atop each other and in many cases, gene names are not visible. Please restack the layers and ensure that the gene names are visible 

      Panel D was made a separate figure with enlarged labels (now Figure 7).

      Supp Fig 2: Layout can be organized a little better. Consider splitting into two figures for better organization  

      The figure was split as recommended. Now Figure 1-figure supplement 2 and Figure 2-figure supplement

      1.

      Abstract Line 10: "...mRNA expression of Kit, Ermap, and Tfrc, Induction of committed erythroid precursors is...". Replace comma after "Tfrc" with period   

      Done.

      Discussion Page 9 Line 8: "...WT spleens, s. mRNAs for some markers of committed erythroid cells including Nr3c1 mRNA...". Remove ", s" after spleens.   

      Done.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      The paper from Hsu and co-workers describes a new automated method for analyzing the cell wall peptidoglycan composition of bacteria using liquid chromatography and mass spectrometry (LC/MS) combined with newly developed analysis software. The work has great potential for determining the composition of bacterial cell walls from diverse bacteria in high-throughput, allowing new connections between cell wall structure and other important biological functions like cell morphology or host-microbe interactions to be discovered. In general, I find the paper to be well written and the methodology described to be useful for the field. However, there are areas where the details of the workflow could be clarified. I also think the claims connecting cell wall structure and stiffness of the cell surface are relatively weak. The text for this topic would benefit from a more thorough discussion of the weak points of the argument and a toning down of the conclusions drawn to make them more realistic.

      Thank you for your thorough and insightful review of our manuscript. We greatly appreciate your positive and constructive feedbacks on our methodology. We have carefully reviewed your comments and have responded to each point as follows:

      Specific points:

      1) It was unclear to me from reading the paper whether or not prior knowledge of the peptidoglycan structure of an organism is required to build the "DBuilder" database for muropeptides. Based on the text as written, I was left wondering whether bacterial samples of unknown cell wall composition could be analyzed with the methods described, or whether some preliminary characterization of the composition is needed before the high-throughput analysis can be performed. The paper would be significantly improved if this point were explicitly addressed in the main text. We apologize for not making it clearer. The prior knowledge of the peptidoglycan structure of an organism is indeed required to build the “DBuilder” database to accurately identify muropeptides; otherwise, the false discovery rate might increase. While peptidoglycan structures of certain organisms might not have been extensively studied, users still remain the flexibility to adapt the muropeptide compositions based on their study, referencing closely related species for database construction. We have addressed this aspect in the main text to ensure a clearer understanding.

      “(Section HAMA platform: a High-throughput Automated Muropeptide Analysis for Identification of PGN Fragments) …(i) DBuilder... Based on their known (or putative) PGN structures, all possible combinations of GlcNAc, MurNAc and peptide were input into DBuilder to generate a comprehensive database that contains monomeric, dimeric, and trimeric muropeptides (Figure 1b)."

      2) The potential connection between the structure of different cell walls from bifidobacteria and cell stiffness is pretty weak. The cells analyzed are from different strains such that there are many possible reasons for the change in physical measurements made by AFM. I think this point needs to be explicitly addressed in the main text. Given the many possible explanations for the observed measurement differences (lines 445-448, for example), the authors could remove this portion of the paper entirely. Conclusions relating cell wall composition to stiffness would be best drawn from a single strain of bacteria genetically modified to have an altered content of 3-3 crosslinks.

      We understand your concern regarding the weak connection between cell wall structure and cell stiffness. We will make a clear and explicit statement in the main text to acknowledge that the cells analyzed are derived from different strains, introducing the possibility of various factors influencing the observed changes in physical measurements as determined by AFM. Furthermore, we greatly appreciate your suggestion to consider genetically modified strains to investigate the role of cross-bridge length in determining cell envelope stiffness. In this regard, we are in the process of developing a CRISPR/Cas genome editing toolbox for Bifidobacterium longum, and we plan on this avenue of investigation for future work.

      Reviewer #2 (Public Review):

      The authors introduce "HAMA", a new automated pipeline for architectural analysis of the bacterial cell wall. Using MS/MS fragmentation and a computational pipeline, they validate the approach using well-characterized model organisms and then apply the platform to elucidate the PG architecture of several members of the human gut microbiota. They discover differences in the length of peptide crossbridges between two species of the genus Bifidobacterium and then show that these species also differ in cell envelope stiffness, resulting in the conclusion that crossbridge length determines stiffness.

      We appreciate your thoughtful review of our manuscript and your recognition of the potential significance of our work in elucidating the poorly characterized peptidoglycan (PGN) architecture of the human gut microbiota.

      The pipeline is solid and revealing the poorly characterized PG architecture of the human gut microbiota is worthwhile and significant. However, it is unclear if or how their pipeline is superior to other existing techniques - PG architecture analysis is routinely done by many other labs; the only difference here seems to be that the authors chose gut microbes to interrogate.

      We apologize if this could have been clearer. The HAMA platform stands apart from other pipelines by utilizing automatic analysis of LC-MS/MS data to identify muropeptides. In contrast, most of the routine PGN architecture analyses often use LC-UV/Vis or LC-MS platform, where only the automatic analyzing PGFinder software is supported. To our best knowledge, a comparable pipeline on automatically analyzing LC-MS/MS data was reported by Bern et al., which they used commercial Byonic software with an in-house FASTA database and specific glycan modifications. They achieved accurate and sensitive identification on monomer muropeptides, but struggled with cross-linked muropeptides due to the limitations of the Byonic software. We believe that our pipeline introducing the automatic and comprehensive analysis on muropeptide identification (particularly for Gram-positive bacterial peptidoglycans) would be a valuable addition to the field. To enhance clarity, we have adjusted the context as follows:

      (Introduction) … Although they both demonstrated great success in identifying muropeptide monomers, the accurate identification of muropeptide multimers and other various bacterial PGN structures still remains unresolved. This is because deciphering the compositions requires MS/MS fragmentation, but it is still challenging to automatically annotate MS/MS spectra from these complex muropeptide structures."

      I do not agree with their conclusions about the correlation between crossbridge length and cell envelope stiffness. These experiments are done on two different species of bacteria and their experimental setup therefore does not allow them to isolate crossbridge length as the only differential property that can influence stiffness. These two species likely also differ in other ways that could modulate stiffness, e.g. turgor pressure, overall PG architecture (not just crossbridge length), membrane properties, teichoic acid composition etc.

      Regarding the conclusions drawn about the correlation between cross-bridge length and cell envelope stiffness, we understand your point and appreciate your feedback. We revisit this section of our manuscript and tone down the conclusions drawn from this aspect of the study. We also recognize the importance of considering other potential factors that could influence stiffness, as you mentioned above. In light of this, we mentioned the need for further investigations, potentially involving genetically modified strains, in the main text to isolate and accurately determine the impact of bridge length on cell envelope stiffness.

      Reviewer #1 (Recommendations For The Authors):

      Minor points:

      1) One thing to consider would be testing the robustness of the analysis pipeline with one the well-characterized bacteria studied, but genetically modifying them to change the cell wall composition in predictable ways. Does the analysis pipeline detect the expected changes?

      We appreciate the reviewer's suggestion and would like to provide a clear response. Regarding to testing the pipeline with genetically modified strains, our lab previously worked on genetically modified S. maltophilia (KJΔmrdA).1 Inactivation of mrdA turned out the increasing level of N-acetylglucosaminyl-1,6-anhydro-N-acetylmuramyl-L-alanyl-D-glutamyl-meso-diamnopimelic acid-D-alanine (GlcNAc-anhMurNAc tetrapeptide) in muropeptide profiles, which is the critical activator ligands for mutant strain ΔmrdA-mediated β-lactamase expression. In this case, our platform could provide rapid PGN analysis for verifying the expected change of muropeptide profiles (see Author response image 1). Besides, if the predictable changes involve genetically modifications on interpeptide bridges within the PGN structure, for example, the femA/B genes of S. aureus, which are encoded for the synthesis of interpeptide bridges,2 our current HAMA pipeline is capable of detecting these anticipated changes. However, if the genetically modifications involve the introduce of novel components to PGN structures, then it would need to create a dedicated database specific to the genetically modified strain.

      Author response image 1.

      2) Line 368: products catalyzed > products formed

      The sentence has been revised.

      “(Section Inferring PGN Cross-linking Types Based on Identified PGN Fragments) …Based on the muropeptide compositional analysis mentioned above, we found high abundances of M3/M3b monomer and D34 dimer in the PGNs of E. faecalis, E. faecium, L. acidophilus, B. breve, B. longum, and A. muciniphila, which may be the PGN products formed by Ldts.”

      3) Lines 400-402: Is it possible the effect is related to porosity, not "hardness".

      Thank you for the suggestion. The possibility of the slower hydrolysis rate of purified PGN in B. breve being related to porosity is indeed noteworthy. While this could be a potential factor, we would like to acknowledge the limited existing literature that directly addresses the relation between PGN architecture and porosity. It is plausible that current methods available for assessing cell wall porosity may have certain limitations, contributing to the scarcity of relevant studies. In light of this, we would like to propose a speculative explanation for the observed effect. It is plausible that the tighter PGN architecture resulting from shorter interpeptide bridges in B. breve could contribute to its harder texture. This speculation is grounded in the concept that a more compact PGN structure might lead to increased stiffness, aligning with our observations of higher cell stiffness in B. breve.

      4) Lines 403-408: See point #2 above.

      Thank you for the suggestion. We have explicitly addressed this point in the main text:

      “(Section Exploring the Bridge Length-dependent Cell Envelope Stiffness in B. longum and B. breve) … Taken all together, we speculate that a tight peptidoglycan network woven by shorter interpeptide bridges or 3-3 cross-linkages could give bacteria stiffer cell walls. However, it is important to note that cell stiffness is a mechanical property that also depends on PGN thickness, overall architecture, and turgor pressure. These parameters may vary among different bacterial strains. Hence, carefully controlled, genetically engineered strains with similar characteristics will be needed to dissect the role of cross-bridge length in cell envelope stiffness.”

      5) Lines 428-429: It is not clear to me how mapping the cell wall architecture provides structural information about the synthetic system. It is also not clear how antibiotic resistance can be inferred. More detail is needed here to flesh out these points.

      Thank you for the suggestion. To provide further clarity on these important aspects, the context in the manuscript has been revised.

      “(Discussion) …Importantly, our HAMA platform provides a powerful tool for mapping peptidoglycan architecture, giving structural information on the PGN biosynthesis system. This involves the ability to infer possible PGN cross-linkages based on the type of PGN fragments obtained from hydrolysis. For instance, the identification of 3-3 cross-linkage formed by L,D-transpeptidases (Ldts) is of particular significance. Unlike 4-3 cross-linkages, the 3-3 cross-linkage is resistant to inhibition by β-Lactam antibiotics, a class of antibiotics that commonly targets bacterial cell wall synthesis through interference with 4-3 cross-linkages. Therefore, by elucidating the specific cross-linkage types within the peptidoglycan architecture, our approach offers insights into antibiotic resistance mechanisms.”

      6) Line 478: "maneuvers are proposed for" > "work is needed to generate". Also, delete "innovative". Also "in silico" > "in silico-based".

      The sentence has been revised.

      “(Discussion) …To achieve a more comprehensive identification of muropeptides, future work is needed to generate an expanded database, in silico-based fragmentation patterns, and improved MS/MS spectra acquisition.”

      7) Line 485: "Its" > "It has potential"

      The sentence has been revised.

      “(Discussion) …It has potential applications in identifying activation ligands for antimicrobial resistance studies, characterizing key motifs recognized by pattern recognition receptors for host-microbiota immuno-interaction research, and mapping peptidoglycan in cell wall architecture studies.”

      8) Figure 1 legend: Define Gb and Pb.

      Gb and Pb are the abbreviations of glycosidic bonds and peptide bonds. We have revised the Figure legend 1 as follow:

      “(Figure legend 1) …(b) DBuilder constructs a muropeptide database containing monomers, dimers, and trimers with two types of linkage: glycosidic bonds (Gb) and peptide bonds (Pb).”

      9) Figure 2: It is hard to see what is going on in panel a and c with all the labels. Consider removing them and showing a zoomed inset with labels in addition to ab unlabeled full chromatogram.

      We apologize for not making this clearer. The panel a and c in Figure 2 were directly generated by the Analyzer as a software screenshot of the peak annotations on chromatogram. Our intention was to present a comprehensive PGN mapping (approximately 70% of the peak area was assigned to muropeptide signals) using this platform. We understand the label density might affect clarity, so we have added the output tables of the whole muropeptide identifications as source data (Table 1–Source Data 1&2). Additionally, we have uploaded the Analyzer output files (see Additional Files), which can be better visualized in the Viewer program, and it also allows users zoom in for detailed labeling information.

      10) Figure 3: It is worth pointing out what features of the MS/MS fingerprints are helping to discriminate between species.

      Thank you for the suggestion. We have revised Figure 3 and the legend as follow:

      “(Figure legend 3) …The sequence of each isomer was determined using in silico MS/MS fragmentation matching, with the identified sequence having the highest matching score. The key MS/MS fragments that discriminate between two isomers are labeled in bold brown.”

      Author response image 2.

      11) Figure 4 and 5 legend: Can you condense the long descriptions of the abbreviations - or at least only refer to them once?

      Certainly, to enhance clarity and conciseness in the figure legends, we have revised Figure legend 5 as follow:

      “(Figure legend 5) …(b) Heatmap displaying …. Symbols: M, monomer; D, dimer; T, trimer (numbers indicate amino acids in stem peptides). Description of symbol abbreviations as in Figure legend 4, with the addition of "Glycan-T" representing trimers linked by glycosidic bonds.”

      Reviewer #2 (Recommendations For The Authors):

      1. Please read the manuscript carefully for spelling errors.

      We appreciate your careful review of our manuscript. We have thoroughly rechecked the entire manuscript for spelling errors and have made the necessary corrections to ensure the accuracy and quality of the text.

      1. Line 46 - "multilayered" is likely only true for Gram-positive bacteria.

      We thank reviewer #2 for bringing up this concern. Indeed, Gram-negative bacteria mostly possess single layer of peptidoglycan, but could be up to three layers in some part of the cell surface.3, 4 In order to reduce the confusion, we have rewritten the context as follow: “(Introduction) …PGN is a net-like polymeric structure composed of various muropeptide molecules, with their glycans linearly conjugated and short peptide chains cross-linked through transpeptidation.”

      1. Methods section: It seems like pellets from a 10 mL bacterial culture were ultimately suspended in 1.5 L (750 mL water + 750 mL tris) - why such a large volume? And how were PG fragments subsequently washed (centrifugation? There is no information on this in the Methods).

      We apologize for the mislabeling on the units. The accurate volume should be “1.5 mL (750 µL water + 750 µL tris)”. We have updated the correct volume in the Methods section (lines 99-100). For the washing process of purified PGN, we added 1 mL water, centrifuged at 10,000 rpm for 5 minutes, and removed supernatant. This information has added to the Methods section (lines 95-98).

      1. Line 183 - why were 6 modifications chose as the cutoff? Please make rationale more clear.

      We thank reviewer #2 for the comments. We set the maximum modification number of 6 in the assumption of one modification on each sugar of a trimeric muropeptide. A lower cutoff could effectively limit the identification of muropeptides with unlikely numbers of modifications, whereas a higher cutoff could allow for having multiple modifications on a muropeptide. In our hand, muropeptide modifications of E. coli are mostly N-deacetyl-MurNAc and anhydro-MurNAc, and modifications of gut microbes used here are mostly N-deacetyl-GlcNAc, anhydro-MurNAc, O-acetyl-MurNAc, loss of GlcNAc, and amidated iso-Glu. While we recommend starting data analysis with the cutoff of 6 modifications, users are free to adjust this based on their studies.

      1. Line 339 - define donor vs. acceptor here (can be added in parentheses after explaining the relevant chemical reactions further above in the text)

      Thank you for the suggestion. To provide greater clarity regarding the roles of the donor and acceptor substrates in the transpeptidation process, we have revised the content in the manuscript as follows:

      “(Section Inferring PGN Cross-linking Types Based on Identified PGN Fragments) …In general, there are two types of PGN cross-linkage…. Transpeptidation involves two stem peptides which function as acyl donor and acceptor substrates, respectively. As the enzyme names imply, the donor substrates that Ddts and Ldts bind to are terminated as D,D-stereocenters and L,D-stereocenters, which structurally means pentapeptides and tetrapeptides. During D,D-transpeptidation, Ddts recognize D-Ala4-D-Ala5 of the donor stem (pentapeptide) and remove the terminal D-Ala5 residue, forming an intermediate. The intermediate then cross-links the NH2 group in the third position of the neighboring acceptor stem, forming a 4-3 cross-link.”

      1. Line 366 following - can you calculate % crosslinks based on these numbers? What does "high abundance" of 3,3 crosslinks mean in this context? Is this the majority of PG?

      Thank you for your questions. Calculating the percentage of crosslinks based on the muropeptide compositional numbers is a valid consideration. However, it's important to note that the muropeptides we analyzed were hydrolyzed by mutanolysin, and as such, deriving an accurate % crosslink value from these data might not provide a true representation of the crosslinking percentage within the PGN network. For a more precise determination of % crosslinks, methods such as solid-phase NMR on purified peptidoglycan would be required. Our research provides insights into the characterization of PGN fragments and allows us to infer potential PGN cross-linkage types and the enzymes involved based on the dominant muropeptide fragments. Regarding the phrase "high abundance" in the context, it indicates that the M3b/M4b monomer and D34 dimer muropeptides represent a significant portion of the hydrolysis products. These muropeptides are major constituents within the PGN fragments obtained from the enzymatic hydrolysis.

      1. Line 375 - I am not sure PG is a meaningful diffusion barrier for drugs and signaling molecules, give that even larger proteins can apparently diffuse through the pores.

      Thank you for raising this point. Peptidoglycan indeed possesses relatively wide pores that allow for the diffusion of larger molecules, including proteins.5 Research has provided a rough estimate of the porosity of the PGN meshwork, suggesting that it allows for the diffusion of proteins with a maximum molecular mass of around 50 kDa.6 Considering this, we acknowledge that PGN may not serve as a significant diffusion barrier for drugs and signaling molecules. The porosity of the PGN scaffold, which is defined by the degree of cross-linking, plays a role in influencing the transport of molecules to the cell membrane. Thus, while PGN may not serve as a strict diffusion barrier, its structural characteristics still impact bacterial cell mechanics and interactions. We have revised the manuscript to reflect this understanding:

      “(Section Exploring the Bridge Length-dependent Cell Envelope Stiffness in B. longum and B. breve) …The porosity of the PGN scaffold, defined by the degree of cross-linking, influences the transport of larger molecules such as proteins. Therefore, modifications to PGN structure are anticipated to significantly affect bacterial cell mechanics and interactions.”

      1. Line 400 - what does "slower hydrolysis rate" refer to, is this chemical hydrolysis or enzymatic (autolysins?). also, I am not sure hydrolysis rate of either modality allows for solid conclusions about how hard (line 402) the PG is.

      Thank you for your comments. The hydrolysis rate here refers to the enzymatic hydrolysis, specifically the mutanolysin cleaving the β-N-acetylmuramyl-(1,4)-N-acetylglucosamine linkage. Indeed, there is no direct correlation between the hydrolysis rate and the hardness of PGN architecture, although the structure rigidity is a key determinant in protein digestion.7 Considering the enzymatic hydrolysis rate depending on the accessibility of the substrate to the enzyme, we proposed that the tighter PGN architecture could also lead to a slower hydrolysis rate. This speculation aligns with our observations of higher cell stiffness or more compact PGN structure of B. breve and its slower hydrolysis rate. We understand this is indirect proof, so the revised sentence now reads:

      “(Section Exploring the Bridge Length-dependent Cell Envelope Stiffness in B. longum and B. breve) …Furthermore, B. breve also showed a slower enzymatic hydrolysis rate in purified PGNs, implying that the cell wall structure of B. breve is characterized by a compact PGN architecture.”

      1. Line 424 - I am not convinced this pipeline can detect PG architectures that other pipelines cannot; likely, the difference between previous analyses and theirs is due to different growth conditions (3,3 crosslink formation is often modulated by environmental factors/growth stage). In the next sentence, it sounds like mutanolysin treatment is a novelty in PG analysis (which it is not).

      We apologize if this could have been clearer and we have revised the paragraph to describe our study more accurately. We agree that different growth conditions could influence PGN architecture and other pipelines could manually identify the PGN architectures or automatically identify them if they are not too complex. Our original intention was to highlight the ability of the HAMA program to automatically identify unreported PGN structure. Here are the revised sentences:

      “(Discussion) …We speculate that this finding may be influenced by the comprehensive mass spectrometric approaches we employed or by variations in growth conditions. Moreover, we utilized the well-established enzymatic method involving mutanolysin to cleave the β-N-acetylmuramyl-(1,4)-N-acetylglucosamine linkage, which preserves the original peptide linkage in intact PGN subunits.”

      1. Line 440- 442: As outlined in more detail above: I don't think you can conclude something about the relationship between bridge length and envelope stiffness based on these data. Thank you for your valuable feedback. We agree that our data may not definitively support the direct conclusion about the relationship between bridge length and envelope stiffness in Bifidobacterium species. Instead, we will rephrase this section to accurately present the observed correlations without overgeneralizing:

      “(Discussion) … Notably, our study suggested a potential correlation between the cell stiffness and the compactness of bacterial cell walls in Bifidobacterium species (Figure 5). B. longum, which predominantly harbors tetrapeptide bridges (Ser-Ala-Thr-Ala), exhibits a trend towards lower stiffness, whereas B. breve, characterized by PGN cross-linked with monopeptide bridges (Gly), demonstrates a trend towards higher stiffness. These findings suggested that it may be correlated between the increased rigidity and the more compact PGN architecture built by shorter cross-linked bridges.”

      References: 1. Huang, Y.-W.; Wang, Y.; Lin, Y.; Lin, C.; Lin, Y.-T.; Hsu, C.-C.; Yang, T.-C., Impacts of Penicillin Binding Protein 2 Inactivation on β-Lactamase Expression and Muropeptide Profile in Stenotrophomonas maltophilia. mSystems 2017, 2 (4), 00077-00017.

      1. Jarick, M.; Bertsche, U.; Stahl, M.; Schultz, D.; Methling, K.; Lalk, M.; Stigloher, C.; Steger, M.; Schlosser, A.; Ohlsen, K., The serine/threonine kinase Stk and the phosphatase Stp regulate cell wall synthesis in Staphylococcus aureus. Sci. Rep. 2018, 8 (1), 13693.

      2. Labischinski, H.; Goodell, E. W.; Goodell, A.; Hochberg, M. L., Direct proof of a "more-than-single-layered" peptidoglycan architecture of Escherichia coli W7: a neutron small-angle scattering study. J. Bacteriol. 1991, 173 (2), 751-756.

      3. Rohde, M., The Gram-Positive Bacterial Cell Wall. Microbiol. Spectr. 2019, 7 (3), gpp3-0044-2018.

      4. Vollmer, W.; Höltje, J. V., The architecture of the murein (peptidoglycan) in gram-negative bacteria: vertical scaffold or horizontal layer(s)? J. Bacteriol. 2004, 186 (18), 5978-5987.

      5. Vollmer, W.; Blanot, D.; De Pedro, M. A., Peptidoglycan structure and architecture. FEMS Microbiol. Rev. 2008, 32 (2), 149-167.

      6. Li, Q.; Zhao, D.; Liu, H.; Zhang, M.; Jiang, S.; Xu, X.; Zhou, G.; Li, C., "Rigid" structure is a key determinant for the low digestibility of myoglobin. Food Chem.: X 2020, 7, 100094.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary

      Type 1 diabetes mellitus (T1DM) progression is accelerated by oxidative stress and apoptosis. Eugenol (EUG) is a natural compound previously documented as anti-inflammatory, anti-oxidative, and anti-apoptotic. In this manuscript by Jiang et al., the authors study the effects of EUG on T1DM in MIN6 insulinoma cells and a mouse model of chemically induced T1DM. The authors show that EUG increases nuclear factor E2-related factor 2 (Nrf2) levels. This results in a reduction of pancreatic beta-cell damage, apoptosis, oxidative stress markers, and a recovery of insulin secretion. The authors highlight these effects as indicative of the therapeutic potential of EUG in managing T1DM.

      Strengths

      Relevant, timely, and addresses an interesting question in the field. The authors consistently observe enhanced beta cell functionality following EUG treatment, which makes the compound a promising candidate for T1DM therapy.

      Weaknesses

      (1) The in vivo experiments have too few biological replicates. With an n=3 (as all figure legends indicate) in complex mouse studies such as these, drawing robust conclusions becomes challenging. It is important to reproduce these results in a larger cohort, to validate the conclusions of the authors.

      Thanks for your comments. In the figure legends of the first draft manuscript, n=3 means at least 3 biological replicates, and in the section of material and methods, n=30 means sample size. The number of mice in each group is 30 and there were 150 mice used in this study, and mice are assigned as follows for the whole in vivo experiments. The relative information has been added in the revised manuscript.

      Author response image 1.

      (2) Another big concern is the lack of quantifications and statistical analysis throughout the manuscript. Although the authors claim statistical significance in various experiments, the limited information provided makes it difficult to verify. The authors use vague and minimal descriptions of their experiments, which further reduces the reader's comprehension and the reproducibility of the experiments.

      Thanks for your constructive suggestion. We conducted quantitative and statistical analysis of the entire manuscript through GraphPad Prism software again. Additionally, we have improved the experimental description in the revised manuscript.

      (3) Finally, the use of Min6 cells as a model for pancreatic beta cells is a strong limitation of this study. Future studies should seek to reproduce these findings in a more translational model and use more relevant in vitro cell systems (eg. Islets).

      Thanks for your professional comments. Mouse insulinoma cells (MIN6 cell line) are permanent cell lines isolated from mouse islet β cell tumors, which can reflect the functional changes of islet β cells. As mature islet cells, MIN6 cells have been widely used in the study of type 1 diabetes mellitus[1-4], so in this study, MIN6 cells were used as the cell model in vitro. In our future studies, we will try to conduct our findings using more relevant in vitro cell systems (eg. Islets).

      References:

      (1) WU M, CHEN W, ZHANG S, et al. Rotenone protects against β-cell apoptosis and attenuates type 1 diabetes mellitus [J]. Apoptosis, 2019, 24(11-12): 879-91.

      (2) LUO C, HOU C, YANG D, et al. Urolithin C alleviates pancreatic β-cell dysfunction in type 1 diabetes by activating Nrf2 signaling [J]. Nutr Diabetes, 2023, 13(1): 24.

      (3) LAKHTER A J, PRATT R E, MOORE R E, et al. Beta cell extracellular vesicle miR-21-5p cargo is increased in response to inflammatory cytokines and serves as a biomarker of type 1 diabetes [J]. Diabetologia, 2018, 61(5): 1124-34.

      (4) LIN Y, SUN Z. Antiaging Gene Klotho Attenuates Pancreatic β-Cell Apoptosis in Type 1 Diabetes [J]. Diabetes, 2015, 64(12): 4298-311.

      Reviewer #3 (Public Review):

      Summary:

      This study by Jiang et al. aims to establish the streptozotocin (STZ)-induced type 1 diabetes mellitus (T1DM) mouse model in vivo and the STZ-induced pancreatic β cell MIN6 cell model in vitro to explore the protective effects of Eugenol (EUG) on T1DM. The authors tried to elucidate the potential mechanism by which EUG inhibits the NRF2-mediated anti-oxidative stress pathway. Overall, this study is well executed with solid data, offering an intriguing report from animal studies for a potential new treatment strategy for T1DM.

      Strengths:

      The in vivo efficacy study is comprehensive and solid. Given that STZ-induced T1DM is a devastating and harsh model, the in vivo efficacy of this compound is really impressive.

      Weaknesses:

      (1) The Mechanism is linked with the anti-oxidant property of the compound, which is common for many natural compounds, such as flavonoids and polyphenol. However, rarely, this kind of compound has been successfully developed into therapeutics in clinical usage. Indeed, if that is the case, Vitamin C or Vitamin E could be used here as the positive control.

      Thanks for your comments. In fact, many anti-oxidant drugs are used for the treatment of type 1 diabetes mellitus in the clinical. For example, lipoic acid was used to treat diabetic peripheral neuropathy[5]. Vitamin E could effectively eliminate free radicals, protect cell membranes, and significantly reduce the risk of cardiovascular disease in patients with SPACE or ICARE diabetes[6]. Glutathione played crucial roles in the detoxification and anti-oxidant systems of cells and has been used to treat acute poisoning and chronic liver diseases by intravenous injection[7]. Therefore, eugenol enhances the management of type 1 diabetes mellitus by modulating oxidative stress pathways and holds potential as a future therapeutic choice for clinical application. In the future relevant studies, we will try to use Vitamin C or Vitamin E as the positive control.

      References:

      (5) ZIEGLER D, PAPANAS N, SCHNELL O, et al. Current concepts in the management of diabetic polyneuropathy [J]. J Diabetes Investig, 2021, 12(4): 464-75.

      (6) VARDI M, LEVY N S, LEVY A P. Vitamin E in the prevention of cardiovascular disease: the importance of proper patient selection [J]. J Lipid Res, 2013, 54(9): 2307-14.

      (7) HONDA Y, KESSOKU T, SUMIDA Y, et al. Efficacy of glutathione for the treatment of nonalcoholic fatty liver disease: an open-label, single-arm, multicenter, pilot study [J]. BMC Gastroenterol, 2017, 17(1): 96.

      Reviewer #1 (Recommendations For The Authors):

      • For each of the figure panels the authors should indicate the exact number of biological replicates (how many mice or how many independent in vitro experiments). For IF panels, the number of mice, the number of histology slides per mouse, number of fields analyzed should be indicated.

      Thanks for your constructive suggestion. These details had been added in the revised manuscript.

      • The methods state n=30 and Figure 1 states n=3. N=3 is too little for such a complex in vivo study and would severely reduce the reliability of the in vivo experiments.

      Thanks for your suggestion. In the figure legends of the first draft manuscript, n=3 means at least 3 biological replicates, and in the section of material and methods, n=30 means sample size. The number of mice in each group is 30 and there were 150 mice used in this study, and mice are assigned as follows for the whole in vivo experiments. The in vivo experimental data of Figure 1 were supplemented in the revised manuscript.

      • Individual data points should be included in each of the graphs from this manuscript.

      Thanks for your reminder. The revised manuscript have shown the individual data points in each of the graphs.

      • The quantifications and statistics in the manuscript need improvement. Several experiments are missing quantifications and/or statistical tests (e.g. Figure 1J). Other experiments show a quantification but without any explanation of replicates (e.g. Figures 2B and 2G). None of the experiments show individual data points, and as in the previous comment, these should be included.

      Thanks for your comments. In the revised manuscript, statistics and repetitions of experimental data have been supplemented, and individual data points were shown in each graph.

      • What is the reason for intragastric administration? The previous studies on which the dosages were based used oral administration (gavage). (Discussed in methods 4.2).

      Thanks for your professional comments. The intervention treatment of T1DM mice is conducted through two methods: oral administration[8] and oral gavage[9-11]. Due to limited experimental conditions, it is not feasible to feed a single mouse in a single cage, which makes it challenging to precisely control the actual daily intervention dose for each mouse when using oral administration. To ensure that each mouse receives an intervention dose according to its weight and expected dosage, we employ a method of gavage. In addition, oral gavage is more convenient and easier to operate than oral administration. Therefore, in vivo experiment of this study used eugenol gavage intervention as a treatment method. These details had been added in the revised manuscript.

      References:

      (8) ZHAO H, WU H, DUAN M, et al. Cinnamaldehyde Improves Metabolic Functions in Streptozotocin-Induced Diabetic Mice by Regulating Gut Microbiota [J]. Drug Des Devel Ther, 2021, 15: 2339-55.

      (9) XING D, ZHOU Q, WANG Y, et al. Effects of Tauroursodeoxycholic Acid and 4-Phenylbutyric Acid on Selenium Distribution in Mice Model with Type 1 Diabetes [J]. Biol Trace Elem Res, 2023, 201(3): 1205-13.

      (10) SUDIRMAN S, LAI C S, YAN Y L, et al. Histological evidence of chitosan-encapsulated curcumin suppresses heart and kidney damages on streptozotocin-induced type-1 diabetes in mice model [J]. Sci Rep, 2019, 9(1): 15233.

      (11) YAO H, SHI H, JIANG C, et al. L-Fucose promotes enteric nervous system regeneration in type 1 diabetic mice by inhibiting SMAD2 signaling pathway in enteric neural precursor cells [J]. Cell Commun Signal, 2023, 21(1): 273.

      • Urine volume cannot be specified per mouse (methods 4.4) unless the mice were single-housed or if the different groups were not mixed, both are not ideal study set-ups. Please clarify in the methods section.

      Thanks for your constructive suggestion. After successful modeling of T1DM mice, the successful modeling mice were grouped based on method 4.2 as follows Control, T1DM, T1DM + EUG (5 mg/kg/day), T1DM + EUG (10 mg/kg/day), and T1DM + EUG (20 mg/kg/day). To ensure consistency among groups, each group consisted of 5 mice and had equal amounts of diet (100 g), drinking water (250 mL), and environmental conditions for feeding. The urine-soaked area of mice in each group was recorded to quantify the urine volume. The conditions are the same for each group. The description of Method 4.4 has been improved in the revised manuscript.

      • OGTT (Figure 1H) of week 2 is missing. This is an important control time point, as it would show the effect of STZ before EUG treatment.

      Thanks for your careful review. OGTT (Figure 1H) of week 2 has been added in the revised manuscript.

      • In Figure 1J, the control group does not follow the expected ITT trajectory. If possible, add the 120-minute time point to see if the blood glucose levels return to baseline in the control group. The graph shows increased basal glucose levels in the experimental groups, but no differences in insulin tolerance. It also misses the AUC calculations. It is probably not significantly different, which should be noted in the text.

      Thanks for your suggestion. T1DM primarily manifests as pancreatic β cell damage and the absolute reduction of insulin secretion, resulting in the disorder of glucose metabolism in vivo. The oral glucose tolerance test (OGTT) is a series of plasma glucose concentrations measured within 2 h after oral gavage of a certain amount of glucose. It is a standard method to evaluate an individual's blood glucose regulation ability and to understand the function of islet β cells. Insulin resistance means reducing the efficiency of insulin to promote glucose uptake and utilization for various reasons, and the body's compensatory secretion of excessive insulin leads to hyperinsulinemia to maintain the stability of blood glucose. The insulin resistance test (ITT) is commonly employed to detect insulin resistance in T2DM. However, it was found that the ITT experiment had little correlation with T1DM. Therefore, the ITT experiment of Figure 1J and related description have been removed from the revised manuscript.

      • The staining and FACS data on the effects of STZ+EUG+/- ML385 are not convincing (Figure 6 and Figure 7) and do not seem to align with the bar graphs and the conclusions in the text. It would be good to include immunofluorescent staining for insulin to further validate the effects of STZ+EUG+/- ML385 on insulin expression.

      Thanks for your comments.

      (1) In the revised manuscript, between the statistical results and the pictures, so we re-conducted the statistics of the immunofluorescence results of NRF2 and HO-1, as follows:

      (1) NRF2 immunofluorescence staining:

      Author response image 2.

      Group 1

      Author response image 3.

      Group 2

      Author response image 4.

      Group 3

      Author response image 5.

      Group 4

      Author response image 6.

      Group 5

      Author response image 7.

      NRF2 immunofluorescence staining statistics:

      (2) HO-1 immunofluorescence staining:

      Author response image 8.

      Group 1

      Author response image 9.

      Group 2

      Author response image 10.

      Group 3

      Author response image 11.

      Group 4

      Author response image 12.

      Group 5

      Author response image 13.

      HO-1 immunofluorescence staining statistics:

      (2) The meanings represented by each quadrant of cell flow analysis are as follows: Q1 represents a group of necrotic cells, characterized by positive PI staining and negative Anenexin V staining; Q2 represents late apoptotic cells, with both PI and Anenexin V staining negative; Q3 represents early apoptotic cells, with both PI and Anenexin V staining positive; Q4 represents living cells, characterized by positive Anenexin V staining and negative PI staining. In the experiment, the number of apoptotic cells were calculated as the sum of late apoptotic cells in Q2 and early apoptotic cells in Q3. As shown in Figure 9F-G, these results were consistent with those observed in Figure 6G, 6J and Figure 7D-F.

      (3) MIN6 cells, as mouse islet β cell line, has the function of secreting insulin. The intervention of STZ was an absolute decrease in the number of islet β cells, so the result of insulin immunofluorescence staining was only a decrease in the number of MIN6 cells in each cell group. In addition, the detection of insulin protein expression level is always through ELISA method to assess the secretion of insulin protein in the cell supernatant. Figure 6E is the ELISA results of insulin protein secretion in the cell supernatant.

      • The experimental design for the in vitro experiments was unclear from the text. Consider including a schematic to show when cells were treated with STZ, EUG, and ML385.

      Thanks for your suggestion. The experimental design for the in vitro experiments of this study has been added in Figure 6A of the revised manuscript.

      • As stated in the Discussion, the use of the insulinoma line Min6 as a model instead of primary pancreatic beta cells is a clear limitation of the study. The mechanistic data would be stronger if validated on a more relevant system (eg. untransformed Islets).

      Thanks for your comments. Mouse insulinoma cells (MIN6 cell line) are permanent cell lines isolated from mouse islet β cell tumors, which can reflect the functional changes of islet β cells. As mature islet cells, MIN6 cells have been widely utilized as an in vitro cellular model for diabetes research to investigate the functionality of β cells within pancreatic islets[1, 2, 12]. So in this study, MIN6 cells were used as the cell model in vitro. In our future studies, we will try to conduct our findings using more relevant in vitro cell systems (eg. Islets).

      References:

      (1) WU M, CHEN W, ZHANG S, et al. Rotenone protects against β-cell apoptosis and attenuates type 1 diabetes mellitus [J]. Apoptosis, 2019, 24(11-12): 879-91.

      (2) LUO C, HOU C, YANG D, et al. Urolithin C alleviates pancreatic β-cell dysfunction in type 1 diabetes by activating Nrf2 signaling [J]. Nutr Diabetes, 2023, 13(1): 24.

      (12) CHEN H, LOU Y, LIN S, et al. Formononetin, a bioactive isoflavonoid constituent from Astragalus membranaceus (Fisch.) Bunge, ameliorates type 1 diabetes mellitus via activation of Keap1/Nrf2 signaling pathway: An integrated study supported by network pharmacology and experimental validation [J]. J Ethnopharmacol, 2024, 322: 117576.

      • The use of small molecule inhibitors such as ML385 can have unspecific effects. Genetic manipulation or the use of siRNAs to inhibit the NRF2 pathway would have been preferable for the in vitro experiments.

      Thanks for your constructive suggestion. ML385 is a commonly used and stable inhibitor of the NRF2 and has been used in a variety of disease studies[13-15]. The MIN6 cells utilized in this study were cultured under challenging conditions and exhibited a sluggish growth rate. Owing to the cytotoxicity associated with siRNAs transfection reagents, a significant proportion of MIN6 cells succumbed following transfection. Consequently, small molecule inhibitors ML385 were employed in this investigation. In our future studies, we will try to conduct our findings using siRNAs.

      References:

      (13) DANG R, WANG M, LI X, et al. Edaravone ameliorates depressive and anxiety-like behaviors via Sirt1/Nrf2/HO-1/Gpx4 pathway [J]. J Neuroinflammation, 2022, 19(1): 41.

      (14) WANG Z, YAO M, JIANG L, et al. Dexmedetomidine attenuates myocardial ischemia/reperfusion-induced ferroptosis via AMPK/GSK-3β/Nrf2 axis [J]. Biomed Pharmacother, 2022, 154: 113572.

      (15) LI J, DENG S H, LI J, et al. Obacunone alleviates ferroptosis during lipopolysaccharide-induced acute lung injury by upregulating Nrf2-dependent antioxidant responses [J]. Cell Mol Biol Lett, 2022, 27(1): 29.

      • The study proposes a mechanism in which EUG-induced disruption of KEAP1 and NRF2 interaction leads to NRF2 translocation to the nucleus and upregulation of proteins required to prevent oxidative stress. In Figure 6H it is unclear whether the nuclear NRF2 increases. Please add quantifications of the immunostainings.

      Thanks for your reminder. Figure 6J shows the quantifications of the immunostainings of NRF2 in the revised manuscript.

      • Some of the figure legends lack important information. In Figure 5A, 6E for instance, what is the protein expression normalized to?

      Thanks for your constructive suggestion. Protein normalization refers to the standardization of proteins from different sources and with different properties, so as to facilitate the comparison of protein content and expression in different samples. In WB experiment, protein expression normalization is one of the essential steps. Western blot of nuclear protein generally cannot be performed using β-Actin as an internal reference. Lamin B was chosen because β-Actin is an intrinsic parameter not found in the nucleus. N-NRF2, as a nuclear protein, requires Lamin B as a reference for protein normalization. The lack important information of WB in Figure have been supplemented in figure legends of the revised manuscript.

      • Please acknowledge previous literature on the effects of EUG/clove oil in diabetes models. The meta-analytical review by Carvalho et al. (DOI: 10.1016/j.phrs.2020.105315) should be cited and discussed.

      Thanks for your suggestion. It has been cited and discussed in the revised manuscripts.

      • Consider revising the text for grammar, language mistakes, and readability. The text is not always precise (e.g. in the explanation of gamma-H2AX in the results), does not explain terminology (e.g. the oxidative stress markers - line 204+205), or simplifies conclusions (e.g. "improved islet function" based on glucose tolerance test", line 129).

      Thanks for your comments. The above problem has been solved in the revised manuscripts. In addition, we had send our manuscript to the professional English language editing company to improve our paper, and the editorial certificate had been submitted as a supplement document.

      • In the current format, some figures are out of focus. Please make sure to upload a high-quality version for publication.

      Thanks for your suggestion. A high quality version figures has been uploaded. Perhaps due to the excessive content of the file after upload, the file is compressed, and the figures is not focused. So, all figures in this study have been uploaded separately for download in the review system.

      Reviewer #2 (Recommendations For The Authors):

      Below are specific points of criticism on the experiments presented.

      (1a) There is no comparison among eugenol treatments with regards to fasting weight, blood glucose, water intake, food intake, and, crucially, OGTT. All three treatments appear to show very similar effects but has this been statistically assessed? Shown statistical significance of ketonuria between no and high eugenol treatments seems exaggerated.

      Thanks for your comments. EUG intervention has a dose-dependent effect on T1DM. According to Figure 1B-I, 20 mg/kg EUG has the best effect. Fasting body weight, blood glucose, water intake, food intake, and OGTT were statistically assessed in Figure 1 of the revised manuscript. In addition, we performed statistical analyse of ketonuria between no and high eugenol treatments again in the revised manuscript. In the revised manuscript, we have also made objective revisions to the expression of eugenol's efficacy.

      (b) ITT is not used to detect T1DM (line 126).

      Thanks for your suggestion. T1DM primarily manifests as pancreatic β cell damage and the absolute reduction of insulin secretion, resulting in the disorder of glucose metabolism in vivo. The oral glucose tolerance test (OGTT) is a series of plasma glucose concentrations measured within 2 h after oral gavage of a certain amount of glucose. It is a standard method to evaluate an individual's blood glucose regulation ability and to understand the function of islet β cells. Insulin resistance means reducing the efficiency of insulin to promote glucose uptake and utilization for various reasons, and the body's compensatory secretion of excessive insulin leads to hyperinsulinemia to maintain the stability of blood glucose. The insulin resistance test (ITT) is commonly employed to detect insulin resistance in T2DM. However, it was found that the ITT experiment had little correlation with T1DM. Therefore, the ITT experiment and related description have been removed in the revised manuscript.

      (2) Here it is hard to reconcile the gradual increase of Ins protein levels in (STZ) and (STZ + increasing eugenol) samples with(a) results in 1 suggesting that the dose of eugenol does not significantly affect the outcome and(b) Ins expression, which is essentially undetectable in both STZ and STZ+EUG mice. A likely explanation is that EUG just postpones beta cell death. I assume that these analyses were done in week 10 but it is not stated.

      Thanks for your professional suggestion. Perhaps because the file is compressed, the gray value of WB strip is not obvious, so the expression of INS is not seen clearly. In fact, the intervention of STZ resulted in a significant decrease in INS expression compared with the Control group, which could be alleviated by the treatment of EUG. However, due to the large difference in INS between the STZ group, EUG treatment, and the Control group, the gray values of INS in the STZ group and the STZ + EUG group were not clear. As mentioned in the method 4.12-4.13, our WB and PCR samples were from 10 week mice.

      (3) The γH2Ax stainings provided are weak and do not fully correspond to the quantitation - the 5 mg/Kg EUG treatment appears less severe than the 10 mg/Kg. In contrast, changes in the PCD pathway are convincingly demonstrated.

      Thanks for your reminder. γH2AX immunohistochemical staining is required to be located in the islets. It measured the number of β cells stained with brown, not the brown area. The ZOOM image of γH2AX staining showed that the EUG improvement effect of 10 mg/kg was better than that of 5 mg/kg. γH2AX, as a marker of DNA damage, exhibits nuclear localization and is absent in the cytoplasmic compartment. Therefore, in Figure 4C-D, we quantified the proportion of cells exhibiting brown staining. In Figure 4C, black arrows were employed to highlight the presence of brown-stained islet β cells.

      (4) Is there a reason for looking at mRNA levels of Ho-1 but not KEAP1 or NQO-1 ? What is the expression of Nrf2 itself at the RNA level? Please give in the text what the abbreviations MDA, SOD, CAT GSH-Px stand for. Are these protein levels or activity assays? Units in the y-axis of graphs?

      Thanks for your constructive suggestion.The required KEAP1 and NQO-1 primers have been synthesized, and the relevant data have been supplemented in the revised manuscript. The expression of Nrf2 itself at the RNA level is T-NRF2 (Total NRF2). The MDA, SOD, CAT and GSH-Px abbreviations stand for Malondialdehyde, Superoxide dismutase, Catalase, Glutathione peroxidase, and the relevant information, which have been supplemented in the revised manuscript. These are activity assays of serum, and units in the y-axis of graphs have been added in the revised manuscripts.

      (5) The Ins levels in the culture medium of STZ + ML treated cells are much lower than the levels in STZ treated cells (6D). This is not consistent with the results of Ins cell content or Ins expression as stated (6B and D).

      Thanks for your careful review. The experimental samples in Figure 6C in the revised manuscript represent the proteins extracted from cells of each group, while the experimental samples in Figure 6E represent the supernatant of cells from each group. ML385 is an inhibitor of NRF2, which effectively suppresses the NRF2 signaling pathway and aggravates MIN6 cell damage, resulting in lower INS expression observed in both the STZ+ML385 group depicted in Figures 6C and 6E compared to that in the STZ group. Although the sample sources of the two groups differ and there are slight variations in the trend, it can be observed that the overall trend of the STZ+ML385 group is comparatively lower than that of the STZ group.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Chen et al. identified the role of endocardial id2b expression in cardiac contraction and valve formation through pharmaceutical, genetic, electrophysiology, calcium imaging, and echocardiography analyses. CRISPR/Cas9 generated id2b mutants demonstrated defective AV valve formation, excitation-contraction coupling, reduced endocardial cell proliferation in AV valve, retrograde blood flow, and lethal effects.

      Strengths:

      Their methods, data and analyses broadly support their claims.

      Weaknesses:

      The molecular mechanism is somewhat preliminary.

      We thank the reviewer for the positive assessment of our work. A detailed point-by-point response has been incorporated in the response to “Recommendations for the authors” section.

      Reviewer #2 (Public review):

      Summary:

      Biomechanical forces, such as blood flow, are crucial for organ formation, including heart development. This study by Shuo Chen et al. aims to understand how cardiac cells respond to these forces. They used zebrafish as a model organism due to its unique strengths, such as the ability to survive without heartbeats, and conducted transcriptomic analysis on hearts with impaired contractility. They thereby identified id2b as a gene regulated by blood flow and is crucial for proper heart development, in particular, for the regulation of myocardial contractility and valve formation. Using both in situ hybridization and transgenic fish they showed that id2b is specifically expressed in the endocardium, and its expression is affected by both pharmacological and genetic perturbations of contraction. They further generated a null mutant of id2b to show that loss of id2b results in heart malformation and early lethality in zebrafish. Atrioventricular (AV) and excitation-contraction coupling were also impaired in id2b mutants. Mechanistically, they demonstrate that Id2b interacts with the transcription factor Tcf3b to restrict its activity. When id2b is deleted, the repressor activity of Tcf3b is enhanced, leading to suppression of the expression of nrg1 (neuregulin 1), a key factor for heart development. Importantly, injecting tcf3b morpholino into id2b-/- embryos partially restores the reduced heart rate. Moreover, treatment of zebrafish embryos with the Erbb2 inhibitor AG1478 results in decreased heart rate, in line with a model in which Id2b modulates heart development via the Nrg1/Erbb2 axis. The research identifies id2b as a biomechanical signaling-sensitive gene in endocardial cells that mediates communication between the endocardium and myocardium, which is essential for heart morphogenesis and function.

      Strengths:

      The study provides novel insights into the molecular mechanisms by which biomechanical forces influence heart development and highlights the importance of id2b in this process.

      Weaknesses:

      The claims are in general well supported by experimental evidence, but the following aspects may benefit from further investigation:

      (1) In Figure 1C, the heatmap demonstrates the up-regulated and down-regulated genes upon tricane-induced cardiac arrest. Aside from the down-regulation of id2b expression, it was also evident that id2a expression was up-regulated. As a predicted paralog of id2b, it would be interesting to see whether the up-regulation of id2a in response to tricane treatment was a compensatory response to the down-regulation of id2b expression.

      We thank the reviewer for the comment. As suggested, we performed qRT-PCR analysis to assess id2a expression in tricaine-treated heart. Our results demonstrate a significant upregulation of id2a following the inhibition of cardiac contraction, suggesting a potential compensatory response to the decreased id2b. These new results have been incorporated into the revised manuscript (Figure 1D).

      (2) The study mentioned that id2b is tightly regulated by the flow-sensitive primary cilia-klf2 signaling axis; however aside from showing the reduced expression of id2b in klf2a and klf2b mutants, there was no further evidence to solidify the functional link between id2b and klf2. It would therefore be ideal, in the present study, to demonstrate how Klf2, which is a transcriptional regulator, transduces biomechanical stimuli to Id2b.

      We have examined the expression levels of id2b in both klf2a and klf2b mutants. The whole mount in situ results clearly demonstrate a decrease in id2b signal in both mutants (Figure 3E). As noted by the reviewer, klf2 is a transcriptional regulator, suggesting that the regulation of id2b may occur at the transcriptional level. However, dissecting the molecular mechanisms underlying the crosstalk between klf2 and id2b is beyond the scope of the present study.

      (3) The authors showed the physical interaction between ectopically expressed FLAG-Id2b and HA-Tcf3b in HEK293T cells. Although the constructs being expressed are of zebrafish origin, it would be nice to show in vivo that the two proteins interact.

      We thank the reviewer for this insightful comment. As suggested, we synthesized Flag-id2b and HA-tcf3b mRNA and co-injected them into 1-cell stage zebrafish embryos. We collected 100-300 embryos at 12, 24, and 48 hpf and performed western blot analysis using the same anti-HA and anti-Flag antibodies validated in HEK293 cell experiments. Despite multiple independent attempts, we were unable to detect clear bands of the tagged proteins in zebrafish embryos. We speculate that this could be due to mRNA instability, translational efficiency, or the low abundance of Id2b and Tcf3b proteins. We have acknowledged these technical limitations in the revised manuscript and clarified that the HEK293 cell data support a potential interaction between Id2b and Tcf3b, while confirming their endogenous interaction will require further investigations (Lines 295-296).

      Reviewer #3 (Public review):

      Summary:

      How mechanical forces transmitted by blood flow contribute to normal cardiac development remains incompletely understood. Using the unique advantages of the zebrafish model system, Chen et al make the fundamental discovery that endocardial expression of id2b is induced by blood flow and required for normal atrioventricular canal (AVC) valve development and myocardial contractility by regulating calcium dynamics. Mechanistically, the authors suggest that Id2b binds to Tcf3b in endocardial cells, which relieves Tcf3b-mediated transcriptional repression of Neuregulin 1 (NRG1). Nrg1 then induces expression of the L-type calcium channel component LRRC1. This study significantly advances our understanding of flow-mediated valve formation and myocardial function.

      Strengths:

      Strengths of the study are the significance of the question being addressed, use of the zebrafish model, and data quality (mostly very nice imaging). The text is also well-written and easy to understand.

      Weaknesses:

      Weaknesses include a lack of rigor for key experimental approaches, which led to skepticism surrounding the main findings. Specific issues were the use of morpholinos instead of genetic mutants for the bmp ligands, cilia gene ift88, and tcf3b, lack of an explicit model surrounding BMP versus blood flow induced endocardial id2b expression, use of bar graphs without dots, the artificial nature of assessing the physical interaction of Tcf3b and Id2b in transfected HEK293 cells, and artificial nature of examining the function of the tcf3b binding sites upstream of nrg1.

      We thank the reviewer for the positive assessment and the constructive suggestions. We have performed additional experiments and data analysis to address these issues. A detailed point-by-point response has been incorporated in the response to “Recommendations for the authors” section.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Questions/Concerns:

      (1) In the introduction, it would be beneficial to include background information on the id2b gene, what is currently known about its function in heart development/regeneration and in other animal models than just the zebrafish.

      We thank the reviewer for the constructive suggestion. In the revised manuscript, we have added a paragraph in the Introduction to provide background on id2b and its role in heart development. Specifically, we discuss its function as a member of the ID (inhibitor of DNA binding) family of helix-loop-helix (HLH) transcriptional regulators and highlight its involvement in cardiogenesis in both zebrafish and mouse models. These additions help place our findings in a broader developmental and evolutionary context (Lines 91-100).

      (2) Of the 6 differentially expressed genes identified in Figure 1C, why did the authors choose to focus on id2b and not the other 5 downregulated genes?

      We thank the reviewer for the comments. As suggested, we have added a sentence in the revised manuscript to clarify the rationale for selecting id2b as the focus of the present study (Lines 117-121).

      (3) As the authors showed representative in situ images for id2b expression with blebbistatin treatment in Figure 1E, and tnn2a MO in Figure 1F, it would also be beneficial to show relative mRNA expression levels for id2b in conditions of blebbistatin treatment and tnn2a MO knockdown. In Fig. 1C: id2b is downregulated with tricaine, but id2a is upregulated with tricaine. Do these genes perform similar or different functions, results of gene duplication events?

      We thank the reviewer for the thoughtful suggestion. Our in situ hybridization results demonstrate reduced id2b expression following tricaine, blebbistatin, and tnn2 morpholino treatment. To further validate these observations and enhance cellular resolution, we generated an id2b:eGFP knockin line. Analysis of this reporter line confirmed a significant reduction in id2b expression in the endocardium upon inhibition of cardiac contraction and blood flow (Figure 3A-D), supporting our in situ results. The divergent expression patterns of id2a and id2b in response to tricaine treatment likely reflect functional specification following gene duplication in zebrafish. While our current study focuses on characterizing the role of id2b in zebrafish heart development, the specific function of id2a remains to be determined. 

      (4) In Fig. 2b, could the authors compare the id2b fluorescence with RNAscope ISH at 24, 48, and 72 hpf? RNAscope ISH allows for the visualization of single RNA molecules in individual cells. The authors should at least compare these in the heart to demonstrate that id2b accurately reflects the endogenous id2b expression. In Fig. 2E: Suggest showing the individual fluorescent images for id2b:eGFP and kdrl:mCherry in the same colors as top panel images instead of in black and white. In Fig. 2F: The GFP fluorescence from id2b:eGFP signals looks overexposed.

      We thank the reviewer for the valuable comment. In response, we attempted RNAscope in situ hybridization on embryos carrying the id2b:eGFP reporter to directly compare fluorescent reporter expression with endogenous id2b transcripts. However, we encountered a significant reduction in id2b:eGFP fluorescence following the RNAscope procedure, and even subsequent immunostaining with anti-GFP antibodies yielded only weak signals. Despite this technical limitation, the RNAscope results independently confirmed id2b expression in endocardial cells (Figure 2E), supporting the specificity and cell-type localization observed with the reporter line. As suggested by the reviewer, we have updated Figure 2G to display id2b:eGFP and kdrl:mCherry images in the same color scheme as the top panel to improve consistency and clarity. Additionally, we have replaced the images in Figure 2F to avoid overexposure and better represent the spatial distribution of id2b:eGFP in adult heart.

      (5) In Fig. 3A: are all the images in panel A taken with the same magnification? In Fig. 3e, could the authors show the localization of klf2 and id2b and confirm their expression in the same endocardial cells? In Fig. 3, the authors conclude that klf2-mediated biomechanical signaling is essential for activating id2b expression. This statement is somewhat overstated because they only demonstrated that knockout of klf2 reduced id2b expression.

      We thank the reviewer for these constructive comments. All images presented in Figure 3A were captured using the same magnification, as now clarified in the revised figure legend. We appreciate the reviewer’s question regarding the localization of klf2 and id2b. While we were unable to directly visualize both markers in the same embryos due to the current unavailability of klf2 reporter lines, prior studies using klf2a:H2B-eGFP transgenic zebrafish have demonstrated that klf2a is broadly expressed in endocardial cells, with enhanced expression in the atrioventricular canal region (Heckel et al., Curr Bio 2015, PMID: 25959969; Gálvez-Santisteban et al., Elife 2019, PMID: 31237233). Our id2b:eGFP reporter analysis revealed a similarly broad endocardial expression pattern. These independent observations support the likelihood that klf2a and id2b are co-expressed in the same endocardial cell population.   

      We also appreciate the reviewer’s comments regarding the connection between biomechanical signaling and id2b expression. Previous studies have already established that biomechanical cues directly regulate klf2 expression in zebrafish endocardial cells (Vermot et al., Plos Biol 2009, PMID: 19924233; Heckel et al., Curr Bio 2015, PMID: 25959969). In the present study, we observed a significant reduction in id2b expression in both klf2a and klf2b mutants, suggesting that id2b acts downstream of klf2. These observations together establish the role of biomechanical cues-klf2-id2b signaling axis in endocardial cells. Nevertheless, we agree with the reviewer that further investigation is required to elucidate the precise mechanism by which klf2 regulates id2b expression.

      (6) In Fig. 4: What's the mRNA expression for id2b in WT and id2b mutant fish hearts?

      We performed qRT-PCR analysis on purified zebrafish hearts and observed a significant reduction in id2b mRNA levels in id2b mutants compared to wild-type controls. These new results have been incorporated into the revised manuscript (Figure 4A).

      (7) In Fig. 5E, the heart rate shows no difference between id2b+/+ and id2b-/- fish according to echocardiography analysis. However, Fig. 5B indicates a difference in heart rate. Could the authors explain this discrepancy?

      We thank the reviewer for this insightful observation. In our study, we observed a reduction in heart rate in id2b mutants during embryonic stages (120 hpf), as shown in Figure 5B. However, this difference was not evident in adult fish based on echocardiography analysis (Figure 5E). While the exact reason for these changes during development remains unclear, it is possible that the reduction in cardiac output observed in id2b mutants during early development triggers compensatory mechanisms over time, ultimately restoring heart rate in adulthood. Given that heart rate is primarily regulated by pacemaker activity, further investigation will be required to determine whether such compensatory adaptations occur and to elucidate the underlying mechanisms.

      (8) In Fig. 6A: it's a little hard to read the gene names in the left most image in the panel. In Fig. 6B, the authors conducted qRT-PCR analysis of 72 hpf embryonic hearts and validated decreased nrg1 levels in id2b-/- compared to control. Since nrg1 is not specifically expressed in endocardial cells in the developing heart, the authors should isolate endocardial cells and compare nrg1 expression in id2b-/- to control. This would ensure that the loss of id2b affects nrg1 expression derived from endocardial cells rather than other cell types. In Supp Figure S6: Suggest adding an image of the UMAP projection to show tcf3b expression in endocardial cells from sequencing analysis.

      We thank the reviewer for these helpful suggestions. In response, we have increased the font size of gene names in the leftmost panel of Figure 6A to improve readability. Regarding nrg1 expression, we acknowledge the importance of assessing its cell-type specificity. Unfortunately, due to the lack of reliable transgenic or knock-in tools for nrg1, its precise expression pattern in embryonic hearts remains unclear. We attempted to isolate endocardial cells from embryonic hearts using FACS, but the limited number of cells obtained at this stage precluded reliable qRT-PCR analysis. Nonetheless, our data show that id2b is specifically expressed in endocardial cells, and publicly available single-cell RNA-seq datasets also support that nrg1 is predominantly expressed in endocardial, but not myocardial or epicardial cells during embryonic heart development (Figure 6-figure supplement 1). These findings suggest that id2b may regulate nrg1 expression in a cell-autonomous manner within the endocardium. As suggested, we have also added a UMAP image to Figure 7-figure supplement 1 to show tcf3b expression in endocardial cells, further supporting the cell identity in single-cell dataset.

      (9) In Fig. 6, Nrg1 knockout shows no gross morphological defects and normal trabeculation in larvae. Could the authors explain why they propose that endocardial id2b promotes nrg1 synthesis, thereby enhancing cardiomyocyte contractile function? Did Nrg1 knockdown with Mo lead to compromised calcium signaling and cardiac contractile function? Nrg2a has been reported to be expressed in endocardial cells in larvae, and its loss leads to heart function defects. Perhaps Nrg2a plays a more important role than Nrg1.

      We thank the reviewer for raising this important point. Although we did not directly test nrg1 knockout in our study, previous reports have shown that genetic deletion of nrg1 in zebrafish does not impair cardiac trabeculation during embryonic stages (Rasouli et al., Nat Commun 2017, PMID: 28485381; Brown et al., J Cell Mol Med 2018, PMID: 29265764). However, reduced trabecular area and signs of arrhythmia were observed in juvenile and adult fish (Brown et al., J Cell Mol Med 2018, PMID: 29265764), suggesting a potential role for nrg1 in maintaining cardiac structure and function later in development. Whether calcium signaling and cardiac contractility are affected at these stages remains to be determined. Given that morpholino-induced knockdown is limited to early embryonic stages, it is not suitable for assessing nrg1 function in juvenile or adult hearts.

      As noted by the reviewer, nrg2a is expressed in endocardial cells, and its deletion has been associated with cardiac defects (Rasouli et al., Nat Commun 2017, PMID: 28485381). To assess its potential involvement in our model, we performed qRT-PCR analysis and observed increased nrg2a expression in id2b mutant hearts (Author response image 1). This upregulation may reflect a compensatory response to the loss of id2b. Therefore, nrg2a is unlikely to play an essential role in mediating the depressed cardiac function in this context.

      Author response image 1.

      Expression levels of nrg2a. qRT-PCR analysis of nrg2a mRNA in id2b<sup>+/+</sup> and id2b<sup>-/-</sup> adult hearts. Data were normalized to the expression of actb1. N=5 biological replicates, with each sample containing two adult hearts.

      (10) In Fig. 7A of the IP experiment, it is recommended that the authors establish a negative control using control IgG corresponding to the primary antibody source. This control helps to differentiate non-specific background signal from specific antibody signal.

      As suggested, we have included an IgG control corresponding to the primary antibody species in the immunoprecipitation (IP) experiment to distinguish specific from non-specific binding. The updated data are presented in Figure 7A of the revised manuscript.

      (11) In Pg. 5, line 115: there is no reference included for previous literature on blebbistatin.

      We have added the corresponding reference (Line 126, Reference #5).

      In Pg. 5, lines 118-119; pg. 6 line 144: It would be beneficial to include a short sentence describing why choosing a tnnt2a morpholino knockdown to help provide mechanistic context to readers.

      We thank the reviewer for the constructive suggestion. In cardiomyocytes, tnnt2a encodes a sarcomeric protein essential for cardiac contraction, and its knockdown is a well-established method for abolishing heartbeat and blood flow in zebrafish embryos, thereby allowing investigation of flow-dependent gene regulation. In the revised manuscript, we have added a sentence and corresponding reference to clarify the rationale for using tnnt2a morpholino in our study (Lines 128-129, Reference #35).

      In Pg. 6, line 140: Results title of "Cardiac contraction promotes endocardial id2b expression through primary cilia but not BMP" is misleading and contradicts the results presented in this section and corresponding figure. For example, the bmp Mo knockdown experiments led to decreased id2b fluorescence and the last statement of this results section contradicts the title that BMP does not promote endocardial id2b in lines 179-180: "Collectively, these results suggest that BMP signaling and blood flow modulate id2b expression in a developmental-stage-dependent manner." It would be helpful to clarify whether BMP signaling is involved in id2b expression or not.

      We apologize for any confusion caused by the section title. Our results demonstrate that id2b expression is regulated by both BMP signaling and biomechanical forces in a developmental-stage-specific manner. Specifically, morpholino-mediated knockdown of bmp2b, bmp4, and bmp7a at the 1-cell stage significantly reduced id2b:eGFP fluorescence at 24 hpf (Figure 3-figure supplement 1A, B), suggesting that id2b is responsive to BMP signaling during early embryonic development. However, treatment with the BMP inhibitor Dorsomorphin during later stages (24-48 or 36-60 hpf) did not significantly alter id2b:eGFP fluorescence intensity in individual endocardial cells, although a modest reduction in total endocardial cell number was noted (Figure 3-figure supplement 1C, D). These results suggest that BMP signaling is required for id2b expression during early development but becomes dispensable at later stages, when biomechanical cues may play a more prominent role. To address this concern and better reflect the data, we have revised the Results section title to: "BMP signaling and cardiac contraction regulate id2b expression". This revised title more accurately reflects the dual regulation of id2b expression (Line 153).

      In line 205: Any speculation on why the hemodynamics was preserved between id2b mutant and WT siblings at 96 hpf?

      As suggested, we have included a sentence to address this observation. “Surprisingly, the pattern of hemodynamics was largely preserved in id2b<sup>-/-</sup> embryos compared to id2b<sup>+/+</sup> siblings at 96 hpf (Figure 4-figure supplement 1E, Video 1, 2), suggesting that the reduced number of endocardial cells in the AVC region was not sufficient to induce functional defects.” (Lines 223-225)

      In line 246: Fig. 6k and 6j are referenced, but should be figure 5k and 5j.

      We have corrected this in the revised manuscript.

      Reviewer #2 (Recommendations for the authors):

      he manuscript was overall well explained, aside from a few minor points that would help facilitate reader comprehension:

      (1) The last paragraph of the introduction could be a brief summary of the study.

      We thank the reviewer for this constructive suggestion. As recommended, we have included a paragraph in the Introduction section summarizing our key findings to provide clearer context for the study (Lines 96-100).

      (2) Lines 127-128: 'revealed a substantial recapitulation of the... of endogenous id2b expression' may need to be rephrased.

      We thank the reviewer for the valuable suggestion. In the revised manuscript, we have changed the sentence to: “Comparison of id2b:eGFP fluorescence with in situ hybridization at 24, 48, and 72 hpf revealed that the reporter signal closely recapitulates the endogenous id2b expression pattern.” (Lines 137-139)

      (3) Line 182: '... in a developmental-stage-dependent manner' sounds a bit ambiguous, may need to slightly elaborate/ clarify what this means.

      We thank the reviewer for the helpful comment. To improve clarity, we have revised the statement to: “Collectively, these results suggest that id2b expression is regulated by both BMP and biomechanical signaling, with the relative contribution of each pathway varying across developmental stages.” (Lines 195-197)

      Reviewer #3 (Recommendations for the authors):

      (1) The conclusion that BMP signaling prior to 24 hpf is necessary for id2b expression is not fully supported by the data. How do the authors envision pre-linear heart tube BMP signaling impacting endocardial id2b expression during later chamber stages? Id2b reporter fluorescence can be clearly visualized in the linear heart tube in panel B from Figure 1. Does id2b expression initiate prior to contraction? Can the model be refined by showing when id2b endocardial reporter fluorescence is first observed, and whether this early/pre-contractile expression is dependent on BMP signaling?

      We thank the reviewer for the important comment. As suggested, we performed morpholino-mediated knockdown of bmp2b, bmp4, and bmp7a at the 1-cell stage. Live imaging at 24 hpf showed significantly reduced id2b:eGFP fluorescence compared to controls (Figure 3-figure supplement 1A, B), suggesting that id2b is responsive to BMP signaling during early embryonic development. However, treatment with the BMP inhibitor Dorsomorphin during 24-48 or 36-60 hpf did not significantly impact id2b:eGFP fluorescence intensity in individual endocardial cells, although a reduction in endocardial cell number was observed (Figure 3-figure supplement 1C, D). These results suggest that BMP signaling is essential for id2b expression during early embryonic development, while it becomes dispensable at later stages, when biomechanical cues exert a more significant role.

      (2) Overexpressing tagged versions of TCF3b and Id2b in HEK293 cells is a very artificial way to make the major claim that these two proteins interact in endogenous endocardial cells. Can this be done in zebrafish embryonic or adult hearts?

      We thank the reviewer for this insightful comment. As suggested, we synthesized Flag-id2b and HA-tcf3b mRNA and co-injected them into 1-cell stage zebrafish embryos. We collected 100-300 embryos at 12, 24, and 48 hpf and performed western blot analysis using the same anti-HA and anti-Flag antibodies validated in HEK293 cell experiments. Despite multiple independent attempts, we were unable to detect clear bands of the tagged proteins in zebrafish embryos. We speculate that this could be due to mRNA instability, translational efficiency, or the low abundance of Id2b and Tcf3b proteins. We have acknowledged these technical limitations in the revised manuscript and clarified that the HEK293 cell data support a potential interaction between Id2b and Tcf3b, while confirming their endogenous interaction will require further investigations (Lines 295-296).

      (3) The data presented are consistent with the claim that the tcf3b binding sites are functional upstream of nrg1 to repress its transcription. To fully support this idea, those two sites should be disrupted with gRNAs if possible.

      We thank the reviewer for the valuable suggestion. In response, we attempted to disrupt the tcf3b binding sites using sgRNAs. However, we encountered technical difficulties in identifying sgRNAs that specifically and efficiently target these binding sites without affecting adjacent regions. Despite these challenges, our luciferase reporter assay, using tcf3b mRNA overexpression and morpholino knockdown, clearly demonstrated that tcf3b binds to and regulates nrg1 promoter region. Nevertheless, we acknowledge that future study using genome editing will be necessary to validate the direct binding of tcf3b to nrg1 promoter.

      Minor Points:

      (1) Must remove all of the "data not shown" statements and add the primary data to the Supplemental Figures.

      As suggested, we have removed all of the “data not shown” statements and added the original data to the revised manuscript (Figure 4E, middle panels, and Figure 4-figure supplement 1F)

      (2) Must present the order of the panels in the figure as they are presented in the text. One example is Figure 6 where 6E is discussed in the text before 6C and 6D.

      We thank the reviewer for bring up this important point. In the revised manuscript, we have carefully revised the manuscript to ensure that the order of figure panels matches the sequence in which they are discussed in the text. Specifically, we have reorganized the presentation of Figure 6 panels to align with the text flow, discussing panels 6C and 6D before panel 6E. The updated figure and corresponding text have been corrected accordingly in the revised manuscript.

      (3) Change the italicized gene names (e.g. tcf3b) to non-italicized names with the first letter capitalized (e.g. Tcf3b) when referencing the protein.

      As suggested, we have revised the manuscript to use non-italicized names with the first letter capitalized when referring to proteins.

      (4) All bar graphs should be replaced with dot bar graphs.

      We have replaced all bar graphs with dot bar graphs throughout the manuscript.

      (5) The new id2b mutant allele should be validated as a true null using quantitative RT-PCR to show that the message becomes destabilized through non-sense mediated decay or by immunostaining/western blot analysis if there is a zebrafish Id2b-specific antibody available.

      We thank the reviewer for this important suggestion. We have performed qRT-PCR analysis and detected a significant reduction in id2b mRNA levels in id2b<sup>-/-</sup> compared to id2b<sup>+/+</sup> controls. These new results are presented in Figure 4A of the revised manuscript.

      (6) Was tricaine used to anesthetize embryos for capturing heart rate and percent fractional area change? This analysis should be performed with no or very limited tricaine as it affects heart rate and systolic function. These parameters were captured at 120 hpf, but the authors should also look earlier at 72 hpf at a time when valves are not present by calcium transients are necessary to support heart function.

      We thank the reviewer for this important comment. When performing live imaging to assess cardiac contractile function, we used low-dose tricaine (0.16 mg/mL) to anesthetize the zebrafish embryos. We have included this important information in the Methods section (Line 503). As suggested, we have also included the heart function results at 72 hpf, which are now presented in Figure 5-figure supplement 2A-C of the revised manuscript.

      (7) The alpha-actinin staining in Figure 5-supplement 2D is very pixelated and unconvincing. This should be repeated and imaged at a higher resolution.

      As suggested, we have re-performed the α-actinin staining and acquired higher-resolution images. The updated results are now presented in Figure 5-figure supplement 2G of the revised manuscript.

      (8) The authors claim that reductions in id2b mutant heart contractility are due to perturbed calcium transients instead of sarcomere integrity. Why do the authors think that regulation of calcium dynamics was not observed in the DEG enriched GO-terms? Was significant downregulation of cacna1 identified in the bulk RNAseq?

      We thank the reviewer for raising this important point. In our bulk RNAseq dataset comparing id2b mutant and control hearts, GO term enrichment was primarily associated with pathways related to cardiac muscle contraction and heart contraction (Figure 5-figure supplement 1B). We speculate that the transcriptional changes related to calcium dynamics may be relatively subtle and thus were not captured as significantly enriched GO terms. In addition, our qRT-PCR analysis revealed a significant reduction in cacna1c expression in id2b mutant hearts compared to controls, suggesting that id2b deletion impairs calcium channel expression. However, this change was not detected by RNA-seq, likely due to limitations in sensitivity.

      (9) In line 277, the authors say, "To determine whether this interaction occurs in zebrafish, Flag-id2b and HA-tcf3b were co-expressed in HEK293 cells...". This should be re-phrased to, "To determine if zebrafish Id2b and Tcf3b interact in vitro, Flag-id2b and HA-tcf3b were co-expressed in HEK293 cells for co-immunoprecipitation analysis." The sentence in line 275 should be changed to, "....heterodimer with Tcf3b to limit its function as a potent transcriptional repressor."

      We thank the reviewer for these constructive comments and have revised the text accordingly (Lines 291-294).

      (10) Small text corrections or ideas:

      Line 63: emphasized

      We have corrected this in the revised manuscript.

      Line 71: studied signaling pathways

      We have corrected this in the revised manuscript.

      Line 106: the top 6 DEGS (I think that the authors mean top 6 GO-terms) and is Id2b in one of the enriched GO categories?

      id2b is one of the top DEGs. This point has been clarified in the revised manuscript (Lines 116-117).

      Line 125: a knockin id2b:eGFP reporter line

      We have corrected this in the revised manuscript (Line 136).

      Line 138: This paragraph could use a conclusion sentence.

      We have added a conclusion sentence in the revised manuscript (Lines 150-151).

      Line 190: id2b-/- zebrafish experienced early lethality

      We have revised the statement as suggested (Line 206).

      Line 193: The prominent enlargement of the atrium with a smaller ventricle has characterized as cardiomyopathy in zebrafish (Weeks et al. Cardiovasc Res, 2024, PMID: 38900908), which has also been associated with disruptions in calcium transients (Kamel et al J Cardiovasc Dev Dis, 2021, PMID: 33924051 and Kamel et al, Nat Commun 2021, PMID: 34887420). This information should be included in the text along with these references.

      We thank the reviewer for this helpful suggestion. We have incorporated these important references into the revised manuscript and included the relevant information to acknowledge the established link between atrial enlargement, cardiomyopathy, and disrupted calcium transients in zebrafish models (Reference #41, 42, and 45; Lines 210 and 260).

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      [...] Weaknesses

      Showing that A-2 and especially A-3 are outliers in the PCA analysis is useful, but it may be hiding other interesting signals in the data. The other strains are remarkably colinear on these plots, hinting that if the outliers were removed, one main component would emerge along which they are situated. It also seems possible that this additional analysis step would allow the second dimension to better differentiate them in a way that is interesting with respect to their mutator status or mutations in key metabolic or regulatory genes.

      We thank the reviewer for their positive comments and their constructive feedback on the manuscript. Following reviewer’s recommendation, we performed the PCA analysis on metabolism data after removing A-2 and A-3 data. We have detailed those results below. Consistent with a similar analysis performed on RNA-seq datasets in our previous publication, we find that removing these outliers has only a modest effect on separating mutators from non-mutators. We find that, while the new PC2 separates most mutators from the non-mutators, the separation is rather weak. Moreover, we do not see a similar distinction when looking at metabolic data in the Stationary phase. In the interest of improving the readability of the manuscript, we recommend not including these analysis in the final manuscript. We have presented the data for the reviewer’s benefit in Author response image 1, 2 and 3.

      Author response image 1.

      Author response image 2.

      Author response image 3.

      There is a missed opportunity to connect some key results to what is known about LTEE mutations that reduce the activity of pykF (pyruvate kinase I). This gene is mutated in all 12 LTEE populations, and often these mutations are frameshifts or transposon insertions that should completely knock out its activity. At first glance, inactivating an enzyme for a step in glycolysis does not make sense when the nutrient source in the growth medium is glucose, even though PykF is only one of two isozymes E. coli encodes for this reaction. There has been speculation that inactivating pykF increases the concentration of phosphoenolpyruvate (PEP) in cells and that this can lead to increased rates of glucose import because PEP is used by the phosphotransferase system of E. coli to import glucose (see https://doi.org/10.1002/bies.20629). The current study has confirmed the higher PEP levels, which is consistent with this model.

      We thank the reviewer for pointing out this missed opportunity. We have expanded the discussion around the role of pykF mutations and the elevated concentrations of PEP observed in our data in section 3.4.

      In the introduction, the papers cited to show the importance of changes in metabolism for adaptation do not seem to fit the focus of this study very well. They stress production of toxins and secondary metabolites, which do not seem to be mechanisms that are at work in the LTEE. I can think of two areas of background that would be more relevant: (1) studies of how bacterial metabolism evolves in adaptive laboratory evolution (ALE) experiments to optimize metabolic fluxes toward biomass production (for example, https://doi.org/10.1038/nature01149), and (2) discussions of how cross-feeding, metabolic niche specialization, and metabolic interdependence evolve in microbial communities, including in other evolution experiments (for example, https://doi.org/10.1073/pnas.0708504105 and https://doi.org/10.1128/mBio.00036-12).

      We thank the reviewer for pointing out missed citations in our introduction. We agree that these papers are relevant to the topic and have added their citations. Additionally, following the suggestion of another reviewer, we have reorganized the introduction so that the concept of the role of metabolism in evolution is presented first and the LTEE second.

      Reviewer #2 (Public Review):

      [...] Overall, this is a significant and well-executed research study. It offers new insights into the complex relationship between genetic changes and observable traits in evolving populations and utilizes metabolomics in the LTEE, a novel approach in combination with RNA-seq and mutation datasets.

      However, the paper's overall clarity is lacking. It is spread too thin and covers many topics without a clear focus. I strongly recommend a substantial rewrite of the manuscript, emphasizing structure and readability. The science is well executed, but the current writing does not do it justice.

      We thank the reviewer for their positive comments and their constructive feedback on the lack of clarity in writing. Following the reviewer’s suggestions, we have rewritten parts of the manuscript and reorganizd a few sections to improve readability. We hope the revised manuscript is significantly improved.

      Recommendations for the authors

      Reviewer #1 (Recommendations For The Authors):

      1) Title and Abstract: Add the study organism to the abstract, and probably also the title. Currently, E. coli is not mentioned in either! I'm also not sure that the LTEE is a sufficiently well-known acronym to abbreviate this in the title.

      We have revised the title of the manuscript and now spell out LTEE and included E. coli in the title and the abstract.

      2) Abstract: I would switch the usage of metabolome to metabolism in a few more places. For example, "changes in its metabolism", "networked and convoluted nature of metabolism". The metabolome, the concentrations of all metabolites, is what is being measured, but I think of this as a phenotypic readout of how metabolism evolving.

      We have changed “metabolome” to “metabolism” in cases where we refer to what is evolving and use “metabolome” when we refer to what is being measured.

      3) Line 16: Technically, the 12 LTEE populations were not initially identical. The Ara- differed from the Ara+ ancestors by one intentional mutation and one unintentional mutation that was not discovered until whole genomes were sequenced. I would rephrase this to "where 12 replicate populations of E. coli are propagated" or something similar so that it can be correct without needing to describe this unnecessary detail.

      The line has been rephrased as suggested.

      4) General Note: The text refers to populations as Ara-3 but the figures use A-3. I'd suggest going with A-3 and similar throughout for consistency.

      Instances of Ara have been changed to A+/-, and a sentence specifying as such has been added to the intro to make mention of this.

      5) Lines 43-44, 97-98. My understanding is that both S and L ecotypes in A-2 can use both glucose and acetate, but that the differentiation is related to their specialization that leads to each one being better on one or the other nutrient. The descriptions make it sound like each grows at a different time. Also, by definition, cells are not growing during "stationary phase". The change from glucose utilization (and acetate secretion) to acetate utilization during one cycle of growth is better described as a diauxic shift.

      We have reworded this part to remove mention of “growth” during stationary phase and changed the wording such that it no longer sounds like they grow at different times.

      6) Line 54: The statement "provide the ability to test hypotheses from previous data" is vague. Either provide an example or delete.

      We have removed this sentence as suggested.

      7) Lines 71-72: The terms "interphase" and "intraphase" sound too much like parts of the cell cycle. I'd suggest describing the comparisons as between and within growth phases.

      The use of intra and interphase have been changed as suggested.

      8) Line 79: The citrate is presumably still a chelating agent, so change phrasing to "Citrate is present in the medium because it was originally added as a chelating agent" or something similar.

      This sentence has been rewritten as suggested.

      9) Line 83: Write out "mutation accumulations" so it is easier to understand as "the number of mutations that have accumulated".

      The phrase has been changed as suggested.

      10) Line 116: It's unclear whether the abundances of metabolites are "strategies of survival" in stationary phase. An equally valid explanation is that there is less selection on the metabolome to have a specific composition during stationary phase to have high fitness.

      We have added a line about the possibility for alternative hypotheses.

      11) Figure 1: There seems to be some information missing from the legend. What are R06 and R07 in Panels A and B? Is panel D exponential phase and panel E stationary phase?

      This information was inadvertently missing from the caption and has been added.

      12) Figures 2 and 3: Gene names should be in italics. To me, the gray for deleted genes is hard to tell apart from the blue/red. Perhaps you could put a little X in these boxes instead? I think that having a little triangle pointing from each gene or metabolite name its corresponding abundance panel would help the reader track which information goes with which features. In Fig. 3 the placement of L-aspartate is a bit awkward. I'd suggest moving it down so the dashed line does not have to go through the abundance panel.

      These figures have been edited to include small triangles that link a gene or metabolite and its heatmap. Additionally, an X has been added where genes have suffered inactivating mutations and the placement of some elements has been moved to improve overall clarity.

      13) Lines 183-185: It would be easier to see and judge the consistency of these argR related relationships if a correlation graph of some kind was shown, probably as a supplemental figure. This plot could, for example, have genes/metabolites across the x-axis and fold-change on the y-axis with lines connecting points corresponding to each of the twelve populations across these categories (like Fig S8 but with lines added). Alternatively, it could be a heat map with the populations across one axis and the genes/metabolites across the other axis (like Fig S3).

      We have added a supplementary figure consisting of heatmaps showing the consistency of these changes within an evolved line. It is now figure S9.

      14) Line 195: I think adding a sentence elaborating on what exactly mutation accumulation means in this context would be helpful to readers.

      We have attempted to clarify the meaning of this by specifically stating that it is due to the accumulation of deleterious mutations.

      15) Line 293: Is standard LTEE medium DM25? These omics experiments with the LTEE sometimes use similar media with different glucose concentrations, and this is a very important detail to precisely specify.

      We reference “standard” LTEE medium in the methods section and have additionally specified the amount of sugar to make it clear that we are not supplementing the media with additional sugar.

      16) Figure S8B. Is "cystine" used instead of "cysteine" on purpose here since the compound is oxidized in the metabolomics treatment?

      The use of cystine is intentional, we detect the oxidized compound.

      Reviewer #2 (Recommendations For The Authors):

      Title:

      The abbreviation "LTEE" should not be in the title. Most readers will not recognize what it means. Instead, either the full name of the experiment, "Long-Term Evolution Experiment with E. coli," should be used, or the title should be rephrased to "Linking genotypic and phenotypic changes during a long-term evolution experiment using metabolomics."

      We have spelled out LTEE and included E. coli in the title.

      Abstract:

      Sentence 1: Consider softening the statement: "Do changes in an organism's environment, genome, or gene expression patterns often lead to changes in its metabolome?"

      We have rephrased this sentence to “Changes in an organism's environment, genome, or gene expression patterns can lead to changes in its metabolism”.

      Sentence 4: Use a hyphen for "Long-Term."

      This addition has been made.

      Sentence 4: Replace "transduce" with a more appropriate term: "...how the effects of mutations can be distributed through a cellular network to eventually affect metabolism and fitness."

      We have rewritten this sentence as “to understand how mutations can eventually affect metabolism and perhaps fitness”.

      Sentence 5: Clarify the use of "both" to refer to the ancestor of the LTEE and its descendant populations as two classes.

      We have reworded this sentence so it’s clear that the ancestors and evolved lines are two separate classes “We used mass-spectrometry to broadly survey the metabolomes of the ancestral strains and all 12 evolved lines…”.

      Sentence 6: Reverse the order for better emphasis: "Our work provides a better understanding of how mutations might affect fitness through the metabolome in the LTEE, and thus provides a major step in developing a complete genotype-phenotype map for this experimental system."

      We have rearranged this sentence per the reviewers suggestion.

      Introduction:

      Revise the introduction for clarity, readability, and logical narrative progression. Start with the second paragraph to set up the basic scientific principles being studied and then transition to describing the LTEE as a model system to examine those principles.

      The introduction has been rearranged and reworded in parts to increase clarity.

      Sentence 1: Revise for clarity: "The Long-Term Evolution Experiment (LTEE) has studied 12 initially identical populations of Escherichia coli as they have evolved in a carbon-limited, minimal glucose medium under a daily serial transfer regime."

      Sentence 2: Suggestion: "Begun in 1988, the LTEE populations have evolved for more than 75,000 generations, making it the longest-running experiment of its kind."

      Paragraph 2, sentence 2: Italicize "Drosophila."

      Paragraph 3, sentence 2: Make an important distinction: "Ara-3 is unique in that it evolved the ability to grow aerobically on citrate."

      Paragraph 3, sentence 4: Introduce the IS-mediated loss of the rbs operon in the LTEE as if it has not been described elsewhere.

      These suggestions have been incorporated into the manuscript.

      Results:

      Section 3.1: The use of samples from hours 2 and 24 to represent exponential and stationary phase may present some issues. For instance, capturing Ara-3 during its exponential growth on glucose, but not citrate, at hour 2. Furthermore, except for Ara-3, the LTEE populations reach stationary phase after approximately 4 hours, and there could be significant differences between early, mid, and late stationary phase. This possibility should be acknowledged, and future follow-up work should consider exploring these differences.

      We have added sentences in the first paragraph of the results section to include these details. We have also added a short paragraph to the conclusions suggesting additional studies of stationary phase, citing work on evolution of E. coli during long term stationary phase.

      Paragraph 3: While Turner et al. 2017 is an essential reference regarding resource use differences between Ara-3 and other LTEE populations, it would be more suitable to reference Blount et al. 2012 for the mutations that enabled access to citrate. Also, it is important to note that the difference lies in the ability to grow aerobically on citrate, rather than the ability to metabolize it.

      This citation has been added.

      Paragraph 4: As mentioned elsewhere, most LTEE populations exhibit balanced polymorphisms. Therefore, it is more appropriate to state that Ara-2 is the best-understood example of long-term diversity. It is likely that there are important metabolic differences between co-existing lineages in other LTEE populations.

      We now refer to Ara-2 as being the best-understood example of long term diversity..

      Paragraph 5: The first sentence of this paragraph should likely end with "levels."

      The word “levels” was added to the end of this sentence.

      Figure 3: It is preferable to refer to the "Superpathway of arginine and polyamine biosynthesis," citing EcoCyc as a reference, rather than a descriptor.

      This has been changed to a reference.

      Section 3.3, Paragraph 3: While higher intracellular amino acid abundances may facilitate higher translation rates and faster growth, the higher abundances themselves do not evaluate the hypothesis. To evaluate the hypothesis, it is necessary to demonstrate that higher abundances are associated with higher translation or growth rates. Therefore, the final sentence of this paragraph is not meaningful.

      We have reworded this sentence to say that it’s not possible to tell what the additional amino acids are being used for given only this data and that additional experiments are needed to confirm this hypothesis.

      Section 3.4: The first paragraph of this section misstates how evolution works. The low level of glucose in the LTEE does not drive innovation; instead, innovation occurs at random through the introduction of variation by mutation. Although the existence of the citrate resource acts as a reward that selects for variation that provides access to it, it is essential to remember that evolution is blind to such a reward. Moreover, regarding the evolution of the Cit+ trait, it is incorrect to assert that low glucose contributed to its evolution. As shown by Quandt et al. (2015), it seems probable that Cit+ evolution was potentiated by adaptation to specialization on acetate, which is produced by overflow metabolism resulting from rapid growth on glucose. This rapid growth only occurs when glucose is relatively abundant. The level of glucose seems low to us because it is low relative to traditional levels in bacteriological media, but not to the bacteria.

      We agree that this is a semantical, but important distinction. We have reworded this part as to not suggest that evolution has any forward thinking properties and is indeed blind to any rewards that might occur as the result of adaptation.

      In general, all instances of "utilize" and its cognates should be replaced with "use" and its cognates.

      Instances of “utilize” have been changed to use and its cognates.

      There is some uncertainty about the expectation of ramping up the TCA cycle in the LTEE. Overflow metabolism and acetate production appear to be prevalent in the LTEE, suggesting that many lineages only partially oxidize carbon derived from glucose, thereby bypassing the TCA cycle. While it is possible that this interpretation is incorrect, it would be helpful to see it addressed in the manuscript.

      We agree that this is a plausible hypothesis, we have added a paragraph at the end of this section that discusses the implications of overflow metabolism as an alternative hypothesis.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Several concerns are raised from the current study.

      1) Previous studies showed that iTregs generated in vitro from culturing naïve T cells with TGF-b are intrinsically unstable and prone to losing Foxp3 expression due to lack of DNA demethylation in the enhancer region of the Foxp3 locus (Polansky JK et al, Eur J Immunol., 2008, PMID: 18493985). It is known that removing TGF-b from the culture media leads to rapid loss of Foxp3 expression. In the current study, TGF-b was not added to the media during iTreg restimulation, therefore, the primary cause for iTreg instability should be the lack of the positive signal provided by TGF-b. NFAT signal is secondary at best in this culturing condition.

      In restimulation, void of TGFb is necessary to cause iTreg instability. Otherwise, the setup is similar to the iTreg-inducing environment (Author response image 1). On the other hand, the ultimate goal of this study is to provide a scenario that bears some resemblance of clinical treatment, where TGFb may not be available. The reviewer is correct in stating that TGFb is essential for iTreg stability, we are studying the role played by NFAT in iTreg instability in vitro, and possibly in potential clinical use of iTreg .

      Author response image 1.

      Restimulation with TGFb will persist iTreg inducing environment, resulting in less pronounced instability. Sorted Foxp3-GFP+ iTregs were rested for 1d, and then rested or restimulated in the presence of TGF-β for 2 d. Percentages of Foxp3+ cells were analyzed by intracellular staining of Foxp3 after 2 d.

      2) It is not clear whether the NFAT pathway is unique in accelerating the loss of Foxp3 expression upon iTreg restimulation. It is also possible that enhancing T cell activation in general could promote iTreg instability. The authors could explore blocking T cell activation by inhibiting other critical pathways, such as NF-kb and c-Jun/c-Fos, to see if a similar effect could be achieved compared to CsA treatment.

      We thank the reviewer for this suggestion. We performed this experiment according to see extent of the role that NFAT plays, or whether other major pathways are involved. As Author response image 2 shows, solely inhibiting NFAT effectively rescued the instability of iTreg. The inhibition of NFkB (BAY 11-7082), c-Jun (SP600125), or a c-Jun/c-Fos complex (T5224) had no discernable effect, or in one case, possibly further reduction in stability. These results may indicate that NFAT plays a crucial and special role in TCR activation, which leads to iTreg instability. Other pathways, as far as how this experiment is designed, do not appear to be significantly involved.

      Author response image 2.

      Comparing effects of NFAT, NF-kB and c-Jun/c-Fos inhibitors on iTreg instability. Sorted Foxp3-GFP+ iTregs were rested for 1d, then restimulated by anti-CD3 and CD28 in the presence of listed inhibitors. Percentages of Foxp3+ cells were analyzed by intracellular staining after 2d restimulation.

      3) The authors linked chromatin accessibility and increased expression of T helper cell genes to the loss of Foxp3 expression and iTreg instability. However, it is not clear how the former can lead to the latter. It is also not clear whether NFAT binds directly to the Foxp3 locus in the restimulated iTregs and inhibits Foxp3 expression.

      T helper gene activation is likely to cause instability in iTregs by secreting more inflammatory cytokines, as shown in Figure Q9, for example, IL-21 secretion. Further investigation is needed to understand how these genes contribute to Foxp3 gene instability exactly. With our limited insight, there may be two possibilities. 1. IL-21 directly affects Foxp3 through its impact on certain inflammation-related transcription factors (TFs). 2. There could be an indirect relationship where NFAT has a greater tendency to bind to those inflammatory TFs when iTreg instability appears, promoting the upregulation of these Th genes like in activated T cells, while being less likely to bind to SMAD and Foxp3, representing a competitive behavior. We at the moment cannot comprehend the intricacies that lead to the differential effects on T helper genes and Treg related genes.

      With that said, we have previously attempted to explore the direct effect of NFAT on Foxp3 gene locus. Foxp3 transcription in iTregs primarily relies on histone modifications such as H3K4me3 (Tone et al., 2008; Lu et al., 2011) rather than DNA demethylation (Ohkura et al., 2012; Hilbrands et al., 2016). Previous studies have reported that NFAT and SMAD3 can together promote the histone acetylation of Foxp3 genes (Tone et al., 2008). In our previous set of experiments, we simultaneously obtained information of NFAT binding sites and H3K4me3. In Foxp3 locus, we observed a decreasing trend in NFAT binding to the CNS3 region of Foxp3 in restimulated iTregs compared to resting iTregs (Author response image 3). Additionally, the H3K4me3 modification in the CNS3 region of Foxp3 decreased upon iTreg restimulation, but inhibiting NFAT nuclear translocation with CsA could maintain this modification at its original level (Author response image 3).

      Author response image 3.

      The NFAT binding and histone modification on Foxp3 gene locus. Genome track visualization of NFAT binding profiles and H3K4me3 profiles in Foxp3 CNS3 locus in two batches of dataset.

      Based on these preliminary explorations, it is concluded that NFAT can directly bind to the Foxp3 locus, and it appears that NFAT decreases upon restimulation, resulting in a decrease in H3K4me3, ultimately leading to the close association of NFAT and Foxp3 instability. However, due to limited sample replicates, these data need to be verified for more solid conclusions. We speculate that during the induction of iTregs, NFAT may recruit histone-modifying enzymes to open the Foxp3 CNS3 region, and this effect is synergistic with SMAD. When instability occurs upon restimulation, NFAT binding to Foxp3 weakens due to the absence of SMAD's assistance, subsequently reducing the recruitment of histone modifications enzyme and ultimately inhibiting Foxp3 transcription.

      Reviewer #2 (Public Review):

      (1) Some concerns about data processing and statistic analysis.

      The authors did not provide sufficient information on statistical data analysis; e.g. lack of detailed descriptions about

      -the precise numbers of technical/biological replicates of each experiment

      -the method of how the authors analyze data of multiple comparisons... Student t-test alone is generally insufficient to compare multiple groups; e.g. figure 1.

      These inappropriate data handlings are ruining the evidence level of the precious findings.

      We thank the reviewer for pointing out this important aspect. In the figure legend, numbers of independently-performed experiment repeats are shown as N, biological replicates of each experiment as n. Student’s t test was used for comparing statistical significance between two groups. In this manuscript, all calculations of significant differences were based on comparisons between two groups. There were no multiple conditions compared simultaneously within a single group, and thus, no other calculation methods were used.

      (2) Untransparent data production; e.g. the method of Motif enrichment analysis was not provided. Thus, we should wait for the author's correction to fully evaluate the significance and reliability of the present study.

      Per this reviewer’s request, we have provided detailed descriptions of the data analysis for Fig 5, including both the method section and the Figure legend, as presented below:

      “The peaks annotations were performed with the “annotatePeak” function in the R package ChIPseeker (Yu et al, 2015).

      The plot of Cut&Tag signals over a set of genomic regions were calculated by using “computeMatrix” function in deepTools and plotted by using “plotHeatmap” and “plotProfile” functions in deepTools. The motif enrichment analysis was performed by using the "findMotifsGenome.pl" command in HOMER with default parameters.

      The motif occurrences in each peak were identified by using FIMO (MEME suite v5.0.4) with the following settings: a first-order Markov background model, a P value cutoff of 10-4, and PWMs from the mouse HOCOMOCO motif database (v11).”

      Additionally, we have also supplemented the method section with further details on the analysis of RNA-seq and ATAC-seq data.

      (3) Lack of evidence in human cells. I wonder whether human PBMC-derived iTreg cells are similarly regulated.

      This is a rather complicated issue, human T cells express FoxP3 upon TCR stimulation (PNAS, 103(17): 6659–6664), whose function is likely to protect T cells from activation induced cell death, and does not offer Treg like properties. In contrast in mice, FoxP3 can be used as an indicator of Treg. Currently, this is not a definitive marker for Treg in human, our FoxP3 based readouts do not apply. Nevertheless, we have now investigated whether inhibiting calcium signaling or NFAT could enhance the stability of human iTreg. As shown in Author response image 4, we found that the proportion of Foxp3-expressing cells did not show significant changes across the different conditions, while the MFI analysis revealed that CsA-treated iTreg exhibited higher Foxp3 expression levels compared to both restimulated iTreg and rest iTreg. However, CM4620 had no significant effect on Foxp3 stability, consistent with the observation of its limited efficacy in suppressing human iTreg long term activation. In summary, our results suggest that inhibiting NFAT signaling through CsA treatment can help maintain higher levels of Foxp3 expression in human iTreg.

      Author response image 4.

      Effect of inhibiting NFAT and calcium on human iTreg stability. Human naïve CD4 cells from PBMC were subjected to a two-week induction process to generate human iTreg. Subsequently, human iTreg were restimulated for 2 days with dynabeads followed by 2 days of rest in the prescence of CsA and CM-4620. Four days later, percentages of Foxp3+ cells and Foxp3 mean fluorescence intensity (MFI) were analyzed by intracellular staining.

      (4) NFAT regulation did not explain all of the differences between iTregs and nTregs, as the authors mentioned as a limitation. Also, it is still an open question whether NFAT can directly modulate the chromatin configuration on the effector-type gene loci, or whether NFAT exploits pre-existing open chromatin due to the incomplete conversion of Treg-type chromatin landscape in iTreg cells. The authors did not fully demonstrate that the distinct pattern of chromatin regional accessibility found in iTreg cells is the direct cause of an effector-type gene expression.

      To our surprise, the inhibition of NFkB (BAY 11-7082), c-Jun (SP600125), and the c-Jun/c-Fos complex (T5224) resulted in minimal alterations, as shown in Fig Q1. This seems to argue that NFAT may play a more special role in events leading iTreg instability.

      We hypothesize that NFAT takes advantage of pre-existing open chromatin state due to the incomplete conversion of chromatin landscape in iTreg cells. Because iTreg cells, after induction, already exhibit inherent chromatin instability, with highly-open inflammatory genes. Furthermore, when iTreg cells were restimulated, the subsequent change in chromatin accessibility was relatively limited and not rescued by NFAT inhibitor treatment (Author response image 5). Therefore, in the case of iTreg cells, we propose that NFAT exploits the easy access of those inflammatory genes, leading to rapid destabilization of iTreg cells in the short term.

      In contrast, tTreg cells possess a relatively stable chromatin structure in the beginning, it would be interesting to investigate whether NFAT or calcium signaling could disrupt chromatin accessibility during the activation or expansion of tTreg cells. It is possible that NFAT might cause the loss of the originally established demethylation map and open up inflammatory loci, thereby inducing a shift in gene transcriptional profiles, equally leading to instability.

      Author response image 5.

      Chromatin accessibility of Rest, Retimulated, CsA/ORAIinh treated restimulated iTreg. PCA visualization of chromatin accessibility profiles of different cell types. Color indicates cell type.

      To establish a direct relationship between gene locus accessibility and its overexpression, a controlled experimental approach can be employed. One such method involves precise manipulation of the accessibility of a specific genomic locus using CRISPR-mediated epigenetic modifications at targeted loci. Subsequently, the impact of this manipulation on the expression level of the target gene can be precisely examined. By conducting these experiments, it will be possible to determine whether the augmented gene accessibility directly causes the observed gene overexpression.

      Reviewer #1 (Recommendations For The Authors):

      1) It might be helpful to add TGF-b to the iTreg restimulation culture to remove the influence of the lack of TGF-b from the equation, and measure the influence of SOCE/NFAT on iTreg instability.

      Please refer to Author response image 1.

      2) Alternatively, authors can also culture iTreg cells with TGF-b for 2 weeks when they undergo epigenetic changes and become more stabilized (Polansky JK et al, Eur J Immunol., 2008, PMID: 18493985). At this point, the stabilized iTregs can be used to measure the influence of SOCE/NFAT on iTreg instability.

      In the study conducted by Polansky, it was observed in Figure 1 that prolonged exposure to TGF-β fails to induce stable Foxp3 expression and demethylation of the Treg-specific demethylated region (TSDR). Based on this finding, we could consider exploring alternative approaches to obtain a more stabilized iTreg population. One such approach could be isolating Foxp3+helios-Nrp1- iTreg cells directly from the peripheral in vivo, which are also known as pTregs. Generally, pTreg cells generated in vivo tend to be more stable compared to iTreg cells induced in vitro, and they already exhibit partial demethylation of the Treg signature, as shown in Fig 6C (Polansky JK et al, Eur J Immunol., 2008, PMID: 18493985). Investigating the role of NFAT and calcium signaling in pTreg cells would provide further insights into the additional roles of NFAT in Treg phenotypical transitions, particularly its role in chromatin accessibility.

      3) In Figure 3, NFAT binding to the inflammatory genes in iTreg cells was even stronger than in activated T conventional cells. This is possibly due to Tconv cells being stimulated only once while iTregs were restimulated. A fair comparison should be conducted with restimulated activated conventional T cells.

      Figure 3 demonstrates the accessibility of inflammatory gene loci, rather than NFAT binding. Comparing restimulated Tconvs with restimulated iTreg cells is indeed a valuable suggestion, as their activation state and polarization in iTreg directions could lead to distinct chromatin accessibility. Although one is activated long term regularly and the other is activated long term under iTreg polarization, it is highly likely that the chromatin state of both activated Tconvs and iTreg cells is highly open, especially in terms of the accessibility of inflammatory genes. This may provide us with a new perspective to understand iTreg cells, but will unlikely affect our central conclusion.

      4) In the in vivo experiment in Figure 6, a control condition without OVA immunization should be included as a baseline.

      We have performed this experiment in the absence of OVA, as depicted in Author response image 6. In the absence of OVA immunization, both WT-ORAI and DN-ORAI iTreg exhibited substantial stability, although DN-ORAI demonstrated a slightly less stable trend. Upon activation with 40ug and 100ug of OVA, DN-ORAI iTreg demonstrated enhanced stability than WT-ORAI iTreg, maintaining a higher proportion of Foxp3 expression.

      Author response image 6.

      Stability of DN-ORAI iTreg in vivo with or without OVA immunization. WT-ORAI/DN-ORAI-GFP+-transfected CD45.2+ Foxp3-RFP+ OT-II iTregs were transferred i.v. into CD45.1 mice. Recipients were left or immunized with OVA323-339 in Alum adjuvant. On day 5, mLN were harvested and analyzed for Foxp3 expression by intracellular staining.

      Reviewer #2 (Recommendations For The Authors):

      Major

      Some concerns about the data processing and statistic analysis, as mentioned in the public review. In the figure legend, what does it mean e.g. n=3, N=3? Technical triplicate experiments? Three mice? Independently-performed three experiments? The authors should define it at least in the "Statistical analysis" in the method section otherwise the readers cannot determine the reason why they mainly use SEM for the data description.

      Moreover, in some cases, the number of experiments was not sure; e.g., Fig.1B, Fig. 5.

      How did the authors analyze data including multiple comparisons? Student t-test alone is generally insufficient to compare multiple groups; e.g. figure 1.

      We thank the reviewer for pointing out this omission. Now, in the figure legend, numbers of independently-performed experiment repeats are shown as N, biological replicates of each experiment as n. For Fig. 1B, N=2, and for Fig 5, we have acquired NFAT Cut&Tag data for 2 times, N=2. Student’s t test was used for comparing statistical significance between two groups. In this manuscript, all calculations of significant differences were based on comparisons between two groups. There were no multiple conditions compared simultaneously within a single group, and thus, no other calculation methods were involved apart from the Student's t-test.

      In Figure 1A, the difference in suppressiveness seemed subtle. Data collection of multiple doses of Tconv:Treg ratio will enhance the reliability of such kind of analysis.

      We have now attempted the suppression assay with varying Treg:Tconv ratios and observed that the suppressive effect of iTreg was more obvious than that of tTreg when co-cultured at a 1:1 ratio with Tconv cells. However, as the cell number of tTreg and iTreg decreased, the inhibitory effects converged.

      Author response image 7.

      Compare multiple dose of Tconv:Treg ratio in suppression function CFSE-labelled OT-II T cells were stimulated with OVA-pulsed DC, then different number of Foxp3-GFP+ iTregs and tTregs were added to the culture to suppress the OT-II proliferation. After 4 days, CFSE dilution were analyzed. Left, Representative histograms of CFSE in divided Tconvs. Right, graph for the percentage of divided Tconvs.

      In Figure 3F, to which group did the shaded peaks belong? In this context, the authors should focus on "Activation Region" peaks (open chromatin signature in both TcAct & iTreg defined in Fig. 4D) but I did not find the peak in the focusing DNA regions in TcAct (e.g. the shaded regions in IL-4 loci). The clear attribution of the peaks to the heatmap will enhance the visibility and understanding of readers.

      We have selected some typical peaks that belong to Fig 3D. These genes encompass some T-cell activation-associated transcription factors, such as Irf4, Atf3, as well as multiple members of the Tnf family including Lta, Tnfsf4, Tnfsf8, and Tnfsf14. Additionally, genes related to inflammation such as Il12rb2, Il9, and Gzmc are included. These genes show elevated accessibility upon T-cell activation, partially open in activated nTreg cells, referred to as the "Activation Region." They collectively exhibit high accessibility in iTreg cells, which may contribute to their instability.

      Author response image 8.

      Chromatin accessibility of some “Activation Region”. Genomic track showing chromatin accessibility of Irf4, Atf3, Lta, Tnfsf8, Tnfsf4, Tnsfsf14, Il12rb2, Il9, Gzmc in activated Tconv and iTreg.

      In Figure 4A/S4A, the information on cell death will help the understanding of readers because the sustained SOCE is associated with cell survival as shown in Fig. S2. The authors can discuss the relationships between cell death and Foxp3 retention, which potentially leads to a further interesting question; e.g. the selective/resistance to activation-induced cell death as the identity of Treg cells.

      As shown in Author response image 9, activated iTreg cells indeed exhibit a certain degree of cell death compared to resting iTreg cells. The inhibition of NFAT by CsA enhances the survival rate of iTreg cells, but the inhibition of ORAI by CM-4620 leads to more severe cell death. The cell death induced by CsA and CM-4620 is not consistent, indicating that there may not be a direct proportional relationship between cell death and the expression of Foxp3 and Treg identity.

      Author response image 9.

      Relationship of cell death and Foxp3 stability in restimulated iTregs. Sorted Foxp3-GFP+ iTregs were rested for 1d, then restimulated by anti-CD3 and CD28 in the presence of CsA or CM-4620. After 2d restimulation, live cell percentage were analyzed by staining of Live/Dead fixable Aqua, and percentages of Foxp3+ cells were analyzed by intracellular staining of Foxp3. Upper, live cell percentage of iTregs. Lower, percentages of Foxp3 in iTregs.

      In Figure 5, the information for the data interpretation was insufficient.

      We have provided detailed descriptions of the data analysis for Fig 5, including both the method section and the Figure legend, as presented below:

      “The peaks annotations were performed with the “annotatePeak” function in the R package ChIPseeker (Yu et al, 2015). The plot of Cut&Tag signals over a set of genomic regions were calculated by using “computeMatrix” function in deepTools and plotted by using “plotHeatmap” and “plotProfile” functions in deepTools. The motif enrichment analysis was performed by using the "findMotifsGenome.pl" command in HOMER with default parameters. The motif occurrences in each peak were identified by using FIMO (MEME suite v5.0.4) with the following settings: a first-order Markov background model, a P value cutoff of 10-4, and PWMs from the mouse HOCOMOCO motif database (v11).”

      Additionally, we have also supplemented the method section with further details on the analysis of RNA-seq and ATAC-seq data.

      The correlation between the open chromatin status of the gene loci described in Fig.5E and the expression at mRNA level? e.g.; Do iTreg-Act cells produce a higher level of IL-21 than nTreg-act? The analysis in Fig.5F-G should be performed in parallel with nTreg cells to emphasize the distinct NFAT-chromatin regulation in iTreg cells.

      We have now compared the secretion levels of IL-21 in tTreg and iTreg upon activation and treated with CsA by ELISA. As shown in Author response image 10, tTreg did not secrete IL-21 regardless of activation status (undetectable), while iTreg did not secrete IL-21 at resting state but exhibited IL-21 secretion after 48 h of activation. Moreover, the secretion of IL-21 was inhibited by CsA and CM-4620 treatment. This observation aligns with our earlier findings where we observed nuclear binding of NFAT to gene loci of these cytokines, enhancing their expression and pushing iTreg unstable under inflammatory conditions. These findings further underscore the likelihood that the inhibition of calcium and NFAT signaling might contribute to the stabilization of iTreg by suppressing the secretion of inflammatory cytokines.

      Author response image 10.

      IL-21 secretion in tTreg and iTreg upon activation. iTregs and tTregs were sorted and restimulated with anti-CD3 and anti-CD28 antibodies, in the presence of CsA and CM-4620. Cell culture supernatant were harvested after 2 d restimulation and IL-21 secretion was analyzed by ELISA.

      Performing a parallel comparison of NFAT activity between tTreg and iTreg cells was initially part of our experimental plan. However, it proved challenging in practice, as we encountered difficulties in efficiently infecting tTreg cells with NFAT-flag. Consequently, we could not obtain a sufficient number of tTreg cells for conducting Cut&Tag experiments.

      Based on our observations, we speculate that there might be substantial differences in the accessibility of genes in tTreg cells, leading to considerable variations in the repertoire of genes available for NFAT to regulate. As a result, we expect significant differences in the nuclear localization and activity of NFAT between iTreg and tTreg cells.

      In Figure 6C, what does the FCM plot between Foxp3-CFSE look like?

      The authors can discuss the mechanism of ORAI-DN-mediated through such analysis; e.g. the possibility that selective proliferation defect by ORAI-DN in Foxp3- cells led to an increased percentage of Foxp3, not only just unstable transcription of Foxp3.

      This is an in vitro experiment to assess the suppressive effect of iTreg on Tconv proliferation. Therefore, CFSE is used to stain Tconv cells, but not iTreg cells, so we did not detect proliferation feature of iTreg.

      Minor

      Confusing terminology of "tTreg" at line 47, etc. "natural Treg" contains both thymic-derived Treg and periphery-derived Treg cells. (A Abbas et al. Nat Immunol. 2013)

      We have now changed the designation to tTreg at line 47. tTreg refers to thymus-derived regulatory T cells, while nTreg includes both tTreg and pTreg. However, it is important to note that the Treg cells used in our study were isolated from the spleen of 2-4-month-old Foxp3-GFP or Foxp3-RFP mice. The CD4+ T cells were first enriched using the CD4 Isolation kit, and the FACSAriaII was utilized to collect CD4+ Foxp3-GFP/RFP+ Treg cells. Subsequently, Helios and Nrp-1 staining revealed that the majority of these cells were nTreg, with only approximately 6% being pTreg. Overall, we consider the cells we used as tTreg.

      In all FCM analyses, the authors should clarify how to detect Foxp3 expression; Foxp3-GFP/Foxp3-RFP/Intracellular staining like Figure S5A (but not specified in the other FCM plots)

      All Foxp3 expressions in the article were assessed using intracellular staining, as described in the methods section, and we have added specific descriptions to each figure legend. The reason for employing intracellular staining is that we used Foxp3-IRES-GFP mice, where GFP and Foxp3 are not fused into a single protein, existing as separate proteins after expression. Therefore, during induction, the appearance of GFP protein might potentially represent the presence of Foxp3. However, in cases of Foxp3 instability, the degradation of GFP protein may not be entirely synchronized with that of Foxp3 protein, making GFP an unreliable indicator of Foxp3 expression levels. As a result, for the purification of pure iTreg cells, we used Foxp3-GFP/RFP fluorescence, while for observing instability, we employed intranuclear staining of Foxp3.

      In Figure 6B, the captions were lacking in the two graphs on the right side

      The two restimulation conditions, 0.125+0.25 and 0.25+0.5, have been added into Fig 6B right side.

      In Figure S2, the annotation of the x-y axis was missing.

      Added.

      Lack of reference at line 292.

      Reference 42-46 were added.

      In the method section, the authors should note the further product information of antibodies and reagents to enhance reproducibility and transparency. Making a list that clarifies the suppliers, Ab clone, product IDs, etc. is encouraged. The authors did not specify the supplier of recombinant proteins and which type of TGF-beta (TGF-beta 1, 2, or 3?).

      A detailed description of the mice, antibodies, Peptide recombinant protein, commercial kit, and software has been provided and incorporated into the methods section.

      In the method section, the authors should clarify which Foxp3-reporter strain. There are many strains of Foxp3-reporter mice in the world. In line 373, is the "FoxP3-IRES-GFP transgenic mice" true? Knock-in strain or BAC-transgene?

      This mouse is a gift from Hai Qi Lab in Tsinghua University. They acquired this mouse strain from Jackson Laboratory, and the strain name is B6.Cg-Foxp3tm2Tch/J, Strain #:006772. An IRES-EGFP-SV40 poly A sequence was inserted immediately downstream of the endogenous Foxp3 translational stop codon, but upstream of the endogenous polyA signal, generating a bicistronic locus encoding both Foxp3 and EGFP.

      The age of mice used in the experiments should be specified, and confusing words such as "young" should not be used in any method descriptions; e.g. line 405.

      The detailed mouse age has been added in the methods section. “To prepare Tconv, tTreg and iTreg for experiments, spleen was isolated from 2-4-month-old Foxp3-GFP mice for Tconv and tTreg sorting, and 6-week-old mice for iTreg induction.”

      The method of how the original ATAC-seq/Cut & Tag data were generated was not described in the method section.

      Added in method section.

      The reference section was incomplete, and the style was not unified. e.g.; ref 7, 24, 25, 26 ... I gave up checking all.

      The style of ref 7, 22, 24, 26, 28, 31, 33, 35 were modified.

      Changes in manuscript:

      Author Name: “Huiyun Lv” to “Huiyun Lyu”.

      Fig 1A was updated according to Reviwer 2’s suggestion.

      Fig S3E and associated description was added according to Reviwer 2’s suggestion.

      Fig S4C and associated description was added according to Reviwer 1’s suggestion.

      Fig 5H and associated description was added according to Reviwer 2’s suggestion.

      Fig 6D were updated according to Reviwer 1’s suggestion.

      Fig 2D was corrected, the labels for gapdh and actin in the iTreg panel were inadvertently switched. The mistake has been rectified, and the original gel image will be provided.

      Fig 2A and Fig 4A was updated.

      The style of Fig 6B and Fig S2A was modified.

      Method:

      Mice: FoxP3-IRES-GFP with more description.

      Flow Cytometry sorting and FACS: the detailed mouse age has been added. RNA-seq analysis, ATAC-sequencing, ATAC-seq analysis, Cut&Tag assay, Cut&Tag data analysis: more description was added.

      Statistical analysis: “Numbers of independently-performed experiment repeats are shown as N, biological replicates of each experiment as n.” were added.

      Reference: Ref 42-46 and 49-52 were added. The style of ref 7, 22, 24, 26, 28, 31, 33, 35 were corrected.

      A detailed description of the mice, antibodies, Peptide recombinant protein, commercial kit, and software has been provided.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This study provides potentially important, new information about the combination of information from the two eyes in humans. The data included frequency tagging of each eye's inputs and measures reflecting both cortical (EEG) and sub-cortical processes (pupillometry). Binocular combination is of potentially general interest because it provides -in essence- a case study of how the brain combines information from different sources and through different circuits. The strength of supporting evidence appears to be solid, showing that temporal modulations are combined differently than spatial modulations, with additional differences between subcortical and cortical pathways. However, the manuscript's clarity could be improved, including by adding more convincing motivations for the approaches used.

      We thank the editor and reviewers for their detailed comments and suggestions regarding our paper. We have implemented most of the suggested changes. In doing so we noticed a minor error in our analysis code that affected the functions shown in Figure 2e (previously Figure 1e), and have fixed this and rerun the modelling. Our main results and conclusions are unaffected by this change. We have also added a replication data set to the Appendix, as this bears on one of the points raised by a reviewer, and included a co-author who helped run this experiment.

      Reviewer #1 (Public Review):

      In this paper, the interocular/binocular combination of temporal luminance modulations is studied. Binocular combination is of broad interest because it provides a remarkable case study of how the brain combines information from different sources. In addition, the mechanisms of binocular combination are of interest to vision scientists because they provide insight into when/where/how information from two eyes is combined.

      This study focuses on how luminance flicker is combined across two eyes, extending previous work that focused mainly on spatial modulations. The results appear to show that temporal modulations are combined in different ways, with additional differences between subcortical and cortical pathways.

      1. Main concern: subcortical and cortical pathways are assessed in quite different ways. On the one hand, this is a strength of the study (as it relies on unique ways of interrogating each pathway). However, this is also a problem when the results from two approaches are combined - leading to a sort of attribution problem: Are the differences due to actual differences between the cortical and subcortical binocular combinations, or are they perhaps differences due to different methods. For example, the results suggest that the subcortical binocular combination is nonlinear, but it is not clear where this nonlinearity occurs. If this occurs in the final phase that controls pupillary responses, it has quite different implications.

      At the very least, this work should clearly discuss the limitations of using different methods to assess subcortical and cortical pathways.

      The modelling asserts that the nonlinearity is primarily interocular suppression, and that this is stronger in the subcortical pathway. Moreover the suppression impacts before binocular combination. So this is quite a specific location. We now say more about this in the Discussion, and also suggest that fMRI might avoid the limits on the conclusions we can draw from different methods.

      1. Adding to the previous point, the paper needs to be a better job of justifying not only the specific methods but also other details of the study (e.g., why certain parameters were chosen). To illustrate, a semi-positive example: Only page 7 explains why 2Hz modulation was used, while the methods for 2Hz modulation are described in detail on page 3. No justifications are provided for most of the other experimental choices. The paper should be expanded to better explain this area of research to non-experts. A notable strength of this paper is that it should be of interest to those not working in this particular field, but this goal is not achieved if the paper is written for a specialist audience. In particular, the introduction should be expanded to better explain this area of research, the methods should include justifications for important empirical decisions, and the discussion should make the work more accessible again (in addition to addressing the issues raised in point 1 above). The results also need more context. For example, why EEG data have overtones but pupillometry does not?

      We now explain the choice of frequency in the final paragraph of the introduction as follows:

      ‘We chose a primary flicker frequency of 2Hz as a compromise between the low-pass pupil response (see Barrionuevo et al., 2014; Spitschan et al., 2014), and the relatively higher-pass EEG response (Regan, 1966).’

      We also mention why the pupil response is low-pass:

      ‘The pupil response can be modulated by periodic changes in luminance, and is temporally low-pass (Barrionuevo et al., 2014; Spitschan et al. 2014), most likely due to the mechanical limitations of the iris sphincter and dilator muscles’.

      Reviewer #2 (Public Review):

      Previous studies have extensively explored the rules by which patterned inputs from the two eyes are combined in the visual cortex. Here the authors explore these rules for un-patterned inputs (luminance flicker) at both the level of the cortex, using Steady-State Visual Evoked Potentials (SSVEPs) and at the sub-cortical level using pupillary responses. They find that the pattern of binocular combination differs between cortical and sub-cortical levels with the cortex showing less dichoptic masking and somewhat more binocular facilitation.

      Importantly, the present results with flicker differ markedly from those with gratings (Hou et al., 2020, J Neurosci, Baker and Wade 2017 cerebral cortex, Norcia et al, 2000 Nuroreport, Brown et al., 1999, IOVS). When SSVEP responses are measured under dichoptic conditions where each eye is driven with a unique temporal frequency, in the case of grating stimuli, the magnitude of the response in the fixed contrast eye decreases as a function of contrast in the variable contrast eye. Here the response increases by varying (small) magnitudes. The authors favor a view that cortex and perception pool binocular flicker inputs approximately linearly using cells that are largely monocular. The lack of a decrease below the monocular level when modulation strength increase is taken to indicate that previously observed normalization mechanism in pattern vision does not play a substantial role in the processing of flicker. The authors present a computational model of binocular combination that captures features of the data when fit separately to each data set. Because the model has no frequency dependence and is based on scalar quantities, it cannot make joint predictions for the multiple experimental conditions which is one of its limitations.

      A strength of the current work is the use of frequency-tagging of both pupil and EEG responses to measure responses for flicker stimuli at two anatomical levels of processing. Flicker responses are interesting but have been relatively neglected. The tagging approach allows one to access responses driven by each eye, even when the other eye is stimulated which is a great strength. The tagging approach can be applied at both levels of processing at the same time when stimulus frequencies are low, which is an advantage as they can be directly compared. The authors demonstrate the versatility of frequency tagging in a novel experimental design which may inspire other uses, both within the present context and others. A disadvantage of the tagging approach for studying sub-cortical dynamics via pupil responses is that it is restricted to low temporal frequencies given the temporal bandwidth of the pupil. The inclusion of a behavioral measure and a model is also a strength, but there are some limitations in the modeling (see below).

      The authors suggest in the discussion that luminance flicker may preferentially drive cortical mechanisms that are largely monocular and in the results that they are approximately linear in the dichoptic cross condition (no effect of the fixed contrast stimulus in the other eye). By contrast, prior research using dichoptic dual frequency flickering stimuli has found robust intermodulation (IM) components in the VEP response spectrum (Baitch and Levi, 1988, Vision Res; Stevens et al., 1994 J Ped Ophthal Strab; France and Ver Hoeve, 1994, J Ped Ophthal Strab; Suter et al., 1996 Vis Neurosci). The presence of IM is a direct signature of binocular interaction and suggests that at least under some measurement conditions, binocular luminance combination is "essentially" non-linear, where essential implies a point-like non-linearity such as squaring of excitatory inputs. The two views are in striking contrast. It would thus be useful for the authors could show spectra for the dichoptic, two-frequency conditions to see if non-linear binocular IM components are present.

      This is an excellent point, and one that we had not previously appreciated the importance of. We have generated a figure (Fig 8) showing the IM response in the cross frequency conditions. There is a clear response at 0.4Hz in the pupillometry data (2-1.6Hz), and at 3.6Hz in the EEG data (2+1.6Hz). We therefore agree that this shows the system is essentially nonlinear, despite the binocular combination appearing approximately linear. We now say in the Discussion:

      ‘In the steady-state literature, one hallmark of a nonlinear system is the presence of intermodulation responses at the sums and differences of fundamental flicker frequencies (Baitch & Levi, 1988; Tsai et al., 2012). In Figure 8 we plot the amplitude spectra of conditions from Experiment 1 in which the two eyes were stimulated at different frequencies (2Hz and 1.6Hz) but at the same contrast (48%; these correspond to the binocular cross and dichoptic cross conditions in Figures 2d,e and 3d,e). Consistent with the temporal properties of pupil responses and EEG, Figure 8a reveals a strong intermodulation difference response at 0.4Hz (red dashed line), and Figure 8b reveals an intermodulation sum response at 3.6Hz (red dashed line). The presence of these intermodulation terms is predicted by nonlinear gain control models of the type considered here (Baker and Wade, 2017; Tsai et al., 2012), and indicates that the processing of monocular flicker signals is not fully linear prior to the point at which they are combined across the eyes.’

      If the IM components are indeed absent, then there is a question of the generality of the conclusions, given that several previous studies have found them with dichoptic flicker. The previous studies differ from the authors' in terms of larger stimuli and in their use of higher temporal frequencies (e.g. 18/20 Hz, 17/21 Hz, 6/8 Hz). Either retinal area stimulated (periphery vs central field) or stimulus frequency (high vs low) could affect the results and thus the conclusions about the nature of dichoptic flicker processing in cortex. It would be interesting to sort this out as it may point the research in new directions.

      This is a great suggestion about retinal area. As chance would have it, we had already collected a replication data set where we stimulated the periphery, and we now include a summary of this data set as an Appendix. In general the results are similar, though we obtain a measurable (though still small) second harmonic response in the pupillometry data with this configuration, which is a further indication of nonlinear processing.

      Whether these components are present or absent is of interest in terms of the authors' computational model of binocular combination. It appears that the present model is based on scalar magnitudes, rather than vectors as in Baker and Wade (2017), so it would be silent on this point. The final summation of the separate eye inputs is linear in the model. In the first stage of the model, each eye's input is divided by a weighted input from the other eye. If we take this input as inhibitory, then IM would not emerge from this stage either.

      We have performed the modelling using scalar values here for simplicity and transparency, and to make the fitting process computationally feasible (it took several days even done this way). This type of model is quite capable of processing sine waves as inputs, and producing a complex output waveform which is Fourier transformed and then analysed in the same way as the experimental data (see e.g. Tsai, Wade & Norcia, 2012, J Neurosci; Baker & Wade, 2017, Cereb Cortex). However our primary aim here was to fit the model, and make inferences about the parameter values, rather than to use a specific set of parameter values to make predictions. We now say more about this family of models and how they can be applied in the methods section:

      “Models from this family can handle both scalar contrast values and continuous waveforms (Tsai et al., 2012) or images (Meese and Summers, 2007) as inputs. For time-varying inputs, the calculations are performed at each time point, and the output waveform can then be analysed using Fourier analysis in the same way as for empirical data.This means that the model can make predictions for the entire Fourier spectrum, including harmonic and intermodulation responses that arise as a consequence of nonlinearities in the model (Baker and Wade, 2017). However for computational tractability, we performed fitting here using scalar contrast values.”

      As a side point, there are quite a lot of ways to produce intermodulation terms, meaning they are not as diagnostic as one might suppose. We demonstrate this in Author response image 1, which shows the Fourier spectra produced by a toy model that multiplies its two inputs together (for an interactive python notebook that allows various nonlinearities to be explored, see here). Intermodulation terms also arise when two inputs of different frequencies are summed, followed by exponentiation. So it would be possible to have an entirely linear binocular summation process, followed by squaring, and have this generate IM terms (not that we think this is necessarily what is happening in our experiments).

      Author response image 1

      Related to the model: One of the more striking results is the substantial difference between the dichoptic and dichoptic-cross conditions. They differ in that the latter has two different frequencies in the two eyes while the former has the same frequency in each eye. As it stands, if fit jointly on the two conditions, the model would make the same prediction for the dichoptic and dichoptic-cross conditions. It would also make the same prediction whether the two eyes were in-phase temporally or in anti-phase temporally. There is no frequency/phase-dependence in the model to explain differences in these cases or to potentially explain different patterns at the different VEP response harmonics. The model also fits independently to each data set which weakens its generality. An interpretation outside of the model framework would thus be helpful for the specific case of differences between the dichoptic and dichoptic-cross conditions.

      As mentioned above, the limitations the reviewer highlights are features of the specific implementation, rather than the model architecture in general. Furthermore, although this particular implementation of the model does not have separate channels for different phases, these can be added (see e.g. Georgeson et al., 2016, Vis Res, for an example in the spatial domain). In future work we intend to explore the phase relationship of flicker, but do not have space to do this here.

      Prior work has defined several regimes of binocular summation in the VEP (Apkarian et al.,1981 EEG Journal). It would be useful for the authors to relate the use of their terms "facilitation" and "suppression" to these regimes and to justify/clarify differences in usage, when present. Experiment 1, Fig. 3 shows cases where the binocular response is more than twice the monocular response. Here the interpretation is clear: the responses are super-additive and would be classed as involving facilitation in the Apkarian et al framework. In the Apkarian et al framework, a ratio of 2 indicates independence/linearity. Ratios between 1 and 2 indicate sub-additivity and are diagnostic of the presence of binocular interaction but are noted by them to be difficult to interpret mechanistically. This should be discussed. A ratio of <1 indicates frank suppression which is not observed here with flicker.

      Operationally, we use facilitation to mean an increase in response relative to a monocular baseline, and suppression to mean a decrease in response. We now state this explicitly in the Introduction. Facilitation greater than a factor of 2 indicates some form of super-additive summation. In the context of the model, we also use the term suppression to indicate divisive suppression between channels, however this feature does not always result in empirical suppression (it depends on the condition, and the inhibitory weight). We think that interpretation of results such as these is greatly aided by the use of a computational modelling framework, which is why we take this approach here. The broad applicability of the model we use in the domain of spatial contrast lends it credibility for our stimuli here.

      Can the model explore the full range of binocular/monocular ratios in the Apkarian et al framework? I believe much of the data lies in the "partial summation" regime of Apkarian et al and that the model is mainly exploring this regime and is a way of quantifying varying degrees of partial summation.

      Yes, in principle the model can produce the full range of behaviours. When the weight of suppression is 1, binocular and monocular responses are equal. When the weight is zero, the model produces linear summation. When the weight is greater than 1, suppression occurs. It is also possible to produce super-additive summation effects, most straightforwardly by changing the model exponents. However this was not required for our data here, and so we kept these parameters fixed. We agree that the model is a good way to unify the results across disparate experimental paradigms, and that is our main intention with Figure 7i.

      Reviewer #3 (Public Review):

      This manuscript describes interesting experiments on how information from the two eyes is combined in cortical areas, sub-cortical areas, and perception. The experimental techniques are strong and the results are potentially quite interesting. But the manuscript is poorly written and tries to do too much in too little space. I had a lot of difficulty understanding the various experimental conditions, the complicated results, and the interpretations of those results. I think this is an interesting and useful project so I hope the authors will put in the time to revise the manuscript so that regular readers like myself can better understand what it all means.

      Now for my concerns and suggestions:

      The experimental conditions are novel and complicated, so readers will not readily grasp what the various conditions are and why they were chosen. For example, in one condition different flicker frequencies were presented to the two eyes (2Hz to one and 1.6Hz to the other) with the flicker amplitude fixed in the eye presented to the lower frequency and the flicker amplitude varied in the eye presented to the higher frequency. This is just one of several conditions that the reader has to understand in order to follow the experimental design. I have a few suggestions to make it easier to follow. First, create a figure showing graphically the various conditions. Second, come up with better names for the various conditions and use those names in clear labels in the data figures and in the appropriate captions. Third, combine the specific methods and results sections for each experiment so that one will have just gone through the relevant methods before moving forward into the results. The authors can keep a general methods section separate, but only for the methods that are general to the whole set of experiments.

      We have created a new figure (now Fig 1) that illustrates the conditions from Experiment 1, and is referenced throughout the paper. We have kept the names constant, as they are rooted in a substantial existing literature, and it will be confusing to readers familiar with that work if we diverge from these conventions. We did consider separating out the methods section, but feel it helps the flow of the results section to keep it as a single section.

      I wondered why the authors chose the temporal frequencies they did. Barrionuevo et al (2014) showed that the human pupil response is greatest at 1Hz and is nearly a log unit lower at 2Hz (i.e., the change in diameter is nearly a log unit lower; the change in area is nearly 2 log units lower). So why did the authors choose 2Hz for their primary frequency? And why did the authors choose 1.6Hz which is quite close to 2Hz for their off frequency? The rationale behind these important decisions should be made explicit.

      We now explain this in the Introduction as follows:

      ‘We chose a primary flicker frequency of 2Hz as a compromise between the low-pass pupil response (see Barrionuevo et al., 2014; Spitschan et al., 2014), and the relatively higher-pass EEG response (Regan, 1966).’

      It is a compromise frequency that is not optimal for either modality, but generates a measurable signal for both. The choice of 1.6 Hz was for similar reasons - for a 10-second trial it is four frequency bins away from the primary frequency, so can be unambiguously isolated in the spectrum.

      By the way, I wondered if we know what happens when you present the same flicker frequencies to the two eyes but in counter-phase. The average luminance seen binocularly would always be the same, so if the pupil system is linear, there should be no pupil response to this stimulus. An experiment like this has been done by Flitcroft et al (1992) on accommodation where the two eyes are presented stimuli moving oppositely in optical distance and indeed there was no accommodative response, which strongly suggests linearity.

      We have not tried this yet, but it’s on our to-do list for future work. The accommodation work is very interesting, and we now cite it in the manuscript as follows:

      ‘Work on the accommodative response indicates that binocular combination there is approximately linear (Flitcroft et al. 1992), and can even cancel when signals are in antiphase (we did not try this configuration here).’

      Figures 1 and 2 are important figures because they show the pupil and EEG results, respectively. But it's really hard to get your head around what's being shown in the lower row of each figure. The labeling for the conditions is one problem. You have to remember how "binocular" in panel c differs from "binocular cross" in panel d. And how "monocular" in panel d is different than "monocular 1.6Hz" in panel e. Additionally, the colors of the data symbols are not very distinct so it makes it hard to determine which one is which condition. These results are interesting. But they are difficult to digest.

      We hope that the new Figure 1 outlining the conditions has helped with interpretation here.

      The authors make a strong claim that they have found substantial differences in binocular interaction between cortical and sub-cortical circuits. But when I look at Figures 1 and 2, which are meant to convey this conclusion, I'm struck by how similar the results are. If the authors want to continue to make their claim, they need to spend more time making the case.

      Indeed, it is hard to make direct comparisons across figures - this is why Figure 4 plots the ratio of binocular to monocular conditions, and shows a clear divergence between the EEG and pupillometry results at high contrasts.

      Figure 5 is thankfully easy to understand and shows a very clear result. These perceptual results deviate dramatically from the essentially winner-take-all results for spatial sinewaves shown by Legge & Rubin (1981); whom they should cite by the way. Thus, very interestingly the binocular combination of temporal variation is quite different than the binocular combination of spatial variation. Can the pupil and EEG results also be plotted in the fashion of Figure 5? You'd pick a criterion pupil (or EEG) change and use it to make such plots.

      We now cite Legge & Rubin. We see what you mean about plotting the EEG and pupillometry results in the same coordinates as the matching data, but we don’t think this is especially informative as we would end up only with data points along the axes and diagonal of the plot, without the points at other angles. This is a consequence of how the experiments were conducted.

      My main suggestion is that the authors need to devote more space to explaining what they've done, what they've found, and how they interpret the data. I suggest therefore that they drop the computational model altogether so that they can concentrate on the experiments. The model could be presented in a future paper.

      We feel that the model is central to the understanding and interpretation of our results, and have retained it in the revised version of the paper.

      Reviewer #2 (Recommendations For The Authors):

      I found the terms for the stimulus conditions confusing. I think a simple schematic diagram of the conditions would help the reader.

      Now added (the new Fig 1).

      In reporting the binocular to monocular ratio, please clarify whether the monocular data was from one eye alone (and how that eye was chosen) or from both eyes and then averaged, or something else. It would be useful to plot the results from the dichoptic condition in this form, as well.

      These were averaged across both eyes. We now say in the Methods section:

      ‘We confirmed in additional analyses that the monocular consensual pupil response was complete, justifying our pooling of data across the eyes.’

      Also, clarify whether the term facilitation is used as above throughout (facilitation being > 2 times monocular response under binocular condition) or if a different criterion is being used. If we take facilitation to mean a ratio > 2, then facilitation depends on temporal frequency in Figure 4.

      We now explain our use of these terms in the final paragraph of the Introduction:

      ‘Relative to the response to a monocular signal, adding a signal in the other eye can either increase the response (facilitation) or reduce it (suppression).’

      The magnitude of explicit facilitation attained is interesting, but not without precedent. Ratios of binocular to mean monocular > 2, have been reported previously and values of summation depend strongly on the stimulus used (see for example Apkarian et al., EEG Journal, 1981, Nicol et al., Doc Ophthal, 2011).

      We now mention this in the Discussion as follows:

      ‘(however we note that facilitation as substantial as ours has been reported in previous EEG work by Apkarian et al. (1981))’

      In Experiment 3, the authors say that the psychophysical matching results are consistent with the approximately linear summation effects observed in the EEG data of Experiment 1. In describing Fig. 3, the claim is that the EEG is non-linear, e.g. super-additive - at least at high contrasts. Please reconcile these statements.

      We think that the ‘superadditive’ effects are close enough to linear that we don’t want to make too much of a big deal about them - this could be measurement error, for example. So we use terms such as near-linear, or approximately linear, when referring to them throughout.

      Reviewer #3 (Recommendations For The Authors):

      Let me make some more specific comments using a page/paragraph/line format to indicate where in the text they're relevant.

      1/2 (middle)/3 from end. "In addition" seems out of place here.

      Removed.

      1/3/4. By "intensities" do you mean "contrasts"?

      Fixed.

      1/3/last. "... eyes'...".

      Fixed.

      2/5/3. By "one binocular disc", you mean into "one perceptually fused disc".

      Rewritten as: ‘to help with their perceptual fusion, giving the appearance of a single binocular disc’

      3/1/1. "calibrated" seems like the wrong word here. I think you're just changing the vergence angle to enable fusion, right?

      Now rewritten as: ‘Before each experiment, participants adjusted the angle of the stereoscope mirrors to achieve binocular fusion’

      3/1/1. "adjusting the angles...". And didn't changing the mirror angles affect the shapes of the discs in the retinal images?

      Perhaps very slightly, but this is well within the tolerance of the visual system to compensate for in the fused image, especially for such high contrast edges.

      3/3/5. "fixed contrast" is confusing here because it's still a flickering stimulus if I follow the text here. Reword.

      Now ‘fixed temporal contrast’

      3/4/1. It would be clearer to say "pupil tracker" rather than "eye tracker" because you're not really doing eye tracking.

      True, but the device is a commercial eye tracker, so this is the appropriate term regardless of what we are using it for.

      3/5/6. I'm getting lost here. "varying contrast levels" applies to the dichoptic stimulus, right?

      Yes, now reworded as ‘In the other interval, a target disc was displayed, flickering at different contrast levels on each trial, but with a fixed interocular contrast ratio across the block.’

      3/5/7. Understanding the "ratio of flicker amplitudes" is key to understanding what's going on here. More explanation would be helpful.

      Addressed in the above point.

      4/3/near end. Provide some explanation about why the Fourier approach is more robust to noise.

      Added ‘(which can make the phase and amplitude of a fitted sine wave unstable)’

      Figure 1. In panel a, explain what the numbers on the ordinate mean. What's zero, for example? Which direction is dilation? Same question for panel b. It's interesting in panel c that the response in one eye to 2Hz increases when the other eye sees 1.6Hz. Would be good to point that out in the text.

      Good idea about panel (a) - we have changed the y-axis to ‘Relative amplitude’ for clarity, and now note in the figure caption that ‘Negative values indicate constriction relative to baseline, and positive values indicate dilation.’ Panel (b) is absolute amplitude, so is unsigned. Panel (c) only contains 2Hz conditions, but there is some dichoptic suppression across the two frequencies in panels (d,e) - we now cover this in the text and include statistics.

      6/2/1. Make clear in the text that Figure 1c shows contrast response functions for the pupil.

      Now noted in the caption.

      Figure 3. I'm lost here. I feel like I should be able to construct this figure from Figures 1 and 2, but don't know how. More explanation is needed at least in the caption.

      Done. The caption now reads:

      ‘Ratio of binocular to monocular response for three data types. These were calculated by dividing the binocular response by the monocular response at each contrast level, using the data underlying Figures 2c, 3c and 3f. Each value is the average ratio across N=30 participants, and error bars indicate bootstrapped standard errors.’

      9/1/1-2. I didn't find the evidence supporting this statement compelling.

      We now point the reader to Figure 4 as a reminder of the evidence for this difference.

      9/1/6-9. You said this. But this kind of problem can be fixed by moving the methods sections as I suggested above.

      As mentioned, we feel that the results section flows better with the current structure.

      Figure 4. Make clear that this is EEG data.

      Now added to caption.

      Figure 5 caption. Infinite exponent in what equation?

      Now clarified as: ‘models involving linear combination (dotted) or a winner-take-all rule (dashed)’

      Figure 6. I hope this gets dropped. No one will understand how the model predictions were derived. And those who look at the data and model predictions will surely note (as the authors do) that they are rather different from one another.

      As noted above, we feel that the model is central to the paper and have retained this figure. We have also worked out how to correct the noise parameter in the model for the number of participants included in the coherent averaging, which fixes the discrepancy at low contrasts. The correspondence between the data and model in is now very good, and we have plotted the data points and curves in the same panels, which makes the figure less busy.

      12/1. Make clear in this paragraph that "visual cortex" is referring to EEG and perception results and that "subcortical" is referring to pupil. Explain clearly what "linear" would be and what the evidence for "non-linear" is.

      Good suggestion, we have added qualifiers linking to both methods. Also tidied up the language to make it clearer that we are talking about binocular combination specifically in terms of linearity, and spelled out the evidence for each point.

      12/2/6-9. Explain the Quaia et al results enough for the reader to know what reflexive eye movements were studied and how.

      We now specify that these eye movements are also known as the ‘ocular following response’ and were measured using scleral search coils.

      12/2/9-10. Same for Spitchan and Cajochen: more explanation.

      Added:

      “(melatonin is a hormone released by the pineal gland that regulates sleep; its production is suppressed by light exposure and can be measured from saliva assays)”

      12/3/2-3. Intriguing statements about optimally combining noisy signals, but explain this more. It won't be obvious to most readers.

      We have added some more explanation to this section.

      13/1. This is an interesting paragraph where the authors have a chance to discuss what would be most advantageous to the organism. They make the standard argument for perception, but basically punt on having an argument for the pupil.

      Indeed, we agree that this point is necessarily speculative, however we think it is interesting for the reader to consider.

      13/2/1. "Pupil size affects the ..." is more accurate.

      Fixed.

      13/2/2 from end. Which "two pathways"? Be clear.

      Changed to ‘the pupil and perceptual pathways’

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      (1) The mechanism by which STAMBPL1 mediates GRHL3 transcription through its interaction with FOXO1 is not sufficiently discussed, especially in relation to how STAMBPL1 regulates FOXO1. Some reported effects are modest.

      We appreciate the reviewer’s comments. In response, we have added a discussion on the potential mechanisms by which STAMPBL1 regulates FOXO1 transcriptional activity in Discussion, highlighted in red on page 18, lines 342 to 352. The specific reply content is as follows: “The transcriptional activity of FOXO1 is primarily regulated by its nucleocytoplasmic shuttling process (Van Der Heide, Hoekman et al. 2004). The PI3K/AKT pathway promotes the phosphorylation of FOXO1, resulting in the formation of a complex with members of the 14-3-3 family (including 14-3-3σ, 14-3-3ε, and 14-3-3ζ), which facilitates its export from the nucleus and inhibits its transcriptional activity (Huang and Tindall 2007, Tzivion, Dobson et al. 2011). It’s reported that TDAG51 prevents the binding of 14-3-3ζ to FOXO1 in the nucleus by interacting with FOXO1, thereby enhancing its transcriptional activity through increased accumulation within the nucleus (Park, Jeon et al. 2023). Our results indicate that the overexpression of STAMBPL1 and STAMBPL1-E292A did not affect the protein levels of FOXO1 (Fig.7E and Fig.S5E), but STAMBPL1 co-localizes with FOXO1 in the nucleus (Fig.7M) and interacts with it (Fig.7N and Fig.S5I-J). This suggests that STAMBPL1 enhances the transcriptional activity of FOXO1 on GRHL3 by interacting with nuclear FOXO1.” The result was added to Supplementary Figure 5 as Fig.S5E.

      Reviewer #2 (Public review):

      (1) A potential limitation of the study is the reliance on specific cellular and animal models, which may constrain the extrapolation of these findings to the broader spectrum of human TNBC biology. Furthermore, while the study provides evidence for a novel regulatory axis involving STAMBPL1, FOXO1, and GRHL3, the multifaceted nature of angiogenesis may implicate additional regulatory factors not exhaustively addressed in this research.

      We appreciate the valuable suggestions provided by the reviewer. In Discussion, we have added an in-depth discussion of the limitations of the study, as well as an analysis of the regulatory factors related to tumor angiogenesis, which highlighted in red on pages 20 to 21, lines 396 to 412. The relevant content added is as follows: “In this study, we utilized two triple-negative breast cancer cell lines, HCC1806 and HCC1937, along with human primary umbilical vein endothelial cells (HUVECs) and a nude mouse breast orthotopic transplantation tumor model to investigate the regulatory mechanism by which STAMBPL1 activates the GRHL3/HIF1α/VEGFA signaling pathway through its interaction with FOXO1, thereby promoting angiogenesis in TNBC. The results of this study have certain limitations regarding their applicability to human TNBC biology. Furthermore, in addition to the HIF1α/VEGFA signaling pathway emphasized in this study, tumor cells can continuously release or upregulate various pro-angiogenic factors, such as Angiopoietin and FGF, which activate endothelial cells, pericytes (PCs), cancer-associated fibroblasts (CAFs), endothelial progenitor cells (EPCs), and immune cells (ICs). This leads to capillary dilation, basement membrane disruption, extracellular matrix remodeling, pericyte detachment, and endothelial cell differentiation, thereby sustaining a highly active state of angiogenesis (Liu, Chen et al. 2023). It is important to collect clinical TNBC tissue samples in the future to analyze the expression of the STAMBPL1/FOXO1/GRHL3/HIF1α/VEGFA signaling axis. Furthermore, patient-derived organoid and xenograft models are useful to elucidate the regulatory relationship of this axis in TNBC angiogenesis”

      Reviewer #3 (Public review):

      The main weaknesses of this work are that the relevance of this molecular axis to the pathogenesis of TNBC is not clear, and it is not clearly established whether this is a regulatory pathway that occurs in hypoxic conditions or independently of oxygen levels.

      (1) With respect to the first point, both FOXO1 and GRHL3 have been previously described as tumor suppressors, with reports of FOXO1 inhibiting tumor angiogenesis. Therefore, this works describes an apparently contradictory function of these proteins in TNBC. While it is not surprising that the same genes perform divergent functions in different tumor contexts, a stronger evidence in support of the oncogenic function of these two genes should be provided to make the data more convincing. As an example, the data in support of high STAMBPL1, FOXO and GRHL3 gene expression in TNBC TCGA specimens provided in Figure 8 is not very strong and it is not clear what the non-TNBC specimens are (whether other breast cancers or other tumors, perhaps those tumors whether these genes perform tumor suppressive functions). To strengthen the notion that STAMBPL1, FOXO and GRHL3 are overexpressed in TNCB, the authors could provide a comparison with normal tissue, as well as the analysis of other publicly available datasets (like the NCI Clinical Proteomic Tumor Analysis Consortium as an example). Finally, is it not clear what are the basal protein expression levels of STAMBPL1 in the cell lines used in this study, as based on the data presented in Figures 2D and F it appears that the protein is not expressed if not exogenously overexpressed. It would be helpful if the authors addressed this issue and provided further evidence of STAMBPL1 expression in TNBC cell lines.

      We appreciate the suggestions. In this study, we utilized the BCIP online tool to analyze the Metabric database, incorporating adjacent normal tissues as controls. Although the expression levels of STAMBPL1, FOXO1, and GRHL3 in breast cancer tissues are not uniformly higher than those in adjacent tissues, their expression levels in triple-negative breast cancer (TNBC) are significantly elevated compared to non-TNBC. The results of this re-analysis have been added in Supplementary Figure 6 as Fig.S6A-C.

      About the question of the basal protein expression levels of STAMBPL1 in the cell lines used in this study, our response is that Fig. 2A showed the endogenous level of STAMBPL1 in HCC1806 and HCC1937. For Fig. 2D and 2F, the overexpressed STAMBPL1 was fused with a 3xFlag tag, resulting in a higher molecular weight compared to the endogenous STAMBPL1. In the revised Figure 2, we have indicated the positions of the endogenous (Endo.) and exogenous (OE.) STAMBPL1 bands with arrows.

      (2) Linked to these considerations is the second major criticism, namely that it is not made clear if this new regulatory axis is proposed to act in normoxic or hypoxic conditions. The experiments presented in this paper are performed in both conditions but a clear explanation as to why cells are exposed to hypoxia is not given and would be necessary being that HIF-1a transcription and not protein stability is being analyzed. Also, different hypoxic conditions are sometimes used, resulting in different mRNA levels of HIF-1a and its downstream targets and quite significant fluctuations within the same cell line from one experimental setting to the next. The authors should provide an explanation as to why experimental conditions are changed and, more importantly, the experiments presented in Figure 2 should be performed also in normoxia.

      Thanks for the comments. Under normoxic conditions, HIF1α is recognized by pVHL due to hydroxylation and is rapidly degraded via the proteasomal pathway. In contrast, under hypoxic conditions, HIF1α protein is accumulated. To investigate the effect of STAMBPL1 knockdown on HIF1A gene transcription levels, we conducted experiments under hypoxic conditions to avoid interference from the rapid degradation of HIF1α at the protein level, as shown in Figures 2B-C. Furthermore, under normoxic conditions, the overexpression of STAMBPL1 had been demonstrated to significantly enhance the protein levels of HIF1α and upregulate the transcription of VEGFA through HIF1α. To avoid the potential impact of excessive accumulation of HIF1α protein under hypoxic conditions on its protein level detection and the transcription of downstream VEGFA, the related experiments shown in Figure 2D-G were performed under normoxic conditions. We have explained the corresponding experimental conditions in the “Result” and “Figure legends” according to the reviewer's comments, highlighted in red.

      (3) Another critical point is that necessary experimental controls are sometimes missing, and this is reducing the strength of some of the conclusions enunciated by the authors. As examples, experiments where overexpression of STAMBPL1 is coupled to silencing of FOXO1 to demonstrate dependency lack FOXO1 silencing the absence of STAMBPL1 overexpression. Because diminishing FOXO1 expression affects HIF-1a/VEGF transcription even in the absence of STAMBPL1 (shown in Figure 7C, D), it is not clear if the data presented in Figure 7G are significant. The difference between HIF-1a expression upon FOXO1 silencing should be compared in the presence or absence of STAMBPL1 overexpression to understand if FOXO1 impacts HIF-1a transcription dependently or independently of STAMBPL1.

      Thank you for this comment. For Fig.7G-H, our experimental objective was to determine whether the activation of HIF1A/VEGFA transcription by STAMBPL1 via FOXO1. Therefore, under STAMBPL1 overexpression, we knocked down FOXO1 to investigate whether FOXO1 silencing could reverse the upregulation of HIF1A/VEGFA transcription induced by STAMBPL1 overexpression.

      (4) In addition, some minor comments to improve the quality of this manuscript are provided.

      (4.1) As a general statement, the manuscript is extremely synthetic. While this is not necessarily a negative feature, sometimes results are discussed in the figure legends and not in the main text (as an example, western blots showing HIF-1a expression) and this makes it hard to read thought the data in an easy and enjoyable manner.

      Thank you for this suggestion. We have revised the figure legends to make them clearer and more concise, highlighted in red.

      (4.2) The effect of STAMBPL1 overexpression on HIF-1a transcription is minor (Figure 2) The authors should explain why they think this is the case and whether hypoxia may provide a molecular environment that is more permissive to this type of regulation.

      Thank you for the comment. Under normoxic conditions, we conducted WB to examine the protein expression of HIF1α after the overexpression of STAMBPL1 and the knockdown of HIF1α. To visually illustrate the impact of STAMBPL1 overexpression on HIF1A protein levels, as well as the effectiveness of HIF1α knockdown, we annotated the grayscale analysis results of the bands in Figures 2D and 2F. As the reviewer pointed out, under normoxic conditions, HIF1α is rapidly degraded, which may explain why the upregulation of HIF1α protein levels by STAMBPL1 overexpression is not very pronounced.

      (4.3) HIF-1a does not appear upregulated at the protein level protein by STAMBPL1 or GRLH3 overexpression, even though this is stated in the legends of Figures 2 and 6. The authors should show unsaturated western blots images and provide quantitative data of independent experiments to make this point.

      Thank you for this comment. We have added the unsaturated image of HIF1α into Fig.2D, and performed a grayscale analysis of the HIF1α bands in Fig.2D and Fig.6A to indicate the relative protein level of HIF1α.

      Reviewer #1 (Recommendations for the authors):

      (1) The authors previously reported that STAMBPL1 stabilizes MKP1 in TNBC. However, in this study, they focus on HIF1a. Given that STAMBPL1 affects HIF1a expression, it would be valuable to examine the levels of ROS in TNBC cells with or without STAMBPL1, as ROS is known to influence HIF1a stability.

      Thank you for your comments. It’s known that STAMBPL1 functions as a deubiquitinating enzyme. However, our study reveals that the upregulation of HIF1α by STAMBPL1 is independent of its deubiquitinating activity. This conclusion is supported by the observation that overexpression of the deubiquitinase active site mutant, STAMBPL1-E292A, also upregulated HIF1α expression (Figure 1F). Moreover, STAMBPL1 overexpression enhanced HIF1α transcription (Figures 4E and S3E), while STAMBPL1 knockdown was able to inhibit the transcription of HIF1α (Figures 2B-C). These results indicate that STAMBPL1 mediates the transcription of HIF1α but does not affect the stability of HIF1α. For these reasons, we think that it is unnecessary to examine the ROS levels.

      (2) Figure 1A: The regulation of HIF1a mRNA by STAMBPL1, but not its protein levels, could be better addressed by using MG132 to rule out the impact of protein degradation.

      Thanks for this comment. Under normoxic conditions, the oxygen-sensitive prolyl hydroxylases PHD1-3 act on HIF1α, specifically inducing hydroxylation at the proline 402 and 564 residues. These hydroxylated residues are recognized by the pVHL/E3 ubiquitin ligase complex, leading to ubiquitination and subsequent degradation via the proteasome pathway. Conversely, under hypoxic conditions, PHD1-3 are inactivated, and non-hydroxylated HIF1α is not recognized by the pVHL/E3 ubiquitin ligase complex, thereby avoiding ubiquitination and proteasomal degradation (DOI: 10.1073/pnas.95.14.7987, DOI: 10.1515/BC.2004.016, and DOI: 10.1042/BJ20040620). The mechanism of HIF1α accumulation under hypoxia is analogous to the action of the proteasome inhibitor MG132. When we treated cells with hypoxia, the ubiquitination and proteasomal degradation pathway of HIF1α was blocked. At this time, STAMBPL1 knockdown could downregulate the expression of HIF1α (Fig.1A). Meanwhile, since the knockdown of STAMBPL1 significantly downregulated the mRNA level of HIF1α under hypoxia (Fig.2B-C), we concluded that STAMBPL1 affects the expression of HIF1α by mediating its transcription. In addition, MG132 will block all proteasomal substrate degradation and may affect HIF1α mRNA levels indirectly.

      (3) Figure 2D and 2F: The effect of STAMBPL1 in promoting HIF1a expression is quite mild, and the effect of HIF1a knockdown is also modest. Given the high levels of STAMBPL1 in TNBC cell lines (Figure 2A), it would be better to repeat these experiments in a STAMBPL1-knockdown setting for clearer insights.

      We appreciate this insightful suggestion. Considering that the regulation of HIF1α expression by STAMBPL1 occurs at the transcriptional level, and to prevent excessive accumulation of HIF1a during hypoxia that could confound the effect of STAMBPL1 overexpression on HIF1α regulation, we opted to overexpress STAMBPL1 under normoxic conditions and subsequently knock down HIF1α, as shown in Fig.2D and Fig.2F. This approach allowed us to observe that STAMBPL1 overexpression can upregulate HIF1a expression to some extent. Additionally, in response to the reviewer's suggestion to knock down STAMBPL1, we have conducted the corresponding experiments, with results presented in Fig.1A-E and Fig.2B-C.

      (4) Figure 4A: Why does the RNA-seq pattern differ significantly between the two siRNAs? Additionally, the authors should clarify why they focus primarily on transcription factors, as other mechanisms, such as mRNA stability and RNA modification, could also influence gene transcription.

      Thank you for this comment. Two siRNAs for STAMBPL1 were designed and synthesized by a biotechnology company. Although both siRNAs target STAMBPL1, they target different sequences. While both siRNAs effectively knocked down STAMBPL1 (Fig. 1A and Fig. 2A), the possibility of off-target effects cannot be completely ruled out. Therefore, we needed to use two siRNAs simultaneously for RNA-seq, ensuring that the gene expression changes observed are due to the knockdown of STAMBPL1 by focusing on genes downregulated by both two siRNAs. Additionally, among the 27 genes downregulated by both two siRNAs, only 18 genes were annotated. Of these 18 genes, except for GRHL3, which is a transcription factor reported to be involved in gene transcription regulation, the remaining 17 genes have no documented association with RNA transcription, stability, or modification. Therefore, we focused on the GRHL3 gene.

      (5) Figure 5G: To investigate whether STAMBPL1 and GRHL3 function epistatically in the pathway, a double knockdown of STAMBPL1 and GRHL3 should be examined. Additionally, a double knockdown of STAMBPL1 and FOXO1 should be assessed.

      Thank you for your comment. In Figure 5G, we aimed to assess the knockdown efficiency of GRHL3 using siRNAs. To determine whether STAMBPL1 upregulates the HIF1a/VEGFA axis via GRHL3, we overexpressed STAMBPL1 and subsequently knocked down GRHL3. Our findings indicated that STAMBPL1 overexpression indeed enhanced the HIF1a/VEGFA axis, which was rescued by the knockdown of GRHL3, as shown in Figures 4E-F and S3E-F. Similarly, upon overexpressing STAMBPL1 and knocking down FOXO1, we observed that STAMBPL1 overexpression increased the GRHL3/HIF1a/VEGFA axis, which could also be rescued by knocking down FOXO1, as shown in Figures 7F-H. These results suggest that STAMBPL1 upregulates the GRHL3/HIF1a/VEGFA axis through FOXO1. We do not think it is a right way to double knock down STAMBPL1 and FOXO1 or GRHL3.

      (6) Figure 7: It remains unclear how STAMBPL1 regulates FOXO1. The authors show that STAMBPL1 increases the transcriptional activation of FOXO1 at the GRHL3 promoter, but it is not clear if STAMBPL1 is required for FOXO1 binding to the GRHL3 promoter. To address this, STAMBPL1-knockdown should be included to examine its effect on FOXO1 binding to the GRHL3 promoter. Furthermore, it would be important to determine whether the STAMBPL1-FOXO1 interaction is essential for GRHL3 transcription. Since the interaction sites of STAMBPL1-FOXO1 have been mapped, a mutant disrupting the interaction would provide better insight into how STAMBPL1 promotes GRHL3 transcription by interacting with FOXO1.

      Thank you for this comment. It has been reported that FOXO1 promotes the transcription of the GRHL3 gene by interacting with its promoter (DOI: 10.1093/nar/gkw1276). We also verified through ChIP assay that FOXO1 can bind to the promoter of GRHL3 gene (Fig.7I) and mediate its transcription. Specifically, knocking down FOXO1 significantly down-regulated the mRNA level of GRHL3 (Fig.7B), and the GRHL3 promoter lacking FOXO1 binding site almost completely lost transcriptional activity (Fig.7J), indicating that FOXO1 is crucial for the transcriptional activity of the GRHL3 promoter. Overexpression of STAMBPL1 enhances the activating effect of FOXO1 on the transcriptional activity of the GRHL3 promoter (Fig.7K). However, the up-regulation of GRHL3 transcription by overexpression of STAMBPL1 is completely blocked by FOXO1 knockdown (Fig.7F), and the knockdown of FOXO1 essentially blocks the binding of STAMBPL1 to the GRHL3 promoter (Fig.7L), suggesting that STAMBPL1 affects the transcriptional expression of GRHL3 based on FOXO1. As we added in Discussion, the transcription factor activity of FOXO1 is mainly regulated by its nucleoplasm shuttling process, and the accumulation of FOXO1 in nucleus can enhance its transcription factor activity (DOI: 10.1042/BJ20040167; DOI: 10.15252/embj.2022111867). In our research, neither STAMBPL1 nor its mutant of deubiquitinating enzyme site affected the expression of FOXO1 (Fig.S5E), but STAMBPL1 and FOXO1 co-located in the nucleus (Fig.7M), and they interacted with each other (Fig.7N, Fig.S5I-J). Therefore, we speculate that STAMBPL1 interacts with FOXO1 in the nucleus, obstructs the binding of FOXO1 with the members of 14-3-3 family, inhibits the export of FOXO1, thereby enhancing its transcriptional activity. This interaction between STAMBPL1 and FOXO1 does not necessarily affect the binding of FOXO1 with DNA, including the GRHL3 promoter.

      (7) Figure 8 A-C: What is the correlation among the expressions of STAMBPL1, FOXO1, and GRHL3 in TNBC tumors compared to non-TNBC tumors?

      Thank you for your comment. In Figure 8A-C, we analyzed the expression levels of STAMBPL1, FOXO1, and GRHL3 in both TNBC and non-TNBC samples using the BCIP. The results indicate that the expression levels of these three genes are significantly higher in TNBC compared to non-TNBC samples. To investigate the correlation among the expressions of STAMBPL1, FOXO1, and GRHL3 in TNBC versus non-TNBC, we further utilized the Metabric data. Besides the positive correlation trend between STAMBPL1 and GRHL3 expression in TNBC clinical samples (Pearson R = 0.27), no significant correlation was observed in the expression levels of STAMBPL1, FOXO1, and GRHL3 in TNBC and non-TNBC clinical samples (as shown in Author response image 1 below). Since STAMBPL1 and FOXO1 are involved as protein molecules in the transcriptional regulation of GRHL3 gene, and the data obtained from the Metabric database are the transcriptional levels of these three genes, this might be the reason why the correlation between their expressions was not observed.

      Author response image 1.

      Reviewer #2 (Recommendations for the authors):

      The authors have thoroughly elucidated the role of STAMBPL1 in TNBC. However, it would be beneficial to discuss the potential clinical implications of these findings, such as how targeting STAMBPL1 or FOXO1 might impact current treatment strategies for TNBC. However, several issues need to be addressed.

      Major:

      (1) While the study provides an exhaustive analysis of the molecular mechanisms, a comparison with other subtypes of breast cancer could enhance our understanding of the specificity of the STAMBPL1/FOXO1/GRHL3/HIF1α/VEGFA axis in TNBC.

      Thank you for your comment. According to report, STAMBPL1 is significantly associated with the mesenchymal characteristics of breast cancer (DOI: 10.1038/s41416-020-0972-x). We utilized cBioPortal (http://www.cbioportal.org/) to analyze the expression of STAMBPL1 across various clinical subtypes of breast cancer. The results indicated that STAMBPL1 is highly expressed in invasive breast cancer, which has been added to Supplementary Figure 6 as Fig.S6D. Given that TNBC is an aggressive type of invasive breast cancer, we further examined the expression of STAMBPL1 in TNBC compared to non-TNBC using BCIP (http://omicsnet.org/bcancer/database). Our findings revealed that the expression level of STAMBPL1 in TNBC was elevated relative to its levels in non-TNBC (Fig.8A). Additionally, since tumor angiogenesis is a critical factor influencing the metastasis of cancer cells, our study focused specifically on the pro-angiogenic effects of STAMBPL1 in TNBC.

      (2) The authors might consider discussing any potential off-target effects of the siRNA and shRNA used in the study to bolster the conclusions drawn from the knockdown experiments.

      We appreciate the reviewer's suggestion. It is well-known that siRNA or shRNA have off-target effects. To address this concern, we employed two siRNAs for each gene knockdown in our study. Specifically, we knocked down genes such as STAMBPL1, FOXO1, GRHL3, and HIF1A in two TNBC cell lines, HCC1806 and HCC1937, using two siRNAs. Except for siRNA#1 targeting HIF1A, which did not show a significant knockdown effect in HCC1806 cells (Fig.2D and Fig.6A), the knockdown effects of other siRNAs on their respective genes were effective, and the resulting phenotypes were consistent. As shown in Fig.2F and Fig.S4H, siRNA#1 targeting HIF1A had a significant knockdown effect in HCC1937 cells. The lower knockdown efficiency of this siRNA in HCC1806 cell line might be attributed to cell-specific factors.

      (3) It would be advantageous if the authors could provide further details on the patient demographics and tumor characteristics in the TCGA database analysis to better comprehend the clinical relevance of their findings.

      Thanks for the reviewer's suggestions. We have now indicated the number of clinical samples in each group in the legend of Fig.8A-C. Since we utilized the BCIP online database to analyze and compare the expression levels of the three genes STAMBPL1, FOXO1, and GRHL3 in TNBC and non-TNBC, we are unable to obtain more specific information regarding the tumor characteristics of each sample. However, our analysis clearly shows that the expression levels of these three genes are significantly higher in TNBC compared to non-TNBC.

      (4) The authors should consider discussing any limitations regarding the generalizability of their findings, such as potential variations among different TNBC subtypes or the specificity of their observations to certain stages of the disease.

      We appreciate the reviewer's comment. Accordingly, we have added a discussion on the limitation of this study in Discussion, highlighted in red font on pages 20 to 21, lines 396 to 412. In addition, we utilized the bc-GenExMiner online database to conduct a comparative analysis of STAMBPL1 expression in different subtypes of non-TNBC and TNBC. The result indicates that STAMBPL1 is highly expressed in mesenchymal-like and basal-like TNBC, which has been added into Supplementary Figure 6 as Fig.S6E. Since these two subtypes of TNBC are highly invasive and metastatic, it suggests that targeting the signaling pathway of STAMBPL1/FOXO1/GRHL3/HIF1α/VEGFA may offer clinical benefits for patients with invasive TNBC.

      Minor:

      The paper is generally well-written, but it's crucial to maintain vigilance for subject-verb agreement, proper use of tense, and consistent terminology.

      Thank you for this suggestion. We have thoroughly revised the article for issues such as grammar, including tense, subject-verb agreement, and terminology.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1:

      Summary:

      In this manuscript (eLife-RP-RA-2024-103904), the authors identified that NOLC1 was upregulated in gastric cancer samples, which promoted cancer progression and cisplatin resistance. They further found that NOLC1 could bind to p53 and decrease its nuclear transcriptional activity, then inhibit p53-mediated ferroptosis. There are several major concerns regarding the conclusions.

      Strengths:

      This study identified that NOLC1 could bind to p53 and decrease its nuclear transcriptional activity, then inhibit p53-mediated ferroptosis in gastric cancer.

      Weaknesses:

      The major conclusions were not sufficiently supported by the results. The experiments were not conducted in a comprehensive manner.

      Major concerns

      (1) The authors investigated NOLC1 expression in gastric cancer (GC) using clinical samples, which is valuable; however, the sample array includes only 3 patients. This sample size is insufficient to support conclusions for human samples. Please increase the sample size and apply a more robust statistical analysis. Additionally, specify the statistical methods used in the figure legend.

      Thanks very much for the kind comments and great suggestions. As suggested, we have increased the sample size of GC patients, and the new data (six pair samples) was shown in Fig. S1A, further reflecting that NOLC1 was upregulate in gastric cancer (GC). Moreover, the statistical methods have been added in each figure legend.

      (2) These data are not sufficient to support the key conclusion of this study "NOLC1 is significantly upregulated in GC tissues and Cis-resistant GC cells". There is no convincing data showing that NOLC1 upregulation is specific to cancer cells or any other cell types. Based on the following results that NOLC1 expressed in cancer cells can support cancer cell survival and drug resistance, the authors switched to investigating the role of NOLC1 in cancer cells without demonstrating cancer cells indeed highly upregulate NOLC1.

      Thanks for raising this good question. As shown in Fig. 1E-F, the TCGA database have shown that NOLC1 was upregulated in GC. Moreover, we further analyzed the NOLC1 expression level in other cancer type, according to the Human Protein Atlas (https://www.proteinatlas.org/). The results indicated that NOLC1 mRNA level was much higher in almost all cancers except acute myeloid leukemia (LAML). In addition, according to the gene expression profiling interactive analysis (GEPIA, http://gepia.cancer-pku.cn/index.html), NOLC1 mRNA level was above 100 nTPM in most gastric cancer cell lines, however in most non-cancerous cell lines was below 100 nTPM, indicating that NOLC1 was up-regulated in gastric cancer.

      Author response image 1.

      The mRNA level of NOLC1 in different GC cells and non-cancerous cells.

      (3) The authors primarily use MGC-803 cells for experiments; however, MGC-803 is known to be a HeLa-contaminated cell line. Could the authors explain this choice of using this cell line only? Did they validate key findings with additional cell lines? This is particularly important for assays such as cisplatin resistance validation, in vivo experiments, TEM imaging, and MitoPeDPP fluorescence imaging.

      Thanks for raising this good question. We are not only use MGC-803 cells, the key findings in vitro was also validated in MKN-45 cells (Fig. 2), and in vivo experiment also validated in Mouse Forestomach Carcinoma cells (MFC)-tumor bearing 615 mice model (Fig 7). Furthermore, we further added some experiments in MKN-45 cells. The TEM imaging showed that NOLC1 could significantly inhibit cisplatin (Cis) induced lipid membrane damage in MKN-45 cells (Fig. S6A). Moreover, MitoPeDPP fluorescence assay analyzed by FCAs also indicating that rapid ROS was enriched in mitochondria in MKN-45 cells (Fig. 4E, Fig. S6J).

      (4) In Figure 2, did the authors perform assays with NOLC1 overexpression? If so, please include these results to strengthen the conclusions.

      Thanks very much for the kind comments and great suggestions. As suggested, we added new data about NOLC1 overexpression assay Cell counting kit-8 assay shows that NOLC1-overexpression group is more resistance to Cis compared to vector group (Fig. S4E, S5A).

      (5) The authors show in Figures 2A-B that shNOLC1 without cisplatin treatment does not affect cell viability. However, Figures 2D-E suggest increased apoptosis in shNOLC1 cells without cisplatin treatment. Additionally, in vivo studies in Figure 3 show no significant difference between the shNC+PBS and shNOLC1+PBS groups, which appears contradictory to the apoptosis assays. Similarly, Ki67 staining shows decreased scores in the shNOLC1 group compared to shNC. Could the authors clarify this inconsistency?

      Thanks for raising this good question. In Fig 2D-E, the difference in proportion of death cells between shNOLC1 and shNC treated with PBS groups were only 3% (MGC-803) and 7% (MKN-45) which is much lower than that treated with cisplatin in vitro. Moreover, in vivo analysis indicated that the average tumor volume in NOLC1+PBS group was smaller than that in NC group, but there was no statistical significance (p value = 0.3962). Moreover, tumor proliferation is a complex process regulated by many factors [1,2], thus the level of Ki67 is by no means the same as the rate of tumor proliferation, might be positively correlated.

      (6) In Figure 4, NOLC1 knockdown appears to enhance cisplatin-induced ferroptosis rather than apoptosis. Given p53's role in apoptosis, did the authors compare the effects of NOLC1 on cisplatin-induced apoptosis vs. ferroptosis? If so, please clarify whether NOLC1 predominantly regulates apoptosis or ferroptosis.

      Thanks for raising this good question. We do have compared the effects of NOLC1 on cisplatin-induced apoptosis vs. ferroptosis. As shown in Fig. 5A, NOLC1 knockdown obviously increased the BCL-2 protein level which is an anti-apoptotic protein and mediated by p53 via protein interaction in cytoplasm[3,4], this phenomenon may cause by the increasing level of p53 in cytoplasm (Fig. 6I). Also, the TEM imaging showed the classic ferroptotic morphological changes rather than apoptosis (Fig. 5A, S6A). Taken together, NOLC1 mainly regulates p53 mediated ferroptosis rather than apoptosis.

      (7) Did the authors perform co-IP assays with p53 or HA antibodies to immunocapture NOLC1? If not, please add this experiment to support protein interactions. The mechanistic correlation between p53 and NOLC1 can be supported by adding experiments using multiple GC cell lines with various p53 alterations (such as loss-of- function or gain-of-function mutations/deletions). This is critical because the authors specifically claimed that NOLC1 can inhibit p53-mediated ferroptosis, but not other tumor suppressors.

      Thanks very much for the kind comments and great suggestions. As suggested, we had performed Co-IP assay with anti-HA antibodies to immunocapture NOLC1-FLAG. As shown in Fig. 5K, p53 DNA binding domain (DBD)-HA could immunocapture with NOLC1, further indicated that NOLC1 could binding to p53 DBD. Moreover, we concur with the reviewer that adding experiments using multiple p53 alterations, however considering that different p53 mutants have completely different functional changes. Therefore, we using siRNA to knockdown p53 level in MGC-803 cells, the results shown that NOLC1 mediated resistance was disappear and the GPX4 level was increased (Fig. S10). These data have shown that NOLC1 promotes GC resistance via mediated p53 functions.

      (8) In Figure S5B, the LDH release can be blocked by Fer-1?

      Thanks for raising this good question. As suggested, Fer-1 (20 μmol/mL) significantly blocked the LDH release in NOLC1 knockdown group (Fig S6E). This data further confirmed that NOLC1 suppressed Cis-induced ferroptosis.

      (9) How about the ubiquitination assay in MGC-803 cells?

      Thanks for raising this good question. As suggested, we also analyzed the ubiquitination assay in MGC-803 cells. As the result showed that NOLC1 also could increasing level of ubiquitination of p53 (Fig. 6H).

      (10) In Figure 6H, the DBD domain of NOLC1 is required for inhibiting P53 ubiquitination.

      Thanks for your opinion. However, in our paper, we only mentioned that p53 DBD domain, rather than NOLC1 DBD domain. Also, we did not find any DNA binding function of NOLC1 in the Pubmed database. Therefore, we would like to ask whether the revised opinion is correct.

      (11) In Figure 8B, the CD3 antibody is not specific, please change it to a new one.

      Thanks very much for the kind comments and great suggestions. As suggested, we have used new CD3 antibody and the new data was added in Fig. 8B.

      (12) The authors report that NOLC1 influences peripheral blood lymphocytes with cisplatin treatment, with or without PD-1. Could the authors explain why NOLC1 would affect peripheral blood lymphocytes? Additionally, did they assess immune cell infiltration in the tumor microenvironment (TME) by flow cytometry?

      Thanks for raising good question. The tumor size of the knockdown group treated with Cis + PD-1 was too small (less than 100 mg) to extract enough infiltrated immune cells (less than 10000 CD45<sup>+</sup> cells), thus we chose to detect immune cells in the blood of the mice. Considering that the infiltrating immune cells including CTLs were originate from peripheral blood by circulation. Under the normal conditions, serval tumor biology behavior impact the TME to limit immune responses and present barriers to cancer therapy. For example, tumor could express or secret lots of negative regulator like PD-L1. Causing immune cells cannot recognize tumor cells and infiltrate into tumor tissue. Ferroptosis, as a new from of ICD, could damage tumor cell plasm and release amount of tumor associated antigen and tumor-specific antigens causing immune cells priming and activation. Eventually, the activated immune cells in peripheral blood travel towards the tumor site, infiltrating the tumor tissue under favorable co-stimulatory conditions and guided by chemokine gradients. Once within the tumor microenvironment, these activated T cells can control tumor growth through direct tumor cell destruction and cytokine-mediated processes [5–8]

      To assess immune cell infiltration in the TME, we analyzed the tumor infiltrated CD3<sup>+</sup> and CD8<sup>+</sup> immune cells in tumor tissue by immunofluorescence (Fig. 8B). Thus, the peripheral blood lymphocytes could reflect the infiltration of immune cells in the tumor.

      Minor concerns:

      (1) Please clarify the statistical methods in each figure legend.

      Thanks for your opinion. We have added statistical methods in each figure legend.

      (2) In Figure 2D, please provide statistical data of cleaved-caspase3 expression.

      Thanks for your opinion. As is shown in Fig. S5B-C, the relative cleaved-caspase3 were provided.

      (3) Please ensure that the canonical expressions used in the research paper are adhered to.

      Thanks for your opinion. We have carefully modified our expressions in our paper.

      (4) Please pay more attention to the grammar and formatting of texts.

      Thanks for your opinion. We revised our manuscript through the American Journal Experts (AJE) service.

      Reviewer #2:

      Summary:

      Shengsheng Zhao et al. investigated the role of nucleolar and coiled-body phosphoprotein 1 (NOLC1) in relegating gastric cancer (GC) development and cisplatin-induced drug resistance in GC. They found a significant correlation between high NOLC1 expression and the poor prognosis of GC. Meanwhile, upregulation of NOLC1 was associated with cis-resistant GC. Experimentally, the authors demonstrate that knocking down NOLC1 increased GC sensitivity to Cis possibly by regulating ferroptosis. Mechanistically, they found NOLC1 suppressed ferroptosis by blocking the translocation of p53 from the cytoplasm to the nucleus and promoting its degradation. In addition, The authors also evaluated the effect of combinational treatment of anti- PD-1 and cisplatin in NOLC1-knockdown tumor cells, revealing a potential role of NOLC1 in the targeted therapy for GC.

      Strengths:

      Chemoresistance is considered a major reason causing failure of tumor treatment and death of cancer patients. This paper explored the role of NOLC1 in the regulation of Cis-mediated resistance, which involves a regulated cell death named ferroptosis. These findings provide more evidence highlighting the study of regulated cell death to overcome drug resistance in cancer treatment, which could give us more potential strategies or targets for combating cancer.

      Weaknesses:

      More evidence supporting the regulation of ferroptosis induced by Cisplatin by NOLC1 should be added. Particularly, the role of ferroptosis in the cisplatin-resistance should be verified and whether NOLC1 regulates ferroptosis induced by additional FINs should be explored. Besides, the experiments to verify the regulation of ferroptosis sensitivity by NOLC1 are sort of superficial. The role of MDM2/p53 in ferroptosis or cisplatin resistance mediated by NOLC1 should be further studied by genetic manipulation of p53, which is the key evidence to confirm its contribution to NOLC1 regulation of GC and relative cell death.

      Major points:

      (1) More evidence supporting the regulation of ferroptosis induced by Cisplatin by NOLC1 should be added. Particularly, the role of ferroptosis in the cisplatin-resistance should be verified and whether NOLC1 regulates ferroptosis induced by additional FINs should be explored.

      Thanks very much for the kind comments and great suggestions. As suggested, we have further analyzed the ferroptosis inhibit ability of NOLC1 in MGC-45 cells treated with Erastin, a common used ferroptosis activator. As shown in Fig. S6B, the ferroptosis activated by Erastin was also blocked by NOLC1.

      (2) In Figure 1J, the CR cell line should obviously have less apoptosis-maker c-PARP expression, which means these cells are resistant to apoptosis induced by CR. Thus, it would be more rational to study the role of apoptosis regulation by NOLC1. Why did the later data shift to the study of ferroptosis?

      Thanks for raising this good question. In the CR cells, the expression levels of many genes were changed, so it is uncertain whether the decreased expression level of cleaved-PARP in the resistant cells is caused by NOLC1 up-regulated. To explore the specific mechanism of NOLC1 mediated resistant, we performed the TEM imaging (Fig. 4A, S6A) and the results showed that cells exhibited classic ferroptosis morphological changes. Moreover, the BCL-2 (an anti-apoptotic protein, and regulated by p53 via protein interaction in cytoplasm) was increased after NOLC1 knockdown (Fig S5A). This phenomenon may cause by the increasing p53 levels in the cytoplasm[3,4] (Fig 5I). Taken together we shift to study of cisplatin induced ferroptosis.

      (3) Besides, how about the regulation of apoptosis during cis-resistance by NOLC1 in GC?

      Thanks for raising this good question. As mentioned above the Cis induced apoptosis was not as significant as ferroptosis, caused by BCL-2 (a key anti-apoptosis protein) increasing which is mediated by p53 via protein interaction in cytoplasm. NOLC1 increased plasm p53 level subsequently increased BCL-2 level.

      (4) The experiments to verify the regulation of ferroptosis sensitivity by NOLC1 are sort of superficial. The role of MDM2/p53 in ferroptosis or cisplatin resistance mediated by NOLC1 should be further studied by genetic manipulation of p53, which is the key evidence to confirm its contribution to NOLC1 regulation of GC and relative cell death.

      Thanks for raising this good question. As is shown in Fig S10, after knockdown p53 protein level by using siRNA, NOLC1 could not promote Cis-resistance and the GPX4 level was increased reflecting that NOLC1 promotes Cis resistance via mediate p53 function.

      (5) In Figure 2, the data indicated that the knockdown of NOLC1 increased rH2Ax in the presence of Cisplatin, which indicated that NOLC1 might regulate DNA damage-related cellular function. These functions should be more relevant to cisplatin resistance, considering the fundamental effect of this chemo drug.

      Thanks very much for the kind comments and great suggestions. Indeed, we found that DNA damage was more obvious in knockdown groups, but the ferroptotic changes like ROS and mitochondrial membrane damage were also significantly different in knockdown groups. Considering that as a chemo drug, cisplatin not only induces damage DNA but also acts as a stress which could activates various signal pathways including apoptosis, ferroptosis, pyroptosis, necroptosis, etc., under different drug concentrate or time [9–11]. Therefore, it is important to find out the NOLC1 predominantly blocked pathway in GC.

      (6) In Figure.4, ferroptosis inhibitors like Ferr-1 or DFO should be used to verify the regulation of ferroptosis by Cisplatin and NOLC1.

      Thanks very much for the kind comments and great suggestions. As suggested, we performed additional LDH release assay. The results showed that Fer-1 also could block cisplatin induced LDH release in NOLC1 knockdown groups (Fig. S6E).

      (7) In Figure 4H, Cisplatin decreased FSP1 and GPX4, which could be enhanced in the NOLC1-konckdown cell line. Meanwhile, the knockdown of NOLC1 increased the ACSL4 level. These findings could be the key reason for the regulation of ferroptosis by NOLC1 rather than p53 since they all are direct regulators of ferroptosis.

      Thanks very much for the kind comments and great suggestions. We rewrote the text as you suggested. Recently, it also has been reported that ACSL4-regulated ferroptosis is related to p53, but the exact mechanism is still unclear [12]. Moreover, further studies of specific relation between NOLC1 and FSP1/ACSL4 will be conducted in the further

      (8) Whether p53 mediates the regulation of ferroptosis and cisplatin resistance by NOLC1 should be thoroughly studied using p53-KO cell lines.

      Thanks very much for the kind comments and great suggestions. As previously mentioned, by using si-RNA to knockdown p53, the NOLC1 mediate Cis-resistance were blocked (Fig. S10). Meanwhile, the GPX4 level was also increased in p53/NOLC1 double-knockdown groups compared to the NOLC1 knockdown group. These data indicating that NOLC1 suppresses ferroptosis via mediating p53 functions.

      Reviewer #3:

      The authors have put forth a compelling argument that NOLC1 is indispensable for gastric cancer resistance in both in vivo and in vitro models. They have further elucidated that NOLC1 silencing augments cisplatin-induced ferroptosis in gastric cancer cells. The mechanistic underpinning of their findings suggests that NOLC1 modulates the p53 nuclear/plasma ratio by engaging with the p53 DNA Binding Domain, which in turn impedes p53-mediated transcriptional regulation of ferroptosis. Additionally, the authors have shown that NOLC1 knockdown triggers the release of ferroptosis-induced damage-associated molecular patterns (DAMPs), which activate the tumor microenvironment (TME) and enhance the efficacy of the anti-PD-1 and cisplatin combination therapy.

      Strengths:

      The manuscript presents a robust dataset that substantiates the authors' conclusion. They have identified NOLC1 as a potential oncogene that confers resistance to immuno-chemotherapy in gastric cancer through the mediation of ferroptosis and subsequent TME reprogramming. This discovery positions NOLC1 as a promising therapeutic target for gastric cancer treatment. The authors have delineated a novel mechanistic pathway whereby NOLC1 suppresses p53 transcriptional functions by reducing its nuclear/plasma ratio, underscoring the significance of p53 nuclear levels in tumor suppression over total protein levels.

      Weaknesses:

      While the overall findings are commendable, there are specific areas that could benefit from further refinement. The authors have posited that NOLC1 suppresses p53- mediated ferroptosis; however, the mRNA levels of ferroptosis genes regulated by p53 have not been quantified, which is a critical gap in the current study. In Figure 4A, transmission electron microscopy (TEM) results are reported solely for the MGC-803 cell line. It would be beneficial to include TEM data for the MKN-45 cell line to strengthen the findings. The authors have proposed a link between NOLC1-mediated reduction in the p53 nuclear/plasma ratio and gastric cancer resistance, yet the correlation between this ratio and patient prognosis remains unexplored, which is a significant limitation in the context of clinical relevance.

      Thanks very much for the kind comments and great suggestions. As suggested, recently studies have reported that CDKN1A (also called p21, a p53 transcriptional mediated protein) could promotes ferroptosis[13], the mRNA levels of ferroptosis genes regulated by p53 have were quantified in Fig. S8G-H. Moreover, we further proceed TEM imaging in MKN-45 cells, the result was consistent to MGC-803 cells, reflecting that NOLC1 has a broad spectrum of promoting drug resistance in gastric cancer. Also, recently studies have reported that p53 transcriptional active and p53 transcriptional inactive types include patients with intermediate prognosis and recurrence rates, with the p53-acvtie group showing better prognosis[14]. Considering p53 transcriptional activity depends on p53 nuclear accumulation, we assume that the low level of p53 nuclear/plasma may cause poor prognosis in gastric cancer. Meanwhile we will further collect enough samples and their prognostic information to analysis NOLC1-mediated reduction in the p53 nuclear/plasma ratio and gastric cancer resistance.

      References

      (1) Z. Seferbekova, A. Lomakin, L.R. Yates, M. Gerstung, Spatial biology of cancer evolution, Nat Rev Genet 24 (2023) 295–313. https://doi.org/10.1038/s41576-022-00553-x.

      (2) T. Matsuoka, M. Yashiro, Molecular Mechanism for Malignant Progression of Gastric Cancer Within the Tumor Microenvironment, IJMS 25 (2024) 11735. https://doi.org/10.3390/ijms252111735.

      (3) Y. Liu, Z. Su, O. Tavana, W. Gu, Understanding the complexity of p53 in a new era of tumor suppression, Cancer Cell (2024) S1535610824001338. https://doi.org/10.1016/j.ccell.2024.04.009.

      (4) R. Pan, V. Ruvolo, H. Mu, J.D. Leverson, G. Nichols, J.C. Reed, M. Konopleva, M. Andreeff, Synthetic Lethality of Combined Bcl-2 Inhibition and p53 Activation in AML: Mechanisms and Superior Antileukemic Efficacy, Cancer Cell 32 (2017) 748-760.e6. https://doi.org/10.1016/j.ccell.2017.11.003.

      (5) E. Catanzaro, M. Beltrán-Visiedo, L. Galluzzi, D.V. Krysko, Immunogenicity of cell death and cancer immunotherapy with immune checkpoint inhibitors, Cell Mol Immunol 22 (2024) 24–39. https://doi.org/10.1038/s41423-024-01245-8.

      (6) G. Lei, L. Zhuang, B. Gan, The roles of ferroptosis in cancer: Tumor suppression, tumor microenvironment, and therapeutic interventions, Cancer Cell 42 (2024) 513–534. https://doi.org/10.1016/j.ccell.2024.03.011.

      (7) E. Catanzaro, R. Demuynck, F. Naessens, L. Galluzzi, D.V. Krysko, Immunogenicity of ferroptosis in cancer: a matter of context?, Trends in Cancer 10 (2024) 407–416. https://doi.org/10.1016/j.trecan.2024.01.013.

      (8) X. Jiang, B.R. Stockwell, M. Conrad, Ferroptosis: mechanisms, biology and role in disease, Nat Rev Mol Cell Biol 22 (2021) 266–282. https://doi.org/10.1038/s41580-020-00324-8.

      (9) J.-L. Roh, E.H. Kim, H. Jang, D. Shin, Nrf2 inhibition reverses the resistance of cisplatin-resistant head and neck cancer cells to artesunate-induced ferroptosis, Redox Biology 11 (2017) 254–262. https://doi.org/10.1016/j.redox.2016.12.010.

      (10) X. Wang, Y. Zhou, D. Wang, Y. Wang, Z. Zhou, X. Ma, X. Liu, Y. Dong, Cisplatin-induced ototoxicity: From signaling network to therapeutic targets, Biomedicine & Pharmacotherapy 157 (2023) 114045. https://doi.org/10.1016/j.biopha.2022.114045.

      (11) J. Liang, G. Bi, Y. Huang, G. Zhao, Q. Sui, H. Zhang, Y. Bian, J. Yin, Q. Wang, Z. Chen, C. Zhan, MAFF confers vulnerability to cisplatin-based and ionizing radiation treatments by modulating ferroptosis and cell cycle progression in lung adenocarcinoma, Drug Resistance Updates 73 (2024) 101057. https://doi.org/10.1016/j.drup.2024.101057.

      (12) M.Y. Kosim, T. Fukazawa, M. Miyauchi, N. Hirohashi, K. Tanimoto, p53 status modifies cytotoxic activity of lactoferrin under hypoxic conditions, Front. Pharmacol. 13 (2022) 988335. https://doi.org/10.3389/fphar.2022.988335.

      (13) Q. Gao, J. Chen, C. Li, J. Zhan, X. Yin, B. Li, H. Dong, L. Luo, Z. Li, CDKN1A promotes Cis-induced AKI by inducing cytoplasmic ROS production and ferroptosis, Food and Chemical Toxicology 193 (2024) 115003. https://doi.org/10.1016/j.fct.2024.115003.

      (14) R. Cristescu, Molecular analysis of gastric cancer identifies subtypes associated with distinct clinical outcomes, Nature Medicine (2015).

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Summary:

      This study aimed at replicating two previous findings that showed (1) a link between prediction tendencies and neural speech tracking, and (2) that eye movements track speech. The main findings were replicated which supports the robustness of these results. The authors also investigated interactions between prediction tendencies and ocular speech tracking, but the data did not reveal clear relationships. The authors propose a framework that integrates the findings of the study and proposes how eye movements and prediction tendencies shape perception.

      Strengths:

      This is a well-written paper that addresses interesting research questions, bringing together two subfields that are usually studied in separation: auditory speech and eye movements. The authors aimed at replicating findings from two of their previous studies, which was overall successful and speaks for the robustness of the findings. The overall approach is convincing, methods and analyses appear to be thorough, and results are compelling.

      Weaknesses:

      Linking the new to the previous studies could have been done in more detail, and the extent to which results were replicated could have been discussed more thoroughly.

      Eye movement behavior could have been presented in more detail and the authors could have attempted to understand whether there is a particular component in eye movement behavior (e.g., microsaccades) that drives the observed effects.

      We would like to thank you for your time and effort in reviewing our work and we appreciate the positive comments!

      We extended our manuscript, now providing intermediate results on individual prediction tendency, which can be compared to our results from Schubert et al., (2023).

      Furthermore, we expanded our discussion now detailing the extent to which our results (do not) replicate the previous findings (e.g. differences in horizontal vs. vertical ocular speech tracking, lack of distractor tracking, link between ocular speech tracking and behavioral outcomes).

      While we agree with the reviewer that it is an important and most interesting question, to what extent individual features of gaze behavior (such as microsaccades, blinks etc.) contribute to the ocular speech tracking effect, it is beyond the scope of the current manuscript. It will be methodologically and conceptually challenging to distinguish these features from one another and to relate them to diverse cognitive processes. We believe that a separate manuscript is needed to give these difficult questions sufficient space for new methodological approaches and control analyses. The primary goal of this manuscript was to replicate the findings of Gehmacher et al. (2024) using similar methods and to relate them to prediction tendencies, attention, and neural speech tracking. 

      Reviewer #2 (Public review):

      Summary

      Schubert et al. recorded MEG and eye-tracking activity while participants were listening to stories in single-speaker or multi-speaker speech. In a separate task, MEG was recorded while the same participants were listening to four types of pure tones in either structured (75% predictable) or random (25%) sequences. The MEG data from this task was used to quantify individual 'prediction tendency': the amount by which the neural signal is modulated by whether or not a repeated tone was (un)predictable, given the context. In a replication of earlier work, this prediction tendency was found to correlate with 'neural speech tracking' during the main task. Neural speech tracking is quantified as the multivariate relationship between MEG activity and speech amplitude envelope. Prediction tendency did not correlate with 'ocular speech tracking' during the main task. Neural speech tracking was further modulated by local semantic violations in the speech material, and by whether or not a distracting speaker was present. The authors suggest that part of the neural speech tracking is mediated by ocular speech tracking. Story comprehension was negatively related to ocular speech tracking.

      Strengths

      This is an ambitious study, and the authors' attempt to integrate the many reported findings related to prediction and attention in one framework is laudable. The data acquisition and analyses appear to be done with great attention to methodological detail (perhaps even with too much focus on detail-see below). Furthermore, the experimental paradigm used is more naturalistic than was previously done in similar setups (i.e. stories instead of sentences).

      Weaknesses

      For many of the key variables and analysis choices (e.g. neural/ocular speech tracking, prediction tendency, mediation) it is not directly clear how these relate to the theoretical entities under study, and why they were quantified in this particular way. Relatedly, while the analysis pipeline is outlined in much detail, an overarching rationale and important intermediate results are often missing, which makes it difficult to judge the strength of the evidence presented. Furthermore, some analysis choices appear rather ad-hoc and should be made uniform and/or better motivated.

      We would like to thank you very much for supporting our paper and your thoughtful feedback!

      To address your concerns, that our theoretical entities as well as some of our analytical choices lack transparency, we expanded our manuscript in several ways:

      (1) We now provide the intermediate results of our prediction tendency analysis (see new Figure 2 of our manuscript). These results are comparable to our findings from Schubert et al. (2023), demonstrating that on a group level there is a tendency to pre-activate auditory stimuli of high probability and illustrating the distribution of this tendency value in our subject population.

      (2) We expanded our methods section in order to explain our analytical choices (e.g. why this particular entropy modulation paradigm was used to measure individual prediction tendency).

      (3) We now provide an operationalisation of the terms “neural speech tracking” and “ocular speech tracking” at their first mention, to make these metrics more transparent to the reader.

      (4) We are summarizing important methodological information ahead of each results section, in order to provide the reader with a comprehensible background, without the necessity to read through the detailed methods section. 

      (5) We expanded our discussion section, with a special emphasis on relating the key variables of the current investigation to theoretical entities.

      Reviewer #3 (Public review):

      Summary:

      In this paper, the authors measured neural activity (using MEG) and eye gaze while individuals listened to speech from either one or two speakers, which sometimes contained semantic incongruencies.

      The stated aim is to replicate two previous findings by this group: (1) that there is "ocular speech tracking" (that eye-movements track the audio of the speech), (2) that individual differences in neural response to tones that are predictable vs. not-predictable in their pitch is linked to neural response to speech. In addition, here they try to link the above two effects to each other, and to link "attention, prediction, and active sensing".

      Strengths:

      This is an ambitious project, that tackles an important issue and combines different sources of data (neural data, eye-movements, individual differences in another task) in order to obtain a comprehensive "model" of the involvement of eye-movements in sensory processing.

      The authors use many adequate methods and sophisticated data-analysis tools (including MEG source analysis and multivariate statistical models) in order to achieve this.

      Weaknesses:

      Although I sympathize with the goal of the paper and agree that this is an interesting and important theoretical avenue to pursue, I am unfortunately not convinced by the results and find that many of the claims are very weakly substantiated in the actual data.

      Since most of the analyses presented here are derivations of statistical models and very little actual data is presented, I found it very difficult to assess the reliability and validity of the results, as they currently stand. I would be happy to see a thoroughly revised version, where much more of the data is presented, as well as control analyses and rigorous and well-documented statistical testing (including addressing multiple comparisons).

      We thank you for your thoughtful feedback. We appreciate your concerns and will address them below in greater detail.

      These are the main points of concern that I have regarding the paper, in its current format.

      (1) Prediction tendencies - assessed by listening to sequences of rhythmic tones, where the pitch was either "predictable" (i.e., followed a fixed pattern, with 25% repetition) or "unpredictable" (no particular order to the sounds). This is a very specific type of prediction, which is a general term that can operate along many different dimensions. Why was this specific design selected? Is there theoretical reason to believe that this type of prediction is also relevant to "semantic" predictions or other predictive aspects of speech processing?

      Theoretical assumptions and limitations of our quantification of individual prediction tendency are now shortly summarized in the first paragraph of our discussion section. With this paradigm we focus on anticipatory “top-down” predictions, whilst controlling for possibly confounding “bottom-up” processes. Since this study aimed to replicated our previous work we chose the same entropy-modulation paradigm as in other studies from our group (e.g. Demarchi et al. 2019, Schubert et al. 2023;2024, Reisinger et al. 2024), which has proven to give reproducible findings of feature-specific preactivations of sounds in a context of low entropy. One advantage of this design is that it gives us the opportunity to directly compare the processing of “predictable” and “unpredictable” sounds of the same frequency in a time-resolved manner (this argument is now also included in the Methods section).

      Regarding the question to what extent this type of prediction might also be relevant to “semantic” predictions we would like to refer to our previous study (Schubert et al., 2023), where we explicitly looked at the interaction between individual prediction tendency and encoding of semantic violations in the cortex. (In short, there we found a spatially dissociable interaction effect, indicating an increased encoding of semantic violations that scales with prediction tendency in the left hemisphere, as well as a disrupted encoding of semantic violations for individuals with stronger prediction tendency in the right hemisphere.) We did not aim to replicate all our findings in the current study, but instead we focused on merging the most important results from two independent phenomena in the domain of speech processing and bringing them into a common framework. However, as now stated in our discussion, we believe that “predictions are directly linked to the interpretation of sensory information. This interpretation is likely to occur at different levels along the cognitive (and anatomical) hierarchy…” and that “this type of prediction is relevant for acoustic processing such as speech and music, whose predictability unfolds over time.”

      (2) On the same point - I was disappointed that the results of "prediction tendencies" were not reported in full, but only used later on to assess correlations with other metrics. Even though this is a "replication" of previous work, one would like to fully understand the results from this independent study. On that note, I would also appreciate a more detailed explanation of the method used to derive the "prediction tendency" metric (e.g, what portion of the MEG signal is used? Why use a pre-stimulus and not a post-stimulus time window? How is the response affected by the 3Hz steady-state response that it is riding on? How are signals integrated across channels? Can we get a sense of what this "tendency" looks like in the actual neural signal, rather than just a single number derived per participant (an illustration is provided in Figure 1, but it would be nice to see the actual data)? How is this measure verified statistically? What is its distribution across the sample? Ideally, we would want enough information for others to be able to replicate this finding).

      We now included a new figure (similar to Schubert et al. 2023) showing the interim results of the “prediction tendency” effect as well as individual prediction tendency values of all subjects.

      Furthermore we expanded the description of the “prediction tendency” metric in the Methods section, where we explain our analytical choices in more detail. In particular we used a pre-stimulus time window in order to capture “anticipatory predictions”. The temporally predictably design gives us the opportunity to capture this type of predictions. The integration across channels is handled by the multivariate pattern analysis (MVPA), which inherently integrates multidimensional data (as mentioned in the methods section we used data from 102 magnetometers) and links it to (in this case) categorical information.

      (3) Semantic violations - half the nouns ending sentences were replaced to create incongruent endings. Can you provide more detail about this - e.g., how were the words selected? How were the recordings matched (e.g., could they be detected due to audio editing?)? What are the "lexically identical controls that are mentioned"? Also, is there any behavioral data to know how this affected listeners? Having so many incongruent sentences might be annoying/change the nature of listening. Were they told in advance about these?

      We expanded the Methods section and included the missing information: 

      “We randomly selected half of the nouns that ended a sentence (N = 79) and replaced them with the other half to induce unexpected semantic violations. The swap of nouns happened in the written script before the audio material was recorded in order to avoid any effects of audio clipping. Narrators were aware of the semantic violations and had been instructed to read out the words as normal. Consequently all target words occurred twice in the text, once in a natural context (serving as lexical controls) and once in a mismatched context (serving as semantic violations) within each trial, resulting in two sets of lexically identical words that differed greatly in their contextual probabilities (see Figure 1F for an example). Participants were unaware of these semantic violations.” Since we only replaced 79 words with semantic violations in a total of ~ 24 minutes of audio material we believe that natural listening was not impaired. In fact none of the participants mentioned to have noticed the semantic violations during debriefing (even though they had an effect on speech tracking in the brain). 

      (4) TRF in multi-speaker condition: was a univariate or multivariate model used? Since the single-speaker condition only contains one speech stimulus - can we know if univariate and multivariate models are directly comparable (in terms of variance explained)? Was any comparison to permutations done for this analysis to assess noise/chance levels?

      For mTRF models it depends on the direction (“encoding” vs. “decoding”) whether or not the model is comparable to a univariate model. In our case of an encoding model the TRFs are fitted to each MEG channel independently. This gives us the possibility to explore the effect over different areas (whereas a multivariate “decoding” model would result in only one speech reconstruction value).

      In both conditions (single and multi speaker) a single input feature (the envelope of the attended speech stream) was used. Of course it would be possible to fit the model to use a multivariate encoding model, predicting the brain’s response to the total input of sounds. This would, however, target a slightly different question than ours as we aimed to investigate how much of the attended speech is tracked.

      Regarding your suggestion of a comparison to permutations to assess noise levels we would like to point out that we chose the same methodological approach as in our previous studies, that we aimed to replicate here. Indeed in these original studies no permuted versions (with exception of the mediation analysis where comparing a model with an additional input predictor to a single predictor model would not result in a fair comparison) have been used. We conducted the mTRF approach considering the guidelines of Crosse et al. (2016) to the best of our knowledge and in accordance with similar studies in this field.

      Crosse, M. J., Di Liberto, G. M., Bednar, A., & Lalor, E. C. (2016). The multivariate temporal response function (mTRF) toolbox: a MATLAB toolbox for relating neural signals to continuous stimuli. Frontiers in human neuroscience, 10, 604.

      (5) TRF analysis at the word level: from my experience, 2-second segments are insufficient for deriving meaningful TRFs (see for example the recent work by Mesik & Wojtczak). Can you please give further details about how the analysis of the response to semantic violations was conducted? What was the model trained on (the full speech or just the 2-second long segments?) Is there a particular advantage to TRFs here, relative - say - to ERPs (one would expect a relatively nice N400 response, not)? In general, it would be nice to see the TRF results on their own (and not just the modulation effects).

      We fully agree with the reviewers statement that 2-second segments would have been too short to derive meaningful TRFs. To investigate the effect of semantic violations, we used the same TRFs trained on the whole dataset (with 4-fold cross validation). The resulting true as well as the predicted data was segmented into single word epochs of 2 seconds. We selected semantic violations as well as their lexically identical controls and correlated true with predicted responses for every word. Thus, we conducted the same analysis as for the overall encoding effect, focusing on only part of the data. We have reformulated the Methods section accordingly to clear up this misunderstanding. Since the TRFs are identical to the standard TRFs from the overall neural speech tracking, they are not informative to the semantic violation effect. However, since the mTRF approach is the key method throughout the manuscript (and our main focus is not on the investigations of brain responses to semantic violations) we have favoured this approach over the classical ERF analysis. 

      (6) Another related point that I did not quite understand - is the dependent measure used for the regression model "neural speech envelope tracking" the r-value derived just from the 2sec-long epochs? Or from the entire speech stimulus? The text mentions the "effect of neural speech tracking" - but it's not clear if this refers to the single-speaker vs. twospeaker conditions or to the prediction manipulation. Or is it different in the different analyses? Please spell out exactly what metric was used in each analysis.

      As suggested we now provide a clear definition of each dependent metric for each analysis.

      “Neural speech tracking” refers to the correlation coefficients between predicted and true brain responses from the aforementioned encoding model, trained and tested on the whole audio material within condition (single vs. multi-speaker).

      Recommendations for the authors:

      Reviewing Editor Comments:

      The reviewers have provided a number of recommendations to improve the manuscript, particularly requesting that more data be reported, with an emphasis on the measurements themselves (eye movements and TRFs) rather than just the numerical outputs of mathematical models.

      We appreciate all the reviewers' and editor’s comments and effort to improve our manuscript. In the revised version we provide interim findings and missing data, updated figures that include an intuitive illustration of the metrics (such as TRFs), and a thoroughly revised discussion section where we focus on the relationship between our observed quantities and theoretical entities. We now offer operationalized definitions of the relevant concepts (“prediction tendency”, “active ocular sensing” and “selective attention”) and suggest how these entities might be related in the context of speech processing, based on the current findings. We are confident that this revision has improved the quality of our paper a lot and we are grateful for all the feedback and suggestions. 

      Reviewer #1 (Recommendations for the authors):

      (1) Participants had to fixate throughout the tasks. How did the authors deal with large eye movements that violated the instructed fixation?

      As described in the Methods section: “Participants were instructed to look at a black fixation cross at the center of a grey screen.” This instruction was not intended to enforce strict fixation but rather to provide a general reference point, encouraging participants to keep their gaze on the grey screen and avoid freely scanning the room or closing their eyes. Unlike trial-based designs, where strict fixation is feasible due to shorter trial durations, this approach did not impose rigid fixation requirements. Consequently, the threshold for "instruction violation" was inherently more flexible, and no additional preprocessing was applied to the gaze vectors.

      Fixating for such an extended period of time (1.5 hours?) is hard. Did fixation behavior change over time? Could (fixation) fatigue affect the correlations between eye movements and speech tracking? For example, fatigued participants had to correct their fixation more often and this drives, in part, the negative correlation with comprehension?

      Yes, participants spent approximately 2 hours in the MEG, including preparation time (~30 minutes). However, participants were given opportunities to rest their eyes between different parts and blocks of the experiment (e.g., resting state, passive listening, and audiobook blocks), which should help mitigate fatigue to some extent.

      That said, we agree that it is an intriguing idea that fatigue could drive the ocular speech tracking effect, with participants potentially needing to correct their gaze more as the experiment progresses. However, our analysis suggests this is unlikely for several reasons:

      (1) Cross-validation in encoding models: Ocular speech tracking effects were calculated using a 4-fold cross-validation approach (this detail has now been added to the Methods section; please see our response to public review #3). This approach reduces the influence of potential increases in gaze corrections over time, as the models are trained and validated on independent data splits.  Moreover, if there were substantial differences in underlying response magnitudes between folds - for instance, between the first and fourth fold - this would likely compromise the TRF's ability to produce valid response functions for predicting the left-out data. Such a scenario would not result in significant tracking, further supporting the robustness of the observed effects.

      (2) TRF time-course stability: If fatigue were driving increased gaze corrections, we would expect this to be reflected in a general offset (capturing the mean difference between folds) in the TRF time-courses shown in Figure 4 (right panel). However, no such trend / offset is evident.

      (3) Comparison of eye movement data: To directly investigate this possibility, we compared the amount of total eye movements between the first and last blocks for both the single and multi-speaker conditions. Total movement was calculated by first calculating the differences in pixel values between consecutive eye positions on both the x- and y-axes. The Euclidean distance was then computed for each difference, providing a measure of movement between successive time points. Summing these distances yielded the total movement for each block. Statistical analysis was performed separately for the single speaker (ASS) and multi-speaker (AMS) conditions. For each condition, paired comparisons were made between the first and last blocks (we resorted to non-parametric tests, if assumptions of normality were violated):

      For the single speaker condition (ASS), the normality assumption was not satisfied (p≤0.05p, Kolmogorov-Smirnov test). Consequently, a Wilcoxon signedrank test was conducted, which revealed no significant difference in total movements between the first and last blocks (z=−1.330, p=0.184). For the multi-speaker condition (AMS), the data met the normality assumption (p>0.05), allowing the use of a paired t-test. The results showed no significant difference in total movements between the first and last blocks (t=−0.184, p=0.855).

      The results are visualized in a bar plot (see below), where individual data points are displayed alongside the mean and standard error for each block. Statistical annotations indicate that neither condition demonstrated significant differences between the blocks. These findings suggest that total eye movements remained stable across the experimental conditions, regardless of whether participants were exposed to a single or multiple speakers.

      Author response image 1.

      (4) Behavioral responses: Participants’ behavioral responses did not indicate any decrease in comprehensibility for later blocks compared to earlier ones. Specifically, a comparison of comprehension scores between the first and last blocks revealed no significant difference in either the single-speaker condition (ASS; Wilcoxon signed-rank test Z=−0.5911, p=0.5545) or the multi-speaker condition (AMS; Wilcoxon signed-rank test: Z=0.5018, p=0.6158). These findings suggest that participants maintained consistent levels of comprehension throughout the experiment, regardless of the condition or block order. The results are visualized in a bar plot (see below), where individual data points are displayed alongside the mean and standard error for each block. Statistical annotations indicate that neither condition demonstrated significant differences between the blocks.

      Author response image 2.

      Together, these factors suggest that fatigue is unlikely to be a significant driver of the ocular speech tracking effects observed in this study.

      (2) The authors should provide descriptive statistics of fixation behavior /fixational eye movements. What was the frequency and mean direction of microsaccades, do they follow the main sequence, etc., quantify drift and tremor?

      Thank you for their suggestion regarding descriptive statistics. To address this, we computed the rates of microsaccades (which were extracted using the microsaccade detection algorithm as proposed in Liu, B., Nobre, A. C. & van Ede, F. Functional but not obligatory link between microsaccades and neural modulation by covert spatial attention. Nat. Commun. 13, 3503 (2022)) and fixations as these metrics are directly relevant to our study and the requests above.

      Microsaccade Rates:

      - Single speaker Condition: Mean = 2.306 Hz, SD = 0.363 Hz. ○ Multi speaker: Mean = 2.268 Hz, SD = 0.355 Hz.

      Fixation Rates:

      - Single speaker Condition: Mean = 2.858 Hz, SD = 1.617 Hz. ○ Multi speaker Condition: Mean = 2.897 Hz, SD = 1.542 Hz.

      These values fall within the expected ranges reported in the literature (fixation rates: 2– 4 Hz, microsaccade rates: ~0.5–2.5 Hz) and serve as a sanity check, confirming the plausibility of our eye-tracking data. Regarding the reviewer’s request for additional metrics (e.g., microsaccade direction, main sequence analysis, drift, and tremor), extracting these features would require advanced algorithms and analyses not supported by our current preprocessing pipeline or dataset. We hope that the provided metrics, which were the main focus of this study, serve as a sufficient sanity check and highlight the robustness of our data.

      Related to this, I am wondering whether microsaccades are the feature that drives speech tracking.

      This is an important and pressing question that we aim to address in future publications. Currently, our understanding - and the reason microsaccades and blinks are not analysed in this manuscript - is limited by methodological constraints. Specifically, microsaccades are binary response vectors, which are not compatible with TRF analyses. Addressing this would require adapting future models to handle timecontinuous binary response data or exploring alternative approaches, such as regression-based ERFs (for example as in Heilbron et al. 2022). As the primary goal of this manuscript was to replicate the findings of Gehmacher et al. (2024) using similar methods and to integrate these findings into an initial unified framework, we did not investigate additional eye movement features here. However, we agree that microsaccades (and also blinks, see below) likely contribute, at least in part, to the observed ocular speech tracking effects, and we now suggest this in the Discussion:  

      “Relatedly, it remains an open question whether microsaccades are a key feature driving ocular speech tracking. However, our current study does not analyze microsaccades due to methodological constraints: microsaccades are binary response vectors, which are incompatible with TRF analyses used here. Addressing this would require adapting models to handle time-continuous binary response data or potentially exploring alternative approaches, such as regression-based ERFs (e.g., as in Heilbron et al., 2022). While these limitations preclude microsaccade analysis in the current study, we hypothesize that they could enhance temporal precision and selectively amplify relevant sensory input, supporting auditory perception. Future studies should explore this possibility to uncover the specific contributions of microsaccades to speech tracking.”

      (3) Can the authors make sure that interpolated blinks did not drive any of the effects? Can interpolated blink trials be excluded?

      Using continuous audiobooks as stimuli meant that we could not exclude blink periods from the analysis without introducing substantial continuation artifacts in the TRF analysis. Importantly, the concept of covert motor routines and active sensing suggests that participants engage more strongly in motor routines - including ocular behaviors such as microsaccades and blinks - during tasks like speech tracking. These motor routines are inherently tied to individual gaze patterns, making microsaccades and blinks correlated with other ocular behaviors. This complicates efforts to disentangle their individual contributions to the observed ocular speech tracking effects.

      Engagement in these motor routines, as posited by active sensing, would naturally load onto various viewing behaviors, further intertwining their roles.

      Even if we were to examine correlations, such as the amount of blinks with the ocular speech tracking effect, it is unlikely to provide a clearer understanding due to these inherent overlaps. The methodological and conceptual challenge lies in distinguishing these features from one another and understanding their respective roles in driving the observed effects.

      However, the aim of this manuscript was not to dissect the ocular speech tracking effect in greater detail, but rather to relate it - based on similar analytical choices as in Gehmacher et al - to prediction tendencies, attention, and neural speech tracking. While it will be crucial in future work to differentiate these patterns and their connections to diverse cognitive processes, it is beyond the scope of this study to address all these questions comprehensively.

      We acknowledge that eye movements, including microsaccades and blinks (however, see challenges for this in response 2), remain underexplored in many experimental paradigms. Their interplay with cognitive processes - such as attention, prediction, and sensory integration - will undoubtedly be an important focus for future studies. 

      (4) Could the authors provide more details on how time shuffling was done for the eyemovement predictor, and include a circularly shifted version (or a version that does not destroy temporal contiguity) in their model comparisons? Some types of shuffling can result in unrealistic time series, which would end up in an unfair comparison with the model that has the real eye movement traces as predictors.

      We thank the reviewer for their insightful question regarding the time-shuffling procedure for the eye-movement predictor and for suggesting the inclusion of a circularly shifted version in our model comparisons. Below, we provide further details about our approach and the rationale behind it:

      (1) Random Shuffling: In our analysis, the eye-movement predictor was randomly shuffled over time, meaning that individual samples were randomly replaced. This method completely disrupts the temporal structure of the signal, providing a null model that directly tests whether the temporal mediation observed is due to the specific temporal relationship between ocular movements and envelope tracking.

      (2) Circular Shifting: While circular shifting maintains temporal contiguity, it introduces certain challenges in the context of TRF analysis. Specifically:

      - Adaptation to Shifts: The TRF model could adapt to the introduced shift, potentially reducing the validity of the null comparison.

      - Similarity due to Repetition: The broadband envelope exhibits strong repetitive patterns over time, such as rhythms inherent to speech. Circular shifting can therefore produce predictors that are very similar to the original signal. As a result, this similarity may lead to null distributions that do not adequately disrupt the temporal mediation we aim to test, making it less robust as a control.

      (3) Rationale for Random Shuffling: The primary goal of our mediation analysis is to determine whether there is a temporal mediation of envelope tracking by ocular movements. By deliberately destroying the temporal structure through random shuffling, we ensure that the null model tests for the specific temporal relationship that is central to our hypothesis. Circularly shifted predictors, on the other hand, may partially preserve temporal dependencies, making them less suitable for this purpose.

      In summary, while circular shifting is a valuable approach in other contexts, it is less appropriate for the specific goals of this study. We hope this explanation clarifies our methodological choices and demonstrates their alignment with the aims of our analysis.

      (5) Replication: I want to point out that it is great that the previous findings were in principle replicated. However, I would like to suggest a more nuanced evaluation of the replication:

      a) Instead of a (direct) replication, the present study should be called a 'conceptual replication', since modifications in design and procedure were made.

      Thank you very much for this suggestion! We now use the term ‘conceptual replication’ throughout the manuscript.

      b) Not all the findings from the Gehmacher et al., 2024 study were replicated to a full extent:

      Did the authors find indications of a vertical vs. horizontal tracking difference in the Gehmacher 2024 data? Could they check this in the Gehmacher 2024 data?

      The findings for horizontal and vertical gaze tracking in Gehmacher et al. (2024) are detailed in the supplementary material of that publication. Both single-speaker and multi-speaker target conditions showed significant speech tracking effects in both horizontal and vertical directions. However, there was a slightly stronger tracking effect for the single-speaker condition in the vertical direction. Due to the highly predictable structure of words in Gehmacher et al. effects here were probably overall boosted as compared to continuous audiobook listening, likely leading to the differentiation of horizontal and vertical gaze. See figures in Gehmacher et al. supplementary file for reference.

      c) Another difference between their previous and this study is the non-existent tracking of the multi-speaker distractor in this study. The authors should point this out clearly in the discussion and potentially provide an explanation.

      Thank you for highlighting this point! We now address this in the discussion:

      “Importantly, in contrast to Gehmacher et al. (2024), we did not observe ocular tracking of the multi-speaker distractor in this study. This difference is likely attributable to the simplistic single-trial, 5-word task structure in Gehmacher et al., which resulted in high temporal overlap between the target and distractor speech streams and likely drove the significant distractor-tracking effects observed in that study. The absence of such an effect during continuous listening in our study suggests that ocular tracking is indeed more specific to selective attention.”

      Minor:

      (1) I was a little surprised to not see an indication of eyes/eye movements in Figure 6. The intention of the authors might have been to create a general schematic illustration, but I find this a bit misleading. This paper provides nice evidence for a specific ocular effect in speech tracking. There is, to my knowledge, no indication that speech would be influenced by different kinds of active sensing (if there are, please include them in the discussion). Given that the visuomotor system is quite dominant in humans, it might actually be the case that the speech tracking the authors describe is specifically ocular.

      Taking into account all the reviewers' remarks on the findings and interpretations, we have updated this figure (now Fig. 7) in the manuscript to make it more specific and aligned with the revised discussion section. Throughout the manuscript, we now explicitly refer to active ocular sensing in relation to speech processing and have avoided the broader term 'active sensing' in this context. We hope these revisions address the concerns raised.

      (2) I find the part in the discussion (page 2, last paragraph) on cognitive processes hard to follow. I don't agree that 'cognitive processes' are easily separable from any of the measured responses (eye and brain). Referring to the example they provide, there is evidence that eye movements are correlated with brain activity that is correlated with memory performance. How, and more importantly, why would one separate those?

      Thank you for raising this important point. We have carefully considered your comments, particularly regarding the interplay between cognitive processes and measured responses (eye and brain), as well as the challenge of conceptually separating them. Additionally, we have incorporated Reviewer #2's query (13) into a unified and complementary reasoning. In response, we have rewritten the relevant paragraph in the discussion to provide a clearer and more detailed explanation of how ocular and neural responses contribute to speech processing in an interdependent manner. We hope this revision addresses your concerns and offers a more precise and coherent discussion on this topic:

      “Despite the finding that eye movements mediate neural speech tracking, the behavioural relevance for semantic comprehension appears to differ between ocular and neural speech tracking. Specifically, we found a negative association between ocular speech tracking and comprehension, indicating that participants with lower comprehension performance exhibited increased ocular speech tracking. Interestingly, no significant relationship was observed between neural tracking and comprehension.

      In this context, the negative association between ocular tracking and comprehension might reflect individual differences in how participants allocate cognitive resources. Participants with lower comprehension may rely more heavily on attentional mechanisms to process acoustic features, as evidenced by increased ocular tracking. This reliance could represent a compensatory strategy when higher-order processes, such as semantic integration or memory retrieval, are less effective. Importantly, our comprehension questions (see Experimental Procedure) targeted a broad range of processes, including intelligibility and memory, suggesting that this relationship reflects a trade-off in resource allocation between low-level acoustic focus and integrative cognitive tasks.

      Rather than separating eye and brain responses conceptually, our analysis highlights their complementary contributions. Eye movements may enhance neural processing by increasing sensitivity to acoustic properties of speech, while neural activity builds on this foundation to integrate information and support comprehension. Together, these systems form an interdependent mechanism, with eye and brain responses working in tandem to facilitate different aspects of speech processing.

      This interpretation is consistent with the absence of a difference in ocular tracking for semantic violations (e.g., words with high surprisal versus lexically matched controls), reinforcing the view that ocular tracking primarily reflects attentional engagement with acoustic features rather than direct involvement in semantic processing. This aligns with previous findings that attention modulates auditory responses to acoustic features (e.g., Forte et al., 2017), further supporting the idea that ocular tracking reflects mechanisms of selective attention rather than representations of linguistic content.

      Future research should investigate how these systems interact and explore how ocular tracking mediates neural responses to linguistic features, such as lexical or semantic processing, to better understand their joint contributions to comprehension.”.  

      (3) Attention vs. predictive coding. I think the authors end up with an elegant description of the observed effects, "as an "active sensing" mechanism that implements the attentional optimization of sensory precision." However, I feel the paragraph starts with the ill-posed question "whether ocular speech tracking is modulated not by predictive, but other (for example attentional) processes". If ocular tracking is the implementation of a process (optimization of sensory precision, aka attention), how could it be at the same time modulated by that process? In my opinion, adding the notion that there is a modulation by a vague cognitive concept like attention on top of what the paper shows does not improve our understanding of how speech tracking in humans works.

      Thank you for raising this point. We agree that it is critical to clarify the relationship between ocular speech tracking, attention, and predictive processes, and we appreciate the opportunity to refine this discussion.  

      To avoid the potential confusion that active ocular sensing represents on the one hand an implementation of selective attention on the other it seems to be modulated by it, we now use  the formulation “ocular speech tracking reflects attentional mechanisms rather than predictive processes.”

      To address your concern that the conceptualization of attention seems rather vague, we have revised the whole paragraph in order to redefine the theoretical entities in question (including selective attention) and to provide a clearer and more precise picture (see also our revised version of Fig. 6, now Fig. 7). We now focus on highlighting the distinct yet interdependent roles of selective attention and individual prediction tendencies for speech tracking.:

      “With this speculative framework we attempt to describe and relate three important phenomena with respect to their relevance for speech processing: 1) “Anticipatory predictions” that are created in absence of attentional demands and contain probabilistic information about stimulus features (here, inferred from frequency-specific pre-activations during passive listening to sound sequences). 2) “Selective attention” that allocates resources towards relevant (whilst suppressing distracting) information (which was manipulated by the presence or absence of a distractor speaker). And finally 3) “active ocular sensing”, which refers to gaze behavior that is temporally aligned to attended (but not unattended) acoustic speech input (inferred from the discovered phenomenon of ocular speech tracking). We propose that auditory inflow is, at a basic level, temporally modulated via active ocular sensing, which “opens the gates” in the sensory periphery at relevant timepoints. How exactly this mechanism is guided (for example where the information about crucial timepoints comes from, if not from prediction, and whether it requires habituation to a speechstream etc.) is yet unclear. Unlike predictive tendencies, active ocular sensing appears to reflect selective attention, manifesting as a mechanism that optimizes sensory precision. Individual differences with respect to anticipatory predictions on the other hand, seem to be independent from the other two entities, but nevertheless relevant for speech processing. We therefore support the notion that representational content is interpreted based on prior probabilistic assumptions. If we consider the idea that “a percept” of an (auditory) object is actually temporally and spatially distributed (across representational spacetime - see Fig. 7), the content of information depends on where and when it is probed (see for example Dennett, 1991 for similar ideas on consciousness). Having to select from multiple interpretations across space and time requires a careful balance between the weighting of internal models and the allocation of resources based on current goals. We suggest that in the case of speech processing, this challenge results in an independent adaptation of feature-based precision-weighting by predictions on the one hand and temporal precision-weighting by selective attention on the other.”

      Reviewer #2 (Recommendations for the authors):

      My main recommendation is outlined in the Weaknesses above: the overarching rationale for many analysis choices should be made explicit, and intermediate results should be shown where appropriate, so the reader can follow what is being quantified and what the results truly mean. Specifically, I recommend to pay attention to the following (in no particular order):

      (1) Define 'neural speech tracking' early on. (e.g.: 'The amount of information in the MEG signal that can multivariately be explained by the speech amplitude envelope.' (is that correct?))

      Thank you for pointing out that this important definition is missing. It is now defined at the first mention in the Introduction as follows: “Here (and in the following) “neural speech tracking” refers to a correlation coefficient between actual brain responses and responses predicted from an encoding model based solely on the speech envelope”.

      (2) Same for 'ocular speech tracking'. Here even reading the Methods does not make it unambiguous how this term is used.

      It is now defined at the first mention in the Introduction as follows: “Ocular speech tracking” (similarly to “neural speech tracking” refers to the correlation coefficient between actual eye movements and movements predicted from an encoding model based on the speech envelope”.

      In addition also define both (neural and ocular speech tracking) metrics in the Methods Section.

      (3) Related to this: for ocular speech tracking, are simply the horizontal and vertical eye traces compared to the speech envelope? If so, this appears somewhat strange: why should the eyes move more rightward/upward with a larger envelope? And the direction here depends on the (arbitrary) sign of right = positive, etc. (It would make more sense to quantify 'amount of movement' in some way, but if this is done, I missed it in Methods.)

      Thank you for your insightful comments. You are correct that the horizontal and vertical traces were used for ocular speech tracking, and no additional details were included in the Methods. While we agree that the observed rightward/upward movement may seem unusual, this pattern is consistent with previous findings, including those reported in Gehmacher et al. (2024). In that study, we discussed how ocular speech tracking could reflect a broader engagement of the motor system during speech perception. For example, we observed a general right-lateralized gaze bias when participants attended to auditory speech, which we hypothesized might resemble eye movements during text reading, with a similar temporal alignment (~200 ms). We also speculated that this pattern might differ in cultures that read text from right to left.

      We appreciate your suggestion to explore alternative methods for quantifying gaze patterns, such as the "amount of movement" or microsaccades. While these approaches hold promise for future studies, our primary aim here was to replicate previous findings using the same signal and analysis methods to establish a basis for further exploration.  

      (4) In the Introduction, specifically blink-related ocular activity is mentioned as being related to speech tracking (for which a reference is, incidentally, missing), while here, any blink-related activity is excluded from the analysis. This should be motivated, as it appears in direct contradiction.

      Thank you for pointing this out. The mention of blink-related ocular activity in the Introduction refers to findings by Jin et al. (2018), where such activity was shown to align with higher-order syntactic structures in artificial speech. We have now included the appropriate reference for clarity.

      While Jin et al. focused on blink-related activity, in the present study, we focused on gaze patterns to investigate ocular speech tracking, replicating findings from

      Gehmacher et al. (2024). This approach was motivated by our goal to validate previous results using the same methodology. Importantly to this point, the exclusion of blinks in our analysis was due to methodological constraints of TRF analysis, which requires a continuous response signal; blinks, being discrete and artifact-prone, are incompatible with this approach.

      To address your concern, we revised the Introduction to clarify this distinction and provide explicit motivation for focusing on gaze patterns. It now reads:

      “Along these lines, It has been shown that covert, mostly blink related eye activity aligns with higher-order syntactic structures of temporally predictable, artificial speech (i.e. monosyllabic words; Jin et al, 2018). In support of ideas that the motor system is actively engaged in speech perception (Galantucci et al., 2006; Liberman & Mattingly, 1985), the authors suggest a global entrainment across sensory and (oculo)motor areas which implements temporal attention. 

      In another recent study from our lab (Gehmacher et al., 2024), we showed that eye movements continuously track intensity fluctuations of attended natural speech, a phenomenon we termed ocular speech tracking. In the present study, we focused on gaze patterns rather than blink-related activity, both to replicate findings from

      Gehmacher et al. (2024) and because blink activity is unsuitable for TRF analysis due to its discrete and artifact-prone nature. Hence, “Ocular speech tracking” (similarly to “neural speech tracking” refers to the correlation coefficient between actual eye movements and movements predicted from an encoding model based on the speech envelope.”

      Jin, P., Zou, J., Zhou, T., & Ding, N. (2018). Eye activity tracks task-relevant structures during speech and auditory sequence perception. Nature communications, 9(1), 5374.

      (5) The rationale for the mediation analysis is questionable. Let speech envelope = A, brain activity = B, eye movements = C. The authors wish to claim that A -> C -> B. But it is equally possible that A -> B -> C. They reflect on this somewhat in Discussion, but throughout the rest of the paper, the mediation analysis is presented as specifically testing whether A -> B is mediated by C, which is potentially misleading.

      Indeed we share your concern regarding the directionality of the relationships in the mediation analysis. Our choice of ocular movements as a mediator was motivated by the fact that the relationship between acoustic speech and neural activity is well established, as well as previous results indicating that oculomotor activity contributes to cognitive effects in auditory attention (Popov et al., 2022). 

      Indeed, here we treat both interpretations (“ocular movements contribute to neural speech tracking” versus “neural activity contributes to ocular speech tracking”) as equal.  We now emphasise this point in our discussion quite thoroughly:

      “It is important to note that our current findings do not allow for inference on directionality. Our choice of ocular movements as a mediator was motivated by the fact that the relationship between acoustic speech and neural activity is well established, as well as previous results indicating that oculomotor activity contributes to cognitive effects in auditory attention (Popov et al., 2022). However, an alternative model may suggest that neural activity mediates the effect of ocular speech tracking. Hence, it is possible that ocular mediation of speech tracking may reflect a) active (ocular) sensing for information driven by (top-down) selective attention or b) improved neural representations as a consequence of temporally aligned increase of sensory gain or c) (not unlikely) both. In fact, when rejecting the notion of a single bottom-up flow of information and replacing it with a model of distributed parallel and dynamic processing, it seems only reasonable to assume that the direction of communication (between our eyes and our brain) will depend on where (within the brain) as well as when we look at the effect. Thus, the regions and time-windows reported here should be taken as an illustration of oculo-neural communication during speech processing rather than an attempt to "explain" neural speech processing by ocular movements.”

      (6) The mediation analysis can be improved by a proper quantification of the effect (sizes or variance explained). E.g. how much % of B is explained by A total, and how much of that can in turn be explained by C being involved? For drawing directional conclusions perhaps Granger causality could be used.

      In Figure 4 (now Figure 5) of our manuscript we use standardized betas (which correspond to effect sizes) to illustrate the mediation effect. With the current mTRF approach it is however not possible (or insightful) to compare the variance explained. It is reasonable to assume that variance in neural activity will be explained better when including oculomotor behavior as a second predictor along with acoustic simulation. However this increase gives no indication to what extent this oculomotor behavior was task relevant or irrelevant (since all kinds of “arbitrary” movements will be captured with brain activity and therefore lead to an increase in variance explained). For this reason we chose to pursue the widely accepted framework of mediation (Baron & Kenny, 1986). This (correlational) approach is indeed limited in its interpretations (see prev. response), however the goal of the current study was to replicate and illustrate the triad relationship of acoustic speech input, neural activity and ocular movements with no particular hypotheses on directionality.

      (7) Both prediction tendency and neural speech tracking depend on MEG data, and thus on MEG signal-to-noise ratio (SNR). It is possible some participants may have higher SNR recordings in both tasks, which may result in both higher (estimated) prediction tendency and higher (estimated) speech tracking. This would result in a positive correlation, as the authors observe. This trivial explanation should be ruled out, by quantifying the relative SNR and testing for the absence of a mediation here.

      We agree that for both approaches (MVPA and mTRF models) individual MEG SNR plays an important role. This concern has been raised previously and addressed in our previous manuscript (Schubert et al., 2023). First, it should be noted that our prediction tendency value is the result of a condition contrast (rather than simple decoding accuracy) which compensates for the influence of subject specific signal-to-noise ratio (as no vacuous difference in SNR is to be expected between conditions). Second, in our previous study we also used frequency decoding accuracy as a control variable to correlate with speech tracking variables of interest and found no significant effect.

      (8) Much of the analysis pipeline features temporal response functions (TRFs). These should be shown in a time-resolved manner as a key intermediate step.

      We now included the Neural Speech tracking TRFs into the Figure (now Figure 3).

      (9) Figure 2 shows much-condensed results from different steps in the pipeline. If I understand correctly, 2A shows raw TRF weights (averaged over some time window?), while 2B-F shows standardized mean posterior regressor weights after Bayesian stats? It would be very helpful to make much more explicit what is being shown here, in addition to showing the related TRFs.

      Thank you for pointing this out! The figure description so far has been indeed not very insightful on this issue. We now adapted the caption and hope this clarifies the confusion: “ Neural speech tracking is related to prediction tendency and word surprisal, independent of selective attention. A) Envelope (x) - response (y) relationships are estimated using deconvolution (Boosting). The TRF (filter kernel, h) models how the brain processes the envelope over time. This filter is used to predict neural responses via convolution. Predicted responses are correlated with  actual neural activity to evaluate model fit and the TRF's ability to capture response dynamics. Correlation coefficients from these models are then used as dependent variables in Bayesian regression models. (Panel adapted from Gehmacher et al., 2024b). B) Temporal response functions (TRFs) depict the time-resolved neural tracking of the speech envelope for the single speaker and multi speaker target condition, shown here as absolute values averaged across channels. Solid lines represent the group average. Shaded areas represent 95% Confidence Intervals. C–H) The beta weights shown in the sensor plots are derived from Bayesian regression models in A). For Panel C, this statistical model is based on correlation coefficients computed from the TRF models (further details can be found in the Methods Section). C) In a single speaker condition, neural tracking of the speech envelope was significant for widespread areas, most pronounced over auditory processing regions. D) The condition effect indicates a decrease in neural speech tracking with increasing noise (1 distractor). E) Stronger prediction tendency was associated with increased neural speech tracking over left frontal areas. F) However, there was no interaction between prediction tendency and conditions of selective attention. G) Increased neural tracking of semantic violations was observed over left temporal areas. H) There was no interaction between word surprisal and speaker condition, suggesting a representation of surprising words independent of background noise. Marked sensors indicate ‘significant’ clusters, defined as at least two neighboring channels showing a significant result. N = 29.”

      Gehmacher, Q., Schubert, J., Kaltenmaier, A., Weisz, N., & Press, C. (2024b). The "Ocular Response Function" for encoding and decoding oculomotor related neural activity. bioRxiv, 2024-11.

      (10) Bayesian hypothesis testing is not done consistently. Some parts test for inclusion of 0 in 94% HDI, while some parts adopt a ROPE approach. The same approach should be taken throughout. Additionally, Bayes factors would be very helpful (I appreciate these depend on the choice of priors, but the default Bambi priors should be fine).

      Our primary aim in this study was to replicate two recent findings: (1) the relationship between individual prediction tendencies and neural speech tracking, and (2) the tracking of the speech envelope by eye movements. To maintain methodological consistency with the original studies, we did not apply a ROPE approach when analyzing these replication effects. Instead, we followed the same procedures as the original work, focusing on the inclusion of 0 in the HDI for the neural effects and using the same methods for the ocular effects. Additionally, we were not specifically interested in potential null effects in these replication analyses, as our primary goal was to test whether we could reproduce the previously reported findings.

      For the mediation analysis, however, we chose to extend the original approach by not only performing the analysis in a time-resolved manner but also applying a ROPE approach. This decision was motivated by our interest in gaining more comprehensive insights — beyond the replication goals — by also testing for potential null effects, which can provide valuable information about the presence or absence of mediation effects.

      We appreciate your thoughtful feedback and hope this clarifies our rationale for the differing approaches in our Bayesian hypothesis testing. 

      Regarding Bayes Factors, 

      We understand that some researchers find Bayes Factors appealing, as they offer a seemingly simple and straightforward way to evaluate the evidence in favor of/ or against H0 in relation to H1 (e.g. BF10 > 102 =  Decisive; according to the Jeffreys Scale). However, in practice Bayes Factors are often misunderstood e.g. by interpreting Bayes Factor as posterior odds or not acknowledging the notion of relative evidence in the Bayes Factor (see Wong et al. 2022). Instead of using Bayes Factors, we prefer to rely on estimating and reporting the posterior distribution of parameters given the data, prior and model assumptions (in form of the 94% HDI). This allows for a continuous evaluation of evidence for a given hypothesis that is in our eyes easier to interpret as a Bayes Factor.

      Jeffreys, Harold (1998) [1961]. The Theory of Probability (3rd ed.). Oxford, England. p. 432. ISBN 9780191589676.

      Wong, T. K., Kiers, H., & Tendeiro, J. (2022). On the Potential Mismatch Between the Function of the Bayes Factor and Researchers’ Expectations. Collabra: Psychology, 8(1), 36357. https://doi.org/10.1525/collabra.36357

      (11) It would be helpful if Results could be appreciated without a detailed read of Methods. I would recommend a recap of each key methodological step before introducing the relevant Result. (This may also help in making the rationale explicit.)

      In addition to the short recaps of methods that were already present, and information on quantifications of neural and ocular tracking and bayes statistics (see responses 1, 2, 9), we now added the following parts below to the results sections. Please refer to them in the context of the manuscript where they should now complement a key recap of methodological steps necessary to readily understand each analysis and rational that led to the results:

      Individual prediction tendency is related to neural speech tracking:

      “Thus, this measure is a single value per subject, which comprises a) differences between two contextual probabilities (i.e. ordered vs. random) in b) feature-specific tone representations c) in advance of their observation (summed over a time-window of -0.3 - 0 s). Importantly, this prediction tendency was assessed in an independent entropy modulation paradigm (see Fig. 1). On a group level we found an increased tendency to pre-activate a stimulus of high probability (i.e. forward transition) in an ordered context compared to a random context (see Fig, 2A). This effect replicates results from our previous work (Schubert et al., 2023, 2024). Using the summed difference between entropy levels (ordered - random) across pre-stimulus time, one value was extracted per subject (Fig. 2B). This value was used as a proxy for “individual prediction tendency” and correlated with encoding of clear speech across different MEG sensors. [...]

      Neural speech tracking, quantified as the correlation coefficients between predicted and observed MEG responses to the speech envelope, was used as the dependent variable in Bayesian regression models. These models included condition (single vs. multi-speaker) as a fixed effect, with either prediction tendency or word surprisal as an additional predictor, and random effects for participants.”

      Eye movements track acoustic speech in selective attention:

      “For this, we separately predicted horizontal and vertical eye movements from the acoustic speech envelope using temporal response functions (TRFs). The resulting model fit (i.e. correlation between true and predicted eye movements) is commonly referred to as “speech tracking”. Bayesian regression models were applied to evaluate tracking effects under different conditions of selective attention (single speaker, attended multi-speaker, unattended multi-speaker). Furthermore, we assessed whether individual prediction tendency or semantic word surprisal influenced ocular speech tracking.”

      Neural speech tracking is mediated by eye movements:

      “This model evaluates to what extent gaze behaviour functions as a mediator between acoustic speech input and brain activity.”

      Neural and ocular speech tracking are differently related to comprehension: “Bayesian regression models were used to investigate relationships between neural/ocular speech tracking and comprehension or difficulty. Ocular speech tracking was analyzed separately for horizontal and vertical eye movements.”

      (12) The research questions in the Introduction should be sharpened up, to make explicit when a question concerns a theoretical entity, and when it concerns something concretely measured/measurable.

      We sharpened them up:

      “Taking into account the aforementioned study by Schubert and colleagues (2023), the two recently uncovered predictors of neural tracking (individual prediction tendency and ocular tracking) raise several empirical questions regarding the relationship between predictive processes, selective attention, and active ocular sensing in speech processing:

      (1) Are predictive processes related to active ocular sensing in the same way they are to neural speech tracking? Specifically, do individuals with a stronger tendency to anticipate predictable auditory features, as quantified through prestimulus neural representations in an independent tone paradigm, show increased or even decreased ocular speech tracking, measured as the correlation between predicted and actual eye movements? Or is there no relationship at all?

      (2) To what extent does selective attention influence the relationship between prediction tendency, neural speech tracking, and ocular speech tracking? For example, does the effect of prediction tendency or ocular speech tracking on neural tracking differ between a single-speaker and multi-speaker listening condition?

      (3) Are individual prediction tendency and ocular speech tracking related to behavioral outcomes, such as comprehension and perceived task difficulty? Speech comprehension is assessed through accuracy on comprehension questions, and task difficulty is measured through subjective ratings.

      Although predictive processes, selective attention, and active sensing have been shown to contribute to successful listening, their potential interactions and specific roles in naturalistic speech perception remain unclear. Addressing these questions will help disentangle their contributions and establish an integrated framework for understanding how neural and ocular speech tracking support speech processing.”

      (13) The negative relationship between story comprehension and ocular speech tracking appears to go against the authors' preferred interpretation, but the reflection on this in the Discussion is very brief and somewhat vague.

      Thank you for pointing this out. We have taken your comments into careful consideration and also incorporated Reviewer #1's query (Minor point 2) into a unified and complementary reasoning. We have rewritten the relevant paragraph in the discussion to provide a clearer and more detailed explanation. We hope this revision offers a more precise and less vague discussion on this important point.

      “Despite the finding that eye movements mediate neural speech tracking, the behavioural relevance for semantic comprehension appears to differ between ocular and neural speech tracking. Specifically, we found a negative association between ocular speech tracking and comprehension, indicating that participants with lower comprehension performance exhibited increased ocular speech tracking. Interestingly, no significant relationship was observed between neural tracking and comprehension.

      In this context, the negative association between ocular tracking and comprehension might reflect individual differences in how participants allocate cognitive resources. Participants with lower comprehension may rely more heavily on attentional mechanisms to process acoustic features, as evidenced by increased ocular tracking. This reliance could represent a compensatory strategy when higher-order processes, such as semantic integration or memory retrieval, are less effective. Importantly, our comprehension questions (see Experimental Procedure) targeted a broad range of processes, including intelligibility and memory, suggesting that this relationship reflects a trade-off in resource allocation between low-level acoustic focus and integrative cognitive tasks.

      Rather than separating eye and brain responses conceptually, our analysis highlights their complementary contributions. Eye movements may enhance neural processing by increasing sensitivity to acoustic properties of speech, while neural activity builds on this foundation to integrate information and support comprehension. Together, these systems form an interdependent mechanism, with eye and brain responses working in tandem to facilitate different aspects of speech processing.

      This interpretation is consistent with the absence of a difference in ocular tracking for semantic violations (e.g., words with high surprisal versus lexically matched controls), reinforcing the view that ocular tracking primarily reflects attentional engagement with acoustic features rather than direct involvement in semantic processing. This aligns with previous findings that attention modulates auditory responses to acoustic features (e.g., Forte et al., 2017), further supporting the idea that ocular tracking reflects mechanisms of selective attention rather than representations of linguistic content.

      Future research should investigate how these systems interact and explore how ocular tracking mediates neural responses to linguistic features, such as lexical or semantic processing, to better understand their joint contributions to comprehension.”.  

      (14) Page numbers would be helpful.

      We added the page numbers.

      Reviewer #3 (Recommendations for the authors):

      Results

      (1) Figure 2 - statistical results are reported in this figure, but they are not fully explained in the text, nor are statistical values provided for any of the analyses (as far as I can tell).

      Also, how were multiple comparisons dealt with (the choice of two neighboring channels seems quite arbitrary)? Perhaps for this reason, the main result - namely the effect of "prediction tendency" and "semantic violations" - is quite sparse and might not survive more a rigorous statistical criterion. I would feel more comfortable with these results if the reporting of the statistical analysis had been more thorough (ideally, including comparison to control models).

      We would like to thank you again for your detailed queries, comments, and questions on our work. We first of all adapted this figure (now Figure 3 in the manuscript, please see responses 8 and 9 to Reviewer #2) to help readers understand the metrics and values within each statistical analysis. In addition, we indeed did not include the detailed statistics in the text! We now added the missing statistic reports calculated as averages over ‘clusters’:

      “Replicating previous findings (Schubert et al., 2023), we found widespread encoding of clear speech (average over cluster: β = 0.035, 94%HDI = [0.024, 0.046]), predominantly over auditory processing regions (Fig. 3C), that was decreased (β = -0.018, 94%HDI = [0.029, -0.006]) in a multi-speaker condition (Fig. 3D). Furthermore, a stronger prediction tendency was associated with increased neural speech tracking (β = 0.014, 94%HDI = [0.004, 0.025]) over left frontal sensors (see Fig. 3E). We found no interaction between prediction tendency and condition (see Fig. 3F).” [...] “In a direct comparison with lexically identical controls, we found an increased neural tracking of semantic violations (β = 0.039, 94%HDI = [0.007, 0.071]) over left temporal areas (see Fig. 3G). Furthermore, we found no interaction between word surprisal and speaker condition (see Fig. 3H).”

      Regarding the "prediction tendency" effect, it is important to note that this finding replicates a result from Schubert et al. (2023). The left frontal location of this effect is also consistent over studies, which convinces us of the robustness of the finding. Furthermore, testing this relationship properly requires a mixed-effects model in order to account for the variability across subjects that is not explained by fixed effects and the repeated measures design. For this reason a random Intercept had to be fitted for each subject (1|subject in the respective model formula). This statistical requirement motivated our decision to use bayesian statistics as (at least to our knowledge) there is no implementation of a cluster-based permutation mixed effects model (yet). In order to provide a more conservative criterion (as bayesian statistics don’t require a multiple comparison correction) we chose to impose in addition the requirement of a “clustered” effect.

      The choice of using two neighboring channels is consistent with the default parameter settings in FieldTrip’s cluster-based permutation testing (cfg.minnbchan = 2). This parameter specifies the minimum number of neighboring channels required for a sample to be included in the clustering algorithm, ensuring spatial consistency in the identified clusters. This alignment ensures that our methodology is comparable to numerous prior studies in the field, where such thresholds are standard. While it is true that all statistical analyses involve some degree of arbitrariness in parameter selection (e.g., alpha levels or clustering thresholds), our approach reflects established conventions and ensures comparability with previous findings.

      While the original study utilized source space analyses, we replicated this effect using only 102 magnetometers. This choice was made for computational simplicity, demonstrating that the effect is robust even without source-level modeling. Similarly, the "semantic violation" effect, while perceived as sparse, is based solely on magnetometer data and - in our opinion - should not be viewed as overly sparse given the methods employed. This effect aligns with the two-neighbor clustering approach, ensuring spatial consistency across magnetometers. The results reflect the robustness of the effects within the constraints of magnetometer-level analyses.

      Overall, the methodological choices, including the choice of a bayesian linear mixed effects model, the use of two neighboring channels and the reliance on magnetometers, are grounded in established practices and methodological considerations. While stricter thresholds or alternative approaches might yield different results, our methods align with best practices in the field and ensure the robustness, comparability, and replicability of our findings.

      (2) Figure 3 - the difference between horizontal and vertical eye-movements. This result is quite confusing and although the authors do suggest a possible interpretation for this in the discussion, I do wonder how robust this difference is or whether the ocular signal (in either direction) is simply too noisy or the effect too small to be detected consistently across conditions. Also, the ocular-TRFs themselves are not entirely convincing in suggesting reliable response/tracking of the audio - despite the small-but-significant increase in prediction accuracy.

      The horizontal versus vertical comparison was conducted to explore potential differences in how these dimensions contribute to ocular tracking of auditory stimuli (please also see our response to Reviewer #1, Response 5b, that includes the vertical vs. horizontal effects of Gehmacher at al. 2024). It would indeed be interesting to develop a measure that combines the two directions into a more natural representation of 'viewing,' such as a combined vector. However, this approach would require the use of complex numbers to represent both magnitude and direction simultaneously, hence the development of novel TRF algorithms capable of modeling this multidimensional signal. While beyond the scope of the current study, this presents an exciting avenue for future research and would allow us to move closer to understanding ocular speech tracking and the robustness of these effects, above and beyond the already successful replication.

      It is also important to emphasize that ocular-TRFs are derived from (viewing) behavioral data rather than neural signals, and are thus inherently subject to greater variability across participants and time. This higher variability does not necessarily indicate a small or unreliable effect but reflects the dynamic and task-dependent nature of eye movement behavior. The TRFs with shaded error margins represent this variability, highlighting how eye movements are influenced by both individual differences and moment-to-moment changes in task engagement.

      Despite this inherent variability, the significant prediction accuracy improvements confirm that ocular-TRFs reliably capture meaningful relationships between eye movements and auditory stimuli. The observed differences between horizontal and vertical TRFs further support the hypothesis that these dimensions are differentially involved in the task, possibly driven by the specific roles they play in sensorimotor coupling.

      (3) Figure 4 - this figure shows source distribution of 3 PCA components, derived from the results of the mediation effect of eye movements on the speech-tracking. Here too I am having difficulty in interpreting what the results actually are. For one, all three components are quite widespread and somewhat overlapping, so although they are statistically "independent" it is hard to learn much from them about the brain regions involved and whether they truly represent separable contributions. Similarly difficult to interpret are the time courses, which share some similarities with the known TRFs to speech (especially PC3). I would have expected to find a cleaner "auditory" response, and clearer separation between sensory regions and regions involved in the control of eye movements. I also wonder why the authors chose not to show the sourcelocalization of the neural and ocular speech-tracking responses alone - this could have helped us between understand what "mediation" of the neural response might look like.

      We appreciate the reviewer’s interest in better understanding the source distribution and time courses of the PCA components. While we acknowledge that the widespread and overlapping nature of the components may make a more fine grained interpretation challenging, it is important to emphasize that our analysis simply reflects the data, hence we can only present and interpret what the analysis revealed.

      Regarding your suggestion to show the source localization of ocular speech tracking and neural speech tracking alone, we would like to point out that ocular tracking is represented by only one channel for vertical and one channel for horizontal eye movements. Thus, in this case the estimated source of the effect are the eyes themselves. We believe that the source localization of neural speech tracking has been a thoroughly studied topic in research so far (locating it to perisylvian, auditory areas with a stronger preference for the left hemisphere) and can also be seen in Schubert et al., (2023). Nevertheless, we believe the observed PCA components still provide valuable, and most importantly novel insights into the interplay between eye movements and neural responses in speech tracking.  

      Discussion/interpretation

      (1) Although I appreciate the authors' attempt to propose a "unified" theoretical model linking predictions about low-level features to higher features, and the potential involvement of eye movements in 'active sensing' I honestly think that this model is overambitious, given the data presented in the current study. Moreover, there is very little discussion of past literature and existing models of active sensing and hierarchical processing of speech, that could have helped ground the discussion in a broader theoretical context. The entire discussion contains fewer than 20 citations (some of which are by these authors) and needs to be substantially enriched in order to provide context for the authors' claims.

      Thank you very much for your thoughtful feedback and for appreciating our approach. We hope that the revised manuscript addresses your concerns. Specifically, we now emphasize that our proposal is a conceptual framework, with the main goal to operationale “prediction tendency”, “active ocular sensing”, and “selective attention” and to “organise these entities according to their assumed function for speech processing and to describe their relationship with each other.” We did this by thoroughly revising our discussion section with a clear emphasis on the definition of terms, for example: 

      “With this speculative framework we attempt to describe and relate three important phenomena with respect to their relevance for speech processing: 1) “Anticipatory predictions” that are created in absence of attentional demands and contain probabilistic information about stimulus features (here, inferred from frequency-specific pre-activations during passive listening to sound sequences). 2) “Selective attention” that allocates resources towards relevant (whilst suppressing distracting) information (which was manipulated by the presence or absence of a distractor speaker). And finally 3) “active ocular sensing”, which refers to gaze behavior that is temporally aligned to attended (but not unattended) acoustic speech input (inferred from the discovered phenomenon of ocular speech tracking).”

      Our theoretical proposals are now followed by a recap of our results that support the respective idea, for example: 

      “...these predictions are formed in parallel and carry high feature-specificity but low temporal precision (as they are anticipatory in nature). This idea is supported by our finding that pure-tone anticipation is visible over a widespread prestimulus interval, instead of being locked to sound onset”

      “....we suggest that active (ocular) sensing does not necessarily convey feature- or content-specific information, it is merely used to boost (and conversely filter) sensory input at specific timescales (similar to neural oscillations). This assumption is supported by our finding that semantic violations are not differentially encoded in gaze behaviour than lexical controls.”

      And we put a strong focus on highlighting the boundaries of these ideas, in order to avoid theoretical confusion, misunderstandings or implicit theoretical assumption that are not grounded in data, in particular: 

      “In fact, when rejecting the notion of a single bottom-up flow of information and replacing it with a model of distributed parallel and dynamic processing, it seems only reasonable to assume that the direction of communication (between our eyes and our brain) will depend on where (within the brain) as well as when we look at the effect. Thus, the regions and time-windows reported here should be taken as an illustration of oculo-neural communication during speech processing rather than an attempt to "explain" neural speech processing by ocular movements.”

      “Even though the terminology [“hierarchy”] is suggestive of a fixed sequence (similar to a multi storey building) with levels that must be traversed one after each other (and even the more spurious idea of a rooftop, where the final perceptual experience is formed and stored into memory), we distance ourselves from these (possibly unwarranted) ideas. Our usage of “higher” or “lower” simply refers to the observation that the probability of a feature at a higher (as in more associative) level affects the interpretation (and thus the representation and prediction) of a feature at lower (as in more segregated) levels (Caucheteux et al., 2023).”

      Additionally, we have made substantial efforts to present complementary results (see response to Reviewer #2, point 8) to further substantiate our interpretation. Importantly, we have updated the illustration of the model (see response to Reviewer #, minor point 1) and refined both our interpretations and the conceptual language in the Discussion. Furthermore, we have included additional citations where appropriate to strengthen our argument.

      We would also like to briefly note that this section of the Discussion aimed to highlight existing literature that bridges the gap our model seeks to address. However, as this is a relatively underexplored area, the references available are necessarily limited.

      (2) Given my many reservations about the data, as presented in the current version of the manuscript, I find much of the discussion to be an over-interpretation of the results. This might change if the authors are able to present more robust results, as per some of my earlier comments.

      We sincerely hope that our comprehensive revisions have addressed your concerns and improved the manuscript to your satisfaction.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):*

      The manuscript by Hariani et al. presents experiments designed to improve our understanding of the connectivity and computational role of Unipolar Brush Cells (UBCs) within the cerebellar cortex, primarily lobes IX and X. The authors develop and cross several genetic lines of mice that express distinct fluorophores in subsets of UBCs, combined with immunocytochemistry that also distinguishes subtypes of UBCs, and they use confocal microscopy and electrophysiology to characterize the electrical and synaptic properties of subsets of so-labelled cells, and their synaptic connectivity within the cerebellar cortex. The authors then generate a computer model to test the possible computational functions of such interconnected UBCs.

      Using these approaches, the authors report that:

      1) GRP-driven TDtomato is expressed exclusively in a subset (20%) of ON-UBCs, defined electrophysiologically (excited by mossy fiber afferent stimulation via activation of UBC AMPA and mGluR1 receptors) and immunocytochemically by their expression of mGluR1.

      2) UBCs ID'd/tagged by mCitrine expression in Brainbow mouse line P079 are expressed in a similar minority subset of OFF-UBCs defined electrophysiologically (inhibited by mossy fiber afferent stimulation via activation of UBC mGluR2 receptors) and immunocytochemically by their expression of Calretinin. However, such mCitrine expression was also detected in some mGluR1 positive UBCs, which may not have shown up electrophysiologically because of the weaker fluorophore expression without antibody amplification.

      This is correctly stated with the exception that the P079 mouse line itself expresses mCitrine. The Brainbow mouse line was used in the connectivity study by crossing it to the GRP-Cre or Calretinin-Cre lines.

      3) Confocal analysis of crossed lines of mice (GRP X P079) stained with antibodies to mGluR1 and calretinin documented the existence of all possible permutations of interconnectivity between cells (ON-ON, ON-OFF, OFF-OFF, OFF-ON), but their overall abundance was low, and neither their absolute nor relative abundance was quantified.

      They were certainly rare to observe using our approaches, but we reasoned that the densities of such connections are not possible to estimate accurately. Please see discussion below.

      4) A computational model (NEURON ) indicated that the presence of an intermediary UBC (in a polysynaptic circuit from MF to UBC to UBC) could prolong bursts (MF-ON-ON), prolong pauses (MF-ON-OFF), cause a delayed burst (MF-OFF- OFF), cause a delayed pause (MF-OFF-ON) relative to solely MF to UBC synapses which would simply exhibit long bursts (MF-ON) or long pauses (MF-OFF).

      The authors thus conclude that the pattern of interconnected UBCs provides an extended and more nuanced pattern of firing within the cerebellar cortex that could mediate longer-lasting sensorimotor responses.

      The cerebellum's long-known role in motor skills and reflexes, and associated disorders, combined with our nascent understanding of its role in cognitive, emotional, and appetitive processing, makes understanding its circuitry and processing functions of broad interest to the neuroscience and biomedical community. The focus on UBCs, which are largely restricted to vestibular lobules of the cerebellum reduces the breadth of likely interest somewhat. The overall design of specific experiments is rigorous and the use of fluorophore expressing mouse lines is creative. The data that is presented and the writing are clear. However, the overall experimental design has issues that reduce overall interpretation (please see specific issues for details), which combined with a lack of thorough analysis of the experimental outcomes severely undermines the value of the NEURON model results and the advance in our understanding of cerebellar processing in situ (again, please see specific issues for details).

      Specific issues:

      1) All data gathered with inhibition blocked. All of the UBC response data (Fig. 1) was gathered in the presence of GABAAR and Glycine R blockers. While such an approach is appropriate generally for isolating glutamatergic synaptic currents, and specifically for examining and characterizing monosynaptic responses to single stimuli, it becomes problematic in the context of assaying synaptic and action potential response durations for long-lasting responses, and in particular for trains of stimuli, when feed-forward and feed-back inhibition modulates responses to afferent stimulation. That is, even for single MF stimuli, given the >500ms duration of UBC synaptic currents, there is plenty of time for feedback inhibition from Golgi cells (or feedforward, from MF to Golgi cell excitation) to interrupt AP firing driven by the direct glutamatergic synaptic excitation. This issue is compounded further for all of the experiments examining trains of MF stimuli. Beyond the impact of feedback inhibition on the AP firing of any given UBC, it would also obviously reduce/alter/interrupt that UBC's synaptic drive of downstream UBCs. This issue fundamentally undermines our ability to interpret the simulation data of Vm and AP firing of both the modeled intermediate and downstream UBC, in terms of applying it to possible cerebellar cortical processing in situ.

      The goal of Figure 1 was to determine the cell types of labeled UBCs in transgenic mouse lines, which is determined entirely by their synaptic responses to glutamate (Borges-Merjane and Trussell, 2015). Thus, blocking inhibition was essential to produce clear results in the characterization of GRP and P079 UBCs. While GABAergic/glycinergic feedforward and feedback inhibition is certainly important in the intact circuit, it was not our intention, nor was it possible, to study its contribution in the present study. Leaving inhibition unblocked does not lead to a physiologically realistic stimulation pattern in acute brain slices, because electrical stimulation produces synchronous excitation and inhibition by directly exciting Golgi cells, rather than their synaptic inputs. The main inhibition that UBCs receive that are crucial to determining burst or pause durations is not via GABA/glycine, but instead through mGluR2, which lasts for 100-1000s of milliseconds. The main excitation that drives UBC firing is mGluR1 and AMPA, which both last 100-1000s of milliseconds. Thus, these large conductances are unlikely to be significantly shaped by 1-10 ms IPSCs from feedforward and feedback GABA/glycine inhibition. Recent studies that examined the duration of bursting or pausing in UBCs had inhibition blocked in their experiments, presumably for the reasons outlined above (Guo et al., 2021; Huson et al., 2023).

      In Author response image 1 is an example showing the synaptic currents and firing patterns in an ON UBC before and after blocking inhibition. The GABA/glycinergic inhibition is fast, occurs soon after the stimuli and has little to no effect on the slow inward current that develops after the end of stimulation, which is what drives firing for 100s of milliseconds.

      Author response image 1.

      Example showing small effect of GABAergic and glycinergic inhibition on excitatory currents and burst duration. A) Excitatory postsynaptic currents in response to train of 10 presynaptic stimuli at 50 Hz before (black) and after (Grey) blocking GABA and glycine receptors. The slow inward current that occurs at the end of stimulation is little affected. B) Expanded view of the synaptic currents evoked during the train of stimuli. GABA/glycine receptors mediate the fast outward currents that occur immediately after the first couple stimuli. C) Three examples of the bursts caused by the 50 Hz stimulation in the same cell without blocking GABA and glycine receptors. D) Three examples in the same cell after blocking GABA and glycine receptors.

      2) No consideration for the involvement of polysynaptic UBCs driving UBC responses to MF stimulation in electrophysiology experiments. Given the established existence (in this manuscript and Dino et al. 2000 Neurosci, Dino et al. 2000 ProgBrainRes, Nunzi and Mugnaini 2000 JCompNeurol, Nunzi et al. 2001 JCompNeurol) of polysynaptic connections from MFs to UBCs to UBCs, the MF evoked UBC responses established in this manuscript, especially responses to trains of stimuli could be mediated by direct MF inputs, or to polysynaptic UBC inputs, or possibly both (to my awareness not established either way). Thus the response durations could already include extension of duration by polysynaptic inputs, and so would overestimate the duration of monosynaptic inputs, and thus polysynaptic amplification/modulation, observed in the NEURON model.

      We are confident that the synaptic responses shown are monosynaptic for several reasons. UBCs receive a single mossy fiber input on their dendritic brush, and thus if our stimulation produces a reliable, short-latency response consistent with a monosynaptic input, then there is not likely to be a disynaptic input, because the main input is accounted for by the monosynaptic response. In all cells included in our data set, the fast AMPA receptor-mediated currents always occurred with short latency (1.24 ± 0.29 ms; mean ± SD; n = 13), high reliability (no failures to produce an EPSC in any of the 13 GRP UBCs in this data set), and low jitter (SD of latency; 0.074 ± 0.046 ms; mean ± SD; n = 13). These measurements have been added to the results section. In some rare cases, we did observe disynaptic currents, which were easily distinguishable because a single electrical stimulation produced a burst of EPSCs at variable latencies. Please see example in Author response image 2. These cases of disynaptic input, which have been reported by others (Diño et al., 2000; Nunzi and Mugnaini, 2000; van Dorp and De Zeeuw, 2015) support the conclusion that UBCs receive input from other UBCs.

      Author response image 2.

      Example of GRP UBC with disynaptic input. Three examples of the effect of a single presynaptic stimulus (triangle) in a GRP UBC with presumed disynaptic input. Note the variable latency of the first evoked EPSC, bursts of EPSCs, and spontaneous EPSCs.

      3) Lack of quantification of subtypes of UBC interconnectivity. Given that it is already established that UBCs synapse onto other UBCs (see refs above), the main potential advance of this manuscript in terms of connectivity is the establishment and quantification of ON-ON, ON-OFF, OFF-ON, and OFF-OFF subtypes of UBC interconnections. But, the authors only establish that each type exists, showing specific examples, but no quantification of the absolute or relative density was provided, and the authors' unquantified wording explicitly or implicitly states that they are not common. This lack of quantification and likely small number makes it difficult to know how important or what impact such synapses have on cerebellar processing, in the model and in situ.

      As noted by the reviewer, the connections between UBCs were rare to observe. We decided against attempting to quantify the absolute or relative density of connections for several reasons. A major reason for rare observations of anatomical connections between UBCs is likely due to the sparse labeling. First, the GRP mouse line only labels 20% of ON UBCs and we are unable to test whether postsynaptic connectivity of GRP ON UBCs is the same as that of the rest of the population of ON UBCs that are not labeled in the GRP mouse line. Second, the Brainbow reporter mouse only labels a small population of Cre expressing cells for unknown reasons. Third, the Brainbow reporter expression was so low that antibody amplification was necessary, which then limited the labeled cells to those close to the surface of the brain slices, because of known antibody penetration difficulties. Therefore, we refrained from estimating the density of these connections, because each of these variables reduced the labeling to unknown degrees and we reasoned that extrapolating our rare observations to the total population would be inaccurate.

      A paper that investigated UBC connectivity using organotypic slice cultures from P8 mice suggests that 2/3 of the UBC population receives UBC input, based on the observation that 2/3 of the mossy fibers did not degenerate as would be expected after 2 days in vitro if they were severed from a distant cell body (Nunzi and Mugnaini, 2000). It remains to be seen if this high proportion is due to the young age of these mice or is also the case in adult mice. Even if these connections are indeed rare, they are expected to have profound effects on the circuit, as each UBC has multiple mossy fiber terminals (Berthie and Axelrad, 1994), and mossy fiber terminals are estimated to contact 40 granule cells each (Jakab and Hamori, 1988). We have added a comment regarding this point to the discussion.

      4) Lack of critical parameters in NEURON model.

      A) The model uses # of molecules of glutamate released as the presumed quantal content, and this factor is constant. However, no consideration of changes in # of vesicles released from single versus trains of APs from MFs or UBCs is included. At most simple synapses, two sequential APs alters release probability, either up or down, and release probability changes dynamically with trains of APs. It is therefore reasonable to imagine UBC axon release probability is at least as complicated, and given the large surface area of contact between two UBCs, the number of vesicles released for any given AP is also likely more complex.

      B) the model does not include desensitization of AMPA receptors, which in the case of UBCs can paradoxically reduce response magnitude as vesicle release and consequent glutamate concentration in the cleft increases (Linney et al. 1997 JNeurophysiol, Lu et al. 2017 Neuron, Balmer et al. 2021 eLIFE), as would occur with trains of stimuli at MF to ON-UBCs.

      A) The model produces synaptic AMPA and mGluR2 currents that reproduce those we recorded in vitro. We did not find it necessary to implement changes in glutamate release during a train as the model was fit to UBC data with the assumption that the glutamate transient did not change during the train. If there is a change in neurotransmitter release during a train, it is therefore built into the model, which has the advantage of reducing its complexity. UBCs are a special case where the postsynaptic currents are mediated mostly by the total amount of transmitter released. Most of the evoked current occurs tens to hundreds of milliseconds after neurotransmitter release and is therefore much more sensitive to total release and less sensitive to how it is released during the train. Author response image 3 shows the effect of reducing the amount of glutamate released by 10% on each stimulus in the model. Despite a significant change in the pattern of neurotransmitter release, as well as a reduction in the total amount of glutamate, the slow EPSC still decays over the course of hundreds of milliseconds.

      Author response image 3.

      Effect of short-term depression of neurotransmitter release. A) The top trace shows the glutamate transient that drives the AMPA receptor model used in our study. No change in release is implemented, although the slow tail of each transient summates during the train. The bottom trace shows the modeled AMPA receptor mediated current. B) In this model the amount of glutamate released is reduced by 10% on each stimulus. The duration of the slow AMPA current that develops at the end of stimulation is similar, despite a profound change in the pattern of neurotransmitter exposure.

      B) The detailed kinetic AMPA receptor model used here accurately reproduces desensitization, and in fact recovery from desensitization is what mediates the slow ON UBC current. This AMPA receptor is a 13-state model, including 4 open states with 1-4 glutamates bound, 4 closed states with 1-4 glutamates bound, 4 desensitized states with 1-4 glutamates bound, and 5 closed states with 0-4 glutamates bound. The forward and reverse rates between different states in the model were fit to AMPA receptor currents recorded from dissociated UBCs and they accurately reproduced the ON UBC currents evoked by synaptic stimulation in our previous work (Balmer et al., 2021).

      5) Lack of quantification of various electrophysiological responses. UBCs are defined (ON or OFF) based on inward or outward synaptic response, but no information is provided about the range of the key parameter of duration across cells, which seems most critical to the current considerations. There is a similar lack of quantification across cells of AP duration in response to stimulation or current injections, or during baseline. The latter lack is particularly problematic because, in agreement with previous publications, the raw data in Fig. 1 shows ON UBCs as quiescent until MF stimulation and OFF UBCs firing spontaneously until MF stimulation, but, for example, at least one ON UBC in the NEURON model is firing spontaneously until synaptically activated by an OFF UBC (Fig. 11A), and an OFF UBC is silent until stimulated by a presynaptic OFF UBC (Fig. 11C). This may be expected/explainable theoretically, but then such cells should be observed in the raw data.

      To address this reasonable concern of a general lack of quantification of electrophysiological responses we have added data characterizing the slow inward and outward currents evoked by synaptic stimulation in GRP and P079 UBCs in the results section and in new panels in Figure 1. We report the action potential pause lengths in P079 UBCs and burst lengths in ON UBCs in the results section. However, we favor the duration of the currents to the length of burst and pause, because the currents do not depend on a stable resting membrane potential, which is itself difficult to determine in intracellular recordings of these small cells. We have added peak times and decay time constants of the slow inward and outward currents in ON and OFF UBCs in the results section and have added new panels to figure 1.

      In a series of recent publications that focused on UBC firing, the authors argue that cell-attached recordings are necessary to determine accurately the burst and pause lengths, as well as spontaneous firing rates (Guo et al., 2021; Huson et al., 2023). (The trade-off of these extracellular recordings is that the monosynaptic nature of the input is nearly impossible to confirm.) Spontaneous firing rates were variable within both GRP and P079 UBCs from silent to firing regularly or in bursts, as previously reported for UBCs (Kim et al., 2012; van Dorp and De Zeeuw, 2015). For clarity, we chose to model the GRP UBCs as silent unless receiving synaptic input and P079 UBCs as active unless receiving synaptic input. As the reviewer suggests, we have observed UBCs firing in the patterns similar to those shown in the model UBCs that have input from a spontaneously active presynaptic UBC. In Author response image 4 are some examples.

      Author response image 4.

      Examples of UBCs that receive spontaneous input. A) Three ON UBCs that had spontaneous EPSCs, suggesting the presence of an active presynaptic UBC. B) Two OFF UBCs that had spontaneous outward currents.

      Reviewer #2 (Public Review):

      In this paper, the authors presented a compelling rationale for investigating the role of UBCs in prolonging and diversifying signals. Based on the two types of UBCs known as ON and OFF UBC subtypes, they have highlighted the existing gaps in understanding UBCs connectivity and the need to investigate whether UBCs target UBCs of the same subtype, different subtypes, or both. The importance of this knowledge is for understanding how sensory signals are extended and diversified in the granule cell layer.

      The authors designed very interesting approaches to study UBCs connectivity by utilizing transgenic mice expressing GFP and RFP in UBCs, Brainbow approach, immunohistochemical and electrophysiological analysis, and computational models to understand how the feed-forward circuits of interconnected UBCs transform their inputs.

      This study provided evidence for the existence of distinct ON and OFF UBC subtypes based on their electrophysiological properties, anatomical characteristics, and expression patterns of mGluR1 and calretinin in the cerebellum. The findings support the classification of GRP UBCs as ON UBCs and P079 UBCs as OFF UBCs and suggest the presence of synaptic connections between the ON and OFF UBC subtypes. In addition, they found that GRP and P079 UBCs form parallel and convergent pathways and have different membrane capacitance and excitability. Furthermore, they showed that UBCs of the same subtype provide input to one another and modify the input to granule cells, which could provide a circuit mechanism to diversify and extend the pattern of spiking produced by mossy fiber input. Accordingly, they suggested that these transformations could provide a circuit mechanism for maintaining a sensory representation of movement for seconds.

      Overall, the article is well written in a sound detailed format, very interesting with excellent discovery and suggested model, however, I have some comments/suggestions that may help to improve this manuscript:

      • The discovery of UBCs innervating each other and their own subtypes, suggesting the presence of feed-forward networks in the cerebellum, is an incredibly fascinating and exciting finding followed by an intriguing model by authors. However, it is worth considering an alternative model as well. I acknowledge that visualizing such interactions using current tools and methods can be challenging ("The approaches used here were not able to determine the existence of networks of more than 2 UBCs connected one after the other. If present, 3 or more UBCs in series could extend and transform the input in even more dramatic ways. The temporal diversity that UBC circuits generate may underlie the flexibility of the cerebellum to coordinate movements over a broad range of behaviors."). Therefore, if this is the case in which more than 2 UBCs connected one after the other, then an alternative model PERHAPS resembles the basal nuclei, with its direct and indirect circuits, can be considered (maybe a type of circular model). The basal nuclei circuits are also regulated by modulators such as D1 dopamine receptors in the direct pathway, causing depolarization, and D2 dopamine receptors in the indirect pathway, resulting in hyperpolarization upon dopamine activation. This approach could involve using computational models to gain insight into potential alternatives within this pathway (may be a future direction).

      Thank you for this suggestion to consider the potentially similar circuit interactions in the basal nuclei. We will certainly investigate this further as we move forward with modeling the feed-forward networks in the cerebellum.

      • GRP UBCs are more densely distributed in lobes VI-IX, while P079 UBCs are more densely distributed in the dorsal leaflet of lobe X in sagittal sections. While the cerebellum is well known for its characteristic stripy pattern, are UBC distributions the same in coronal/transverse section?

      UBCs of different types, based on their expression of specific proteins, have overlapping but somewhat distinct distributions in coronal sections. The densities of calretinin-expressing UBCs are higher within Zebrin II positive zones and form sagittal stripes, whereas the densities of mGluR1-expressing and PLCb4-expressing UBCs vary less but are in their highest densities at the midline (Chung et al., 2009; Sekerkova et al., 2014). The difference noted by the reviewer between the dorsal and ventral leaflets of lobe X are the most distinct that we know of in the GRP and P079 populations.

      • The extension of the axons from both subtypes of UBCs show they are long enough to pass several UBCs and even projections are directed toward the white matter (e.g. Fig 9A), suggesting targeting the UBCs or granule cells in other lobules. Is it suggesting UBCs connectivity between different lobules (perhaps longitudinal connectivity)? Is there any observation or information in coronal/transverse section to visualize mediolateral connectivity?

      This is certainly worth exploring in future work. UBCs have been reported to project their axons into and across the white matter (Diño et al., 2000). To our knowledge, whether UBCs project their axons out of one lobule and into another has not been examined.

      • The limitation in identifying networks involving more than two sequentially connected UBCs was briefly noted. I suggest including a paragraph describing limitations and discussing the implications of the findings would enhance the overall impact of the research and broaden our understanding of cerebellar function.

      • It is a pity that there is no clear conclusion to the discussion of this very interesting study. I suggest providing the key points as a conclusion.

      Thank you for these suggestions. Limitations and implications are included throughout the discussion section and we feel that the summary figure and significance statement now sufficiently convey the key conclusions of the study.

      • Please make the correction in Figure 2A by relabeling it as IXa, IXb, and IXc to correct the typographical error.

      Fixed

      • I recommend rotating Figure 7A to align its orientation with the other figures for consistency.

      Fixed

      Reviewer #1 (Recommendations For The Authors):

      Minor comments that should be addressed for clarity:

      1) In the NEURON model, why was the reversal potential for the leak conductance and Gmax for Ih different for the two types of UBCs. Relatedly, why is Erev for GABAB -95mV if Ek is -90mV?

      The h-current (Ih) was estimated from a hyperpolarizing current step in both cell types and these data have been added to the result section and as a panel in Figure 1. The conductance of Ih in the model cells were adjusted accordingly, with OFF UBCs having ~3 times that of ON UBCs and approximated the measured voltage sag, as we now describe in the methods section. The reversal potential of the model mGluR2 current (which is based on a model of GABAB) has been fixed.

      2) Line 69 justification for their dual genetic approach is a bit too strong: "Paired recordings not possible". It may be difficult, but it is certainly possible.

      Reworded

      3) Confusing wording, only one stat for two parameters? Line 93: These currents were produced by both mGluR1 and AMPA receptors, as they were blocked by their antagonists JNJ16259685 and GYKI53655, respectively (92.86% {plus minus} 3.25; paired t-test; P=0.0066; n = 9; 95 mean {plus minus} SEM) (Fig 1D-E).

      Reworded

      References

      Balmer TS, Borges-Merjane C, Trussell LO (2021) Incomplete removal of extracellular glutamate controls synaptic transmission and integration at a cerebellar synapse. eLife 10:e63819.

      Berthie B, Axelrad H (1994) Granular layer collaterals of the unipolar brush cell axon display rosette-like excrescences. A Golgi study in the rat cerebellar cortex. Neuroscience Letters 167:161–165.

      Borges-Merjane C, Trussell LO (2015) ON and OFF unipolar brush cells transform multisensory inputs to the auditory system. Neuron 85:1029–1042.

      Chung SH, Sillitoe RV, Croci L, Badaloni A, Consalez G, Hawkes R (2009) Purkinje cell phenotype restricts the distribution of unipolar brush cells. Neuroscience 164:1496–1508.

      Diño MR, Schuerger RJ, Liu Y-B, Slater NT, Mugnaini E (2000) Unipolar brush cell: a potential feedforward excitatory interneuron of the cerebellum. Neuroscience 98:625–636.

      Guo C, Huson V, Macosko EZ, Regehr WG (2021) Graded heterogeneity of metabotropic signaling underlies a continuum of cell-intrinsic temporal responses in unipolar brush cells. Nat Commun 12:5491.

      Huson V, Newman LN, Regehr WG (2023) A continuum of response properties across the population of Unipolar Brush Cells in the Dorsal Cochlear Nucleus. J Neurosci Available at: https://www.jneurosci.org/content/early/2023/07/26/JNEUROSCI.0873-23.2023 [Accessed August 15, 2023].

      Jakab RL, Hamori J (1988) Quantitative morphology and synaptology of cerebellar glomeruli in the rat. Anatomy and embryology 179:81–88.

      Kim JA, Sekerkova G, Mugnaini E, Martina M (2012) Electrophysiological, morphological, and topological properties of two histochemically distinct subpopulations of cerebellar unipolar brush cells. Cerebellum 11:1012–1025.

      Nunzi M-G, Mugnaini E (2000) Unipolar brush cell axons form a large system of intrinsic mossy fibers in the postnatal vestibulocerebellum. Journal of Comparative Neurology 422:55–65.

      Sekerkova G, Watanabe M, Martina M, Mugnaini E (2014) Differential distribution of phospholipase C beta isoforms and diaglycerol kinase-beta in rodents cerebella corroborates the division of unipolar brush cells into two major subtypes. Brain structure & function 219:719–749.

      van Dorp S, De Zeeuw CI (2015) Forward signaling by unipolar brush cells in the mouse cerebellum. Cerebellum 14:528– 533.

    1. Author Response

      The following is the authors’ response to the original reviews.

      First and foremost, we would like to thank all the editors and reviewers for their thoughtful and thorough evaluations of our manuscript. We greatly appreciate their assessment about the novelty and strength in this study and have revised the manuscript according to their recommendations. Below are our detailed responses and revisions based on the reviewer recommendations.

      Reviewer #1 (Recommendations For The Authors):

      1) It is unclear the rationale for choosing the P35-42 adolescent window for stimulating the mesofrontal dopamine system.

      The dopaminergic innervation in the mesofrontal circuit exhibits a protracted maturation from P21 to P56 (Kalsbeek, Voorn et al. 1988, Niwa, Kamiya et al. 2010, Naneix, Marchand et al. 2012, Hoops and Flores 2017). P35-42 is in the center of this period and captures the mid-adolescent stage in rodents (Spear 2000). We have previously shown that increasing dopamine neuron activity by wheel running or optogenetic stimulation during this period, but not adulthood, can induce formation of mesofrontal dopaminergic boutons and enhance mesofrontal circuit activity in wild-type mice (Mastwal, Ye et al. 2014). We therefore chose the P35-P42 adolescent window to stimulate the mesofrontal dopamine circuit and test the long-term effect of this intervention on the frontal circuit and memory-guided decision-making deficits in mutant mice. We have detailed this rationale in the revised manuscript when we first introduced this intervention.

      2). Please provide a justification for choosing the optical recording M2 neuronal activity instead of the prelimbic prefrontal cortex, which has been known to show the highest levels of dopamine terminals.

      While the prelimbic area has the highest level of dopamine terminals among frontal cortical regions, a robust presence of dopaminergic terminals and dopamine release in the M2 frontal cortex have been well documented (Berger, Gaspar et al. 1991, Mastwal, Ye et al. 2014, Aransay, Rodriguez-Lopez et al. 2015, Patriarchi, Cho et al. 2018). The M2 cortex plays an important role in action planning, generating the earliest neural signals among frontal cortical regions that are related to upcoming choice during spatial navigation (Sul, Kim et al. 2010, Sul, Jo et al. 2011). Our chemogenetic inactivation experiments (Supplementary Fig 1) has further confirmed the involvement of M2 in the memory-guided Y-maze navigation task used in this study. Technically, M2 has the advantage of being more amendable to optical recording of neuronal activity without the tissue damage caused by implanting a lens, which would be necessary for deeper areas such as the prelimbic cortex. We have provided this justification in the revised manuscript.

      3). What was the rationale for using the 3-day chemogenetic stimulation paradigm?

      Our previous work in wild-type adolescent mice showed that a single optogenetic stimulation session or a 2-hr wheel running session is sufficient to induce bouton formation in mesofrontal dopaminergic axons (Mastwal, Ye et al. 2014). In this study, we sought to rescue existing structural and functional deficits in the mesofrontal dopaminergic circuits due to genetic mutations. Because previous studies suggested that an optimal level of dopamine is important for normal cognitive function (Arnsten, Cai et al. 1994, Robbins 2000, Floresco 2013), we elected to do multiple stimulation sessions to boost the potential rescue effects. We tested both a 3-day and a 3-week stimulation paradigm, and found that the 3-day, but not the 3-week paradigm led to robust functional improvement (Fig. 5). These results indicate that moderate but not excessive stimulation of dopamine neurons can provide functional improvement of a deficient mesofrontal circuit. We have revised our text to clarify the rationale for these experiments.

      4). A major maturational event occurring in the prefrontal cortex is the gain of local GABAergic transmission, which is crucial for sustaining proper levels of Y-maze tasks. I am wondering if the authors have any thoughts about what is really happening at the postsynaptic level following adolescent dopamine stimulation.

      The developmental increases in dopaminergic innervation to the frontal cortex and local GABAergic transmission are likely synergistic processes, which both contribute to the maturation of high-order cognitive functions supported by the frontal cortex (Caballero and Tseng 2016, Larsen and Luna 2018). Previous electrophysiological studies have suggested that dopamine can act on five different receptors expressed in both excitatory and inhibitory postsynaptic neurons (Seamans and Yang 2004, Tseng and O'Donnell 2007, O'Donnell 2010). At the network level, dopaminergic signaling can increase the signal-to-noise ratio and temporal synchrony of neural activity during cognitive tasks (Rolls, Loh et al. 2008, Vander Weele, Siciliano et al. 2018, Lohani, Martig et al. 2019). As the frontal GABAergic inhibitory network undergoes major functional remodeling during adolescence (Caballero and Tseng 2016), adolescent stimulation of dopamine neurons may interact with this maturational process to promote a network configuration conducive for synchronous and high signal-to-noise neural computation (Porter, Rizzo et al. 1999, Murty, Calabro et al. 2016, Mukherjee, Carvalho et al. 2019). The microcircuit mechanisms underlying adolescent dopamine stimulation induced changes, particularly in the GABAergic inhibitory neurons, will be an exciting direction for future research. We have extended our discussion about these points in the revised manuscript.

      5). A change in the density of dopamine boutons is unlikely to be limited to the M2 region in Arc-/- mice. The authors should provide some data illustrating that similar changes are widespread across the medial prefrontal cortex, and that the optical recording in the M2 region was preferred for technical limitations and to avoid damaging areas in the frontal cortex.

      As discussed above, this study focused on the M2 region of the frontal cortex because it is functionally required for memory-guided Y-maze navigation, generates behavioral choice-related neural signals during spatial navigation, and is optically most accessible. The medial prefrontal regions (anterior cingulate, prelimbic and infralimbic) ventral to M2 also receive dense dopaminergic innervation and can act in concert with M2 in decision making (Sul, Kim et al. 2010, Sul, Jo et al. 2011, Barthas and Kwan 2017). As dopaminergic innervations to the frontal cortical regions progress in a ventral-to-dorsal direction during development (Kalsbeek, Voorn et al. 1988, Hoops and Flores 2017), how the changes induced by adolescent dopamine stimulation may proceed spatial-temporally across different frontal subregions requires more extensive investigation in the future. We have added this discussion into the revised manuscript.

      Reviewer #2 (Public Review):

      The manuscript by Mastwal and colleagues explores how transient adolescent stimulation of ventral midbrain neurons that project to the frontal cortex may help to improve performance on certain memory tasks. The manuscript provides an interesting set of observations that DREADD-based activation over only 3 days during adolescence provides a fast-acting and long-lasting improvement in performance on Y-maze spontaneous alternation as well as aspects of neuronal function as assessed using in vivo imaging methods. While interesting, there are several weaknesses. First and foremost, it is not clear that the effects the authors are observing are mediated by dopamine. It has been clearly documented that the DAT-Cre line provides a better representation of midbrain dopamine cells in the mouse, particularly near the midline of the ventral midbrain (Lammel et al., Neuron 2015). This is precisely where the cells that project to the frontal cortex are located. Therefore, the selection of TH-Cre is problematic. It is very likely that the authors are labeling a substantial number of non-dopaminergic cells.

      We agree with Review 2 that the DAT-Cre line can provide specific labeling of midbrain dopamine neurons, particularly those projecting to the striatum, as discussed in the cited study (Lammel, Steinberg et al. 2015). DAT transports the extracellularly released dopamine back into presynaptic terminals, but it is not essential for dopamine synthesis and release (Sulzer, Cragg et al. 2016). Mesocortical dopamine neurons in the ventral tegmental area (VTA) express very little DAT (Sesack, Hawrylak et al. 1998, Lammel, Hetzel et al. 2008, Li, Qi et al. 2013), which limits the use of the DAT-Cre line to target these neurons (Lammel, Steinberg et al. 2015). Because mesocortical dopamine neurons have strong expression of TH, a key enzyme involved in dopamine synthesis, TH-Cre lines have been extensively used to study the mesocortical pathway (Lammel, Lim et al. 2012, Gunaydin, Grosenick et al. 2014, Ellwood, Patel et al. 2017, Vander Weele, Siciliano et al. 2018, Lohani, Martig et al. 2019). We provide more details below about our rationales for using TH-Cre rather than DAT-Cre mice in our study and the revisions we made in response to the reviewer’s specific recommendations.

      Reviewer #2 (Recommendations For The Authors):

      1). The authors should rigorously demonstrate that there is a reasonable midbrain DA projection to the coordinates that they are assessing and that their effects are due to DA release from these cells. It is not clear that there is a VTA dopaminergic projection to M2 - it does not appear for example in the Allen Mouse Brain Connectivity Atlas (https://connectivity.brainmap.org/projection/experiment/siv/160540751? imageId=160541123&imageType=TWO_PHOTON,SEGMENTATION&initImage=TWO_PHOTON&x=17321&y=15284&z=3). Though there is a projection to the mPFC, at the coordinates the authors report, there does not appear to be any signal from DAT-Cre mice. However, there is much more signal when expression is not restricted to dopamine cells (https://connectivity.brain-map.org/projection/experiment/siv/165975096? imageId=165975158&imageType=TWO_PHOTON,SEGMENTATION&initImage=TWO_PHOTON&x=17950&y=11504&z=3). The argument that these cells may express less TH is not relevant for this particular issue. Therefore, it is possible that the vast majority of observed effects are not in fact mediated by dopamine but another neurotransmitter such as glutamate. While the experiment using SCH23390 does suggest DA receptors may be involved, this result in isolation doesn't alleviate this caveat - there can be, for example, DA release from NE cells (e.g., Takeuchi et al., Nature 2016). While this does not entirely invalidate the authors' results, as their effects of stimulation of ventral midbrain cells to the forebrain don't necessarily have to occur via dopamine - the mechanism by how this is occurring needs to be clear.

      While the prelimbic area has the highest level of dopaminergic terminals among frontal cortical regions, a robust presence of midbrain dopaminergic projections and dopamine release in the M2 frontal cortex have been well established by immunostaining, viral labeling, single-cell axon-tracing, and in vivo imaging of recently developed dopamine biosensors (Berger, Gaspar et al. 1991, Mastwal, Ye et al. 2014, Aransay, Rodriguez-Lopez et al. 2015, Ye, Mastwal et al. 2017, Patriarchi, Cho et al. 2018). It has also been reported repeatedly that mesocortical dopamine neurons in the VTA express very little DAT, which is different from mesostriatal dopamine neurons (Sesack, Hawrylak et al. 1998, Lammel, Hetzel et al. 2008, Li, Qi et al. 2013). This limitation in the use of the DAT-Cre line to target mesocortical dopamine neurons has been acknowledged in previous studies (Lammel, Steinberg et al. 2015) and is consistent with the reviewer’s observation of DAT-Cre labeling in the Allen Brain Mouse Connectivity atlas. Additionally, and interestingly, recent extensive evaluation of the DAT-Cre line reported ectopic labeling of multiple non-dopaminergic neuronal populations (Soden, Miller et al. 2016, Stagkourakis, Spigolon et al. 2018, Papathanou, Dumas et al. 2019). Our own evaluation of the DAT-Cre line’s utility for cortical imaging also revealed sparse axonal labeling and sporadic ectopic labeling of cortical cell somas. We have included representative DAT-Cre images in Author response image 1 to highlight the limitations of this line in the study of the dopaminergic mesocortical circuit.

      Author response image 1.

      Example images from DAT-Cre/Ai14 mice. Left most panel shows little axonal labeling in Layer 5/6 of M2. The center panel shows sparse axonal label in Layer 1/2 of M2, but also ectopic labeling of cell soma. The right panel shows a lack of labeling in L1/2 of prelimbic cortex as well. Scale bars 50um.

      We as well as others have confirmed that TH immunoreactivity in the frontal cortex can label dopaminergic axons originated from the VTA, and ablation of VTA dopaminergic neurons removes this labeling (Niwa, Jaaro-Peled et al. 2013, Ye, Mastwal et al. 2017). Because mesocortical dopamine neurons have much stronger TH expression than DAT expression (Sesack, Hawrylak et al. 1998, Lammel, Hetzel et al. 2008, Li, Qi et al. 2013, Lammel, Steinberg et al. 2015), TH-Cre lines have been frequently used to label these neurons and study the mesocortical pathway (Lammel, Lim et al. 2012, Gunaydin, Grosenick et al. 2014, Ellwood, Patel et al. 2017, Vander Weele, Siciliano et al. 2018, Lohani, Martig et al. 2019). While TH-Cre expression itself is not restricted to dopaminergic neurons, we targeted our viral injections to the VTA and optogenetic stimulation to the cortical dopaminergic projection target area in M2 (Patriarchi, Cho et al. 2018) to specifically modulate mesofrontal dopaminergic axons. In addition, we tested D1 antagonist’s effects in our manipulations. Although we targeted dopamine neurons in our adolescent stimulation, the final behavioral outcome likely includes contributions from co-released neurotransmitters such as glutamate and non-dopaminergic neurons via network effects (Morales and Margolis 2017, Lohani, Martig et al. 2019), which will be interesting directions for future research. We have revised our results and discussion sections to highlight our rationales for using the TH-Cre line and the open mechanistic questions for future studies.

      2) SSFOs don't increase excitability like DREADDs, but rather, cause long-lasting hyperactivity through continuous passage of cations. What the actual firing properties are of these cells over a long period of time is not clear.

      We did not measure the precise firing patterns of the dopaminergic neurons targeted by SSFOs but evaluated the effects of SSFO activation on the frontal cortex. Similar to our DREADD-Gq mediated activity changes in the mesofrontal circuit, we found increased frontal cortical activity post-light stimulation of frontal dopamine axons in our SSFO treated animals (Fig 6a-c, S6e). While quantitatively the firing patterns of DREADD-Gq and SSFO activated dopaminergic neurons likely differ, qualitatively both of these manipulations lead to increased mesofrontal circuit activity and improvements in cognitive behaviors. In our previous work with wild-type adolescent mice, both wheel running and a single 10-min session of phasic optogenetic stimulation of the VTA resulted in dopaminergic bouton outgrowth in the frontal cortex (Mastwal, Ye et al. 2014). Taken together, these results suggest that adolescent dopaminergic mesofrontal projections are highly responsive to neural activity changes and a variety of adolescent stimulation paradigms are sufficient to elicit lasting changes in this circuit. We have added this discussion of the limitations and implications of our study into the revised manuscript.

      3) It is not clear what the increase in boutons means, given that DA release is thought to largely occur via non-synaptic release.

      Although many of dopamine boutons are not associated with defined postsynaptic structures, these axonal boutons and the active zones they contain are the major release sites for dopamine (Goldman-Rakic, Leranth et al. 1989, Arbuthnott and Wickens 2007, Sulzer, Cragg et al. 2016, Liu, Goel et al. 2021). Past studies have established a consistent association between increased dopaminergic innervation in the frontal cortex and an increase in dopamine levels (Niwa, Kamiya et al. 2010, Naneix, Marchand et al. 2012). Our previous work also found that increasing dopaminergic boutons through adolescent VTA stimulation led to prolonged frontal local field potential responses with high-frequency oscillations (Mastwal, Ye et al. 2014), which is characteristic of increased dopaminergic signaling (Lewis and O'Donnell 2000, Gireesh and Plenz 2008, Wood, Kim et al. 2012, Lohani, Martig et al. 2019). Importantly, in our quantification of the structural changes in this study, we evaluated boutons which were labeled with synaptophysin, a molecular marker indicating the presence of synaptic vesicle release machinery (Li, Tasic et al. 2010, Oh, Harris et al. 2014). Thus, our study, taken in the context of the previous work, suggests the increased number of boutons signifying an increase in dopaminergic signaling within the mesofrontal circuit. We have added this discussion into the revised manuscript.

      4) The use of Arc and DISC mutants as models of schizophrenia is perhaps a bit overstated - while deficits in prefrontal innervation certainly occur, there are many differences between these models and the human disease states. Language should be toned down accordingly, particularly in the introduction.

      We strived to avoid overstating the extent to which the mouse lines are models for specific diseases, but we can appreciate that this may not have been clear in our original writing. We have adjusted our language to better distinguish between the utility of the animal models for the purposes of our study and their relationship to specific human disease states. Particularly in the introduction, we stated that: “Genetic disruptions of several genes involved in synaptic functions related to psychiatric disorders, such as Arc and DISC1, lead to hypoactive mesofrontal dopaminergic input in mice (Niwa, Kamiya et al. 2010, Niwa, Jaaro-Peled et al. 2013, Fromer, Pocklington et al. 2014, Purcell, Moran et al. 2014, Wen, Nguyen et al. 2014, Manago, Mereu et al. 2016). Although there are many differences between these mouse lines and specific human disease states, these mice offer opportunities to test whether genetic deficits in frontal cortex function can be reversed through circuit interventions.”

      5) Some experiments are missing proper controls, e.g., Figure 3g-I where a WT mouse should be used as a positive control.

      The goal of this experimental design (Fig 3g-i) was to evaluate the potential effects of chemogenetic VTA stimulation in the Arc-/- mice. We used Arc-/- mice with mCherry injections to control for the potential effects of CNO administration. While WT mice could be used to determine if adolescent VTA stimulation would lead to long-lasting enhancement of VTA-to-Cortical transmission, this wouldn’t necessarily be a positive control for our experiments, but rather a separate line of inquiry. As dopamine’s effects often display an inverted-U dose-response curve (Vijayraghavan, Wang et al. 2007, Floresco 2013), evaluating the effects adolescent VTA stimulation in the absence of underlying dopamine deficiency could be an interesting future research direction. We have added this discussion into the revised manuscript.

      Reviewer #3 (Recommendations For The Authors):

      1) Did the SSFO stimulation of the TH+ axons in PFC during adolescence lead to the same long-term change in DA bouton number the authors saw with DREADDs?

      We did not examine the degree of bouton growth in the SSFO cohort, which is a limitation of this study. Accurate quantification of dopamine boutons requires the co-injection of another AAV vector encoding Synaptophysin-GFP to label the boutons. Because we used light to directly stimulate SSFO-labeled dopaminergic axons in the frontal cortex, we were concerned that co-injecting another AAV vector may dilute SSFO-labeling of axons and reduce the efficacy of optogenetic stimulation. Given the behavioral benefits we observed, we would expect an increase in bouton density after optogenetic stimulation. A systematic optimization of viral co-labeling and optogenetic stimulation protocols will facilitate examination of the impact of SSFO stimulation at the structural level in future studies. We have added a discussion of the limitation of this study in the revised manuscript.

      2) The DISC1 section is far less detailed than the Arc section, and it was not completely clear to me that the mechanisms of dysfunction and rescue were the same in these mice compared with the Arc mice. For example, there was no mention of DA bouton density or the patterned firing of the PFC neurons at the time of decision making.

      The initial motivation of this study was to test if adolescent dopamine stimulation can rescue the deficits in the mesofrontal dopaminergic circuit and cognitive function of Arc-/- mice, which were identified in our previous studies (Manago, Mereu et al. 2016). We first conducted multiple levels of analyses including viral tracing, in vivo calcium imaging, and behavioral tests to establish the coherent impacts of adolescent dopamine neuron stimulation on circuits and behaviors. We then examined a range of stimulation protocols to assess the efficacy requirements for cognitive improvement, which is our primary goal. Finally, we included DISC1 mice in our study to test if adolescent dopamine stimulation can also reverse the cognitive deficit in another genetic model for mesofrontal dopamine deficiency. By demonstrating a similar cognitive recuse effect of adolescent VTA stimulation in an independent mouse model, this study provides a foundation for future research to compare the detailed cellular mechanisms that underlie the functional rescue in different genetic models. We have added the discussion of the scope and limitation of this study to the revised manuscript.

      References

      Aransay, A., C. Rodriguez-Lopez, M. Garcia-Amado, F. Clasca and L. Prensa (2015). "Long-range projection neurons of the mouse ventral tegmental area: a single-cell axon tracing analysis." Front Neuroanat 9: 59.

      Arbuthnott, G. W. and J. Wickens (2007). "Space, time and dopamine." Trends Neurosci 30(2): 62-69.

      Arnsten, A. F., J. X. Cai, B. L. Murphy and P. S. Goldman-Rakic (1994). "Dopamine D1 receptor mechanisms in the cognitive performance of young adult and aged monkeys." Psychopharmacology (Berl) 116(2): 143-151.

      Barthas, F. and A. C. Kwan (2017). "Secondary motor cortex: where ‘sensory’meets ‘motor’in the rodent frontal cortex." Trends in neurosciences 40(3): 181-193.

      Berger, B., P. Gaspar and C. Verney (1991). "Dopaminergic innervation of the cerebral cortex: unexpected differences between rodents and primates." Trends Neurosci 14(1): 21-27.

      Caballero, A. and K. Y. Tseng (2016). "GABAergic Function as a Limiting Factor for Prefrontal Maturation during Adolescence." Trends Neurosci 39(7): 441-448.

      Ellwood, I. T., T. Patel, V. Wadia, A. T. Lee, A. T. Liptak, K. J. Bender and V. S. Sohal (2017). "Tonic or Phasic Stimulation of Dopaminergic Projections to Prefrontal Cortex Causes Mice to Maintain or Deviate from Previously Learned Behavioral Strategies." J Neurosci 37(35): 8315-8329.

      Floresco, S. B. (2013). "Prefrontal dopamine and behavioral flexibility: shifting from an "inverted-U" toward a family of functions." Front Neurosci 7: 62.

      Fromer, M., A. J. Pocklington, D. H. Kavanagh, H. J. Williams, S. Dwyer, P. Gormley, L. Georgieva, E. Rees, P. Palta, D. M. Ruderfer, N. Carrera, I. Humphreys, J. S. Johnson, P. Roussos, D. D. Barker, E. Banks, V. Milanova, S. G. Grant, E. Hannon, S. A. Rose, K. Chambert, M. Mahajan, E. M. Scolnick, J. L. Moran, G. Kirov, A. Palotie, S. A. McCarroll, P. Holmans, P. Sklar, M. J. Owen, S. M. Purcell and M. C. O'Donovan (2014). "De novo mutations in schizophrenia implicate synaptic networks." Nature 506(7487): 179-184.

      Gireesh, E. D. and D. Plenz (2008). "Neuronal avalanches organize as nested theta- and beta/gamma-oscillations during development of cortical layer 2/3." Proc Natl Acad Sci U S A 105(21): 7576-7581.

      Goldman-Rakic, P. S., C. Leranth, S. M. Williams, N. Mons and M. Geffard (1989). "Dopamine synaptic complex with pyramidal neurons in primate cerebral cortex." Proc Natl Acad Sci U S A 86(22): 9015-9019.

      Gunaydin, L. A., L. Grosenick, J. C. Finkelstein, I. V. Kauvar, L. E. Fenno, A. Adhikari, S. Lammel, J. J. Mirzabekov, R. D. Airan, K. A. Zalocusky, K. M. Tye, P. Anikeeva, R. C. Malenka and K. Deisseroth (2014). "Natural neural projection dynamics underlying social behavior." Cell 157(7): 1535-1551.

      Hoops, D. and C. Flores (2017). "Making Dopamine Connections in Adolescence." Trends Neurosci 40(12): 709-719.

      Kalsbeek, A., P. Voorn, R. M. Buijs, C. W. Pool and H. B. Uylings (1988). "Development of the dopaminergic innervation in the prefrontal cortex of the rat." J Comp Neurol 269(1): 58-72.

      Lammel, S., A. Hetzel, O. Hackel, I. Jones, B. Liss and J. Roeper (2008). "Unique properties of mesoprefrontal neurons within a dual mesocorticolimbic dopamine system." Neuron 57(5): 760-773.

      Lammel, S., A. Hetzel, O. Haeckel, I. Jones, B. Liss and J. Roeper (2008). "Unique properties of mesoprefrontal neurons within a dual mesocorticolimbic dopamine system." Neuron 57(5): 760-773.

      Lammel, S., B. K. Lim, C. Ran, K. W. Huang, M. J. Betley, K. M. Tye, K. Deisseroth and R. C. Malenka (2012). "Input-specific control of reward and aversion in the ventral tegmental area." Nature 491(7423): 212-217.

      Lammel, S., E. E. Steinberg, C. Foldy, N. R. Wall, K. Beier, L. Luo and R. C. Malenka (2015). "Diversity of transgenic mouse models for selective targeting of midbrain dopamine neurons." Neuron 85(2): 429-438.

      Larsen, B. and B. Luna (2018). "Adolescence as a neurobiological critical period for the development of higher-order cognition." Neurosci Biobehav Rev 94: 179-195.

      Lewis, B. L. and P. O'Donnell (2000). "Ventral tegmental area afferents to the prefrontal cortex maintain membrane potential 'up' states in pyramidal neurons via D(1) dopamine receptors." Cereb Cortex 10(12): 1168-1175.

      Li, L., B. Tasic, K. D. Micheva, V. M. Ivanov, M. L. Spletter, S. J. Smith and L. Luo (2010). "Visualizing the distribution of synapses from individual neurons in the mouse brain." PLoS One 5(7): e11503.

      Li, X., J. Qi, T. Yamaguchi, H. L. Wang and M. Morales (2013). "Heterogeneous composition of dopamine neurons of the rat A10 region: molecular evidence for diverse signaling properties." Brain Struct Funct 218(5): 1159-1176.

      Liu, C., P. Goel and P. S. Kaeser (2021). "Spatial and temporal scales of dopamine transmission." Nat Rev Neurosci 22(6): 345-358.

      Lohani, S., A. K. Martig, K. Deisseroth, I. B. Witten and B. Moghaddam (2019). "Dopamine Modulation of Prefrontal Cortex Activity Is Manifold and Operates at Multiple Temporal and Spatial Scales." Cell Rep 27(1): 99-114 e116.

      Manago, F., M. Mereu, S. Mastwal, R. Mastrogiacomo, D. Scheggia, M. Emanuele, M. A. De Luca, D. R. Weinberger, K. H. Wang and F. Papaleo (2016). "Genetic Disruption of Arc/Arg3.1 in Mice Causes Alterations in Dopamine and Neurobehavioral Phenotypes Related to Schizophrenia." Cell Rep 16(8): 2116-2128.

      Mastwal, S., Y. Ye, M. Ren, D. V. Jimenez, K. Martinowich, C. R. Gerfen and K. H. Wang (2014). "Phasic dopamine neuron activity elicits unique mesofrontal plasticity in adolescence." J Neurosci 34(29): 9484-9496.

      Morales, M. and E. B. Margolis (2017). "Ventral tegmental area: cellular heterogeneity, connectivity and behaviour." Nat Rev Neurosci 18(2): 73-85.

      Mukherjee, A., F. Carvalho, S. Eliez and P. Caroni (2019). "Long-Lasting Rescue of Network and Cognitive Dysfunction in a Genetic Schizophrenia Model." Cell 178(6): 1387-1402 e1314. Murty, V. P., F. Calabro and B. Luna (2016). "The role of experience in adolescent cognitive development: Integration of executive, memory, and mesolimbic systems." Neurosci Biobehav Rev 70: 46-58.

      Naneix, F., A. R. Marchand, G. Di Scala, J. R. Pape and E. Coutureau (2012). "Parallel maturation of goal-directed behavior and dopaminergic systems during adolescence." J Neurosci 32(46): 16223-16232.

      Niwa, M., H. Jaaro-Peled, S. Tankou, S. Seshadri, T. Hikida, Y. Matsumoto, N. G. Cascella, S. Kano, N. Ozaki, T. Nabeshima and A. Sawa (2013). "Adolescent stress-induced epigenetic control of dopaminergic neurons via glucocorticoids." Science 339(6117): 335-339.

      Niwa, M., A. Kamiya, R. Murai, K. Kubo, A. J. Gruber, K. Tomita, L. Lu, S. Tomisato, H. Jaaro-Peled, S. Seshadri, H. Hiyama, B. Huang, K. Kohda, Y. Noda, P. O'Donnell, K. Nakajima, A. Sawa and T. Nabeshima (2010). "Knockdown of DISC1 by in utero gene transfer disturbs postnatal dopaminergic maturation in the frontal cortex and leads to adult behavioral deficits." Neuron 65(4): 480-489.

      O'Donnell, P. (2010). "Adolescent maturation of cortical dopamine." Neurotox Res 18(3-4): 306-312.

      Oh, S. W., J. A. Harris, L. Ng, B. Winslow, N. Cain, S. Mihalas, Q. Wang, C. Lau, L. Kuan, A. M. Henry, M. T. Mortrud, B. Ouellette, T. N. Nguyen, S. A. Sorensen, C. R. Slaughterbeck, W. Wakeman, Y. Li, D. Feng, A. Ho, E. Nicholas, K. E. Hirokawa, P. Bohn, K. M. Joines, H. Peng, M. J. Hawrylycz, J. W. Phillips, J. G. Hohmann, P. Wohnoutka, C. R. Gerfen, C. Koch, A. Bernard, C. Dang, A. R. Jones and H. Zeng (2014). "A mesoscale connectome of the mouse brain." Nature 508(7495): 207-214.

      Papathanou, M., S. Dumas, H. Pettersson, L. Olson and A. Wallen-Mackenzie (2019). "Off-Target Effects in Transgenic Mice: Characterization of Dopamine Transporter (DAT)-Cre Transgenic Mouse Lines Exposes Multiple Non-Dopaminergic Neuronal Clusters Available for Selective Targeting within Limbic Neurocircuitry." eNeuro 6(5).

      Patriarchi, T., J. R. Cho, K. Merten, M. W. Howe, A. Marley, W. H. Xiong, R. W. Folk, G. J. Broussard, R. Liang, M. J. Jang, H. Zhong, D. Dombeck, M. von Zastrow, A. Nimmerjahn, V. Gradinaru, J. T. Williams and L. Tian (2018). "Ultrafast neuronal imaging of dopamine dynamics with designed genetically encoded sensors." Science 360(6396): 1420-+.

      Porter, L. L., E. Rizzo and J. P. Hornung (1999). "Dopamine affects parvalbumin expression during cortical development in vitro." J Neurosci 19(20): 8990-9003.

      Purcell, S. M., J. L. Moran, M. Fromer, D. Ruderfer, N. Solovieff, P. Roussos, C. O'Dushlaine, K. Chambert, S. E. Bergen, A. Kahler, L. Duncan, E. Stahl, G. Genovese, E. Fernandez, M. O. Collins, N. H. Komiyama, J. S. Choudhary, P. K. Magnusson, E. Banks, K. Shakir, K. Garimella, T. Fennell, M. DePristo, S. G. Grant, S. J. Haggarty, S. Gabriel, E. M. Scolnick, E. S. Lander, C. M. Hultman, P. F. Sullivan, S. A. McCarroll and P. Sklar (2014). "A polygenic burden of rare disruptive mutations in schizophrenia." Nature 506(7487): 185-190.

      Robbins, T. W. (2000). "Chemical neuromodulation of frontal-executive functions in humans and other animals." Exp Brain Res 133(1): 130-138.

      Rolls, E. T., M. Loh, G. Deco and G. Winterer (2008). "Computational models of schizophrenia and dopamine modulation in the prefrontal cortex." Nat Rev Neurosci 9(9): 696-709.

      Seamans, J. K. and C. R. Yang (2004). "The principal features and mechanisms of dopamine modulation in the prefrontal cortex." Prog Neurobiol 74(1): 1-58.

      Sesack, S. R., V. A. Hawrylak, C. Matus, M. A. Guido and A. I. Levey (1998). "Dopamine axon varicosities in the prelimbic division of the rat prefrontal cortex exhibit sparse immunoreactivity for the dopamine transporter." J Neurosci 18(7): 2697-2708.

      Soden, M. E., S. M. Miller, L. M. Burgeno, P. E. M. Phillips, T. S. Hnasko and L. S. Zweifel (2016). "Genetic Isolation of Hypothalamic Neurons that Regulate Context-Specific Male Social Behavior." Cell Rep 16(2): 304-313.

      Spear, L. (2000). "Modeling adolescent development and alcohol use in animals." Alcohol Res Health 24(2): 115-123.

      Stagkourakis, S., G. Spigolon, P. Williams, J. Protzmann, G. Fisone and C. Broberger (2018). "A neural network for intermale aggression to establish social hierarchy." Nat Neurosci 21(6): 834-842. Sul, J. H., S. Jo, D. Lee and M. W. Jung (2011). "Role of rodent secondary motor cortex in value-based action selection." Nat Neurosci 14(9): 1202-1208.

      Sul, J. H., H. Kim, N. Huh, D. Lee and M. W. Jung (2010). "Distinct roles of rodent orbitofrontal and medial prefrontal cortex in decision making." Neuron 66(3): 449-460.

      Sulzer, D., S. J. Cragg and M. E. Rice (2016). "Striatal dopamine neurotransmission: regulation of release and uptake." Basal Ganglia 6(3): 123-148.

      Tseng, K. Y. and P. O'Donnell (2007). "Dopamine modulation of prefrontal cortical interneurons changes during adolescence." Cereb Cortex 17(5): 1235-1240.

      Vander Weele, C. M., C. A. Siciliano, G. A. Matthews, P. Namburi, E. M. Izadmehr, I. C. Espinel, E. H. Nieh, E. H. S. Schut, N. Padilla-Coreano, A. Burgos-Robles, C. J. Chang, E. Y. Kimchi, A. Beyeler, R. Wichmann, C. P. Wildes and K. M. Tye (2018). "Dopamine enhances signal-to-noise ratio in cortical-brainstem encoding of aversive stimuli." Nature 563(7731): 397-401.

      Vijayraghavan, S., M. Wang, S. G. Birnbaum, G. V. Williams and A. F. Arnsten (2007). "Inverted-U dopamine D1 receptor actions on prefrontal neurons engaged in working memory." Nat Neurosci 10(3): 376-384.

      Wen, Z., H. N. Nguyen, Z. Guo, M. A. Lalli, X. Wang, Y. Su, N. S. Kim, K. J. Yoon, J. Shin, C. Zhang, G. Makri, D. Nauen, H. Yu, E. Guzman, C. H. Chiang, N. Yoritomo, K. Kaibuchi, J. Zou, K. M. Christian, L. Cheng, C. A. Ross, R. L. Margolis, G. Chen, K. S. Kosik, H. Song and G. L. Ming (2014). "Synaptic dysregulation in a human iPS cell model of mental disorders." Nature 515(7527): 414-418.

      Wood, J., Y. Kim and B. Moghaddam (2012). "Disruption of prefrontal cortex large scale neuronal activity by different classes of psychotomimetic drugs." J Neurosci 32(9): 3022-3031.

      Ye, Y., S. Mastwal, V. Y. Cao, M. Ren, Q. Liu, W. Zhang, A. G. Elkahloun and K. H. Wang (2017). "Dopamine is Required for Activity-Dependent Amplification of Arc mRNA in Developing Postnatal Frontal Cortex." Cereb Cortex 27(7): 3600-3608.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer 1:

      The main weaknesses of the paper are a lack of significance in key findings, and relatedly, concluding effects from insignificant findings. Additional elements could be improved to help strengthen this overall well-rounded and intriguing set of results.

      In the original manuscript, we reported that chemogenetic silencing of POA-social neurons (previously called POA-iso neurons; more details on rationale for renaming below in our responses to reviewer recommendations) tended to reduce mounting in both single-housed female and single-housed male mice, although these effects were non-significant. We have added samples to both datasets and now report that chemogenetic silencing of POA-social neurons significantly reduces the proportion of trials with mounting in both sexes (Fig. 2C and Fig. 6G). 

      We have also included new analyses to test whether optogenetic activation of POAsocial neurons in group-housed females promotes social investigation (in addition to USV production, as reported in the original manuscript). We now report that optogenetic activation of POA-social neurons significantly increases the probability of social investigation (Fig. 4E-F) and significantly increases the duration of social investigation bouts (Fig. 4G). 

      Additional recommendations from the reviewer are addressed in detail below. Thank you for your critical and insightful feedback.

      Reviewer 2:

      All the activity-dependent labeling experiments with TRAP mice, including the subsequent neural activity manipulation experiments (Figures 2, 3, 4, 5E-F), were conducted by labeling neurons only in socially isolated animals, not group-housed animals. The authors labeled neurons after 30-minute social interactions, raising the possibility that the labeled neurons simply represent a "social interaction/behavior population" (mediating mounting and USVs in females and males) rather than a set of neurons specific to social isolation.

      I strongly recommend including experimental groups that involve labeling neurons after 30minute social interactions in group-housed female or male mice and inhibit TRAPed neurons after social isolation or activate TRAPed neurons after group housing. If manipulating the grouphoused TRAP neurons has similar effects to manipulating the isolated TRAP neurons, it would suggest the current labeling paradigm is not isolating neurons specific to the effect of social isolation per se. Rather, the neurons may mediate more general social interaction or motivationrelated activities. Given the known role of POA in male mating behavior, a group-housed TRAP experiment in males with a female visitor is especially important for understanding the selectivity of the labeled cells.

      Without proper controls, referring to the labeled neurons as "POAiso" neurons is potentially misleading. The data thus far suggests these neurons may predominantly reflect a "POA social behavior" population rather than a set of cells distinctly responsive to isolated housing.

      We agree with the reviewer that the POA neurons we are studying regulate the production of social behaviors in females and males, rather than representing a set of cells distinctly responsive to single housing. To more clearly reflect our thinking, we have changed the name of the neurons from “POA-iso neurons” to “POA-social neurons”. Thank you for this helpful criticism.

      Our Fos data are consistent with the idea that the POA may regulate social behaviors in group-housed females (not just single-housed females). Namely, we found that counts of Fospositive POA neurons are significantly related to rates of social investigation (p = 0.01) and tend to be related to USV rates (p = 0.05) in group-housed females that engaged in same-sex interactions (Fig. S1C). We now include two new sets of experiments aimed at further testing the idea this idea. 

      First, we include 2 control groups in which TRAPing sessions were performed in grouphoused females following same-sex interactions. We find that chemogenetic silencing of grouphoused-TRAPed POA neurons fails to reduce social behaviors in females that are subsequently single-housed and given a same-sex social interaction (Fig. 5A-D), and that optogenetic activation of group-housed-TRAPed POA neurons fails to promote female social behavior (Fig. 5E-H). At face value, these findings do not support the idea that the POA contains neurons that regulate social behaviors in group-housed females.

      However, one important caveat is that group-housed females engage in low rates of social behaviors (low investigation time, no mounting, and few USVs), and thus TRAP-based labeling may not work efficaciously in these mice. There may be POA neurons that regulate social behaviors in group-housed females but that do not upregulate Fos following production of relatively low rates of social behaviors. To test this idea, we also include females in which POA neurons are chemogenetically silenced using a viral strategy that does not depend on activitydependent labeling. In this new experiment, we report that silencing of POA neurons significantly reduces USV production in group-housed females (Fig. 5J-L) and significantly reduces social investigation, mounting, and USV production when these same females are retested following single-housing (Fig. 5M-O). Together, these experiments suggest that the POA may regulate the production of social behaviors during same-sex interactions in group-housed females, but that these effects may be difficult to detect in some cases given the low rates at which group-housed females engage in social behaviors during same-sex interactions relative to single-housed females.

      Finally, we want to highlight an additional new dataset that supports the idea that POAsocial neurons regulate social behaviors, rather than encoding the “state” of social isolation. We now include a control group for the chemogenetic silencing of female POA-social neurons, in which females were single-housed but were not given a social interaction prior to 4-OHT treatment (N = 5 non-social controls). Rates of social behaviors were subsequently unaffected following CNO delivery in these females (Fig. S2D-G). These new data support the conclusion that POA-social neurons regulate the production of social behaviors, rather than encoding the state of social isolation. 

      Reviewer 3:

      While the authors should be commended for performing and reporting multiple circuit perturbation experiments (e.g., chemogenetics, ablation), the conflicting effects on behavior are hard to interpret without additional experiments. For example, chemogenetic silencing of the POA neurons (using DREADDs) attenuated all three behavioral measures but the ablation of the same POA neurons (using CASPACE) decreased mounting duration without impacting social investigation or USV production. Similarly, optogenetic activation of POA neurons was sufficient to generate USV production as reported in earlier studies but mounting or social investigation remained unaffected. 

      Do these discrepancies arise due to the efficiency differences between DREADD-mediated silencing vs. Casp3 ablation? Or does the chemogenetic result reflect off-manifold effects on downstream circuitry whereas a more permanent ablation strategy allows other brain regions to compensate due to redundancy? It is important to resolve whether these arise due to technical reasons or whether these reflect the underlying (perhaps messy) logic of neural circuitry. Therefore, while it is clear that POA neurons likely contribute to multiple behavioral readouts of social isolation, understanding their exact roles in any greater detail will require further experiments.

      We have added new analyses to consider the possibility that optogenetic activation of female POA-social neurons promotes social investigation. In the original manuscript, we analyzed the duration of social investigation bouts in POA-social-ChR2 females according to whether they overlapped with laser stimulation or whether they did not overlap. We realized that we made an error in this first analysis and inadvertently included social investigation bouts that occurred during the first 5 minutes of the social sessions, prior to any laser stimulation. Because these earlier bouts tend to be longer duration than later bouts, this mistake washed out the effect of laser stimulation on social bout duration. After correcting that error, we now report that optogenetic activation of female POA-social neurons lengthens social investigation bout duration (Fig. 4G). Inspired by this interesting finding, we also included analyses of the probability of social investigation following laser stimulation (Fig. 4E-F; excluding laser stimulations that were preceded by social investigation in the pre-laser baseline period). These analyses support the conclusion that optogenetic activation of POA-social neurons promotes both USV production and social investigation in group-housed females.  

      The majority of the females that we used in our TRAP2-based ablation experiments were heterozygous for TRAP2 (N = 11 of 15 POA-social-caspase subjects were TRAP2;Ai14 females), whereas all females used in our chemogenetic silencing experiments were homozygous for TRAP2. To test whether a more effective ablation of POA-social neurons might drive decreases in social investigation and USV production, we set up additional TRAP2 homozygous POA-social-caspase females and directly compare the effects of ablation between the two genotypes (Fig. S3; N = 11 hets in total and N = 9 homozygotes in total). These experiments revealed that effects on mounting were more pronounced following POA-social ablation in TRAP2 homozygotes vs. heterozygotes, but that neither group exhibited decreased social investigation or USV production following 4-OHT treatment.

      To ask whether caspase-mediated ablation in TRAP2 homozygotes was effective in eliminating neural activity associated with social behaviors in females, we performed Fos immunostaining in a subset of the POA-social-caspase TRAP2 homozygotes following a samesex interaction. We found that POA Fos expression was robustly reduced in these females relative to control group-housed and control single-housed females that also engaged in samesex interactions, down to levels seen in group-housed and single-housed females that did not engage in a social interaction (comparison shown in Fig. S3D; control female data same as in Fig. 1). Moreover, the remaining POA Fos in these TRAP2 homozygotes was no longer positively correlated to social investigation or USV production (Fig. S3E-F). Together, these findings lead us to favor the interpretation suggested by the reviewer below, that permanent ablation of POA-social neurons leads to compensation from other brain regions due to redundancy. In addition, our finding that optogenetic activation of POA-social neurons promotes both USV production and social investigation supports the idea that POA-social neurons directly regulate these behaviors. We agree with the reviewer that additional work is needed to understand the complex sex- and context-dependent role played by the POA in the regulation of mouse social behaviors.

      Recommendations for the Authors:

      Reviewer 1 Recommendations:

      (1) The largest issue is that many of the stated "key" behavioral findings are not statistically significant.

      (1a) Figure 2C is not significant and Figure 5G is not significant

      We have added N = 5 POA-social-hM4Di females, N = 3 POA-social-hM4Di males, and N = 3 POA-social-GFP males to the dataset. The decrease in mounting following chemogenetic silencing of POA-social neurons is now statistically significant in both sexes (p < 0.05 for both; see current Figs. 2C and 6G). We also simplified our statistical analysis of mounting in these experiments to consider the proportion of trials with and without resident-initiated mounting on saline vs. CNO days, using McNemar’s test for paired proportions. 

      (1b) Mounting graphs are completely omitted in Figure 4. 

      Given that mounting was only observed infrequently in POA-social-ChR2 females, we simply report this information in the Results text (lines 382-388). In our prior summary of the mounting results, we reported that mounting was observed in a total of 3 trials from 2 females, but we inadvertently included information from a duplicate trial from one of the POA-socialChR2 females in this summary (all other analyses of the POA-social-ChR2 females included one trial per female). We have corrected that error and now report that we observed mounting following laser stimulation in 1 trial from 1 POA-social-ChR2 female. We have expanded our consideration of potential effects of optogenetic activation of POA-social neurons on social investigation and include these new analyses as part of Figure 4 (Fig. 4E-G), following the existing analyses of USV production.

      (1c) Figure 3C shows a reduction of mounting following the ablation of POA (although no stats on the graph to denote significance), but this ablation approach can't resolve whether POA is required to encode the state produced by the short period of isolation, and/or whether it needs to be online at test.

      We have now added an asterisk in Fig. 3C to denote a p value less than 0.05. Thank you for catching our oversight.

      We designed our activity-dependent labeling experiments to TRAP and express viruses in POA neurons that increase their activity in conjunction with the production of social behaviors in single-housed females. We believe our findings our most consistent with the conclusion that these neurons regulate the production of social behaviors, rather than encoding the state of social isolation, and we have renamed these neurons as “POA-social” neurons to better reflect our thinking.

      We also now include control experiments (albeit chemogenetic inhibition, not caspase ablation) in which the TRAP2 strategy is used to express hM4Di in the POA of single-housed females that do not experience a social interaction prior to 4-OHT delivery (non-social controls, Fig. S2D-G). We report that chemogenetic inhibition of these neurons does not decrease social behavior in single-housed females during a subsequent same-sex interaction (p > 0.05 for saline vs. CNO rates of social investigation, mounting, and USVs). These additional findings support the idea that the activity of POA-social neurons is related to the production of social behaviors rather than to the state of social isolation. 

      The reviewer is correct that our ablation approach cannot resolve the question of whether POA-social neuronal activity is required online during testing, but our reversible chemogenetic inhibition experiments provide evidence that the activity of POA-social neurons is required online at the time of testing to regulate social behavior.

      (1d) A similar issue is seen regarding investigation (a general lack of significance with most of the LOF and GOF manipulations).

      As reported in the original manuscript, we find that chemogenetic inhibition of POAsocial neurons reduces social investigation in females, while caspase-mediated ablation of female POA-social neurons does not. Our original caspase dataset used mostly but not all TRAP2 heterozygous females (N = 11 TRAP2 heterozygotes (TRAP2;Ai14), generated by crossing TRAP2 mice with Ai14 mice, for the purpose of visualizing the absence of tdTomato labeling to estimate spread of the caspase virus; and N = 4 TRAP2 homozygotes). By adding to the TRAP2 homozygous caspase dataset and comparing the effects on female social behavior of ablation of POA-social neurons in TRAP2 heterozygous vs. TRAP2 homozygous females, we

      now provide evidence that the attenuation of mounting is more efficacious in TRAP2 homozygous females than in heterozygotes (Fig. S3B). Nonetheless, we fail to see effects on social investigation and USV production, even when caspase ablation of POA-social neurons is performed in TRAP2 homozygous females (Fig. S3A,C). 

      In spite of the lack of effect on these behaviors, we show that caspase-mediated ablation of POA-social neurons in TRAP2 homozygous females leads to a dramatic reduction in social interaction-induced Fos expression in the POA. POA Fos expression in these caspase females is reduced to the levels seen in control group-housed and single-housed females that are not given social interactions and are significantly lower than Fos expression in group-housed and single-housed females that are given a same-sex interaction (Fig. S3D). Moreover, the remaining POA Fos expression in the caspase females is no longer related to rates of social investigation (Fig. S3E), as is normally the case in group-housed and single-housed control females (Fig. S1C, left). Together, these data support the idea that some type of neuronal compensation outside of the POA is occurring following ablation of POA-social neurons, and this compensation permits normal levels of USV production and social investigation.

      As in the original manuscript, we report that chemogenetic inhibition of POA-social neurons in male mice reduces mounting but does not reduce social investigation (or USV production). We now include quantification of social behaviors produced by male and female POA-social-hM4Di mice in the TRAPing sessions that preceded 4-OHT delivery (Fig. S5). These measurements show that males spent significantly more time than females engaged in mounting, and we speculate that this bias in TRAPing session behavior might have led to a bias in TRAP-mediated viral labeling of male POA neurons that regulate mounting, at the expense of male POA neurons that regulate social investigation (or USV production).

      We have added new analyses to consider the possibility that optogenetic activation of female POA-social neurons promotes social investigation. In the original manuscript, we analyzed the duration of social investigation bouts in POA-social-ChR2 females according to whether they overlapped with laser stimulation or whether they did not overlap. We realized that we made an error in this first analysis and inadvertently included social investigation bouts that occurred during the first 5 minutes of the social sessions, prior to any laser stimulation. Because these earlier bouts tend to be longer duration than later bouts, this mistake washed out the effect of laser stimulation on social bout duration. After correcting that error, we now report that optogenetic activation of female POA-social neurons lengthens social investigation bout duration (Fig. 4G). Inspired by this encouraging finding, we also included analyses of the probability of social investigation following laser stimulation (Fig. 4E-F; excluding laser stimulations that were preceded by social investigation in the pre-laser baseline period). These analyses support the conclusion that optogenetic activation of POA-social neurons promotes both USV production and social investigation in group-housed females.

      (2) In Figure 1 and elsewhere, the authors use a Mann-Whitney U test, which should be used for non-parametric data, but in other places, they use statistical tests for normally distributed data. Why? How was the normality of distributions tested?

      We tested the normality of data distributions using the Shapiro-Wilk test. Parametric tests were used for analyses that contained normally distributed data, and non-parametric tests were used for analyses that contained non-normally distributed data. This information is included in the Methods (lines 997-1000), and full details of statistical analyses can be found in Table S1.

      (3) The method for "trapping" neurons that are part of the short-term isolation ensemble has some caveats that have not been adequately addressed. First, 4-OHT was administered after social interaction, but before 24 hours of isolation, making it unclear exactly WHAT is being trapped.

      i) Is it neurons that encode the recent 3-day iso experience? (seems unlikely, as this would have been hours after the end of that iso window)

      We now include a group of control females to directly test this possibility (Fig. S2D-G). These TRAP2 females were single-housed for 3 days but were not given a social interaction prior to 4-OHT treatment (N = 5 non-social controls). Presumably, POA neurons TRAPed in these females might encode the experience of short-term isolation. However, we found that chemogenetic inactivation of these TRAPed neurons during a subsequent same-sex interaction failed to decrease social behaviors in single-housed females (Fig. S2E-G; p > 0.05 for CNO vs. saline rates of social investigation, mounting, and USV production). These control experiments support the idea that we are TRAPing neurons whose activity is related to the production of social behaviors, and we have renamed the neurons as “POA-social” neurons to reflect this thinking.

      ii) Is it neurons that encode the recent behavior impacted by the 3-day iso? (this seems to be the goal, but the authors do not provide evidence that the time course of their injection is efficient enough to recruit the recently activated neurons, nor do they provide evidence that opening the trapping window directly after the behavior is better than directly before)

      We opted to perform IP injections of 4-OHT immediately following the behavior session, rather than behavior, due to concern that handling the mice and delivering IP injections prior to behavior sessions would stress the mice, leading to lower rates of social behaviors. The nonsocial female hM4Di experiments described above support the idea that we are TRAPing neurons related to the production of social behaviors, as the reviewer suggests. 

      iii) Is it trapping neurons active during the subsequent 24 hours of isolation? (seems possible, but this would mean that the authors are looking at a different population of neurons than they claim).

      If chemogenetic silencing of POA neurons that were TRAPed following 3-days of social isolation but in the absence of a social interaction (N = 5 non-social controls, Fig. S2D-G) does not alter social behaviors, there is no compelling reason to hypothesize that TRAPing POA neurons activated following the 24 hours of social isolation that follow a social interaction would do so. Moreover, in the original study characterizing the TRAP2 mice (DeNardo et al., 2019), the authors performed experiments to characterize the time course of TRAPing relative to 4-OHT treatment and concluded that the majority of TRAPing occurs within a 6-hour window centered around the 4-OHT injection.

      (4) Relatedly, the authors seem to find a fair bit of variability in their TRAP-mediated experiments. This begs the question - are the effects of their GOF and LOF approaches

      i) dependent on the iso-behaviors that were "trapped" for each animal (in other words, how does behavior at test 1 correlate with behavior at test 2)? 

      To test the reviewer’s idea, we compared rates of TRAPing session behaviors for the POA-social-hM4Di females to the subsequent effects of neuronal silencing on these behaviors (calculated as (CNO behavior – saline behavior). These correlations are shown in Fig. S2A-C and are all non-significant. We also include below for the reviewer the same types of correlations for the other datasets in our study (loss-of-function experiments: female POAsocial-caspase, male POA-social-hM4Di; and gain-of-function experiments: female POA-socialChR2).

      Author response image 1.

      The only loss-of-function experiment comparison in the above figure that reveals a negative and significant correlation is the mounting comparison for the POA-social-hM4Di males (time spent mounting during TRAPing session vs. (CNO time spent mounting -saline time spent mounting). This significant correlation likely reflects that fact that (1) no males mounted in the CNO session and (2) that mounting rates for individual males are relatively consistent over time (in comparison to female mounting, which is more variable; see Author response image 2 below of TRAPing session vs. saline mounting in male vs. female POA-social-hM4Di experiments). The correlation between TRAPing session and testing session mounting is significant for the POA-social-ChR2 females, but despite the significant correlation, we would want to see more instances of optogenetically-elicited mounting to make any claim about its relationship to TRAPing session behavior.

      Author response image 2.

      Nonetheless, we agree with the reviewer’s intuition that one would expect the effects of POA activity manipulations on different behaviors to scale with rates at which these behaviors were performed during the TRAPing session. We speculate that variability in the TRAPing process might have obscured such a relationship. There is inevitable variability in the exact body cavity placement of IP injections, which can affect drug absorption, and another point is that we delivered a fixed volume of 4-OHT (10 mg/mL 4-OHT in 150 uL filtered corn oil) to all mice in the study, regardless of their weight, which likely added variability in TRAPing efficacy from animal to animal. This detail was reported inaccurately in the Methods, and that error has been corrected (line 920). With regard to our male POA-social-hM4Di dataset, we find that these males spend more time mounting during their TRAPing sessions than female POA-socialhM4Di (Fig. S5; males also spent less time investigating and tended to produce fewer USVs than females), a fact that we hypothesize may have led to a bias toward TRAPing mountingrelated POA neurons in male subjects. In addition, however, the fact that male mice typically weigh more than females and would have received a slightly lower effective dosage of 4-OHT may also have contributed to the weaker effects on behavior in the male POA-social-hM4Di experiments relative to the female POA-social-hM4Di experiments.

      We also want to highlight that interpreting correlations for females between time spent mounting during the TRAPing session and time spent mounting during the test sessions can be complicated. For example, we see 2 cases in the female POA-social-hM4Di dataset in which the female did not mount in the TRAPing session, and then mounted on the saline day (12s and 10s total mounting for those 2 females) but not on the CNO day. One interpretation of the data from these 2 females is that mounting on the TRAPing day is not required to attenuate mounting on the later test days. However, female mounting behavior itself is variable, both across different females and across different tests of a given female, as noted above. If we consider all singlehoused females included in our dataset for which we quantified control behavioral data (i.e., behavior trials from unmanipulated females and TRAPing sessions from females that were later manipulated), we find that mounting is not observed in ~30% of the females (24 of 83). In ongoing behavioral experiments not included in this manuscript, we are investigating factors that regulate female mounting following single-housing. In that dataset, we also see little evidence that female mounting in one social interaction predicts mounting in a subsequent interaction

      (i.e., there don’t appear to stable “high mounters” and “low mounters” following single housing). Thus, the small number of cases in which females did not mount in the TRAPing session and then displayed mounting on the CNO only day are difficult to interpret. 

      Two additional considerations are that TRAPing may not be equally efficacious for POA neurons that regulate different behaviors, and that different behaviors may be differentially sensitive to perturbations of the POA. Previous elegant calcium imaging work has shown that different subsets of Esr1+ POA neurons exhibit activity that is “tuned” to specific behaviors (sniffing vs. mounting in males interacting with females; Yang et al., 2023). However, it is possible that these subsets of neurons display differential levels of Fos expression following the production of their preferred behavior and that some behavior-related subsets may thus be more easily TRAPed than others. It may also be the case that some behaviors are more easily disrupted by POA activity manipulations than others (e.g., perturbation in a smaller percentage of behavior-related POA neurons may be required to disrupt some behaviors relative to others). 

      Despite these caveats, we have two lines of evidence that the effects of chemogenetic silencing of POA-social neurons depends on the behaviors produced during the TRAPing sessions.

      (1) Social behavior is required during the TRAPing session to see subsequent effects on social behavior following chemogenetic silencing of TRAPed POA neurons. In control females that were single-housed but were not given a social interaction prior to 4OHT treatment, social behaviors are not reduced by chemogenetic silencing of TRAPed POA neurons (Figs. S2D-G).

      (2) To directly test whether mounting in the TRAPing session is required to see attenuation of mounting during subsequent chemogenetic silencing of POA-social neurons, we performed control experiments in which single-housed females interacted with a female visitor that was placed under a cup during the TRAPing session prior to 4-OHT treatment. Mounting was not possible in this context, and we also found that females produced lower rates of USVs during the TRAPing session relative to single-housed females engaged in free social interaction. However, subject females spent more time engaged in social investigation of the visitor relative to single-housed females engaged in free social interactions (see Author response image 3 below).

      Author response image 3.

      Unfortunately, none of the experimental females in this cohort displayed mounting in the CNO or saline sessions. Given that we could use this dataset to address the intended question, we did not include it in the manuscript. However, it is quite interesting that female subjects displayed higher than normal social investigation and lower than normal USV production in their TRAPing sessions (relative to single-housed females engaged in free interactions), and subsequently, chemogenetic inhibition of TRAPed POA neurons decreased social investigation but did not decrease USV production (Author response image 4 below). 

      Author response image 4.

      Together, we think our data support the idea that the POA neurons that are TRAPed are related to the social behaviors performed by the animals, but these relationships may be complex and difficult to detect from comparisons across animals within a single experimental group.

      And/or are they

      ii) influenced by the spread or amount of virus for each animal? These correlations could help shed light on what exactly is being trapped - is it specific behaviors or is it the "state" of shortterm isolation?

      Our control experiments with females that were single-housed but did not receive a social interaction prior to 4-OHT treatment provide evidence that the production of social behaviors is required to see subsequent effects on behavior following chemogenetic inhibition of TRAPed POA neurons (Figs. S2D-G).

      The same volume of virus was injected across all activity manipulation experiments (200 nL). Because of the trajectory of our POA viral injections (performed at a slight rostral angle relative to vertical), we did sometimes see viral labeling that spread into the AH caudal to the POA. For this reason, we included the AH TRAPed control group (Fig. 2), to rule out the possibility that viral spread into the AH could account for the effects of chemogenetic silencing of POA-social neurons on female social behaviors. Also because of the injection angle used, we don’t see substantial viral spread rostral to our injection coordinates. In short, there isn’t systematic variability in the targeting or spread of our POA viral injections that can account for variability in the effects on USV production and social investigation of our LOF and GOF manipulations (female hM4Di and female ChR2 experiments).

      In older lesion studies in male rodents and birds, there is some support for the idea that rostral vs. caudal POA neurons differentially regulate appetitive vs. consummatory sexual behaviors (as reviewed in Balthazart and Ball, 2007). However, all of our viral injections were placed in what that review paper would have considered ‘caudal’ POA. We also note that more recent imaging studies have reported that subsets of POA neurons are differentially tuned to male sniffing vs. male mounting (Yang et al.,2023), and these subsets must be relatively co-localized given that they are imaged in the same field of view. Whether distinct subsets of POA neurons regulate the production of different female social behaviors, and if so, how these subsets are localized within the POA, remains an important question for future study.

      (5) The authors label their region of interest as the "POA" but images throughout (e.g. their fos image, Figure 1E), look more like the MPO. Why label it POA?

      The POA neurons in our study are found in a band that spans the medial POA, as well as a bit of the lateral POA. To avoid over-specifying, we call this region the POA more generally.

      (6) In all the experiments, mice are isolated and then re-group housed with siblings. Do all the siblings in the group belong to the same experimental group, or are siblings naïve? This may be critical to help determine whether some of the effects observed may be "group" effects.

      In general, multiple (although not always all) mice in a cage belonged to the same experimental group. In our inhibitory DREADDs experiments, it is unclear how that could drive our observed effects on behavior, given that home cage behavior would only be expected to differ for a given mouse in the time period following their CNO session. 

      For the female POA-social-caspase mice, we cannot rule out the possibility that their home cage behaviors differed in the time period following 4-OHT treatment and re-grouphousing and prior to post-4-OHT behavior measurements. However, given that the only social behavior affected by ablation of POA-social neurons was mounting, and that rates of mounting would be expected to be very low in group-housed females within home cages, it is unclear how our experimental result could be attributed to group effects.

      If by “group” effects the reviewer means “litter” effects, we include a plot below that shows the CNO vs. saline behaviors for the POA-social-hM4Di females, separated by cage ID. There is no evidence that the effects of chemogenetic silencing of POA-social-hM4Di females are being driven by only certain cages (only social investigation and USVs are shown, because mounting was uniformly low (1 of 17 females mounted) in the CNO session).

      Author response image 5.

      (7) For chemogenetic experiments, the authors state that CNO and Saline were given in a counterbalanced order (eg line 189). Did the authors see any order effects?

      We did not see order effects, and we can include plots of those data below for the female and male POA-social-hM4Di groups, with mice plotted according to which treatment they received first.

      Author response image 6.

      (8) In the control experiments in Figure 2 where VMH or AH are chemogenetically silenced, it isn't clear whether these groups include mice that were subjected to 3 days of isolation. Please clarify.

      Yes, these female groups were also subjected to 3 days of isolation (first prior to the TRAPing session, and for a second time prior to the onset of the CNO/saline testing sessions). That information has been clarified in the Results section (line 214) and in the Methods (lines 935-938).

      (9) Line 312. The title for this section, "POA neurons increase their activity....." is somewhat misleading. It sounds like the authors imaged trapped neurons. I think what they mean is that more POA neurons are activated following opposite-sex interactions with males.

      Thanks for this catch. We have modified the section title, as well as the title of the first results sub-section.

      (10) Figure 5A, right panels. The authors fail to find an increase in the investigation of male-male pairs following the short-term isolation of one. This contrasts with the main finding in Matthews et al., 2016 Cell, where short periods of isolation are said to promote pro-social behaviors. The authors could comment on this discrepancy in their discussion (eg difference in testing apparatus/test type? Difference in the number of days of isolation? etc.).

      In current Fig. 6A, there is no significant interaction between the two main effects, but each main effect is significant: single-housed males spend more time investigating partners than group-housed males, and males spend more time investigating female partners than male partners. The significant main effect of housing condition is consistent with the findings of Matthews et al., 2016 and is included within the Results (lines 486-492). 

      (11) Figure 5F, the authors seem to have a main effect of virus (more overall investigation in dreadds mice). Nothing about this is addressed.

      We sometimes see differences in social behavior between cohorts of males when they are tested at different times and, correspondingly, with different groups of female social partners. Our POA-social-hM4Di and POA-social-GFP males were set-up and tested at largely non-overlapping times. We have added a brief note to the Results section to include this information (lines 535-539).

      Reviewer 2 Recommendations:

      (1) (C)ritical control experiments are missing to support this claim (that a population of preoptic hypothalamic neurons contribute to the effects of short-term social isolation on the social behaviors of female mice).  

      (1a) All the activity-dependent labeling experiments with TRAP mice, including the subsequent neural activity manipulation experiments (Figures 2, 3, 4, 5E-F), were conducted by labeling neurons only in socially isolated animals, not group-housed animals. The authors labeled neurons after 30-minute social interactions, raising the possibility that the labeled neurons simply represent a "social interaction/behavior population" (mediating mounting and USVs in females and males) rather than a set of neurons specific to social isolation behaviors of mice)… The data thus far suggests these neurons may predominantly reflect a "POA social behavior" population rather than a set of cells distinctly responsive to isolated housing.

      We agree with the reviewer that the POA neurons we are studying regulate the production of social behaviors in females and males, rather than representing a set of cells distinctly responsive to single housing. To more clearly reflect our thinking, we have changed the name of the neurons from “POA-iso neurons” to “POA-social neurons”. Thank you for this helpful criticism.

      Our Fos data are consistent with the idea that the POA may regulate social behaviors in group-housed females (not just single-housed females). Namely, we found that counts of Fospositive POA neurons are significantly related to rates of social investigation (p = 0.01) and tend to be related to USV rates (p = 0.05) in group-housed females that engaged in same-sex interactions (Fig. S1C). We now include two new sets of experiments aimed at further testing the idea this idea. 

      First, we include 2 control groups in which TRAPing sessions were performed in grouphoused females following same-sex interactions. We find that chemogenetic silencing of these group-housed-TRAPed POA neurons fails to reduce social behaviors in females that are subsequently single-housed and given a same-sex social interaction (Fig. 5A-D; GH-TRAPed POA hM4Di females), and that optogenetic activation of group-housed-TRAPed POA neurons fails to promote female social behavior (Fig. 5E-H; GH-TRAPed POA ChR2 females). At face value, these findings do not support the idea that the POA contains neurons that regulate social behaviors in group-housed females.

      However, one important caveat is that group-housed females engage in low rates of social behaviors (low investigation time, no mounting, and few USVs), and thus TRAP-based labeling may not work efficaciously in these mice. There may be POA neurons that regulate social behaviors in group-housed females but that do not upregulate Fos following production of relatively low rates of social behaviors. To test this idea, we also include females in which POA neurons are chemogenetically silenced using a viral strategy that does not depend on activitydependent labeling. In this new experiment, we report that silencing of POA neurons significantly reduces USV production in group-housed females (Fig. 5J-L) and significantly reduces social investigation, mounting, and USV production when these same females are retested following single-housing (Fig. 5M-O).

      (2) Please add strain background information of subject animals in the methods.

      This information has been added to the Animals section within the Methods (lines 788802).

      Responses to Reviewer 3 Recommendations:

      (1a) (T)he conflicting effects on behavior are hard to interpret without additional experiments….Similarly, optogenetic activation of POA neurons was sufficient to generate USV production as reported in earlier studies but mounting or social investigation remained unaffected. 

      We have added new analyses to consider the possibility that optogenetic activation of female POA-social neurons promotes social investigation. In the original manuscript, we analyzed the duration of social investigation bouts in POA-social-ChR2 females according to whether they overlapped with laser stimulation or whether they did not overlap. We realized that we made an error in this first analysis and inadvertently included social investigation bouts that occurred during the first 5 minutes of the social sessions, prior to any laser stimulation. Because these earlier bouts tend to be longer duration than later bouts, this mistake washed out the effect of laser stimulation on social bout duration. After correcting that error, we now report that optogenetic activation of female POA-social neurons lengthens social investigation bout duration (Fig. 4G). Inspired by this interesting finding, we also included analyses of the probability of social investigation following laser stimulation (Fig. 4E-F; excluding laser stimulations that were preceded by social investigation in the pre-laser baseline period). These analyses support the conclusion that optogenetic activation of POA-social neurons promotes both USV production and social investigation in group-housed females.

      (1b) Do these discrepancies (between hM4Di and caspase) arise due to the efficiency differences between DREADD-mediated silencing vs. Casp3 ablation? Or does the chemogenetic result reflect off-manifold effects on downstream circuitry whereas a more permanent ablation strategy allows other brain regions to compensate due to redundancy? It is important to resolve whether these arise due to technical reasons or whether these reflect the underlying (perhaps messy) logic of neural circuitry.  

      The possibility that the difference in effects on behavior between chemogenetic silencing and caspase ablation at face value seems inconsistent with the findings of previous experiments, in which ablation of large numbers of POA neurons failed to reduce USV production in male mice (POA lesions in Bean et al., 1981; ablation of VGAT+ POA neurons by Gao et al., 2018). These findings stand in contrast to those using chemogenetic silencing of large numbers of POA neurons, which report reduced USV production in male mice (VGAT+/Esr1+ in Karigo et al., 2021; Esr1+ in Chen et al., 2021).

      However, it is the case that the majority of the females that we used in our TRAP2-based ablation experiments were heterozygous for TRAP2 (N = 11 of 15 POA-social-caspase subjects were TRAP2;Ai14 females), whereas all females used in our chemogenetic silencing experiments were homozygous for TRAP2. To test whether a more effective ablation of POAsocial neurons might drive decreases in social investigation and USV production, we set up additional TRAP2 homozygous POA-social-caspase females and directly compare the effects of ablation between the two genotypes (Fig. S3; N = 11 hets in total and N = 9 homozygotes in total). These experiments revealed that effects on mounting were more pronounced following POA-social ablation in TRAP2 homozygotes vs. heterozygotes, but that neither group exhibited decreased social investigation or USV production following 4-OHT treatment.

      To ask whether caspase-mediated ablation in TRAP2 homozygotes was effective in eliminating neural activity associated with social behaviors in females, we performed Fos immunostaining in a subset of the POA-social-caspase TRAP2 homozygotes following a samesex interaction. We found that POA Fos expression was robustly reduced in these females relative to control group-housed and control single-housed females that also engaged in samesex interactions, down to levels seen in group-housed and single-housed females that did not engage in a social interaction (comparison shown in Fig. S3D; control female data same as in Fig. 1). Moreover, the remaining POA Fos in these TRAP2 homozygotes was no longer positively correlated to social investigation or USV production (Fig. S3E-F). Together, these findings lead us to favor the interpretation suggested by the reviewer below, that permanent ablation of POA-social neurons leads to compensation from other brain regions due to redundancy.

      Given the negative results above, we favor this possibility and indicate so in our Discussion. In addition, our finding that optogenetic activation of POA-social neurons promotes both USV production and social investigation supports the idea that POA-social neurons directly regulate these behaviors. We agree with the reviewer that additional work is needed to understand the complex sex- and context-dependent role played by the POA in the regulation of mouse social behaviors.

      (2) L 49: Please define Mesolimbic circuitry the first time it is mentioned.

      We have added a definition (lines 52-53).

      (3) L 210: In Figure 2C, the mounting duration baseline (saline) distribution seems lower than the same experimental baseline in Figures 1C and 3C. Does this reflect natural variability in the behavioral assay and might this be mitigated by additional sampling of animals?

      Yes, there is substantial variability in the display of mounting behavior by single-housed females, including in the proportion of trials with mounting as well as in the total duration of mounting. In the revised manuscript, we have simplified our analysis of mounting in our TRAPbased experiments to quantify the proportion of trials with mounting, rather than considering the total time spent mounting. After adding N = 5 additional females to the POA-social-hM4Di dataset, we now report a statistically significant decrease in the proportion of trials with mounting following chemogenetic silencing of POA-social neurons (Fig. 2C; McNemar’s test for paired proportions). 

      (4) L 310: The authors claim that "These findings suggest that a subset of POAiso neurons overlap with GABAergic, PAG-projecting POA neurons that have been demonstrated in previous work to promote USVs via disinhibition of excitatory PAG neurons important to USV production (Chen et al., 2021; Michael et al., 2020)." I think the data reported suggests the opposite since only 18.3% of all POA->PAG neurons are cFos+. Perhaps better rephrased as "A subset (18.3%) of POA->PAG neurons are labelled by cFos and that is sufficient to drive the production of USVs". Is it surprising?

      We modified the phrasing (lines 468-469), but a bit differently than suggested above, because although we suspect that optogenetic activation of the PAG-projecting neurons within the larger population of POA-social neurons is responsible for eliciting USV production, we did not technically demonstrate this to be the case in the current dataset. 

      We do find it surprising that so few (only ~20%) of PAG-projecting POA neurons upregulate Fos following female-female interactions marked by high rates of USV production. Even though optogenetic activation of PAG-projecting POA neurons elicits USV production, our finding suggests that the majority of PAG-projecting POA neurons may not play a role in regulating vocalization. In future work, it may be useful to apply an intersectional approach to further understand how the POA regulates USV production (for example, measure or manipulate activity selectively in projection-defined subsets of POA-social neurons).

      (5) Given the considerable prior evidence of POA->PAG circuit in promoting USVs, it is hard to understand why chemogenetic inactivation of POA neurons in males affects mounting but not USV production (Figures 5F-H). Any potential explanation for this discrepancy?

      We have two ideas about this surprising result. First, we examined the TRAPing session social behaviors of female and male POA-social-hM4Di mice. We found that male POA-socialhM4Di mice spent more time than female subjects mounting during the TRAPing sessions, and conversely, males spent less time investigating visitors and tended to produce fewer USVs than female subjects (Fig. S5). Given that our labeling method is activity-dependent, one possibility is that this bias in behavior is reflected in a bias toward labeling of POA neurons related to mounting.  

      Second, each mouse in the TRAP2-based hM4Di datasets received an IP injection of the same amount of 4-OHT (150 nL of 10 mg/mL 4-OHT in filtered corn oil) not adjusted for weight of the mouse. This information was not reported accurately in the Methods, and we have adjusted that section accordingly (line 920). As a result, because male mice typically weigh more than females and would have received a lower effective dosage of 4-OHT, another possibility is that TRAPing in males was less efficient than in females and accounts for the less complete effects on social behaviors. We have added language to the Results to discuss these possibilities (lines 540-560).

      (6) L 472: Typo. "we found that short-term isolation exerts more robust on the effects of male behavior during subsequent interactions with females than during interactions with males."

      Thank you for catching this mistake.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this study, the authors address whether the dorsal nucleus of the inferior colliculus (DCIC) in mice encodes sound source location within the front horizontal plane (i.e., azimuth). They do this using volumetric two-photon Ca2+ imaging and high-density silicon probes (Neuropixels) to collect single-unit data. Such recordings are beneficial because they allow large populations of simultaneous neural data to be collected. Their main results and the claims about those results are the following:

      (1) DCIC single-unit responses have high trial-to-trial variability (i.e., neural noise);

      (2) approximately 32% to 40% of DCIC single units have responses that are sensitive tosound source azimuth;

      (3) single-trial population responses (i.e., the joint response across all sampled single unitsin an animal) encode sound source azimuth "effectively" (as stated in title) in that localization decoding error matches average mouse discrimination thresholds;

      (4) DCIC can encode sound source azimuth in a similar format to that in the central nucleusof the inferior colliculus (as stated in Abstract);

      (5) evidence of noise correlation between pairs of neurons exists;

      and 6) noise correlations between responses of neurons help reduce population decoding error.

      While simultaneous recordings are not necessary to demonstrate results #1, #2, and #4, they are necessary to demonstrate results #3, #5, and #6.

      Strengths:

      - Important research question to all researchers interested in sensory coding in the nervous system.

      - State-of-the-art data collection: volumetric two-photon Ca2+ imaging and extracellularrecording using high-density probes. Large neuronal data sets.

      - Confirmation of imaging results (lower temporal resolution) with more traditionalmicroelectrode results (higher temporal resolution).

      - Clear and appropriate explanation of surgical and electrophysiological methods. I cannot comment on the appropriateness of the imaging methods.

      Strength of evidence for claims of the study:

      (1) DCIC single-unit responses have high trial-to-trial variability - The authors' data clearlyshows this.

      (2) Approximately 32% to 40% of DCIC single units have responses that are sensitive tosound source azimuth - The sensitivity of each neuron's response to sound source azimuth was tested with a Kruskal-Wallis test, which is appropriate since response distributions were not normal. Using this statistical test, only 8% of neurons (median for imaging data) were found to be sensitive to azimuth, and the authors noted this was not significantly different than the false positive rate. The Kruskal-Wallis test was not performed on electrophysiological data. The authors suggested that low numbers of azimuth-sensitive units resulting from the statistical analysis may be due to the combination of high neural noise and relatively low number of trials, which would reduce statistical power of the test. This may be true, but if single-unit responses were moderately or strongly sensitive to azimuth, one would expect them to pass the test even with relatively low statistical power. At best, if their statistical test missed some azimuthsensitive units, they were likely only weakly sensitive to azimuth. The authors went on to perform a second test of azimuth sensitivity-a chi-squared test-and found 32% (imaging) and 40% (e-phys) of single units to have statistically significant sensitivity. This feels a bit like fishing for a lower p-value. The Kruskal-Wallis test should have been left as the only analysis. Moreover, the use of a chi-squared test is questionable because it is meant to be used between two categorical variables, and neural response had to be binned before applying the test.

      The determination of what is a physiologically relevant “moderate or strong azimuth sensitivity” is not trivial, particularly when comparing tuning across different relays of the auditory pathway like the CNIC, auditory cortex, or in our case DCIC, where physiologically relevant azimuth sensitivities might be different. This is likely the reason why azimuth sensitivity has been defined in diverse ways across the bibliography (see Groh, Kelly & Underhill, 2003 for an early discussion of this issue). These diverse approaches include reaching a certain percentage of maximal response modulation, like used by Day et al. (2012, 2015, 2016) in CNIC, and ANOVA tests, like used by Panniello et al. (2018) and Groh, Kelly & Underhill (2003) in auditory cortex and IC respectively. Moreover, the influence of response variability and biases in response distribution estimation due to limited sampling has not been usually accounted for in the determination of azimuth sensitivity.

      As Reviewer #1 points out, in our study we used an appropriate ANOVA test (KruskalWallis) as a starting point to study response sensitivity to stimulus azimuth at DCIC. Please note that the alpha = 0.05 used for this test is not based on experimental evidence about physiologically relevant azimuth sensitivity but instead is an arbitrary p-value threshold. Using this test on the electrophysiological data, we found that ~ 21% of the simultaneously recorded single units reached significance (n = 4 mice). Nevertheless these percentages, in our small sample size (n = 4) were not significantly different from our false positive detection rate (p = 0.0625, Mann-Whitney, See Author response image 1 below).  In consequence, for both our imaging (Fig. 3C) and electrophysiological data, we could not ascertain if the percentage of neurons reaching significance in these ANOVA tests were indeed meaningfully sensitive to azimuth or this was due to chance. 

      Author response image 1.

      Percentage of the neuropixels recorded DCIC single units across mice that showed significant median response tuning, compared to false positive detection rate (α = 0.05, chance level).

      We reasoned that the observed markedly variable responses from DCIC units, which frequently failed to respond in many trials (Fig. 3D, 4A), in combination with the limited number of trial repetitions we could collect, results in under-sampled response distribution estimations. This under-sampling can bias the determination of stochastic dominance across azimuth response samples in Kruskal-Wallis tests. We would like to highlight that we decided not to implement resampling strategies to artificially increase the azimuth response sample sizes with “virtual trials”, in order to avoid “fishing for a smaller p-value”, when our collected samples might not accurately reflect the actual response population variability.

      As an alternative to hypothesis testing based on ranking and determining stochastic dominance of one or more azimuth response samples (Kruskal-Wallis test), we evaluated the overall statistical dependency to stimulus azimuth of the collected responses.  To do this we implement the Chi-square test by binning neuronal responses into categories. Binning responses into categories can reduce the influence of response variability to some extent, which constitutes an advantage of the Chi-square approach, but we note the important consideration that these response categories are arbitrary.

      Altogether, we acknowledge that our Chi-square approach to define azimuth sensitivity is not free of limitations and despite enabling the interrogation of azimuth sensitivity at DCIC, its interpretability might not extend to other brain regions like CNIC or auditory cortex. Nevertheless we hope the aforementioned arguments justify why the Kruskal-Wallis test simply could not “have been left as the only analysis”.

      (3) Single-trial population responses encode sound source azimuth "effectively" in that localization decoding error matches average mouse discrimination thresholds - If only one neuron in a population had responses that were sensitive to azimuth, we would expect that decoding azimuth from observation of that one neuron's response would perform better than chance. By observing the responses of more than one neuron (if more than one were sensitive to azimuth), we would expect performance to increase. The authors found that decoding from the whole population response was no better than chance. They argue (reasonably) that this is because of overfitting of the decoder modeltoo few trials used to fit too many parameters-and provide evidence from decoding combined with principal components analysis which suggests that overfitting is occurring. What is troubling is the performance of the decoder when using only a handful of "topranked" neurons (in terms of azimuth sensitivity) (Fig. 4F and G). Decoder performance seems to increase when going from one to two neurons, then decreases when going from two to three neurons, and doesn't get much better for more neurons than for one neuron alone. It seems likely there is more information about azimuth in the population response, but decoder performance is not able to capture it because spike count distributions in the decoder model are not being accurately estimated due to too few stimulus trials (14, on average). In other words, it seems likely that decoder performance is underestimating the ability of the DCIC population to encode sound source azimuth.

      To get a sense of how effective a neural population is at coding a particular stimulus parameter, it is useful to compare population decoder performance to psychophysical performance. Unfortunately, mouse behavioral localization data do not exist. Therefore, the authors compare decoder error to mouse left-right discrimination thresholds published previously by a different lab. However, this comparison is inappropriate because the decoder and the mice were performing different perceptual tasks. The decoder is classifying sound sources to 1 of 13 locations from left to right, whereas the mice were discriminating between left or right sources centered around zero degrees. The errors in these two tasks represent different things. The two data sets may potentially be more accurately compared by extracting information from the confusion matrices of population decoder performance. For example, when the stimulus was at -30 deg, how often did the decoder classify the stimulus to a lefthand azimuth? Likewise, when the stimulus was +30 deg, how often did the decoder classify the stimulus to a righthand azimuth?

      The azimuth discrimination error reported by Lauer et al. (2011) comes from engaged and highly trained mice, which is a very different context to our experimental setting with untrained mice passively listening to stimuli from 13 random azimuths. Therefore we did not perform analyses or interpretations of our results based on the behavioral task from Lauer et al. (2011) and only made the qualitative observation that the errors match for discussion.

      We believe it is further important to clarify that Lauer et al. (2011) tested the ability of mice to discriminate between a positively conditioned stimulus (reference speaker at 0º center azimuth associated to a liquid reward) and a negatively conditioned stimulus (coming from one of five comparison speakers positioned at 20º, 30º, 50º, 70 and 90º azimuth, associated to an electrified lickport) in a conditioned avoidance task. In this task, mice are not precisely “discriminating between left or right sources centered around zero degrees”, making further analyses to compare the experimental design of Lauer et al (2011) and ours even more challenging for valid interpretation.

      (4) DCIC can encode sound source azimuth in a similar format to that in the central nucleusof the inferior colliculus - It is unclear what exactly the authors mean by this statement in the Abstract. There are major differences in the encoding of azimuth between the two neighboring brain areas: a large majority of neurons in the CNIC are sensitive to azimuth (and strongly so), whereas the present study shows a minority of azimuth-sensitive neurons in the DCIC. Furthermore, CNIC neurons fire reliably to sound stimuli (low neural noise), whereas the present study shows that DCIC neurons fire more erratically (high neural noise).

      Since sound source azimuth is reported to be encoded by population activity patterns at CNIC (Day and Delgutte, 2013), we refer to a population activity pattern code as the “similar format” in which this information is encoded at DCIC. Please note that this is a qualitative comparison and we do not claim this is the “same format”, due to the differences the reviewer precisely describes in the encoding of azimuth at CNIC where a much larger majority of neurons show stronger azimuth sensitivity and response reliability with respect to our observations at DCIC. By this qualitative similarity of encoding format we specifically mean the similar occurrence of activity patterns from azimuth sensitive subpopulations of neurons in both CNIC and DCIC, which carry sufficient information about the stimulus azimuth for a sufficiently accurate prediction with regard to the behavioral discrimination ability.

      (5) Evidence of noise correlation between pairs of neurons exists - The authors' data andanalyses seem appropriate and sufficient to justify this claim.

      (6) Noise correlations between responses of neurons help reduce population decodingerror - The authors show convincing analysis that performance of their decoder increased when simultaneously measured responses were tested (which include noise correlation) than when scrambled-trial responses were tested (eliminating noise correlation). This makes it seem likely that noise correlation in the responses improved decoder performance. The authors mention that the naïve Bayesian classifier was used as their decoder for computational efficiency, presumably because it assumes no noise correlation and, therefore, assumes responses of individual neurons are independent of each other across trials to the same stimulus. The use of decoder that assumes independence seems key here in testing the hypothesis that noise correlation contains information about sound source azimuth. The logic of using this decoder could be more clearly spelled out to the reader. For example, if the null hypothesis is that noise correlations do not carry azimuth information, then a decoder that assumes independence should perform the same whether population responses are simultaneous or scrambled. The authors' analysis showing a difference in performance between these two cases provides evidence against this null hypothesis.

      We sincerely thank the reviewer for this careful and detailed consideration of our analysis approach. Following the reviewer’s constructive suggestion, we justified the decoder choice in the results section at the last paragraph of page 18:

      “To characterize how the observed positive noise correlations could affect the representation of stimulus azimuth by DCIC top ranked unit population responses, we compared the decoding performance obtained by classifying the single-trial response patterns from top ranked units in the modeled decorrelated datasets versus the acquired data (with noise correlations). With the intention to characterize this with a conservative approach that would be less likely to find a contribution of noise correlations as it assumes response independence, we relied on the naive Bayes classifier for decoding throughout the study. Using this classifier, we observed that the modeled decorrelated datasets produced stimulus azimuth prediction error distributions that were significantly shifted towards higher decoding errors (Fig. 5B, C) and, in our imaging datasets, were not significantly different from chance level (Fig. 5B). Altogether, these results suggest that the detected noise correlations in our simultaneously acquired datasets can help reduce the error of the IC population code for sound azimuth.”

      Minor weakness:

      - Most studies of neural encoding of sound source azimuth are done in a noise-free environment, but the experimental setup in the present study had substantial background noise. This complicates comparison of the azimuth tuning results in this study to those of other studies. One is left wondering if azimuth sensitivity would have been greater in the absence of background noise, particularly for the imaging data where the signal was only about 12 dB above the noise. The description of the noise level and signal + noise level in the Methods should be made clearer. Mice hear from about 2.5 - 80 kHz, so it is important to know the noise level within this band as well as specifically within the band overlapping with the signal.

      We agree with the reviewer that this information is useful. In our study, the background R.M.S. SPL during imaging across the mouse hearing range (2.5-80kHz) was 44.53 dB and for neuropixels recordings 34.68 dB. We have added this information to the methods section of the revised manuscript.

      Reviewer #2 (Public Review):

      In the present study, Boffi et al. investigate the manner in which the dorsal cortex of the of the inferior colliculus (DCIC), an auditory midbrain area, encodes sound location azimuth in awake, passively listening mice. By employing volumetric calcium imaging (scanned temporal focusing or s-TeFo), complemented with high-density electrode electrophysiological recordings (neuropixels probes), they show that sound-evoked responses are exquisitely noisy, with only a small portion of neurons (units) exhibiting spatial sensitivity. Nevertheless, a naïve Bayesian classifier was able to predict the presented azimuth based on the responses from small populations of these spatially sensitive units. A portion of the spatial information was provided by correlated trial-to-trial response variability between individual units (noise correlations). The study presents a novel characterization of spatial auditory coding in a non-canonical structure, representing a noteworthy contribution specifically to the auditory field and generally to systems neuroscience, due to its implementation of state-of-the-art techniques in an experimentally challenging brain region. However, nuances in the calcium imaging dataset and the naïve Bayesian classifier warrant caution when interpreting some of the results.

      Strengths:

      The primary strength of the study lies in its methodological achievements, which allowed the authors to collect a comprehensive and novel dataset. While the DCIC is a dorsal structure, it extends up to a millimetre in depth, making it optically challenging to access in its entirety. It is also more highly myelinated and vascularised compared to e.g., the cerebral cortex, compounding the problem. The authors successfully overcame these challenges and present an impressive volumetric calcium imaging dataset. Furthermore, they corroborated this dataset with electrophysiological recordings, which produced overlapping results. This methodological combination ameliorates the natural concerns that arise from inferring neuronal activity from calcium signals alone, which are in essence an indirect measurement thereof.

      Another strength of the study is its interdisciplinary relevance. For the auditory field, it represents a significant contribution to the question of how auditory space is represented in the mammalian brain. "Space" per se is not mapped onto the basilar membrane of the cochlea and must be computed entirely within the brain. For azimuth, this requires the comparison between miniscule differences between the timing and intensity of sounds arriving at each ear. It is now generally thought that azimuth is initially encoded in two, opposing hemispheric channels, but the extent to which this initial arrangement is maintained throughout the auditory system remains an open question. The authors observe only a slight contralateral bias in their data, suggesting that sound source azimuth in the DCIC is encoded in a more nuanced manner compared to earlier processing stages of the auditory hindbrain. This is interesting, because it is also known to be an auditory structure to receive more descending inputs from the cortex.

      Systems neuroscience continues to strive for the perfection of imaging novel, less accessible brain regions. Volumetric calcium imaging is a promising emerging technique, allowing the simultaneous measurement of large populations of neurons in three dimensions. But this necessitates corroboration with other methods, such as electrophysiological recordings, which the authors achieve. The dataset moreover highlights the distinctive characteristics of neuronal auditory representations in the brain. Its signals can be exceptionally sparse and noisy, which provide an additional layer of complexity in the processing and analysis of such datasets. This will be undoubtedly useful for future studies of other less accessible structures with sparse responsiveness.

      Weaknesses:

      Although the primary finding that small populations of neurons carry enough spatial information for a naïve Bayesian classifier to reasonably decode the presented stimulus is not called into question, certain idiosyncrasies, in particular the calcium imaging dataset and model, complicate specific interpretations of the model output, and the readership is urged to interpret these aspects of the study's conclusions with caution.

      I remain in favour of volumetric calcium imaging as a suitable technique for the study, but the presently constrained spatial resolution is insufficient to unequivocally identify regions of interest as cell bodies (and are instead referred to as "units" akin to those of electrophysiological recordings). It remains possible that the imaging set is inadvertently influenced by non-somatic structures (including neuropil), which could report neuronal activity differently than cell bodies. Due to the lack of a comprehensive ground-truth comparison in this regard (which to my knowledge is impossible to achieve with current technology), it is difficult to imagine how many informative such units might have been missed because their signals were influenced by spurious, non-somatic signals, which could have subsequently misled the models. The authors reference the original Nature Methods article (Prevedel et al., 2016) throughout the manuscript, presumably in order to avoid having to repeat previously published experimental metrics. But the DCIC is neither the cortex nor hippocampus (for which the method was originally developed) and may not have the same light scattering properties (not to mention neuronal noise levels). Although the corroborative electrophysiology data largely eleviates these concerns for this particular study, the readership should be cognisant of such caveats, in particular those who are interested in implementing the technique for their own research.

      A related technical limitation of the calcium imaging dataset is the relatively low number of trials (14) given the inherently high level of noise (both neuronal and imaging). Volumetric calcium imaging, while offering a uniquely expansive field of view, requires relatively high average excitation laser power (in this case nearly 200 mW), a level of exposure the authors may have wanted to minimise by maintaining a low the number of repetitions, but I yield to them to explain.

      We assumed that the levels of heating by excitation light measured at the neocortex in Prevedel et al. (2016), were representative for DCIC also. Nevertheless, we recognize this approximation might not be very accurate, due to the differences in tissue architecture and vascularization from these two brain areas, just to name a few factors. The limiting factor preventing us from collecting more trials in our imaging sessions was that we observed signs of discomfort or slight distress in some mice after ~30 min of imaging in our custom setup, which we established as a humane end point to prevent distress. In consequence imaging sessions were kept to 25 min in duration, limiting the number of trials collected. However we cannot rule out that with more extensive habituation prior to experiments the imaging sessions could be prolonged without these signs of discomfort or if indeed influence from our custom setup like potential heating of the brain by illumination light might be the causing factor of the observed distress. Nevertheless, we note that previous work has shown that ~200mW average power is a safe regime for imaging in the cortex by keeping brain heating minimal (Prevedel et al., 2016), without producing the lasting damages observed by immunohistochemisty against apoptosis markers above 250mW (Podgorski and Ranganathan 2016, https://doi.org/10.1152/jn.00275.2016).

      Calcium imaging is also inherently slow, requiring relatively long inter-stimulus intervals (in this case 5 s). This unfortunately renders any model designed to predict a stimulus (in this case sound azimuth) from particularly noisy population neuronal data like these as highly prone to overfitting, to which the authors correctly admit after a model trained on the entire raw dataset failed to perform significantly above chance level. This prompted them to feed the model only with data from neurons with the highest spatial sensitivity. This ultimately produced reasonable performance (and was implemented throughout the rest of the study), but it remains possible that if the model was fed with more repetitions of imaging data, its performance would have been more stable across the number of units used to train it. (All models trained with imaging data eventually failed to converge.) However, I also see these limitations as an opportunity to improve the technology further, which I reiterate will be generally important for volume imaging of other sparse or noisy calcium signals in the brain.

      Transitioning to the naïve Bayesian classifier itself, I first openly ask the authors to justify their choice of this specific model. There are countless types of classifiers for these data, each with their own pros and cons. Did they actually try other models (such as support vector machines), which ultimately failed? If so, these negative results (even if mentioned en passant) would be extremely valuable to the community, in my view. I ask this specifically because different methods assume correspondingly different statistical properties of the input data, and to my knowledge naïve Bayesian classifiers assume that predictors (neuronal responses) are assumed to be independent within a class (azimuth). As the authors show that noise correlations are informative in predicting azimuth, I wonder why they chose a model that doesn't take advantage of these statistical regularities. It could be because of technical considerations (they mention computing efficiency), but I am left generally uncertain about the specific logic that was used to guide the authors through their analytical journey.

      One of the main reasons we chose the naïve Bayesian classifier is indeed because it assumes that the responses of the simultaneously recorded neurons are independent and therefore it does not assume a contribution of noise correlations to the estimation of the posterior probability of each azimuth. This model would represent the null hypothesis that noise correlations do not contribute to the encoding of stimulus azimuth, which would be verified by an equal decoding outcome from correlated or decorrelated datasets. Since we observed that this is not the case, the model supports the alternative hypothesis that noise correlations do indeed influence stimulus azimuth encoding. We wanted to test these hypotheses with the most conservative approach possible that would be least likely to find a contribution of noise correlations. Other relevant reasons that justify our choice of the naive Bayesian classifier are its robustness against the limited numbers of trials we could collect in comparison to other more “data hungry” classifiers like SVM, KNN, or artificial neuronal nets. We did perform preliminary tests with alternative classifiers but the obtained decoding errors were similar when decoding the whole population activity (Author response image 2A). Dimensionality reduction following the approach described in the manuscript showed a tendency towards smaller decoding errors observed with an alternative classifier like KNN, but these errors were still larger than the ones observed with the naive Bayesian classifier (median error 45º). Nevertheless, we also observe a similar tendency for slightly larger decoding errors in the absence of noise correlations (decorrelated, Author response image 2B). Sentences detailing the logic of classifier choice are now included in the results section at page 10 and at the last paragraph of page 18 (see responses to Reviewer 1).

      Author response image 2.

      A) Cumulative distribution plots of the absolute cross-validated single-trial prediction errors obtained using different classifiers (blue; KNN: K-nearest neighbors; SVM: support vector machine ensemble) and chance level distribution (gray) on the complete populations of imaged units. Cumulative distribution plots of the absolute cross-validated singletrial prediction errors obtained using a Bayes classifier (naive approximation for computation efficiency) to decode the single-trial response patterns from the 31 top ranked units in the simultaneously imaged datasets across mice (cyan), modeled decorrelated datasets (orange) and the chance level distribution associated with our stimulation paradigm (gray). Vertical dashed lines show the medians of cumulative distributions. K.S. w/Sidak: Kolmogorov-Smirnov with Sidak.

      That aside, there remain other peculiarities in model performance that warrant further investigation. For example, what spurious features (or lack of informative features) in these additional units prevented the models of imaging data from converging?

      Considering the amount of variability observed throughout the neuronal responses both in imaging and neuropixels datasets, it is easy to suspect that the information about stimulus azimuth carried in different amounts by individual DCIC neurons can be mixed up with information about other factors (Stringer et al., 2019). In an attempt to study the origin of these features that could confound stimulus azimuth decoding we explored their relation to face movement (Supplemental Figure 2), finding a correlation to snout movements, in line with previous work by Stringer et al. (2019).

      In an orthogonal question, did the most spatially sensitive units share any detectable tuning features? A different model trained with electrophysiology data in contrast did not collapse in the range of top-ranked units plotted. Did this model collapse at some point after adding enough units, and how well did that correlate with the model for the imaging data?

      Our electrophysiology datasets were much smaller in size (number of simultaneously recorded neurons) compared to our volumetric calcium imaging datasets, resulting in a much smaller total number of top ranked units detected per dataset. This precluded the determination of a collapse of decoder performance due to overfitting beyond the range plotted in Fig 4G.

      How well did the form (and diversity) of the spatial tuning functions as recorded with electrophysiology resemble their calcium imaging counterparts? These fundamental questions could be addressed with more basic, but transparent analyses of the data (e.g., the diversity of spatial tuning functions of their recorded units across the population). Even if the model extracts features that are not obvious to the human eye in traditional visualisations, I would still find this interesting.

      The diversity of the azimuth tuning curves recorded with calcium imaging (Fig. 3B) was qualitatively larger than the ones recorded with electrophysiology (Fig. 4B), potentially due to the larger sampling obtained with volumetric imaging. We did not perform a detailed comparison of the form and a more quantitative comparison of the diversity of these functions because the signals compared are quite different, as calcium indicator signal is subject to non linearities due to Ca2+ binding cooperativity and low pass filtering due to binding kinetics. We feared this could lead to misleading interpretations about the similarities or differences between the azimuth tuning functions in imaged and electrophysiology datasets. Our model uses statistical response dependency to stimulus azimuth, which does not rely on features from a descriptive statistic like mean response tuning. In this context, visualizing the trial-to-trial responses as a function of azimuth shows “features that are not obvious to the human eye in traditional visualizations” (Fig. 3D, left inset).

      Finally, the readership is encouraged to interpret certain statements by the authors in the current version conservatively. How the brain ultimately extracts spatial neuronal data for perception is anyone's guess, but it is important to remember that this study only shows that a naïve Bayesian classifier could decode this information, and it remains entirely unclear whether the brain does this as well. For example, the model is able to achieve a prediction error that corresponds to the psychophysical threshold in mice performing a discrimination task (~30 {degree sign}). Although this is an interesting coincidental observation, it does not mean that the two metrics are necessarily related. The authors correctly do not explicitly claim this, but the manner in which the prose flows may lead a non-expert into drawing that conclusion.

      To avoid misleading the non-expert readers, we have clarified in the manuscript that the observed correspondence between decoding error and psychophysical threshold is explicitly coincidental.

      Page 13, end of middle paragraph:

      “If we consider the median of the prediction error distribution as an overall measure of decoding performance, the single-trial response patterns from subsamples of at least the 7 top ranked units produced median decoding errors that coincidentally matched the reported azimuth discrimination ability of mice (Fig 4G, minimum audible angle = 31º) (Lauer et al., 2011).”

      Page 14, bottom paragraph:

      “Decoding analysis (Fig. 4F) of the population response patterns from azimuth dependent top ranked units simultaneously recorded with neuropixels probes showed that the 4 top ranked units are the smallest subsample necessary to produce a significant decoding performance that coincidentally matches the discrimination ability of mice (31° (Lauer et al., 2011)) (Fig. 5F, G).”

      We also added to the Discussion sentences clarifying that a relationship between these two variables remains to be determined and it also remains to be determined if the DCIC indeed performs a bayesian decoding computation for sound localization.

      Page 20, bottom:

      “… Concretely, we show that sound location coding does indeed occur at DCIC on the single trial basis, and that this follows a comparable mechanism to the characterized population code at CNIC (Day and Delgutte, 2013). However, it remains to be determined if indeed the DCIC network is physiologically capable of Bayesian decoding computations. Interestingly, the small number of DCIC top ranked units necessary to effectively decode stimulus azimuth suggests that sound azimuth information is redundantly distributed across DCIC top ranked units, which points out that mechanisms beyond coding efficiency could be relevant for this population code.

      While the decoding error observed from our DCIC datasets obtained in passively listening, untrained mice coincidentally matches the discrimination ability of highly trained, motivated mice (Lauer et al., 2011), a relationship between decoding error and psychophysical performance remains to be determined. Interestingly, a primary sensory representations should theoretically be even more precise than the behavioral performance as reported in the visual system (Stringer et al., 2021).”

      Moreover, the concept of redundancy (of spatial information carried by units throughout the DCIC) is difficult for me to disentangle. One interpretation of this formulation could be that there are non-overlapping populations of neurons distributed across the DCIC that each could predict azimuth independently of each other, which is unlikely what the authors meant. If the authors meant generally that multiple neurons in the DCIC carry sufficient spatial information, then a single neuron would have been able to predict sound source azimuth, which was not the case. I have the feeling that they actually mean "complimentary", but I leave it to the authors to clarify my confusion, should they wish.

      We observed that the response patterns from relatively small fractions of the azimuth sensitive DCIC units (4-7 top ranked units) are sufficient to generate an effective code for sound azimuth, while 32-40% of all simultaneously recorded DCIC units are azimuth sensitive. In light of this observation, we interpreted that the azimuth information carried by the population should be redundantly distributed across the complete subpopulation of azimuth sensitive DCIC units.

      In summary, the present study represents a significant body of work that contributes substantially to the field of spatial auditory coding and systems neuroscience. However, limitations of the imaging dataset and model as applied in the study muddles concrete conclusions about how the DCIC precisely encodes sound source azimuth and even more so to sound localisation in a behaving animal. Nevertheless, it presents a novel and unique dataset, which, regardless of secondary interpretation, corroborates the general notion that auditory space is encoded in an extraordinarily complex manner in the mammalian brain.

      Reviewer #3 (Public Review):

      Summary:

      Boffi and colleagues sought to quantify the single-trial, azimuthal information in the dorsal cortex of the inferior colliculus (DCIC), a relatively understudied subnucleus of the auditory midbrain. They used two complementary recording methods while mice passively listened to sounds at different locations: a large volume but slow sampling calcium-imaging method, and a smaller volume but temporally precise electrophysiology method. They found that neurons in the DCIC were variable in their activity, unreliably responding to sound presentation and responding during inter-sound intervals. Boffi and colleagues used a naïve Bayesian decoder to determine if the DCIC population encoded sound location on a single trial. The decoder failed to classify sound location better than chance when using the raw single-trial population response but performed significantly better than chance when using intermediate principal components of the population response. In line with this, when the most azimuth dependent neurons were used to decode azimuthal position, the decoder performed equivalently to the azimuthal localization abilities of mice. The top azimuthal units were not clustered in the DCIC, possessed a contralateral bias in response, and were correlated in their variability (e.g., positive noise correlations). Interestingly, when these noise correlations were perturbed by inter-trial shuffling decoding performance decreased. Although Boffi and colleagues display that azimuthal information can be extracted from DCIC responses, it remains unclear to what degree this information is used and what role noise correlations play in azimuthal encoding.

      Strengths:

      The authors should be commended for collection of this dataset. When done in isolation (which is typical), calcium imaging and linear array recordings have intrinsic weaknesses. However, those weaknesses are alleviated when done in conjunction with one another - especially when the data largely recapitulates the findings of the other recording methodology. In addition to the video of the head during the calcium imaging, this data set is extremely rich and will be of use to those interested in the information available in the DCIC, an understudied but likely important subnucleus in the auditory midbrain.

      The DCIC neural responses are complex; the units unreliably respond to sound onset, and at the very least respond to some unknown input or internal state (e.g., large inter-sound interval responses). The authors do a decent job in wrangling these complex responses: using interpretable decoders to extract information available from population responses.

      Weaknesses:

      The authors observe that neurons with the most azimuthal sensitivity within the DCIC are positively correlated, but they use a Naïve Bayesian decoder which assume independence between units. Although this is a bit strange given their observation that some of the recorded units are correlated, it is unlikely to be a critical flaw. At one point the authors reduce the dimensionality of their data through PCA and use the loadings onto these components in their decoder. PCA incorporates the correlational structure when finding the principal components and constrains these components to be orthogonal and uncorrelated. This should alleviate some of the concern regarding the use of the naïve Bayesian decoder because the projections onto the different components are independent. Nevertheless, the decoding results are a bit strange, likely because there is not much linearly decodable azimuth information in the DCIC responses. Raw population responses failed to provide sufficient information concerning azimuth for the decoder to perform better than chance. Additionally, it only performed better than chance when certain principal components or top ranked units contributed to the decoder but not as more components or units were added. So, although there does appear to be some azimuthal information in the recoded DCIC populations - it is somewhat difficult to extract and likely not an 'effective' encoding of sound localization as their title suggests.

      As described in the responses to reviewers 1 and 2, we chose the naïve Bayes classifier as a decoder to determine the influence of noise correlations through the most conservative approach possible, as this classifier would be least likely to find a contribution of correlated noise. Also, we chose this decoder due to its robustness against limited numbers of trials collected, in comparison to “data hungry” non linear classifiers like KNN or artificial neuronal nets. Lastly, we observed that small populations of noisy, unreliable (do not respond in every trial) DCIC neurons can encode stimulus azimuth in passively listening mice matching the discrimination error of trained mice. Therefore, while this encoding is definitely not efficient, it can still be considered effective.

      Although this is quite a worthwhile dataset, the authors present relatively little about the characteristics of the units they've recorded. This may be due to the high variance in responses seen in their population. Nevertheless, the authors note that units do not respond on every trial but do not report what percent of trials that fail to evoke a response. Is it that neurons are noisy because they do not respond on every trial or is it also that when they do respond they have variable response distributions? It would be nice to gain some insight into the heterogeneity of the responses.

      The limited number of azimuth trial repetitions that we could collect precluded us from making any quantification of the unreliability (failures to respond) and variability in the response distributions from the units we recorded, as we feared they could be misleading. In qualitative terms, “due to the high variance in responses seen” in the recordings and the limited trial sampling, it is hard to make any generalization. In consequence we referred to the observed response variance altogether as neuronal noise. Considering these points, our datasets are publicly available for exploration of the response characteristics.

      Additionally, is there any clustering at all in response profiles or is each neuron they recorded in the DCIC unique?

      We attempted to qualitatively visualize response clustering using dimensionality reduction, observing different degrees of clustering or lack thereof across the azimuth classes in the datasets collected from different mice. It is likely that the limited number of azimuth trials we could collect and the high response variance contribute to an inconsistent response clustering across datasets.

      They also only report the noise correlations for their top ranked units, but it is possible that the noise correlations in the rest of the population are different.

      For this study, since our aim was to interrogate the influence of noise correlations on stimulus azimuth encoding by DCIC populations, we focused on the noise correlations from the top ranked unit subpopulation, which likely carry the bulk of the sound location information.  Noise correlations can be defined as correlation in the trial to trial response variation of neurons. In this respect, it is hard to ascertain if the rest of the population, that is not in the top rank unit percentage, are really responding and showing response variation to evaluate this correlation, or are simply not responding at all and show unrelated activity altogether. This makes observations about noise correlations from “the rest of the population” potentially hard to interpret.

      It would also be worth digging into the noise correlations more - are units positively correlated because they respond together (e.g., if unit x responds on trial 1 so does unit y) or are they also modulated around their mean rates on similar trials (e.g., unit x and y respond and both are responding more than their mean response rate). A large portion of trial with no response can occlude noise correlations. More transparency around the response properties of these populations would be welcome.

      Due to the limited number of azimuth trial repetitions collected, to evaluate noise correlations we used the non parametric Kendall tau correlation coefficient which is a measure of pairwise rank correlation or ordinal association in the responses to each azimuth. Positive rank correlation would represent neurons more likely responding together. Evaluating response modulation “around their mean rates on similar trials” would require assumptions about the response distributions, which we avoided due to the potential biases associated with limited sample sizes.

      It is largely unclear what the DCIC is encoding. Although the authors are interested in azimuth, sound location seems to be only a small part of DCIC responses. The authors report responses during inter-sound interval and unreliable sound-evoked responses. Although they have video of the head during recording, we only see a correlation to snout and ear movements (which are peculiar since in the example shown it seems the head movements predict the sound presentation). Additional correlates could be eye movements or pupil size. Eye movement are of particular interest due to their known interaction with IC responses - especially if the DCIC encodes sound location in relation to eye position instead of head position (though much of eye-position-IC work was done in primates and not rodent). Alternatively, much of the population may only encode sound location if an animal is engaged in a localization task. Ideally, the authors could perform more substantive analyses to determine if this population is truly noisy or if the DCIC is integrating un-analyzed signals.

      We unsuccessfully attempted eye tracking and pupillometry in our videos. We suspect that the reason behind this is a generally overly dilated pupil due to the low visible light illumination conditions we used which were necessary to protect the PMT of our custom scope.

      It is likely that DCIC population activity is integrating un-analyzed signals, like the signal associated with spontaneous behaviors including face movements (Stringer et al., 2019), which we observed at the level of spontaneous snout movements. However investigating if and how these signals are integrated to stimulus azimuth coding requires extensive behavioral testing and experimentation which is out of the scope of this study. For the purpose of our study, we referred to trial-to-trial response variation as neuronal noise. We note that this definition of neuronal noise can, and likely does, include an influence from un-analyzed signals like the ones from spontaneous behaviors.

      Although this critique is ubiquitous among decoding papers in the absence of behavioral or causal perturbations, it is unclear what - if any - role the decoded information may play in neuronal computations. The interpretation of the decoder means that there is some extractable information concerning sound azimuth - but not if it is functional. This information may just be epiphenomenal, leaking in from inputs, and not used in computation or relayed to downstream structures. This should be kept in mind when the authors suggest their findings implicate the DCIC functionally in sound localization.

      Our study builds upon previous reports by other independent groups relying on “causal and behavioral perturbations” and implicating DCIC in sound location learning induced experience dependent plasticity (Bajo et al., 2019, 2010; Bajo and King, 2012), which altogether argues in favor of DCIC functionality in sound localization.

      Nevertheless, we clarified in the discussion of the revised manuscript that a relationship between the observed decoding error and the psychophysical performance, or the ability of the DCIC network to perform Bayesian decoding computations, both remain to be determined (please see responses to Reviewer #2).

      It is unclear why positive noise correlations amongst similarly tuned neurons would improve decoding. A toy model exploring how positive noise correlations in conjunction with unreliable units that inconsistently respond may anchor these findings in an interpretable way. It seems plausible that inconsistent responses would benefit from strong noise correlations, simply by units responding together. This would predict that shuffling would impair performance because you would then be sampling from trials in which some units respond, and trials in which some units do not respond - and may predict a bimodal performance distribution in which some trials decode well (when the units respond) and poor performance (when the units do not respond).

      In samples with more that 2 dimensions, the relationship between signal and noise correlations is more complex than in two dimensional samples (Montijn et al., 2016) which makes constructing interpretable and simple toy models of this challenging. Montijn et al. (2016) provide a detailed characterization and model describing how the accuracy of a multidimensional population code can improve when including “positive noise correlations amongst similarly tuned neurons”. Unfortunately we could not successfully test their model based on Mahalanobis distances as we could not verify that the recorded DCIC population responses followed a multivariate gaussian distribution, due to the limited azimuth trial repetitions we could sample.

      Significance:

      Boffi and colleagues set out to parse the azimuthal information available in the DCIC on a single trial. They largely accomplish this goal and are able to extract this information when allowing the units that contain more information about sound location to contribute to their decoding (e.g., through PCA or decoding on top unit activity specifically). The dataset will be of value to those interested in the DCIC and also to anyone interested in the role of noise correlations in population coding. Although this work is first step into parsing the information available in the DCIC, it remains difficult to interpret if/how this azimuthal information is used in localization behaviors of engaged mice.

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      General:

      The manuscript is generally well written, but could benefit from a quick proof by a native English speaker (e.g., "the" inferior colliculus is conventionally used with its article). The flow of arguments is also generally easy to follow, but I would kindly ask the authors to consider elaborating or clarifying the following points (including those already mentioned in my public review).

      (1) Choice of model:

      There are countless ways one can construct a decoder or classifier that can predict a presented sensory stimulus based on a population neuronal response. Given the assumptions of independence as mentioned in my public review, I would ask the authors to explicitly justify their choice of a naïve Bayesian classifier.

      A section detailing the logic of classifier choice is now included in the results section at page 10 and the last paragraph of page 18 from the revised version of the manuscript.

      (2) Number of imaging repetitions:

      For particularly noisy datasets, 14 repetitions is indeed quite few. I reckon this was not the choice of the authors, but rather limited by the inherent experimental conditions. Despite minimisation of required average laser power during the development of s-TeFo imaging, the authors still required almost 200 mW (which is still quite a lot of exposure). Although 14 repetitions for 13 azimuthal locations every 5 s is at face value a relatively short imaging session (~15 min.), at 191 mW, with the desire to image mice multiple times, I could imagine that this is a practical limitation the authors faced (to avoid excessive tissue heating or photodamage, which was assessed in the original Nature Methods article, but not here). Nevertheless, this logic (or whatever logic they had) should be explained for non-imaging experts in the readership.

      This is now addressed in the answers to the public reviews.

      (3) Redundancy:

      It is honestly unclear to me what the authors mean by this. I don't speculate that they mean there are "redundant" (small) populations of neurons that sufficiently encode azimuth, but I'm actually not certain. If that were the case, I believe this would need further clarification, since redundant representations would be both inconsistent with the general (perhaps surprising) finding that large populations are not required in the DCIC, which is thought to be the case at earlier processing stages.

      In the text we are referring to the azimuth information being redundantly distributed across DCIC top ranked units. We do not mention redundant “populations of neurons”.

      (4) Correspondence of decoding accuracy with psychometric functions in mice: While this is an interesting coincidental observation, it should not be interpreted that the neuronal detection threshold in the DCIC somehow is somehow responsible its psychometric counterpart (which is an interesting yet exceedingly complex question). Although I do not believe the authors intended to suggest this, I would personally be cautious in the way I describe this correspondence. I mention this because the authors point it out multiple times in the manuscript (whereas I would have just mentioned it once in passing).

      This is now clarified in the revised manuscript.

      (5) Noisy vs. sparse:

      I'm confident that the authors understand the differences between these terms, both in concept (stochastic vs. scattered) and in context (neuronal vs. experimental), but I personally would be cautious in the way I use them in the description of the study. Indeed, auditory neuronal signals are to my knowledge generally thought to be both sparse and noisy, which is in itself interesting, but the study also deals with substantial experimental (recording) noise, and I think it's important for the readership to understand when "noise" refers to the recordings (in particular the imaging data) and to neuronal activity. I mention this specifically because "noisy" appears in the title.

      We have clarified this issue at the bottom of page 5 by adding the following sentences to the revised manuscript:

      “In this section we used the word “noise” to refer to the sound stimuli used and recording setup background sound levels or recording noise in the acquired signals. To avoid confusion, from now on in the manuscript the word “noise” will be used in the context of neuronal noise, which is the trial-to-trial variation in neuronal responses unrelated to stimuli, unless otherwise noted.”

      (6)  More details in the Methods:

      The Methods section is perhaps the least-well structured part of the present manuscript in my view, and I encourage the authors to carefully go through it and add the following information (in case I somehow missed it).

      a. Please also indicate the number of animals used here.

      Added.

      b. How many sessions were performed on each mouse?

      This is already specified in the methods section in page 25:

      “mice were imaged a total of 2-11 times (sessions), one to three times a week.”

      We added for clarification:

      “Datasets here analyzed and reported come from the imaging session in which we observed maximal calcium sensor signal (peak AAV expression) and maximum number of detected units.”

      c. For the imaging experiments, was it possible to image the same units from session tosession?

      This is not possible for sTeFo 2P data due to low spatial resolution which makes precisely matching neuron ROIs across sessions challenging.

      d. Could the authors please add more detail to the analyses of the videos (to track facialmovements) or provide a reference?

      Added citation.

      e. The same goes for the selection of subcellular regions of interest that were used as"units."

      Added to page 25:

      “We used the CaImAn package (Giovannucci et al., 2019) for automatic ROI segmentation through constrained non negative matrix factorization and selected ROIs (Units) showing clear Ca transients consistent with neuronal activity, and IC neuron somatic shape and size (Schofield and Beebe, 2019).”

      Specific: In order to maximise the efficiency of my comments and suggestions (as there are no line numbers), my numerated points are organised in sequential order.

      (1) Abstract: I wouldn't personally motivate the study with the central nucleus of the IC (i.e. Idon't think this is necessary). I think the authors can motivate it simply with the knowledge gaps in spatial coding throughout the auditory system, in which such large data sets such as the ones presented here are of general value.

      (2) Page 4: 15-50 kHz "white" noise is incorrect. It should be "band-passed" noise.

      Changed.

      (3) Supplemental figure 1, panel A: Since the authors could not identify cell bodiesunequivocally from their averaged volume timeseries data, it would be clearer to the readership if larger images are shown, so that they can evaluate (speculate) for themselves what subcellular structures were identified as units. Even better would be to include a planar image through a cross-section. As mentioned above, not everything determined for the cortex or hippocampus can be assumed to be true for the DCIC.

      The raw images and segmentations are publicly available for detailed inspections.

      (4) Supplemental figure 2, panel A: This panel requires further explanation, in particular thepanel on the right. I assume that to be a simple subtraction of sequential frames, but I'm thrown off by the "d(Grey)" colour bar. Also, if "grey" refers to the neutral colour, it is conventionally spelled "gray" in US-American English.

      Changed.

      (5) Supplemental figure 2, panel B: I'm personally curious why the animals exhibitedmovement just prior to a stimulus. Did they learn to anticipate the presentation of a sound after some habituation? Is that somehow a pre-emptive startle response? We observe that in our own experiments (but as we stochastically vary the inter-trial-intervals, the movement typically occurs directly after the stimulus). I don't suggest the authors dwell on this, but I find it an interesting observation.

      It is indeed interesting, but we can’t conclude much about it without comparing it to random inter-trial-intervals.

      (6) Supplemental figure 3: I personally find these data (decoding of all electrophysiologicaldata) of central relevance to the study, since it mirrors the analyses presented for its imaging data counterpart and encourage the authors to move it to the main text.

      Changed.

      (7) Page 12: Do the authors have any further analyses of spatial tuning functions? We allknow they can parametrically obscure (i.e., bi-lobed, non-monotonic, etc.), but having these parameters (even if just in a supplemental figure) would be informative for the spatial auditory community.

      We dedicated significant effort to attempt to parametrize and classify the azimuth response dependency functions from the recorded DCIC cells in an unbiased way. Nevertheless, given the observed response noise and the “obscure” properties of spatial tuning functions mentioned by the reviewer, we could only reach the general qualitative observation of having a more frequent contralateral selectivity.

      (8) Page 14 (end): Here, psychometric correspondence is referenced. Please add theLauer et al., (2011) reference, or, as I would, remove the statement entirely and save it for the discussion (where it is also mentioned and referenced).

      Changed.

      (9) Figure 5, Panels B and C: Why don't the authors report the Kruskal-Wallis tests (forincreasing number of units training the model), akin to e.g., Panel G of Figure 4? I think that would be interesting to see (e.g., if the number of required units to achieve statistical significance is the same).

      Within class randomization produced a moderate effect on decoder performance, achieving statistical significance at similar numbers of units, as seen in figure 5 panels B and C. We did not include these plots for the sake of not cluttering the figure with dense distributions and fuzzing the visualization of the differences between the distributions shown.

      (10) Figure 5, Panels B and C (histograms): I see a bit of skewedness in the distributions(even after randomisation). Where does this come from? This is just a small talking point.

      We believe this is potentially due to more than one distribution of pairwise correlations combined into one histogram (like in a Gaussian mixture model).

      (11) Page 21: Could the authors please specify that the Day and Delgutte (2013) study wasperformed on rabbits? Since rabbits have an entirely different spectral hearing range compared to mice, spatial coding principles could very well be different in those animals (and I'm fairly certain such a study has not yet been published for mice).

      Specified.

      (12) Page 22: I'd encourage the authors to remove the reference to Rayleigh's duplextheory, since mice hardly (if at all) use interaural time differences for azimuthal sound localisation, given their generally high-frequency hearing range.

      That sentence is meant to discuss beyond the mouse model an exciting outlook of our findings in light of previous reports, which is a hypothetical functional relationship between the tonotopy in DCIC and the spatial distribution of azimuth sensitive DCIC neurons. We have clarified this now in the text.

      (13) Page 23: I believe the conventional verb for gene delivery with viruses is still"transduce" (or "infect", but not "induce"). What was the specific "syringe" used for stereotactic injections? Also, why were mice housed separately after surgery? This question pertains to animal welfare.

      Changed. The syringe was a 10ml syringe to generate positive or negative pressure, coupled to the glass needle through a silicon tubing via a luer 3-way T valve. Single housing was chosen to avoid mice compromising each other’s implantations. Therefore this can be seen as a refinement of our method to maximize the chances of successful imaging per implanted mouse.

      (14) Page 25: Could the authors please indicate the refractory period violation time windowhere? I had to find it buried in the figure caption of Supplementary figure 1.

      Added.

      (15) Page 27: What version of MATLAB was used? This could be important for reproductionof the analyses, since The Mathworks is infamously known to add (or even more deplorably, modify) functions in particular versions (and not update older ones accordingly).

      Added.

      Reviewer #3 (Recommendations For The Authors):

      Overall I thought this was a nice manuscript and a very interesting dataset. Here are some suggestions and minor corrections:

      You may find this work of interest - 'A monotonic code for sound azimuth in primate inferior colliculus' 2003, Groh, Kelly & Underhill.

      We thank the reviewer for pointing out this extremely relevant reference, which we regrettably failed to cite. It is now included in the revised version of the manuscript.

      In your introduction, you state "our findings point to a functional role of DCIC in sound location coding". Though your results show that there is azimuthal information contained in a subset of DCIC units there's no evidence in the manuscript that shows a functional link between this representation and sound localization.

      This is now addressed in the answers to the public reviews.

      I found the variability in your DCIC population quite striking - especially during the intersound intervals. The entrainment of the population in the imaging datatset suggests some type of input activating the populations - maybe these are avenues for further probing the variability here:

      (1) I'm curious if you can extract eye movements from your video. Work from Jennifer Grohshows that some cells in the primate inferior colliculus are sensitive to different eye positions (Groh et. al., 2001). With recent work showing eye movements in rodents, it may explain some of the variance in the DCIC responses.

      This is now addressed in the answers to the public reviews.

      (2) I was also curious if the motor that moves the speaker made noise It could be possiblesome of the 'on going' activity could be some sound-evoked response.

      We were careful to set the stepper motor speed so that it produced low frequency noise, within a band mostly outside of the hearing range of mice (<4kHz). Nevertheless, we cannot fully rule out that a very quiet but perhaps very salient component of the motor noise could influence the activity during the inter trial periods. The motor was stationary and quiet for a period of at least one stimulus duration before and during stimulus presentation.  

      (3) Was the sound you present frozen or randomly generated on each trial? Could therebe some type of structure in the noise you presented that sometimes led cells to respond to a particular azimuth location but not others?

      The sound presented was frozen noise. This is now clarified in the methods section.

      It may be useful to quantify the number of your units that had refractory period violations.

      Our manual curation of sorted units was very stringent to avoid mixing differently tuned neurons. The single units analyzed had very infrequent refractory period violations, in less than ~5% of the spikes, considering a 2 ms refractory period.

      Was the video recording contralateral or ipsilateral to the recording?

      The side of the face ipsilateral to the imaged IC was recorded. Added to methods.

      I was struck by the snout and ear movements - in the example shown in Supplementary Figure 2B it appears as they are almost predicting sound onset. Was there any difference in ear movements in the habituated and non-habituated animals? Also, does the placement of the cranial window disturb any of the muscles used in ear movement?

      Mouse snout movements appear to be quite active perhaps reflecting arousal (Stringer et al., 2019). We cannot rule out that the cranial window implantation disturbed ear movement but while moving the mouse headfixed we observed what could be considered normal ear movements.

      Did you correlate time-point by time-point in the average population activity and movement or did you try different temporal labs/leads in case the effect of the movements was delayed in some way?

      Point by point due to 250ms time resolution of imaging.

      Are the video recordings only available during the imaging? It would be nice to see the same type of correlations in the neuropixel-acquired data as well.

      Only imaging. For neuropixels recordings, we were skeptical about face videography as we suspected that face movements were likely influenced by the acute nature of the preparation procedure. Our cranial window preparation in the other hand involved a recovery period of at least 4 weeks. Therefore we were inclined to perform videographical interrogation of face movements on these mice instead.

      If you left out more than 1 trial do you think this would help your overfitting issue (e.g. leaving out 20% of the data).

      Due to the relatively small number of trial repetitions collected, fitting the model with an even smaller training dataset is unlikely to help overfitting and will likely decrease decoder performance.

      It would be nice to see a confusion matrix - even though azimuthal error and cumulative distribution of error are a fine way to present the data - a confusion matrix would tell us which actual sounds the decoder is confusing. Just looking at errors could result in some funky things where you reduce the error generally but never actually estimate the correct location.

      We considered confusion matrices early on in our study but they were not easily interpretable or insightful, likely due to the relatively low discrimination ability of the mouse model with +/- 30º error after extensive training. Therefore, we reasoned that in passively listening mice (and likely trained mice too) with limited trial repetitions, an undersampled and diffuse confusion matrix is expected which is not an ideal means of visualizing and comparing decoding errors. Hence we relied on cumulative error distributions.

      Do your top-ranked units have stronger projections onto your 10-40 principal components?

      It would be interesting to know if the components are mostly taking into account those 30ish percent of the population that is dependent upon azimuth.

      Inspection of PC loadings across units ranked based on response dependency to stimulus azimuth does not show a consistent stronger projection of top ranked units onto the first 10-40 principal components (Author response image 3).

      Author response image 3.

      PC loading matrices for each recorded mouse. The units recorded in each mouse are ranked in descending order of response dependency to stimulus azimuth based on  the p value of the chi square test. Units above the red dotted line display a chi square p value < 0.05, units below this line have p values >= 0.05.

      How much overlap is there in the tuning of the top-ranked units?

      This is quite varying from mouse to mouse and imaging vs electrophysiology, which makes it hard to make a generalization since this might depend on the unique DCIC population sampled in each mouse.

      I'm not really sure I follow what the nS/N adds - it doesn't really measure tuning but it seems to be introduced to discuss/extract some measure of tuning.

      nS/N is used to quantify how noisy neurons are, independent of how sensitive their responses are to the stimulus azimuth.

      Is the noise correlation - observed to become more positive - for more contralateral stimuli a product of higher firing rates due to a more preferred stimulus presentation or a real effect in the data? Was there any relationship between distance and strength of observed noise correlation in the DCIC?

      We observed a consistent and homogeneous trend of pairwise noise correlation distributions either shifted or tailed towards more positive values across stimulus azimuths, for imaging and electrophysiology datasets (Author response image 3). The lower firing frequency observed in neuropixels recordings in response to ipsilateral azimuths could have affected the statistical power of the comparison between the pairwise noise correlation coefficient distribution to its randomized chance level, but the overall histogram shapes qualitatively support this consistent trend across azimuths (Author response image 4).

      Author response image 4.

      Distribution histograms for the pairwise correlation coefficients (Kendall tau) from pairs of simultaneously recorded top ranked units across mice (blue) compared to the chance level distribution obtained through randomization of the temporal structure of each unit’s activity to break correlations (purple). Vertical lines show the medians of these distributions. Imaging data comes from n = 12 mice and neuropixels data comes from n = 4 mice.

      Typos:

      'a population code consisting on the simultaneous" > should on be of?

      'half of the trails' > trails should be trials?

      'referncing the demuxed channels' > should it be demixed?

      Corrected.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Tateishi et al. report a Tn-seq-based analysis of genetic requirements for growth and fitness in 8 clinical strains of Mycobacterium intracellulare Mi), and compare the findings with a type strain ATCC13950. The study finds a core set of 131 genes that are essential in all nine strains, and therefore are reasonably argued as potential drug targets. Multiple other genes required for fitness in clinical isolates have been found to be important for hypoxic growth in the type strain.

      Strengths:

      The study has generated a large volume of Tn-seq datasets of multiple clinical strains of Mi from multiple growth conditions, including from mouse lungs. The dataset can serve as an important resource for future studies on Mi, which despite being clinically significant remains a relatively understudied species of mycobacteria.

      Thank you for reviewing our manuscript and finding the significance of our data.

      Weaknesses:

      The paper lacks clarity in data presentation and organization. For example, some of the key data on cfu counts of clinical Mi strains in a mouse model can be presented along with the Tn-seq dataset in Figure 6, the visualization of which can be improved with volcano plots. etc. Improvement in data visualization is perhaps necessary throughout the paper.

      Thank you for the comment on the data presentation of in vivo studies. We previously revealed the time-course data on CFUs, animal survival, and tissue pathology from the pure strains (Tateishi Y. BMC Microbiol. 2023; new Ref #22) . Based on these data, we assumed that we would be able to harvest sufficient number of colonies from mice infected with M.i.27 or M.i.198, and we performed in vivo TnSeq studies using these two strains. We have referred to our previous publication (new Ref #22) on the virulence of MAC-PD strains used in this study for mice in the revised manuscript (page12, line 212).

      The data of CFU counts were shown in new Supplementary Fig. 3b. In the manuscript text, we explained as follows (page 12, lines 212-216): “The time course of the changes in the bacterial burden showed a pattern similar to those of the wild-type strains M.i.198 and M.i.27, respectively, except that it was not possible to harvest sufficient colonies (as few as 104/mouse) in the few mice infected with the M.i.27 Tn mutant strain in week 8 and week 16 (page 12, lines 212-216; new Supplementary Fig, 3b, new Supplementary Table 8)”.

      Regarding the suggestion to include volcano plots, we appreciate the proposal but chose not to adopt this format, as the main aim of this study was to identify genes commonly required for in vitro and in vivo fitness across multiple M. intracellulare strains, rather than to highlight differential genetic requirements within a single strain. Volcano plots are useful for visualizing differential values and significance for a single dataset but are less suited for cross-strain comparisons of shared gene sets. Our approach is aligned with the methodology used by Cary et al. (PLoS Pathog. 2018; new Ref#8), who similarly focused on identifying conserved genetic requirements across M. tuberculosis genotypes without employing volcano plots.

      [References]

      Tateishi, Y. et al. Virulence of Mycobacterium intracellulare clinical strains in a mouse model of lung infection - role of neutrophilic inflammation in disease severity. BMC Microbiol 23, 94 (2023).

      Carey, A.F. et al. TnSeq of Mycobacterium tuberculosis clinical isolates reveals strain-specific antibiotic liabilities. PLoS Pathog 14, e1006939 (2018).

      The primary claim of the study that the clinical strains are better adapted for hypoxic growth is not well-supported by the data presented in Figure 7.

      Thank you for the comments on the difference of adaptation for hypoxic growth between ATCC13950 and clinical MAC-PD strains. To clarify, growth rates shown in Figure 7 were calculated at the inflection point (midpoint) of the growth curves, which were modeled using a four-parameter logistic (4P logistic) model. As described in the Discussion, we found the pattern of hypoxic adaptation characteristics of the clinical MAC-PD strains from the growth curve forms. Taking into consideration the impact of growing bacteria on the disease progression of MAC-PD, the slow growth with early entry to log phase implicates the continuous impact on the infected hosts during logarithmic bacterial growth, which may be involved in the persistent and steadily progressive illness of MAC-PD for years in humans.

      Unlike time-lapse imaging assay, the completely seamless sampling of culture for CFU assay is impossible. Nevertheless, we collected sufficient timepoints to allow reliable curve fitting with the 4P logistic model and thus consider our growth data to represent a valid approximation of continuous growth dynamics.

      Regarding the suggestion of mixed culture experiments, we agree that such studies could be informative. However, co-culture conditions introduce additional variables, including inter-strain competition or synergy, which can obscure the specific contributions of hypoxic adaptation in each strain. Therefore, we believe that the current approach using monoculture growth curves under defined oxygen conditions offers a clearer interpretation of strain-specific hypoxic responses.

      The title of the paper is misleading as the study doesn't provide any mechanistic aspect of hypoxic adaptation in Mi.

      Thank you for the comment on the article title. We admit that this paper does not directly reveal the mechanism of hypoxic adaptation in M. intracellulare strains but provides the data on the different pattern of hypoxic adaptation between M. intracellulare strains in relation to the difference of genetic requirements. Therefore, we revised the title as ”Functional genomics reveals strain-specific genetic requirements conferring hypoxic growth in Mycobacterium intracellulare

      Reviewer #2 (Public Review):

      Summary:

      In the study titled "Functional genomics reveals the mechanism of hypoxic adaptation in nontuberculous mycobacteria" by Tateishi et al., the authors have used TnSeq to identify the common essential and growth-defect-associated genes that represent the genomic diversity of clinical M. intracellulare strains in comparison to the reference type strain. By estimating the frequency of Tn insertion, the authors speculate that genes involved in gluconeogenesis, the type VII secretion system, and cysteine desulfurase are relatively critical in the clinical MAC-PD strains than in the type strain, both for the extracellular survival and in a mouse lung infection model.

      Based on their analysis, the authors proposed to identify the mechanism of hypoxic adaptation in nontuberculous mycobacteria (NTM) which offer promising drug targets in the strains causing clinical Mycobacterium avium-intracellulare complex pulmonary disease (MAC-PD).

      Strengths:

      A major strength of the manuscript is the performance of the exhaustive set of TnSeq experiments with multiple strains of M. intracellulare during in vitro growth and animal infection.

      Thank you for reviewing our manuscript and acknowledging the performance of producing datasets in this study.

      Weaknesses:

      (1) The study suffers from the authors' preconceived bias toward a small subset of genes involved in hypoxic pellicle formation in ATCC13950.

      Thank you for the comment regarding a potential bias toward a small subset of genes involved in hypoxic pellicle formation in ATCC13950. The rationale for the importance of hypoxic pellicle genes in clinical MAC-PD strains is that the profiles of genetic requirements in each bacterial strain reflect the adaptation to the environment in which each strain lives. When the strains are placed in a special environment, they can adapt to the situation by altering the profiles of genetic requirements, resulting in the remodeling of metabolic pathways.

      In this study, we found that several of these pellicle-associated genes also showed increased genetic requirement in the clinical MAC-PD strains, suggesting a possible overlap in hypoxic adaptation mechanisms. We did not insist that clinical MAC-PD strains showed an increase of genetic requirements in all genes of hypoxic pellicle formation. Except for the gene sets involved in hypoxic pellicle formation in ATCC13950, almost no global information has been revealed on the pathogenesis of nontuberculous mycobacterial disease, which differs from the case of tuberculosis. Along with this finding, we investigated the effect of gene silencing on bacterial growth and preferential hypoxic adaptation observed by growth kinetics in clinical MAC-PD strains compared to ATCC13950. At first glance, to focus on the gene sets of hypoxic pellicle formation seems to be “biased”, but we proceeded this research step by step based on our achievements. We consider these data provide valuable information on the pathogenesis of MAC-PD by clinical MAC-PD strains.

      We have added the description of the rationale for the importance of hypoxic pellicle genes in clinical MAC-PD strains in the revised manuscript (page 9, lines 148-155).

      (2) An important set of data with the ATCC13950 reference strain is missing in the mouse infection study. In the absence of this, it is difficult to establish whether the identified genes are critical for infection/intracellular proliferation, specifically in the clinical isolates that are relatively more adapted for hypoxia.

      Thank you for the comment on the necessity of setting ATCC13950 as a control strain of mouse TnSeq experiment. To set ATCC13950 as a control strain in mouse infection experiments would be ideal. However, we proved that ATCC13950 is eliminated within 4 weeks of infection (Tateishi Y. BMC Microbiol. 2023; new Ref#22). That means, it is impossible to perform in vivo TnSeq study due to the inability to harvest sufficient number of colonies.

      [Reference]

      Tateishi, Y. et al. Virulence of Mycobacterium intracellulare clinical strains in a mouse model of lung infection - role of neutrophilic inflammation in disease severity. BMC Microbiol 23, 94 (2023).

      (3) Statistical enrichment analysis of gene sets by GSEA wrongly involves genes required for hypoxic pellicle formation in ATCC13950 together with the gene sets found essential in the clinical MAC-PD strains, to claim that a significant % of genes belong to hypoxia-adaptation pathways. It could be factually incorrect because a majority of these might overlap with those found critical for the in vitro survival of MAC-PD strains (and may not be related to hypoxia).

      Thank you for the suggestion on the re-analysis of gene enrichment analysis of genes required for M.i.27 and M.i.198 in vivo infection, individually with genes involved in hypoxic pellicle formation in ATCC13950 and with those showing increased genetic requirements in clinical MAC-PD strains compared to ATCC13950.

      About 50% (92 and 94 out of 181 genes through Day 1 to Week 16 and Week4 to Week16 of infection) and 40% (70 and 79 genes out of 179 through Day 1 to Week 16 and Week 4 to Week 16 of infection) of genes required for hypoxic pellicle formation in ATCC13950 were listed as enriched in genes required for mouse lung infection in M.i.27 and M.i.198, respectively. In addition, about 42% (54 and 56 out of 128 genes through Day 1 to Week 16 and thorough Week 4 to Week 16 of infection) and 40% (79 and 68 out of 179 genes through Day 1 to Week 16 and through Week 4 to Week 16 of infection) of genes showing increased requirements in clinical MAC-PD strains compared to ATCC13950 were listed as enriched in genes required for mouse lung infection in M.i.27 and M.i.198, respectively.

      These data indicate that about 40-50% genes required for in vitro hypoxic pellicle formation are shared with the genes required for in vivo bacterial growth, and that about 40% strain-dependent/accessory essential genes are shared with the genes required for in vivo bacterial growth. Thus, the genes required for the growth of M.i.27 and M.i.198 in mouse lungs are enriched individually with those involved in hypoxic pellicle formation in ATCC13950, and with the gene sets found critical for in vitro growth. We have added the description of the reanalyzed data of GSEA in the manuscript (pages 16-17, lines 287-290). And the details of reanalyzed data of GSEA have been shown in Supplementary Fig. 5 and 6 as well as Supplementary Tables 15 and 16.

      (4) Validation of mouse infection experiments with individual mutants is missing.

      Thank you for the suggestion on the validation of the TnSeq hit genes on the in vivo survival. We acknowledge the importance of validating the TnSeq-hit genes by constructing knockout mutants. We have recently succeeded in constructing the vectors for making knockout strains of M. intracellulare (Tateishi. Microbiol Immunol. 2024). We will proceed to the infection experiment of knockout mutants by using our system for constructing them.

      [Reference]

      Tateishi Y. et al. Construction of knockout mutants in Mycobacterium intracellulare ATCC13950 strain using a thermosensitive plasmid containing negative selection marker rpsL. Microbiol Immunol 68, 339-347 (2024).

      (5) Phenotypes with TnSeq and CRISPRi-based KD exhibit poor correlation with misleading justifications by the authors.

      Thank you for the comment on the issue of inconsistent results between TnSeq and CRISPR-i based knockdown. We acknowledge that some inconsistencies were observed, particularly among strain-dependent/accessory essential or growth-defect-associated genes. By contrast, we found consistent data between TnSeq and CRISPR-i based knockdown results among universal essential genes such as glcB, inhA, gyrB and embB. Although the mechanism has not been fully proven yet, we consider that such inconsistent phenotypes with TnSeq and CRISPR- based knockdown may be related to the recently revealed bypass mechanism of gene essentiality which is characteristically observed in strain-specific/accessory essential genes (Rosconi F. Nat Micorbiol. 2022; new Ref#14). They suggested this bypass mechanism of gene essentiality in strain-dependent/accessory essential or growth-defect-associated genes from the ‘forced-evolution experiments’ of 36 clinical Streptococcus pneumoniae strains. For example, knockout mutants are successfully recovered from transformation experiments targeting strain-specific/accessory essential genes in TnSeq such as cytidine monophosphate kinase cmk, formate tetrahydrofolate ligase fhs and farnesyl-diphosphate synthase fpp. The bypassing of gene essentiality can be suggested by observing suppressor mutations and synthetic lethality in knockout strains. By contrast, universal essential genes fulfill the following three categories: i) high levels of conservation within and often across species, iii) limited genetic diversity, and iii) high and stable expression levels. Consequently, the universal essential genes are rigid, largely immutable key components to an organism’s survival. In the universal essential genes, the knockout recovery fails as shown by no colonies or only appearance of merodiploids. Taking into consideration such bypass mechanism of gene essentiality in strain-dependent/accessory essential genes, the lower effect of gene silencing of strain-dependent/accessory essential genes on bacterial growth may reflect pathway rewiring that helps the bacterial growth under suppression of the target gene expression.

      We have added the description of the possible reason for inconsistency between TnSeq and CRISPR-i results in the Result and Discussion in the revised manuscript (page 21, lines 367-376; pages 28-29, lines 489-519).

      [Reference]

      Rosconi, F. et al. A bacterial pan-genome makes gene essentiality strain-dependent and evolvable. Nat Microbiol 7, 1580–1592 (2022).

      In summary, this study is unable to provide mechanistic insights into why and how different MAC-PD mutant strains exhibit differential survival (in vitro and in animals) and adaptation to hypoxia. It remains to understand why the clinical strains show better adaptation to hypoxia and what is the impact of other stresses on their growth rates.

      Thank you for the comments on the issue of being unable to prove the mechanism of MAC-PD pathogenesis and adaptation to hypoxia. We admit that the original manuscript did not provide the apparent reason and mechanism of MAC-PD pathogenesis and adaptation to hypoxia. Following the comment, we have modified the manuscript tile as “Functional genomics reveals strain-specific genetic requirements conferring hypoxic growth in Mycobacterium intracellulare”.

      However, we revealed the diversity of genetic requirements among the genus M. intracellulare including the type strain ATCC13950 and clinical MAC-PD strains. We revealed the characteristics of genetic requirements in clinical MAC-PD strains as increased genetic requirements of gluconeogenesis, type VII secretion system and cysteine desulfurase, the former two of which are also required in hypoxic pellicle formation in ATCC13950. Along with this, we demonstrated the difference of growth behavior under hypoxia between clinical MAC-PD strains and ATCC13950. Overall, we consider that we could provide the basic information suggesting the involvement of difference of genetic requirements among strains in the pathogenesis of MAC-PD.

      Reviewer #3 (Public Review):

      Summary:

      The study by Tateishi et al. utilized TnSeq in nine genetically diverse M. intracellulare strains, identifying 131 common essential and growth-defect-associated genes across those strains, which could serve as potential drug targets. The authors also provided an overview of the differences in gene essentiality required for hypoxic growth between the reference strain and the clinical strains. Furthermore, they validated the universal and accessory/strain-dependent essential genes by knocking down their expression using CRISPRi technique. Overall, this study offers a comprehensive assessment of gene requirements in different clinical strains of M. intracellular.

      Thank you for reviewing our manuscript and finding the significance of our data.

      (1) The rationale for using ATCC13950 versus clinical strains needs to be clarified. The reference strain ATCC13950 was obtained from the abdominal lymph node of a patient around 10 years ago and is therefore considered a clinical strain that has undergone passages in vitro. How many mutations have accumulated during these in vitro passages? Are these mutations significant enough to cause the behavior of ATCC13950 to differ from other recently sampled clinical strains? From the phylogenetic tree, ATCC13950 is located between M018 and M.i.27. Did the authors observe a similarity in gene essentiality between ATCC13950 and its neighbor strains? What is the key feature that separates ATCC13950 from these clinical strains? The authors should provide a strong rationale for how to interpret the results of this comparison in a clinical or biological context.

      Thank you for the comments on the rationale for using ATCC13950 versus clinical strains and the key feature that separates ATCC13950 from clinical MAC-PD strains.

      ATCC13950 was isolated in 1949, (not around 10 years ago) from 34-month-old female abdominal lymph node (Cuttino. Am J Pathol 1949; new Ref#11). Of note, the clinical background of the patient infected with ATCC13950 is quite different from the patients with MAC-pulmonary disease (MAC-PD), the incidence rate of which has been increasing worldwide without predisposing immunological disorders. ATCC13950 has been regarded as a type strain of genus M. intracellulare historically. And ATCC13950 is the first M. intracellulare strain to be sequenced in 2012 (Kim. J Bacteriol 2012; new Ref#59).

      The rationale for using ATCC13950 versus clinical MAC-PD strains is to find the answer to the question whether the essential genes and genetic requirements are similar or different between clinical MAC-PD strains and historical type strain ATCC13950. So far, there are two reports on TnSeq that compare genetic requirements between clinical mycobacterial strains and the type strains, one of which is M. tuberculosis (Carey AF. PLoS Pathogens. 2018; new Ref#8) and the other is M. abscessus (Akusobi C. mBio. 2025; new Ref#9, published after submission of our manuscript). They reported the difference and diversity of genetic requirements between clinical strain and type strains such as M. tuberculosis H37Rv and M. abscessus ATCC19977. We have added the mention of these previous reports to explain the rationale for setting the type strain ATCC13950 as a referential control strain. (page 5, lines 83-87)

      The genetic and functional analysis of clinical MAC-PD strains has not been conducted for a long time. In 2021, we have revealed the genomic diversity between clinical MAC-PD and ATCC13950 by comparative genomic analysis (Tateishi BMC Microbiol. 2021; new Ref#5). Except for our TnSeq study of ATCC13950 (Tateishi Sci Rep 2020; new Ref#10), no functional analysis has been conducted in clinical M. intracellulare strains. On our research stream of clinical MAC-PD strains, we expected that we could reveal the functional genomic characteristics of clinical MAC-PD strains by setting ATCC13950 as a referential control strain for analyzing TnSeq data.

      It seems an interesting viewpoint to consider the relationship between accumulation of mutations by in vitro passages during prolonged periods from first isolation in ATCC13950, and the difference of phenotypes between ATCC13950 and recently sampled clinical MAC-PD strains. However, there are no time-course samples of ATCC13950 isolates available. Therefore, we can neither investigate how many mutations have accumulated in a time-course manner, nor evaluate how much the accumulated mutations influence the phenotype in ATCC13950. It can be expected that the accumulation of in vitro mutations may cause the behavior of ATCC13950 different from clinical MAC-PD strains. However, it is to be elucidated yet which kinds of factors contribute to the characteristics of ATCC13950 that differ from clinical MAC-PD strains specifically.

      It seems an interesting viewpoint to investigate the similarity of gene essentiality between genetical neighbor strains. However, we focused on the overview of the profiles of gene essentiality in clinical MAC-PD strains compared to ATCC13950. Thus, it was out of scope to elucidate the details of gene essentiality in each genetic phylogeny that clinical MAC-PD strains belong. The overview of phylogenetic trees should be referred to our previous publication on the comparative genomic analysis of 55 strains (Tateishi Y. BMC Microbiol. 2021; new Ref#5, new Supplementary Fig. 1), and we have shown Fig. 1 as the extracted phylogenetic tree of subject strains. To elucidate the details of gene essentiality in each genetic clade, it would be necessary to include a considerable number of strains that we used for comparative genomic analysis in 2021 (Tateishi Y. BMC Microbiol. 2021; new Ref#5). Furthermore, it would be necessary to set a referential control strain other than ATCC13950 for comparing gene essentiality. So far, it is not the highest priority for us to elucidate the similarity of gene essentiality between phylogenetic clades in detail, and such investigation will be planned as a future study.

      The key features that separate ATCC13950 and clinical MAC-PD strains have not been proved yet, in contrast to the case of M. tuberculosis such as mutations in the gene of the response regulator PhoPR in the type strain H37Rv vs most clinical strains. However, the features that separate ATCC13950 and clinical MAC-PD strains may not be explained by a single genetic factor but may be explained by complicated factors such as epigenetic and/or regulatory factors. For example, the reason for the weakened virulence of H37Ra compared to H37Rv has not been able to be explained by simple genetic differences (Brosch R. Infect Immun. 1999).

      In summary, we set the historical type strain ATCC13950 which is derived from infant abdominal lymphadenitis as a referential control strain for TnSeq analysis, because we intended to reveal the characteristics of clinical MAC-PD strains in terms of the gene essentiality and genetic requirements by comparing the clinical MAC-PD strains with the non-MAC-PD reference strain. We consider that the profiles of gene essentiality and genetic requirements specific to clinical MAC-PD strains confer the pathogenesis in an increasing number of MAC-PD patients worldwide without predisposing immunological disorders.

      [References]

      Cuttino, J.T. & Mc, C.A. Pure granulomatous nocardiosis, a new fungus disease distinguished by intracellular parasitism; a description of a new disease in man due to a hitherto undescribed organism, Nocardia intracellularis, n. sp., including a study of the biologic and pathogenic properties of this species. Am J Pathol 25, 1-47 (1949).

      Kim, B.J. et al. Complete genome sequence of Mycobacterium intracellulare clinical strain MOTT-64, belonging to the INT1 genotype. J Bacteriol 194, 3268 (2012).

      Carey, A.F. et al. TnSeq of Mycobacterium tuberculosis clinical isolates reveals strain-specific antibiotic liabilities. PLoS Pathog 14, e1006939 (2018).

      Akusobi. C. et al.. Transposon-sequencing across multiple Mycobacterium abscessus isolates reveals significant functional genomic diversity among strains. mBio 6, e0337624 (2025).

      Tateishi, Y. et al. Comparative genomic analysis of Mycobacterium intracellulare: implications for clinical taxonomic classification in pulmonary Mycobacterium avium-intracellulare complex disease. BMC Microbiol 21, 103 (2021).

      Tateishi, Y. et al. Genome-wide identification of essential genes in Mycobacterium intracellulare by transposon sequencing - Implication for metabolic remodeling. Sci Rep 10, 5449 (2020)

      Brosch R. et al. Genomic analysis reveals variation between Mycobacterium tuberculosis H37Rv and the attenuated M. tuberculosis H37Ra strain. Infect Immun. 67, 5768-74 (1999).

      (2) Regarding the 'nine representative strains of M. intracellulare with diverse genotypes in this study,' how were these nine strains selected? To what extent do they represent the genetic diversity of the M. intracellulare population? A phylogenetic tree illustrating the global genetic diversity of the M. intracellulare population, with these strains marked on it, would be important to demonstrate their genetic representativeness.

      Thank you for the comments on the selection of 9 subject strains. We selected the 9 strains based on the phylogenetic tree we published in 2021 (BMC Microbiol 2021; new Ref#5). We have shown the global phylogenetic tree of the M. intracellulare population in new supplementary Fig. 1. We have selected 4 or 5 strains from the major two groups (typical M. intracellulare group and M. paraintracellulare-M. indicus pranii group) for this TnSeq study, respectively.

      [Reference]

      Tateishi, Y. et al. Comparative genomic analysis of Mycobacterium intracellulare: implications for clinical taxonomic classification in pulmonary Mycobacterium avium-intracellulare complex disease. BMC Microbiol 21, 103 (2021).

      (3) The authors observed a considerable amount of differential gene requirements in clinical strains. However, the genetic underpinning underlying the differential requirement of genes in clinical strains was not investigated or discussed. Because M. intracellulare has a huge number of accessory genes, the authors should at least check whether the differential requirement could be explained by the existence of a second copy of functional analogous genes or duplications.

      Thank you for the comments on the effect of gene duplication on the change of genetic requirements between strains. Following the comments, we conducted blast search for the 162 genes showing increased Tn insertion reads in each subject strain. We found that M019 has duplicate genes of OCU_RS44705 coding adenosylhomocysteinase (LOCUS_42940: ahcY_1, LOCUS_21000: ahcY_2). However, there were no duplicate genes found in the remaining 161 genes showing increased Tn insertion reads.

      From these results, we consider that gene duplication has minor effects on the change of genetic requirements between strains. Rather, sequence differences and accessory genes may play a key role in determining the difference of genetic requirements.

      We have added a description of the above-mentioned result in the Result section (pages11-12, lines 191-199).

      (4) Growth in aerobic and hypoxic conditions: The authors concluded that clinical strains are better adapted to hypoxia, as reflected by their earlier entry into the log phase. They presented the 'Time at midpoint' and 'Growth rate at midpoint.' However, after reviewing the growth curves, I noticed that ATCC13950 had a longer lag phase compared to other strains under hypoxic conditions, and its phylogenetic neighbor M018 also had a longer lag phase. Hence, I do not believe a conclusion can be drawn that clinical strains are better adapted to hypoxia, as this behavior could be specific to a particular clade. It's also possible that the ATCC13950 strain has adapted to aerobic growth. I would suggest that the authors include growth curves in the main figures. The difference in 'Time at midpoint' could be attributed to several factors, and visualizing the growth curves would provide additional context and clarity.

      Thank you for the comments on the possibility of genotypes as a determinant of growth pattern in M. intracelulare. Following the comments, we performed aerobic and hypoxic growth assay in the two strains (M005 and M016) that neighbor ATCC13950.

      Author response image 1.

      The phylogenetic relationship between M005, M016 and ATCC13950. The former two strains are squared in blue.

      M005 reached midpoint later than ATCC13950 both in aerobic and hypoxic conditions. By contrast, M016 reached midpoint three quarters earlier than ATCC13950 under hypoxic conditions. The growth rate was not significantly different between M005, M016 or ATCC13950 under either aerobic or hypoxic conditions, although P-value of M005 vs ATCC13950 was 0.0512 under aerobic conditions on Steel’s multiple comparisons test.

      From the data of growth pattern in M005 and M016, we suggest that in addition to gene essentiality, genotypes may have some impact on the bacterial growth pattern under hypoxia; however, since there was a significant difference in the timing of hypoxic adaptation among ATCC13950 and its neighbor strains, bacterial growth pattern under hypoxia is considered to be determined by multiple factors such as genetic requirements and unproven regulatory systems. Taking into consideration that there are lots of genetically diverse strains other than ATCC13950 clade, many clinical MAC-PD strains are possibly better adapted to hypoxia.

      Responding to the reviewer’s recommendation, we have added the description of the above-mentioned result in the revised manuscript (page 18, lines 313-322). And we have shown the data of growth curves of the original 9 subject strains in the new Fig 7. And we have added the data of the growth curves of M005 and M016 in new Supplementary Fig 7. Additionally, we have corrected the label of y-axis in new Fig. 7a and new Supplementary Fig. 7a and added the description as “Data are represented as CFUs in 4 μl sample at each timepoint.” in the Figure legends. (page 58, lines 1027-1028 and Supplementary Fig. 7a legend)

      (5) Lack of statistical statement: The authors emphasized the role of pellicle-formation-associated genes in strain-dependent essential and accessory essential genes. Additionally, the authors observed that 10% of the genes required for mouse infection are also required for hypoxic pellicle formation. However, these are merely descriptive statements. There is no enrichment analysis to justify whether pellicle-formation-associated genes are significantly enriched in these groups.

      Thank you for the comments on the enrichment of pellicle-formation associated genes in strain-dependent essential and accessory essential genes. We performed GSEA and found that 9.1% (16 out of 175) genes were hit as core enrichment. Of them, 4 genes were hit commonly as genes showing increased genetic requirements analyzed by resampling plus HMM analyses including genes of phosphoenolpyruvate carboxykinase pckA (OCU_RS48660), type VII secretion-associated serine protease mycP5 (OCU_RS38275), type VII secretion protein eccC5 (OCU_RS38345) and glycine cleavage system amino-methyltransferase gcvT (OCU_RS35955).

      Author response image 2.

      We have added the description of GSEA result in the revised manuscript (page 8, lines 137-144; Supplementary Fig. 2; Supplementary Table 5).

      Reviewer #1 (Recommendations For The Authors):

      Tn-seq and hypoxia adaption in clinical isolates of M. intracellulare (Mi): The authors claim that clinical strains are better adapted to hypoxia because their genetic requirements for optimum fitness overlap with genetic requirements for fitness of the type strain under hypoxia. This is a reasonable hypothesis, but it has not been well-supported by the data presented in Figure 7. The growth rates (Figure 7b) of most of the clinical strains under hypoxia appear to be less than the type strain, although they all seem to grow better than the type strain under normoxia. Perhaps a continuous growth curve of each strain, both as pure and mixed cultures under these conditions will provide a clearer picture.

      Thank you for the comments on the difference of adaptation of hypoxic growth between ATCC13950 and MAC-PD strains. To clarify, growth rates shown in Figure 7 were calculated at the inflection point (midpoint) of the growth curves, which were modeled using a four-parameter logistic (4P logistic) model. As described in the Discussion, we found the pattern of hypoxic adaptation characteristics of the clinical MAC-PD strains from the growth curve forms. Taking into consideration the impact of growing bacteria on the disease progression of MAC-PD, the slow growth with early entry to log phase implicates the continuous impact on the infected hosts during logarithmic bacterial growth, which may be involved in the persistent and steadily progressive illness of MAC-PD for years in humans.

      Unlike time-lapse imaging assay, the completely seamless sampling of cultures for CFU assay is impossible. Nevertheless, we collected sufficient timepoints to allow reliable curve fitting with the 4P logistic model, and thus consider our growth data to represent a valid approximation of continuous growth dynamics.

      Regarding the suggestion of mixed culture experiments, we agree that such studies could be informative. However, co-culture conditions introduce additional variables, including inter-strain competition or synergy, which can obscure the specific contributions of hypoxic adaptation in each strain. Therefore, we believe that the current approach using monoculture growth curves under defined oxygen conditions offers a clearer interpretation of strain-specific hypoxic responses.

      In vivo studies: It is unclear how virulent the two clinical strains, Mi27 and Mi198 are in the mouse model. The CFU data in Figure S1b reports the bacterial burden of the Tn libraries of the two strains, of which the overall population of Mi27 library seems to be declining. Without any information on the CFU, animal survival, and tissue pathology from the pure strains, data from the library will have limited implications.

      Thank you for the comments on the data presentation of in vivo studies. We previously revealed the time-course data on CFUs, animal survival, and tissue pathology from the pure strains (Tateishi Y. BMC Microbiol. 2023; new Ref#22). Based on these data, we assumed that we would be able to harvest sufficient number of colonies from mice infected with M.i.27 or M.i.198, and we performed in vivo TnSeq studies using these two strains. We have referred to our previous publication on the virulence of MAC-PD pure strains used in this study for mice in the revised manuscript (page 12, line 212; new Ref #22).

      The data of CFU counts were shown in new Supplementary Figure 3b. In the manuscript text, we explained as follows (page 12, lines 212-216): “The time course of the changes in the bacterial burden showed a pattern similar to those of the wild-type strains M.i.198 and M.i.27, respectively (Tateishi Y. BMC Microbiol. 2023; new Ref#22), except that it was not possible to harvest sufficient colonies (as few as 104/mouse) in the few mice infected with the M.i.27 Tn mutant strain in week 8 and week 16 (new Supplementary Fig, 3b, new Supplementary Table 8)”. The decline of overall population of M.i.27 Tn mutant library strains in the infected lungs can be explained by the lower virulence of M.i.27 pure strain that shows intermediate virulence phenotype than M.i.198 that shows high virulence phenotype.

      [References]

      Tateishi, Y. et al. Virulence of Mycobacterium intracellulare clinical strains in a mouse model of lung infection - role of neutrophilic inflammation in disease severity. BMC Microbiol 23, 94 (2023).

      Data presentation: The manuscript suffers from a lack of clarity in data visualization and presentation, especially the Tn-Seq datasets. Panels describe the experimental workflow with a densely-worded paragraph, making it difficult to navigate through the major findings.

      Thank you for the comments on the issue of Fig. 1b. Following the suggestion, we have modified the new Fig. 1b entitled “Strategy of the procedure of TnSeq analyses”.

      Language: The paper should be extensively revised for language. Often the authors have mixed-up the terms like 'core' and 'accessory' 'genes' in lines 116-119 with 'core and accessory genomes' in Figure 2c, which is not even mentioned in the paper. It is further unclear how they identified 3153 and 5824 core and accessory genes, respectively, from 55 different strains of Mi. Line 251: ..."involved by confer..." needs revision. The terms "increased gene essentiality" and 'growth-defected associated genes" are very confusing. The essentiality of a gene is either absolute or conditional but is not quantitative. Similarly, 'growth-defect associated' can be replaced with a better phrase that alludes to fitness loss in the clone. Additional typos were found throughout the paper that need to be fixed.

      Thank you for the comments on the issue of scientific words including “'core and accessory genomes” and “gene essentiality” used in this study.

      Based on Rosconi’s paper (Panel C of Fig. 1 in Nat Microbiol. 2022; new Ref#14), we used the phrases “accessory genome and core genome” as a meaning of a whole set of genes belonging to accessory and core genes. To avoid the confusion and keep consistency, we replaced the term “genomes” to “genes” in the revised manuscript.

      In our previous comparative genomic analysis, we identified 3153 and 5824 core and accessory genes, respectively, from 55 different strains of M. intracellulare (Tateishi Y. BMC Microbiol. 2021; new Ref #5). To perform pangenomic analysis, we used the software Bacterial Pan-Genome Analysis tool (BGAP) (Narendrakumar NM. Sci Rep 2016).

      We admit that gene essentiality is a qualitative but not a quantitative trait. We have corrected the term "increased gene essentiality" as "increased genetic requirements" throughout the manuscript.

      We have used the term “growth-defect (GD)” based on the classification of gene essentiality calculated by the Hidden Markov Model (HMM) complied by TRANSIT software (DeJesus. PLoS Comput Biol. 2015; new Ref#12). The HMM classifies genes as essential (ES), GD, non-essential (NE), growth-advantage (GA). GD means difficulties of growth (growth deficiency) in aerobic conditions in vitro, because Tn insertions are less frequent. The suggested phrases “fitness loss” or “less fit” may include the meaning of the comparison of two different conditions such as culture conditions exposed to a single bacterial strain. Since the HMM analysis is performed in data of a single strain in a specific bacterial condition, we consider that the phrase including “fitness” is somewhat unsuitable for describing the classification of gene essentiality. Thus, it is difficult for us to rephrase GD to the word that implies fitness levels between two conditions in a single bacterial strain.

      [References]

      Rosconi, F. et al. A bacterial pan-genome makes gene essentiality strain-dependent and evolvable. Nat Microbiol 7, 1580–1592 (2022).

      Tateishi, Y. et al. Comparative genomic analysis of Mycobacterium intracellulare: implications for clinical taxonomic classification in pulmonary Mycobacterium avium-intracellulare complex disease. BMC Microbiol 21, 103 (2021).

      Narendrakumar NM et al. BPGA- an ultra-fast pan-genome analysis pipeline. Sci Rep 2016 6, 24373 (2016).

      DeJesus, M.A. et al. TRANSIT--A Software Tool for Himar1 TnSeq Analysis. PLoS Comput Biol 11, e1004401 (2015).

      Reviewer #2 (Recommendations For The Authors):

      Major Comments:

      (1) Result 1 (Page 6-7): Common essential and growth-defect-associated genes representing the genomic diversity of M. intracellulare strains.

      (1a) From Table S1, it is observed that the numbers of Tn-inserted TA sites significantly vary (p >0.05) among biological replicates for each strain when compared with the reference strain ATCC13950. the authors should provide an explanation of how they overcame this variation in their analysis.

      Thank you for the comment on the issue of a variable number of Tn-inserted TA sites among biological replicates for each strain of MAC-PD.

      On TRANSIT software, we set the replicate option as Sum to combine read counts. And we used Beta-Geometric correction (BGC) to normalize the datasets to fit an “ideal” geometric distribution with a variable probability parameter ρ.

      Following the comment, we have added the description on which option we used for handling the replicate data and normalization (page 36, lines 640-643).

      (1b) Importantly, saturation in most of the strains is only ~50-60%. In such a case, there will be a high probability that Tn will not hit a nonessential region due to chance instead of selection (See DeJasus et al., mBio, 2017). It has been observed that the sequence pattern (GC)GNTANC(GC) is strongly associated with non-permissiveness. As shown earlier, the authors need to carefully look for the potential non-permissive sites before concluding the fate of a gene. Also, they should acknowledge the potential limitations of their approach due to the suboptimal level of saturation.

      Thank you for the comment on the saturation of Tn mutant libraries. Our method of comparison of genetic requirements between strains are the same as a previous report that used duplicate Tn mutant libraries of clinical Mtb strains of different genotypes and triplicate Tn mutant libraries of H37Rv for identifying increased genetic requirements of clinical Mtb strains (Carey. PLoS Pathog 2018; new Ref#8). Our method is also based on the coauthor’s TnSeq study on H37Rv (Minato Y. mSystems 2019; new Ref#61). Moreover, by combining replicates, the saturation of our Tn mutant libraries became 62-79% as follows: ATCC13950: 67.6%, M001: 72.9%, M003: 63.0%, M018: 62.4%, M019: 74.5%, M.i.27: 76.6%, M.i.198: 68.0%, MOTT64: 77.6%, M021: 79.9%. That is, we calculated gene essentiality from the Tn mutant libraries with 62-79% saturation in each strain. The levels of saturation of transposon libraries in our study is similar to the very recent TnSeq anlaysis by Akusobi where 52-80% saturation libraries (so-called “high-density” transposon libraries) are used for HMM and resampling analyses (Supplemental Methods Table 1[merged saturation] in Akusobi. mBio. 2025; new Ref#9). The saturation of Tn insertion in individual replicates of our libraries is also comparable to that reported by DeJesus (Table S1 in mBio 2017; new Ref#57). Thus, we consider that our TnSeq method of identifying essential genes and detecting the difference of genetic requirements between clinical MAC-PD strains and ATCC13950 is acceptable.

      As the Reviewer indicates, there is non-permissive sequence pattern in mariner transposon mutagenesis. Using more than 10 replicates of Tn mutant libraries is quite an accurate method for detecting essential genes in nonstructural small genes such as small regulatory RNAs. However, as DeJesus shows, the number essential genes identified by TnSeq are comparable in large genes possessing more than 10 TA sites between 2 and 14 TnSeq datasets, most of which seem to be structural genes (Supplementary Fig 2 in mBio 2017; new Ref#57). Thus, we do not consider that we made a serious mistake for the classification of essentiality in most of the structural genes that encode proteins. With respect to the coverage of non-permissive sites, our TnSeq method might need to be improved if it is intended to classify the gene essentiality quite accurately on the small genes including small regulatory RNAs.

      We investigated the non-permissive TA sites in ATCC13950. There are 4136 (6.43% of total ORFs) nonpermissive TA sites in ATCC13950, which is less than in H37Rv (9% of total ORFs) (DeJesus MA. mBio 20171; new Ref#57) and in M. abscessus ATCC19977 (8.1% of total ORFs)(Rifat D. mBio. 2021; new Ref#58). As for larger ORFs (TA sites > = 10), there are nonpermissive TA sites in 89 genes (ORFs) of common “essential (ES)” or “growth-defect-associated (GD)” (4.82% of a total of 1844 larger ORFs in ATCC13950). As for small ORFs (2-9 TA sites), there are nonpermissive TA sites in 41 genes (ORFs) of common ES or GD (1.35% of a total of 3021 smaller ORFs in ATCC13950).

      We appreciate the idea of concluding the fate of gene essentiality by the presence/absence of non-permissive TA sites. However, we cannot conclude the fate of gene essentiality classification only by the presence/absence of potential non-permissive sites. Because, strictly to say, it is impossible to conclude the scientific truth of gene essentiality without functional analysis using gene manipulation. In accurate, TnSeq can “predict” the gene essentiality but cannot perfectly guarantee the functional significance. However, in the current situation, most of the recent TnSeq studies have been published only by the TnSeq analysis without functional analysis that uses gene manipulation strains of all targets they identified. Taking such limitations of TnSeq including non-permissive sites into consideration, we consider that the essentiality of the detected genes should be determined in further studies, mainly including biological experiments such as functional studies using gene manipulation strains.

      We have added the above-mentioned contents in the revised manuscript (pages 32-33, lines 559-580).

      [References]

      Carey, A.F. et al. TnSeq of Mycobacterium tuberculosis clinical isolates reveals strain-specific antibiotic liabilities. PLoS Pathog 14, e1006939 (2018).

      Minato, Y., et al. Genomewide assessment of Mycobacterium tuberculosis conditionally essential metabolic pathways. mSystems. 4, e00070-192019 (2019).

      Akusobi. C. et al. Transposon-sequencing across multiple Mycobacterium abscessus isolates reveals significant functional genomic diversity among strains. mBio 6, e0337624 (2025).

      DeJesus, M.A. et al. Comprehensive essentiality analysis of the Mycobacterium tuberculosis genome via saturating transposon mutagenesis. mBio 8, e02133-16 (2017).

      Rifat, D., Chen L., Kreiswirth, B.N. & Nuermberger, E.L.. Genome-wide essentiality analysis of Mycobacterium abscessus by saturated transposon mutagenesis and deep sequencing. mBio 12, e0104921 (2021).

      (1c) Line 100: Authors report a total of 131 genes identified as essential or growth-defect-associated with the HMM analysis across all M. intracellulare strains. It should be explained in more detail how gene essentiality was determined (see above comment in (1b)). Furthermore, in Table S3 authors should mention the essential and growth defective trait of each of the 131 genes.

      Thank you for the comment on how to classify the 131 genes as essential or growth-defect-associated with the HMM analysis across all M. intracellulare strains. As replied in (1b), the average saturation of Tn insertion of our libraries became 62-79% when combining duplicate or triplicate data in each strain. The levels of saturation of transposon libraries in our study is similar to the very recent TnSeq analysis by Akusobi where 52-80% saturation libraries (so-called “high-density” transposon libraries) were used for HMM and resampling analyses, and most of triplicate libraries ranges 70-79% saturation (Supplemental Methods Table 1[merged saturation] in Akusobi. mBio. 2025; new Ref#9). The saturation of Tn insertion in individual replicates of our libraries is also comparable to those with DeJesus (Table S1 in mBio 2017; new Ref#57). Thus, we consider that our TnSeq libraries are acceptable for identifying essential genes and growth-defect-associated genes by the HMM method.

      We used the HMM method as reported by DeJesus (DeJesus. PLoS Comput Biol. 2015; new Ref#12). HMM method can categorize the gene essentiality throughout the genome including “Essential”, “Growth-defect”, “Non-essential” and “Growth-advantage”. “Essential” genes are defined as no insertions in all or most of their TA sites. “Non-essential” genes are defined as regions that have usual read counts. “Growth-defect” genes are defined as regions that have unusually low read counts. “Growth-advantage” genes are defined as regions that have unusually high low read counts.

      Following the previous report (Carey AF. PLos Pathog 2018; new Ref#8), the annotation for the clinical MAC-PD strains was adapted from that of ATCC13950 by adjusting the START and END coordinates of each ORF in the clinical MAC-PD strains according to their alignment with the corresponding ORFs of ATCC13950. By using an adjusted annotation table, gene essentiality was classified by the HMM analysis.

      We have added the explanation of how we identified essential and growth-defect-associated genes in the Methods (pages 35-36, lines 620-632). And following the comment, we have added the data of classification of gene essentiality in the 131 genes in the new Supplementary Table 3 in the revised manuscript.

      [Reference]

      DeJesus, M.A. et al. TRANSIT--A Software Tool for Himar1 TnSeq Analysis. PLoS Comput Biol 11, e1004401 (2015).

      Carey, A.F. et al. TnSeq of Mycobacterium tuberculosis clinical isolates reveals strain-specific antibiotic liabilities. PLoS Pathog 14, e1006939 (2018).

      Akusobi. C. et al. Transposon-sequencing across multiple Mycobacterium abscessus isolates reveals significant functional genomic diversity among strains. mBio 6, e0337624 (2025).

      DeJesus, M.A. et al. Comprehensive essentiality analysis of the Mycobacterium tuberculosis genome via saturating transposon mutagenesis. mBio 8, e02133-16 (2017).

      (1d) In Table S4, the authors show strain-specific putative essential genes from the core and accessory gene sets. For the sake of clarity, it is important to have the name of all the strains against each gene in which it is predicted essential or growth defective.

      Thank you for the comment on the hit strains on the genes classified as strain-specific and accessory putative essential of growth-defect associated. Following the comment, we have added the data of hit strains in the new Supplementary Table 4 in the revised manuscript.

      (1e) Lines 123-126: It is not clear what is the relevance of highlighting genes involved in hypoxic pellicle formation in ATCC13950. These appear to be randomly distributed across different clinical isolates and is not clear whether they correlate with differential susceptibility of the reference strain and clinical isolates to hypoxia.

      Thank you for the comment on the relevance of highlighting genes involved in hypoxic pellicle formation in ATCC13950. The rationale for the importance of hypoxic pellicle genes in clinical MAC-PD strains is that the profiles of genetic requirements in each bacterial strain reflect the adaptation to the environment in which each strain lives. When the strains are placed in a special environment, they can adapt to the situation by altering the profiles of genetic requirements, resulting in the remodeling of metabolic pathways. We indeed found that the genetic requirements of several hypoxic pellicle genes were increased in clinical MAC-PD strains in vitro situations. These data suggest the hypoxic pellicle genes become more important in clinical MAC-PD strains for in vitro growth than in ATCC13950.

      Moreover, hypoxia is known to be one of the characteristic conditions in vivo including clinical lesions (McKeown. Br Br J Radiol. 2014). We consider it reasonable to expect that the strains derived from MAC-PD patients without predisposing immunological disorders may adapt under hypoxic conditions for maintaining bacterial survival. Therefore, we highlighted the genes involved in hypoxic pellicle formation in ATCC13950.

      We have added the description of the rationale for the importance of hypoxic pellicle genes in clinical MAC-PD strains in the revised manuscript (page 9, lines 148-155).<br /> [Reference]

      McKeown, S.R., et al. Defining normoxia, physoxia and hypoxia in tumours-implications for treatment response. Br Br J Radiol 87,: 20130676 (2014).

      (2) Result 2 (pages 8-10): Genes with increased gene essentiality in clinical MAC-PD strains are also required for hypoxic pellicle formation in the type strain.

      (2a) As reported by authors (lines 123-126), only a small fraction of genes showing essentiality in clinical MAC-PD strains are required for hypoxic pellicle formation in the reference strain, which might be due to random distribution. Authors should avoid making such a generalised statement that reflects the association of the entire essential gene pool in clinical MAC-PD strains with hypoxic pellicle formation.

      Thank you for the comment on the issue of a small fraction of genes showing increased genetic requirements in clinical MAC-PD strains that is shared with genes required for hypoxic pellicle formation in the type strain ATCC13950. We admit that the section title may mislead that the genes required for hypoxic pellicle formation confer the entire essential gene pool of clinical MAC-PD strains. Following the comment, we have revised the section title as “Partial overlap of the genes showing increased genetic requirements in clinical MAC-PD strains with those required for hypoxic pellicle formation in ATCC13950” (page 9, lines 146-147).

      We consider that it cannot be explained by a mere coincidence that we obtained the data of partial overlap of genes showing essentiality in clinical MAC-PD strains with genes required for hypoxic pellicle formation in ATCC13950, because we demonstrated the supporting data such as the pattern of genetic requirements suggesting gluconeogenic metabolic shift (Fig. 5) and the different pattern of hypoxic growth curves between clinical MAC-PD strains and ATCC13950 (Fig. 7).

      (2b) I fail to understand how the number of Tn insertions determines "more" or "less" essentiality of a gene particularly with 50-60% saturation. To my understanding, essentiality is a qualitative trait. Either a gene will be essential (based on no Tn insertion despite having the permissive sites), critical (poor representation of Tn insertions at the permissive sites due to growth defect of the strain in the pool), non-essential (expected frequency of insertion) or growth-advantageous (higher representation of Tn insertions at the permissive sites due to growth advantage of the strain in the pool). Hence, authors should avoid quantifying the essentiality of a gene.

      Thank you for the comments on the trait of gene essentiality. We realize that essentiality is a qualitative trait, not a quantitative trait. Taking into consideration the number of Tn insertions determines "more" or "less" requirements of a gene, we have corrected the manuscript by using the phrase “genetic requirements” instead of “gene essentiality”.

      As mentioned earlier, our method of comparison of genetic requirements between strains are the same as a previous report that used duplicate Tn mutant libraries of clinical Mtb strains of different genotypes and triplicate Tn mutant libraries of H37Rv for identifying increased genetic requirements of clinical Mtb strains (Carey AF. PLoS Pathog 2018; new Ref#8). Moreover, as described in rebuttal (1b), the saturation of our Tn mutant libraries by combining replicates are 62-79% as follows: ATCC13950: 67.6%, M001: 72.9%, M003: 63.0%, M018: 62.4%, M019: 74.5%, M.i.27: 76.6%, M.i.198: 68.0%, MOTT64: 77.6%, M021: 79.9%. That is, we calculated gene essentiality from the Tn mutant libraries with 62-79% saturation in each strain. The levels of saturation of transposon libraries in our study is similar to the recent TnSeq analysis by Akusobi where 52-80% saturation libraries (“high-density” transposon libraries) were used for HMM and resampling analyses (Supplemental Methods Table 1[merged saturation] in Akusobi C. mBio. 2025; new Ref#9).

      Thus, we consider that our data of the difference of genetic requirements between clinical MAC-PD strains and ATCC13950 are acceptable.

      [Reference]

      Akusobi. C. et al. Transposon-sequencing across multiple Mycobacterium abscessus isolates reveals significant functional genomic diversity among strains. mBio 6, e0337624 (2025).

      (2c) From Figures 3-4, it seems the authors intend to highlight the insertion frequencies of certain genes in the clinical isolates compared to those in the reference strain to conclude whether a gene has become more critical and its disruption results in the growth defective phenotype (poor representation) in the clinical isolates, or a critical/essential gene has become dispensable in these strains.

      Based on these arguments, I suggest that the authors modify the title of the result such as "Tn insertion reveals differential requirement of genes for in vitro growth of clinical MAC-PD strains" or "Identification of genes differentially required for in vitro growth of clinical MAC-PD strains" as this is precisely the information we gain from this section of the study. Also, it is suggested to re-draft the rationale of this section as only 4 genes associated with hypoxic pellicle formation, were found to exhibit reduced insertion frequencies in the clinical isolates out of total of 283 genes. Hypoxia-related genes can be highlighted in the next section (see below).

      Thank you for the suggestion to modify the section title and to re-draft the rationale of the section. Following the comment, we modified the section title as “Partial overlap of the genes showing increased genetic requirements in clinical MAC-PD strains with those required for hypoxic pellicle formation in ATCC13950 (page 9, lines 146-147)

      Following the suggestion, we have revised the rationale of this section as follows: “The sharing of strain-dependent and accessory essential or growth-defect-associated genes with genes required for hypoxic pellicle formation in ATCC13950 prompts us to consider that the profiles of gene essentiality in clinical MAC-PD strains may be associated with the genes required for hypoxic pellicle formation in ATCC13950.” (page 9, lines 151-155)

      The reviewer points out that only 4 genes associated with hypoxic pellicle formation were found to exhibit reduced insertion frequencies in the clinical isolates out of total of 283 genes. However, to discuss how much proportion of the genes were detected to be increasingly required in clinical MAC-PD strains compared to ATCC13950, we should focus on the 121 genes showing increased requirements in clinical MAC-PD strains compared to ATCC13950, excluding the 162 genes indispensable for clinical MAC-PD strains. Thus, we described that 4 genes associated with hypoxic pellicle formation, were found to exhibit reduced insertion frequencies in the clinical isolates out of the 121 genes having significantly fewer Tn insertions than ATCC13950 in the manuscript (Fig. 3).

      (3) Result 3 (Page 10-14): Requirement of genes with increased gene essentiality in the clinical MAC-PD strains for mouse lung infection.

      (3a) The title should be modified to "Identification of genes in the clinical MAC-PD strains required for mouse lung infection".

      Following the comment, we have modified the section title as "Identification of genes in the clinical MAC-PD strains required for mouse lung infection". (page 12, lines 201-202).

      (3b) Further, the rationale of this experiment needs to be modified. As mentioned above, up until now the impact of hypoxic pellicle formation genes in the growth of MAC-PD strains remains unconvincing. The rationale of mouse infection experiments could be straightforward- "to identify genes critical for animal infection of the clinical isolates".

      Thank you for the comment on the rationale of the in vivo TnSeq experiment. Following the comment, we have revised the rationale as “The impact of hypoxia on mycobacteria under various ecological circumstances implies that the genes required for pathogenesis of MAC-PD may be in some degrees, overlapped with the genes with increased requirements in the clinical MAC-PD strains compared to ATCC13950, and also with the genes required for hypoxic pellicle formation in ATCC13950. To identify genes required for in vivo infection of clinical MAC-PD strains,” in the revised manuscript (page 12, lines 204-210).

      (3c) The authors should avoid using the term "genes with increased essentiality" for the reasons explained above in point #2b.

      Following the comment, we have corrected the term as “genes with increased requirements” in the revised manuscript (page 12, line 207).

      (3d) From Tables S8 and S9, I can find 93 genes in Mi198Tn and 74 genes in Mi27Tn for which Tn insertion mutants are under-represented in TnSeq at all time points from Day 1 to Wk 16 in comparison to input. Importantly, excluding results from Day 1 when the infection has just settled, I find 172 and 121 genes in Mi198Tn and Mi27Tn, respectively, under-represented in lungs between Wk 4-16. My suggestion is that authors should focus more on such genes and identify the characteristics of these genes and what fraction belongs to those involved in hypoxic pellicle formation in the reference strain. I am perplexed why authors have categorically ignored other genes and only focused on a set of genes that correspond to ~10-12% of entire differentially abundant mutant pool.

      Thank you for the suggestion on the genes that Tn insertion mutants are under-represented in TnSeq from Weeks 4-16 in the infected mouse lungs be analyzed for overlapping the genes involved in hypoxic pellicle formation in the type strain ATCC13950. We found that at all timepoints from Day1 to Week 16, 74 genes and 99 genes were under-represented in lungs infected with M.i.27Tn and M.i.198Tn, respectively. Of them, 21 (28.3%) and 12 (12.1%) genes belonged to the genes involved in the genes required for hypoxic pellicle formation in the type strain. We found that at timepoints from Week 4 to Week 16, 121 genes and 172 genes were under-represented in lungs infected with M.i.27Tn and M.i.198Tn, respectively. Of them, 21 (23.1%) and 30 (18.0%) genes belonged to genes involved in hypoxic pellicle formation in the type strain. These hypoxic pellicle-associated genes detected both in M.i.27 and M.i.198 encoded methionine synthesis, acyl-CoA dehydrogenase, isocitrate lyase, MMPL family transporter at all time points (from Day1 to Week 16). And additionally, multifunctional oxoglutarate decarboxylase/dehydrogenase, proteasome subunits, ABC transporter ATP-binding protein/permease, lipase chaperone at all time points (from Week 4 to Week 16). We have described these results in the Result section (page 14 lines 236-248) and new Supplementary Tables 12 and 13.

      As for M. intracellulare, conditionally essential genes have not been revealed except for those required for hypoxic pellicle formation in ATCC13950 revealed by us (Tateishi Y. Sci Rep. 2020; new Ref#10). This study is the first to focus on the relationship between the difference of genetic requirements among strains and hypoxic adaptation. We found a certain proportion of overlapped genes required for mouse lung infection and ATCC13950’s hypoxic pellicle formation. We consider it reasonable to focus on the category of genes required for hypoxic pellicle formation to analyze the datasets of TnSeq in mice.

      [Reference]

      Tateishi, Y. et al. Genome-wide identification of essential genes in Mycobacterium intracellulare by transposon sequencing - Implication for metabolic remodeling. Sci Rep 10, 5449 (2020).

      (3e) Page 13, lines 224-227: "Despite the differences in the profiles of the genes required for in vivo infection between strains, these data suggest that increased gene essentiality for hypoxic growth confers advantages for pathogenesis in vivo."

      For the reason described above, I find it a misleading hypothesis that hypoxic growth confers advantages for pathogenesis in vivo. How come only 10-12% of the entire gene sets which include genes of varying functions, can be the sole contributors to bacterial survival in host organelles during infection?

      More importantly, the mouse is not considered a good model for hypoxia as mouse infection does not lead to the formation of solid granuloma with a hypoxic core Though I am not convinced with the authors' bias toward hypoxia-related genes, however, if at all they aim to investigate the role of such genes by an unbiased enrichment of TnSeq mutant, they should have used C3HeJ mice which are known to form granulomas (Boute et al., 2017 (doi: 10.1186/s13567-017-0477-7)).

      Thank you for the comments on the issue of the contribution of genes required for hypoxic growth and on the difference of hypoxic levels between mouse lineages. We did not intend to mention that a set of genes required for hypoxic growth is the sole contributor to bacterial survival in host organs during infection. As we discussed in the Discussion section, we acknowledge that the adaptation to the difference of carbon source between in vitr_o and _in vivo infection (i.e. preferential usage of lipid carbon source in vivo) is involved in the pathogenesis of mycobacterial diseases (Yang. Front Microbiol 2018; new Ref#33, Gouzy. Proc Natl Acad Sci U S A 2021; new Ref#29, Quinonez. mBio 2022; new Ref#40, Pandey. Proc Natl Acad Sci U S A 2008; new Ref#41). We consider that not only the genes required for hypoxic pellicle formation but also strain-dependent/accessory genes conferring kinds of metabolism other than hypoxic pellicle formation can be estimated to be involved in the in vivo mouse lung infection.

      We have modified the sentence to clearly express our intention as follows: “These in vivo TnSeq data suggest that, despite the differences in the profiles of the genes required for in vivo infection between strains, increase of genetic requirements for hypoxic growth in part contribute to the pathogenesis in vivo.” (pages 15-16, lines 269-271)

      It seems to be an interesting idea to perform TnSeq by using C3HeJ mice. The granuloma formed in C3HeJ mice becomes extremely hypoxic (less than 1%, corresponding the level of “pathological” hypoxia) which is as severe as the detection range by pimonidazole. In our model, the effect of such pathological levels of hypoxia on granuloma formation might not be detected. However, the lesion formed in C57BL/6 mice becomes a “physiological” level of hypoxia (5% O2) (McKeown SR. Br Br J Radiol. 2014) which is the same O2 level for M. intracellulare to form pellicles. In principle, oxygen levels inside human bodies are physiologically hypoxic, and many biological events are experimentally investigated in this condition. Thus, we consider that we were able to observe the effect of physiological hypoxia on M. intracellulre growth both in vitro (hypoxic pellicles) and in vivo (infected C57BL/6 mice).

      [Reference]

      Yang, T. et al. Pan-genomic study of Mycobacterium tuberculosis reflecting the primary/secondary genes, generality/individuality, and the interconversion through copy number variations. Front Microbiol 9, 1886 (2018).

      Gouzy, A., Healy, C., Black, K.A., Rhee, K.Y. & Ehrt, S. Growth of Mycobacterium tuberculosis at acidic pH depends on lipid assimilation and is accompanied by reduced GAPDH activity. Proc Natl Acad Sci U S A 118, e2024571118 (2021).

      Quinonez, C.G. et al. The role of fatty acid metabolism in drug tolerance of Mycobacterium tuberculosis. mBio 13, e0355921 (2022).

      Pandey, A.K. & Sassetti, C.M. Mycobacterial persistence requires the utilization of host cholesterol. Proc Natl Acad Sci U S A 105, 4376-4380 (2008).

      McKeown., S.R. et al. Defining normoxia, physoxia and hypoxia in tumours-implications for treatment response. Br Br J Radiol 87, 20130676 (2014).

      (3f) An important set of data with the ATCC13950 reference strain is missing here. It is suggested that authors perform this study with the reference strain to identify whether the enrichment of genes is similar across all strains or is specific to the clinical isolates.

      Thank you for the comment on the setting of ATCC13950 as a control strain in the mouse infection experiment. However, we proved that bacterial burden of ATCC13950 is reduced continuously from 4 weeks of infection, and that ATCC13950 is almost completely eliminated from 8 to 16 weeks of infection (BMC Microiol 2023; new Ref#22). Therefore, it is impossible to perform TnSeq to detect the genes required for persistent infection in mice infected with ATCC13950.

      [Reference]

      Tateishi, Y. et al. Virulence of Mycobacterium intracellulare clinical strains in a mouse model of lung infection - role of neutrophilic inflammation in disease severity. BMC Microbiol 23, 94 (2023).

      (3g) Pages 13-14, lines 228-245: "We have performed a statistical enrichment analysis of gene sets by GSEA...".

      The comparison made here is not clear to me. It seems the authors do compare genes required for the growth of M.i.27 and M.i.198 in mouse lungs with the gene sets required for hypoxic pellicle formation in ATCC13950 together with the gene sets showing increased gene essentiality observed in the clinical MAC-PD strains, and claim that a significant % of genes belong to hypoxia-adaptation pathways. It is factually incorrect because a majority of these might overlap with those found critical for the in vitro survival of MAC-PD strains. It is suggested that authors re-analyze their data by comparing genes required for the growth of M.i.27 and M.i.198 in mouse lungs individually with those involved in hypoxic pellicle formation in ATCC13950, and with the gene sets found critical for in vitro growth, and present accordingly.

      Thank you for the suggestion on the re-analysis of gene enrichment analysis of genes required for M.i.27 and M.i.198 in vivo infection, individually with genes involved in hypoxic pellicle formation in ATCC13950 and with those showing genetic requirements in clinical MAC-PD strains compared to ATCC13950.

      About 50% (92 and 94 out of 181 genes through Day 1 to Week 16 and through Week4 to Week16 of infection) and 40% (70 and 79 out of 179 genes through Day 1 to Week 16 and through Week 4 to Week 16 of infection) of genes required for hypoxic pellicle formation in ATCC13950 were listed as enriched in genes required for mouse lung infection in M.i.27 and M.i.198, respectively. In addition, about 42% (54 and 56 out of 128 genes through Day 1 to Week 16 and through Week 4 to Week 16 of infection) and 40% (79 and 68 out of 179 genes through Day 1 to Week 16 and through Week 4 to Week 16 of infection) of genes showing increased requirements in clinical MAC-PD strains compared to ATCC13950 were listed as enriched in genes required for mouse lung infection in M.i.27 and M.i.198, respectively.

      The tables and graphs of GSEA results are shown in Supplementary Figs. 5, 6.

      These data indicate that 40-50% of the genes required for in vitro hypoxic pellicle formation and the strain-dependent/accessory essential genes are significantly enriched individually with in vivo bacterial growth. We have added the result of reanalyzed data of GSEA in the Result (pages 16-17, lines 287-290). We have shown the detail of reanalyzed data of GSEA in Supplementary Figs. 5, 6 and Supplementary Tables 15, 16.

      (3h) Since authors have used Tnseq of pooled mutants, which often yields misleading information, it is important to validate some of their findings upon mouse infection with individual mutants that yield prominent as well as baseline reduction at different time points. In the absence of validation, it remains a mere speculation of the role of these genes in the infection of these strains to animals.

      Thank you for the suggestion on the validation of the TnSeq hit genes on the in vivo survival. We acknowledge the importance of validating the TnSeq-hit genes by constructing knockout mutants. We have recently succeeded in constructing the vectors for making knockout strains of M. intracellulare (Tateishi Y. Microbiol Immunol. 2024). We will proceed to the infection experiment of knockout mutants by using our system for constructing them.

      [Reference]

      Tateishi Y. et al. Construction of knockout mutants in Mycobacterium intracellulare ATCC13950 strain using a thermosensitive plasmid containing negative selection marker rpsL+. Microbiol Immunol 68, 339-347 (2024).

      (4) Result 4 (Page 14-15): Preferential hypoxic adaptation of clinical MAC-PD strains evaluated with bacterial growth kinetics.

      (4a) "The metabolic remodeling, such as the increased gene essentiality of gluconeogenesis and the type VII secretion system..". As stated above, the essentiality of a gene, being a qualitative trait, should not be presented in quantitative terms. The authors should re-phrase this statement.

      Following the comment, we have corrected the term as “The metabolic remodeling, such as the increased genetic requirements of gluconeogenesis and the type VII secretion system.” (page 17, lines 296-297)

      (4b) "overlap of the genes required for mouse lung infection and those required for hypoxic pellicle formation involved by conferring these metabolic pathways..". There is a syntax error in this statement and needs revision.

      Following the comment, we have corrected the phrase as “overlap of the genes required for mouse lung infection and those required for hypoxic pellicle formation involved by these metabolic pathways”. (page 17, lines 297-299)

      (4c) The altered requirement of genes in different clinical strains for survival provides only circumstantial evidence of metabolic remodeling. Authors are suggested to perform metabolic profiling of representative clinical and reference strains, as it is important to examine whether these bacteria indeed undergo metabolic shift.

      Thank you for the comment on the metabolic profiling of the representative clinical and reference strains. We previously published the TnSeq result of ATCC13950 and we produced the current data by organizing with our previous findings (Fig. 4 in Tateishi Y. Sci Rep 2020; new Ref#10). The priority of the current study was to elucidate the difference and diversity of genetic requirements between clinical MAC-PD strains and ATCC13950. We consider that it is of some value to show the even circumstantial evidence of metabolic remodeling by TnSeq, because it provides a strong rationale for proceeding to the next study including metabolomic analysis.

      [Reference]

      Tateishi, Y. et al. Genome-wide identification of essential genes in Mycobacterium intracellulare by transposon sequencing - Implication for metabolic remodeling. Sci Rep 10, 5449 (2020).

      (5) Result 5 (Page 16-18): Effects of knockdown of universal and accessory/strain-dependent essential or growth-defect-associate genes in clinical MAC-PD strains.

      (5a) Lines 273-277: The rationale of using CRISPRi should be correctly presented to evaluate the effect of individual genes' suppression on the downstream phenotype and not to establish the CRISPRi silencing tool in MAC.

      Thank you for the comment on the rationale of the section of the CRISPR-i experiment. Following the comment, we have modified the sentence as follows: “With an intention to evaluate the effect of suppressing TnSeq-hit genes on bacterial growth.” (page 19, lines 333-334 in the revised manuscript).

      (5b) Line 278: pRH2052/pRH2521 are the plasmids and not the CRISPRi system.

      Following the comment, we have corrected the phrase as “pRH2052/pRH2521 clustered regularly interspaced short palindromic repeats interference (CRISPR-i) plasmids.” (page 19, lines 334-335 in the revised manuscript).

      (5c) Line 280: Other pioneering studies on the use of CRISPRi for gene silencing in mycobacteria (Chaudhary et al., Nat Comm, Rock et al., Nat Microbio) should also be cited.

      Thank you for the comment on adding the reference papers on CRISPR-i in mycobacteria. We have added the two suggested papers in the revised manuscript as new Ref #30 and #31. (page 19, line 336).

      (5d) Lines 282-283: It is not clear why M001 and MOTT64 could not be transformed. Did the authors use any control plasmid to evaluate the transformation efficiency of these strains?

      Thank you for the comment on the failure of transformation in M001 and MOTT64.

      Following the comment, we have performed the experiment for evaluating the efficiency of transformation in the 9 M. intracellulare strains we used in this study. We have used an E. coli-mycobacteria shuttle vector pSO246KM-Prhsp65-luc that expresses firefly luciferase as a control plasmid (Aoki K. J Biol Chem 2004). For obtaining transformed colonies, we used 7H10/OADC agar plates containing the same concentration of kanamycin that we used for preparing Tn mutant libraries and for obtaining CRSISPR-i knockdown strains.

      We have observed no colonies grown on agar plates in MOTT64 after electroporation of the pSO246KM-Prhsp65-luc plasmid. In most of the remaining strains, the transformed colonies have emerged fully on day 10 of culture after electroporation of the plasmid. However, we have observed that M001 needs twice as long as a period for the emergence of transformed colonies. On day 21, the number of colonies in M 001 have finally become comparable to that of the other strains. We have checked the luciferase activity of 6-12 colonies in each strain except for MOTT64, and we have confirmed the transformation of the plasmid by the data of higher luciferase activity in the colonies undergoing electroporation of the plasmid than in those not undergoing electroporation.

      The possible reason for the incapability of obtaining transformants of CRISPR-i vectors in MOTT64 may be due to the extremely low efficiency of acquiring foreign DNA. And the possible reason for the incapability of obtaining transformants of CRISPR-i vectors in M001 may be intolerable to the stress caused by transformation of plasmids compared to other M. intracellulare strains. For M001, pSO246KM-Prhsp65-luc plasmid may cause tolerable stress for transformation, resulting in the delayed emergence of transformed colonies. By contrast, the CRIPSR-i plasmids may cause greater stress for M001 than pSO246KM-Prhsp65-luc plasmid, resulting in being intolerable for transformation.

      Author response table 1.

      Author response image 3.

      Result of luciferase activities before and after transformation of pS0246KM-Prhsp65-luc plasmid. Fifty microliter of cultures were mixed with 50 u L of assay reagents (Luciferase assay system E1500, Promega) and luciferase activity was measured by the luminometer (FilterMax F5, Molecular Devices). Data are shown as mean ± SD of 6-12 colonies

      [Reference]

      Aoki K. Extracellular mycobacterial DNA-binding protein 1 participates in Mycobacterium-lung epithelial cell interaction through hyaluronic acid. J Biol Chem 279, 39798–39806 (2004).

      (5e) Lines 283-186: "To confirm the gene essentiality detected with the HMM analysis, we evaluated the consequent growth inhibition in the knockdown strains of representative universal essential or growth-defect-associated genes, including glcB, inhA, gyrB, and embB.." It is not clear what was the level of suppression of these genes in the respective KD strains. Authors should include the level of suppression of these genes also by qRT-PCR.

      Thank you for the comment on the suppression levels of gene expression in knockdown strains of universal essential genes. Following the comment, we have evaluated them by qRT-PCR and we observed comparable levels of knockdown efficiency in the knockdown strains between universally essential genes and strain-specific/accessory essential genes (new Supplementary Fig. 9). Overall, the gene expression was suppressed to 20 - 70% in the knockdown strains compared to the vector control strains that do not express sgRNA.

      We have added the data of qRT-PCR of knockdown strains of universal essential genes such as glcB, inhA, gyrB, and embB (new Supplementary Fig. 9). We have revised the Result and Discussion in the manuscript (page 21, lines 367-376; page28, lines 490-497).

      (5f) Lines 293-: I am unable to establish any correlation between the growth of the knockdown with Tn insertion reads in the respective genes. For instance, pckA exhibits reduced Tn insertion reads in almost all the strains except in M.i.27, but the effect of its KD on growth is seen only in M.i.198 and M003; glpX exhibits reduced Tn insertion reads in M003, M019, M021 but the effect of its KD on growth is seen only in M003; csd exhibits reduced Tn insertion reads in M.i.198, M003, M019 but the effect of its KD on growth is seen only in M.i.198 and M003. The authors argue that these contradictory phenotypes are due to difficulties in the effective operation of genetically modified systems using foreign genes from different bacterial species in MAC-PD strains (Lines 310-312) or the desired effect on growth could not be observed due to the inability of CRISPRi to yield >99% suppression (Line 314) are not the valid justifications. Indeed, a close look at the RT-PCR data (Figure S5) reveals that pckA levels are ~0.22, 0.5, 0.2, 0.22, 0.2, 0.5, and 0.3 fold relative to sigA in M.i.198, M.i.27, ATCC13950, M018, M019, M003 and M021, respectively, but the effect of its suppression on growth by CRISPRi is seen only in M.i.198 and M003. Secondly, >99% suppression is not a universal prerequisite for all the genes to show growth defect (as might be the case with glcB, inhA, gyrB, and embB genes in this study). Hence, it remains unclear why contrasting results are obtained for most of the genes by TnSeq and CRISPRi.

      Thank you for the comments on the issue of inconsistent results between TnSeq and CRISPR-i based knockdown. We acknowledge that some inconsistencies were observed, particularly among strain-dependent/accessory essential or growth-defect associated genes. By contrast, we found consistent data between TnSeq and CRISPR-i based knockdown results of universal essential genes. By obtaining the data of suppression levels of gene expression in the knockdown strains of universal essential genes, we have acknowledged that the low efficiency of knockdown does not explain the reason of the discrepancy between TnSeq and CRISPR-i results because the levels of knockdown efficiency were comparable between strain-dependent/accessory essential genes and universally essential genes.  

      Although the mechanism has not been fully proven yet only from the current study, we consider that such inconsistent phenotypes with TnSeq and CRISPR-i based knockdown may be related to the recently revealed the bypass mechanism of gene essentiality which is characteristically observed in strain-dependent/accessory essential or growth-defect-associated genes. According to the publication by Rosconi (Nat Microbiol. 2022: new Ref#14) reporting the ‘forced-evolution experiments’ of 36 clinical Streptococcus pneumoniae strains, gene essentiality can be bypassed by several mechanisms including the composition of the accessory genome and pathway rewiring. They recovered successfully knockout mutants from transformation experiments in strain-specific/accessory essential genes such as cytidine monophosphate kinase, a folate pathway enzyme formate tetrahydrofolate ligase and an undecaprenyl phosphate-biosynthesis pathway enzyme farnesyl-diphosphate synthase. The bypassing of gene essentiality could be suggested by observing suppressor mutations and synthetic lethality in knockout strains. By contrast, universal essential genes were reported to fulfill the three categories including high levels of conservation within and often across species, limited genetic diversity, and high and stable expression levels. Consequently, universal essential genes are estimated to be rigid, largely immutable key components to an organism’s survival.

      We consider that this is the case with our study on NTM because NTM is pangenomic. The knockdown of universal essential genes resulted in the clear growth suppression; however, the knockdown of strain-dependent/accessory essential genes did not show the consistent growth suppression. We consider that the bypass mechanism of gene essentiality can explain the inconsistent effect of gene silencing of strain-dependent/accessory genes on bacterial growth suppression.

      We have added the above-mentioned description in the Discussion (pages 28-29, lines 497-519).

      [Reference]

      Rosconi, F. et al. A bacterial pan-genome makes gene essentiality strain-dependent and evolvable. Nat Microbiol 7, 1580–1592 (2022).

      Minor Comments:

      (1) The authors should mention the cut-off of fold-change for all the experiments in the methods section.

      Thank you for the comment on the cut-off of fold-change. We set the cut-off of fold-change as adjusted P-value < 0.05. We added the description in the Methods section. (page 41, lines 724-725)

      (2) Figure 7 legend (Lines 888-889): "Data are shown as the means {plus minus} SD of triplicate experiments. Data from one experiment representative of three independent experiments (N = 3) are shown."

      Figure S3 legend: Data on the growth curves are the means of triplicate experiments. Data from one experiment representative of three independent experiments (N = 3) are shown.

      Figure S4 legend: Data are shown as the means {plus minus} SD of triplicate experiments. Data from one experiment representative of two independent experiments (N = 2) are shown.

      Figure S5 legend: Gene expression data are the means {plus minus} SD of triplicate experiments. Data from one experiment representative of two independent experiments (N = 2) are shown.

      These statements need clarification. Whether multiple independent experiments (biological repeats), each with 2-3 technical replicates performed and the data shown represent one of the multiple biological repeats?

      Thank you for the comments on the number of experiments performed and the number of replicates. We have performed two or three independent experiments with 2-3 technical replicates. The data shown represent one of the independent experiments.

      (3) Figure 7b: Statistics are missing in the bar graph for growth rate under aerobic conditions.

      Thank you for the comment on the statistics of the data regarding growth rate under aerobic conditions. We have added the statistics in the new Fig. 7c.

      (4) The authors should check the y-axis in Figure 7b, as it is not clear whether bacteria indeed show a growth rate of 1-3 CFUs/day.

      Thank you for the comment on the y-axis in Figure 7b. We have corrected the label of y-axis as “log10[CFUs]/day” in the new Fig. 7c. Additionally, we have corrected the label of y-axis in new Fig. 7a and added the description as “Data are represented as CFUs in 4 μl sample at each timepoint.” in the Fig. 7a legend.

      Reviewer #3 (Recommendations For The Authors):

      (1) It's notable that strains M001 and MOTT64 failed to undergo a transformation, while seven other strains did. Given that M001, MOTT64, and M019 belong to the same phylogenetic clade, it raises questions about why particular strains within this clade showed different transformation outcomes. It might be valuable for them to discuss this discrepancy in their study.

      Thank you for the comment on the difference in capacity of transformation between strains belonging to the same genomic subgroup. Although the direct mechanism determining the competency for foreign DNA has not been elucidated in M. intracellulare and other pathogenic NTM species, several studies on general bacteria suggest the difficulties of introducing foreign DNA into clinical strains compared to the laboratory strains. As suggested in Staphylococcus aureus (Covaglia AR. PNAS. 2010; new ref#55), some clinical strains develop elimination system of foreign nucleic acids such as a type III-like restriction restriction endonuclease. As suggested in gran-negative bacteria (Qin J. Sci Rep. 2022; new Ref#56), there may be some difference in cell surface structures between strains, resulting in the necessity of polymyxin B nonapeptide targeting cell membrane for transforming clinical strains. The efficiency of eliminating foreign DNA may be attributed to various kinds of strain-specific factors including restriction endonuclease, natural CRISPR-interference system and cell wall structures rather than a simple genotypic factor.

      We have added the description on the difference of capability in transformation in the Discussion. (page 31, lines 546-558)

      [References]

      Corvaglia, A.R., François, P., Hernandez, D., Perron, K., Linder, P. & Schrenzel, J. A type III-like restriction endonuclease functions as a major barrier to horizontal gene transfer in clinical Staphylococcus aureus strains. Proc Natl Acad Sci U S A 107, 11954-11958 (2010).

      Qin, J., Hong, Y., Pullela, K., Morona, R., Henderson, I.R. & Totsika, M. A method for increasing electroporation competence of Gram-negative clinical isolates by polymyxin B nonapeptide. Sci Rep 12,:11629 (2022).

      (2) The authors should consider specifying M. intracellulare in their title.

      Thank you for the comment on the manuscript title. Following the comments from all Reviewers, we have modified the title as “Functional genomics reveals strain-specific genetic requirements conferring hypoxic growth in Mycobacterium intracellulare”.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This study presents a valuable finding on the immunophenotypes of cancer treatment-related pneumonitis. The evidence supporting the claims of the authors is solid, although the inclusion of controls, as suggested by one of the reviewers, strengthened the study. The work will be of interest to cancer immunologists.

      Response: We are thankful for the editor's recognition of the contribution our study makes to understanding the immunophenotypes associated with cancer treatment-related pneumonitis. We agree that the inclusion of control data is pivotal for benchmarking biomarkers. While our initial study design was constrained by the availability of BALF from healthy individuals within clinical settings, we addressed this limitation by incorporating scRNA-seq data from healthy control and COVID-19 BALF cells sourced from the GSE145926 dataset. This additional analysis has provided a baseline for comparison, revealing that CD16 is expressed in a minority of T cells in healthy BALF, specifically 1.0% of CD4+ T cells and 1.6% of CD8+ T cells. The inclusion of this data as Figures 6H and 6I in our manuscript offers a robust context for the significant increase in CD16-expressing T cells observed in patients with PCP, thus enhancing the robustness of our study's conclusions.

      Author response image 1.

      Reviewer #1 (Recommendations For The Authors):

      Many thanks for giving me the opportunity to review your paper. I really enjoyed the way you carried out this work - for example, your use of a wide panel of markers and the use of two analytical methods - you have clearly given great thought to bias avoidance. I also greatly appreciated your paragraph on the limitations, as there are several, but you do not 'over-sell' your conclusions so there is no issue here for me.

      To improve the piece, there are a few typos (eg 318 - specific to alpha-myosin) and I was briefly confused about the highlighted clusters in Figure 4. Perhaps mention why they are highlighted when they first appear in 4D instead of E?

      Response: We have corrected the typos, and we have rearranged the sequence of Figures 3E and 3F, as well as 4D and 4E, to ensure a logical flow. Citrus-generated violin plots are now presented prior to the heatmap of the clusters, which better illustrates the progression of our analysis and the derivation of the clusters.

      In terms of improvements to the data, obviously it would have been ideal if you had had some sort of healthy control as a point of reference for all cohorts, but working in the field I understand the difficulties in getting healthy BAL. It would be worth your while however trying to find more supportive data in the literature in general. There are studies which assess various immune markers in healthy BAL eg https://journal-inflammation.biomedcentral.com/articles/10.1186/1476-9255-11-9. and so I think it is worth looking wrt the main findings. For example, are CD16+ T cells seen in healthy BAL or any other conditions (at present the COVID study is being over-relied on)? Could these cells be gamma deltas? (gamma deltas frequently express CD8 and CD16, and can switch to APC like phenotypes).

      Response: We are grateful for the reviewer's consideration of the practical challenges associated with collecting BALF from healthy individuals. Alternatively, we have supplemented our analysis with single-cell RNA sequencing data from BALF cells of healthy controls, as found in existing literature (Nature Medicine 2020; 26: 842-844). We have accessed to GSE145926 and downloaded data of BALF cells from healthy control (n=3) and severe COVID19 (n=6). The filtered gene-barcode matrix was first normalized using ‘NormalizeData’ methods in Seurat v.4 with default parameters. The top 2,000 variable genes were then identified using the ‘vst’ method in Seurat FindVariableFeatures function. Then PCA and UMAP was performed. T cells were identified as CD2 >1 and CD3E >1, and FCGR3A expression was explored using an expression threshold of 0.5. Violin plots and bar plots were generated by ggplot function.

      Regarding the pivotal finding of increased CD16-expressing T cells in patients with PCP, the scRNA-seq data mining indicates that CD16 is expressed by a minority of T cells in healthy BALF—1.0% of CD4+ T cells and 1.6% of CD8+ T cells. These figures, now incorporated into our revised manuscript as Figures 6H and 6I, substantiate our findings. These cells could be gamma delta T cells, but we could not confirm it with the limited data. We will investigate in the future study. The main text has been updated to reflect these findings.

      Author response image 2.

      I would agree with your approach of not going down the transcript route, so just focus on protein expression.

      I think you need to mention more about the impact of ICI on PD1 expression - in the methods you lose one approach owing to low T cell expression (132) but in the discussion you mention ICI induced high expression (311) as previously reported. This apparent contradiction needs an explanation.

      Response: We acknowledge the need for clarification regarding the impact of ICIs on PD-1 expression. In the methods section, the low detection of PD-1 expression on T cells in patients treated with nivolumab was indeed noted; this was due to the competitive nature of the PD-1 detection antibody EH12.2 with nivolumab. As reported by Suzuki et al. (International Immunology 2020; 32: 547-557), T cells from patients with ICI-induced ILD, including those treated with nivolumab, exhibit upregulated PD-1 expression, where the PD-1 detection antibody (clone: MIH4). Conversely, as outlined by Yanagihara et al. (BBRC 2020; 527: 213-217), the PD-1 detection antibody clone EH12.2 conjugated with 155Gd (#3155009B) used in our study is unable to detect PD-1 when patients are under nivolumab treatment due to competitive inhibition. The absence of a metal-conjugated PD-1 antibody with the MIH4 clone presented a limitation in our study. Ideally, we would have conjugated the MIH4 antibody with 155Gd for our analysis, which is a refinement we aim to incorporate in future research. We have now included this discussion in our manuscript to clarify the contradiction between the methodological limitations and the high PD-1 expression induced by ICIs, as reported in the literature. This addition will guide readers through the nuances of antibody selection and its implications for detecting PD-1 expression in the context of ICI treatment.

      Finally, since you have the severity data, it would be good to assess all the significantly different clusters against this metric, as you have done for CD16+ T cells. Not only may this reveal more wrt the impact of other immune populations, but it'll also give a point of reference for the CD16+ T cell data.

      Response: Thank you for the suggestion to assess all significantly different clusters against the disease severity metric. We have expanded our analysis to include a thorough correlation study between the disease severity and intensity of various T-cell markers. Notably, we observed that intensity of CCR7 expression correlates with the disease severity. Although the precise biological significance of this correlation remains to be elucidated, it may suggest a role for CCR7+ T cells in the pathogenesis or progression of the disease. We have considered the potential implications of this finding and included it as Supplementary Figure 5. We have also discussed this observation in the discussion section.

      Author response image 3.

      Overall though I think this is a really nice study, with a potentially very significant finding in linking CD16+ T cells with severity. Congratulations.

      Response: We would like to thank the reviewer’s heartful comments on our manuscript.

      Reviewer #2 (Recommendations For The Authors):

      General:

      1) The fact that this is a retrospective study should be indicated earlier in the paper.

      Response: Now we have mentioned the retrospective nature of the study in the method section as follows: In this retrospective study, patients who were newly diagnosed with PCP, DI-ILD, and ICI-ILD and had undergone BALF collection at Kyushu University Hospital from January 2017 to April 2022 were included. The retrospective study was approved by the Ethics Committee of Kyushu University Hospital (reference number 22117-00).

      2) tSNE and UMAP are dimensionality reduction techniques that don't cluster the cells, the authors should specify what clustering algorithm was used subsequently (e.g FlowSOM)

      Response: The cluster was determined manually by their expression pattern.

      3) With regards to the role of CD16 in a potential exacerbated cytotoxicity in the fatal PCP case, the authors could measure the levels of C3a related proteins in patient serum to link to a common immunopathogenic pathway with COVID.

      Response: We did not collect serum from the patients in this study as our research protocol was approved by the Ethics committee for the use of BALF only. However, we agree with your assessment that the measurement of serum C3a levels would be informative. In future studies, we will incorporate the measurement of serum C3a levels to provide more comprehensive insights into the impact of C3a on immune function. Thank you for your valuable feedback and for helping us to improve the quality of our research.

      Line-specific:

      101 The authors should provide some information on how the cryopreservation of the BALF was carried out.

      Response: Upon collection, BALF samples were immediately centrifuged at 300 g for 5 minutes to pellet the cells. The resultant cell pellets were then resuspended in Cellbanker 1 cryopreservation solution (Takara, catalog #210409). This suspension was aliquoted into cryovials and gradually frozen to –80ºC using a controlled rate freezing method to ensure cell viability. The samples were stored at –80ºC until required for experimental analysis. We have added the information in the method section.

      Fig 3B: It would be very helpful if the authors could add a supplementary figure with marker expression on the UMAP projection.

      Response: We have added Supplementary Figure 4 with marker expression on the UMAP projection in Figure 3B.

      Fig 4A: Same as Fig 3B

      Response: We have added Supplementary Figure 5 with marker expression on the UMAP projection in Figure 4A.

      Fig 5B: Same as Fig 3B

      Response: We have added Supplementary Figure 6 with marker expression on the tSNE projection in Figure 5B.

      266 Authors should state if the data is not shown with regards to differences in myeloid cell fractions

      430 Marker intensity is not shown in panel D

      Re: Corrected as follows: “Citrus network tree visualizing the hierarchical relationship of each marker between identified T cell ~”

      446 The legend says patients have IPF, CTD-ILD, sarcoidosis but the figure shows PCP, DI-ILD, ICI-ILD.

      Re: Corrected.

      451 What do the authors mean in "Graphical plots represent individual samples"? Panel B is a dot plot of all samples.

      Response: Corrected as “Dot plots represent ~”.

      472 What do the authors mean in "Graphical plots represent individual samples"? Panel C is a dot plot of all samples.

      Response: Corrected as “Dot plots represent ~”.

      Reviewer #3 (Recommendations For The Authors):

      An important thing is to add comparisons against healthy donors, at least. A common baseline is needed to firmly establish any biomarkers.

      Response: We acknowledge the reviewer's concern regarding the comparison with healthy donors. Although our study did not initially include BALF collection from healthy controls due to the constraints of clinical practice, we recognize the importance of a control baseline to validate biomarkers. To address this, we have integrated scRNA-seq data from healthy control BALF cells available in public datasets (Nature Medicine 2020; 26: 842-844), accessed from GSE145926. This dataset includes BALF cells from healthy controls (n=3) alongside severe COVID-19 patients (n=6). Data mining confirmed that CD16 expression is in a minority of T cells in healthy BALF—1.0% of CD4+ T cells and 1.6% of CD8+ T cells. We have included this comparative data in our manuscript as Figures 6H and 6I to provide context for the observed increase in CD16-expressing T cells in PCP patients, which substantiates our findings.

      Author response image 4.

      Data analysis needs to go deeper. There are several other tools on Cytobank alone that would allow a more quantitative analysis of the data. Fold changes in marker expressions would be very important as measurements of phenotypic changes.

      Response: We thank the reviewer for their constructive feedback on the depth of our data analysis. We acknowledge the value of a more quantitative approach, including the use of fold change measurements to assess phenotypic alterations, and recognize the potential insights such tools on Cytobank could provide. Due to the scope and limited space of the current study, we have focused our analysis on the most pertinent findings relevant to our research questions. We believe the present analysis serves the immediate objectives of this study. However, we agree that further quantitative analysis would enhance the understanding of the data. We have expanded our analysis to include a thorough correlation study between the disease severity of PCP and intensity of various T-cell markers. Notably, we observed that intensity of CCR7 expression correlates with the disease severity of PCP. Although the precise biological significance of this correlation remains to be elucidated, it may suggest a role for CCR7+ T cells in the pathogenesis or progression of the disease. We have considered the potential implications of this finding and included it as Supplementary Figure 5. We have also discussed this observation in the discussion section. We aim to consider these approaches in future work to build upon the foundation laid by this study. Your suggestions are invaluable and will be kept at the forefront as we plan subsequent research phases.

      Author response image 5.

      Reviewer #1 (Public Review):

      Cytotoxic agents and immune checkpoint inhibitors are the most commonly used and efficacious treatments for lung cancers. However their use brings two significant pulmonary side-effects; namely Pneumocystis jirovecii infection and resultant pneumonia (PCP), and interstitial lung disease (ILD). To observe the potential immunological drivers of these adverse events, Yanagihara et al. analysed and compared cells present in the bronchoalveolar lavage of three patient groups (PCP, cytotoxic drug-induced ILD [DI-ILD], and ICI-associated ILD [ICI-ILD]) using mass cytometry (64 markers). In PCP, they observed an expansion of the CD16+ T cell population, with the highest CD16+ T proportion (97.5%) in a fatal case, whilst in ICI-ILD, they found an increase in CD57+ CD8+ T cells expressing immune checkpoints (TIGIT+ LAG3+ TIM-3+ PD-1+), FCRL5+ B cells, and CCR2+ CCR5+ CD14+ monocytes. Given the fatal case, the authors also assessed for, and found, a correlation between CD16+ T cells and disease severity in PCP, postulating that this may be owing to endothelial destruction. Although n numbers are relatively small (n=7-9 in each cohort; common numbers for CyTOF papers), the authors use a wide panel (n=65) and two clustering methodologies giving greater strength to the conclusions. The differential populations discovered using one or two of the analytical methods are robust: whole population shifts with clear and significant clustering. These data are an excellent resource for clinical disease specialists and pan-disease immunologists, with a broad and engaging contextual discussion about what they could mean.

      Strengths:

      • The differences in immune cells in BAL in these specific patient subgroups is relatively unexplored.

      • This is an observational study, with no starting hypothesis being tested.

      • Two analytical methods are used to cluster the data.

      • A relatively wide panel was used (64 markers), with particular strength in the alpha beta T cells and B cells.

      • Relevant biomarkers, beta-D-glucan and KL-6 were also analysed

      • Appropriate statistics were used throughout.

      • Numbers are low (7 cases of PCP, 9 of DI-ILD, and 9 of ICI-ILD) but these are difficult samples to collect and so in relative terms, and considering the use of CyTOF, these are good numbers.

      • Beta-D-glucan shows potential as a biomarker for PCP (as previously reported) whilst KL-6 shows potential as a biomarker for ICI-ILD (not reported before). Interestingly, KL-6 was not seen to be increased in DI-ILD patients.

      • Despite the relatively low n numbers and lack of matching there are some clear differentials. The CD4/CD8+CD16+HLA-DR+CXCR3+CD14- T cell result is striking - up in PCP (with EM CD4s significantly down) - whilst the CD8 EMRA population is clear in ICI-ILD and 'non-exhausted' CD4s, with lower numbers of EMRA CD8s in DI-ILD.

      • The authors identify 17/31 significantly differentiated clusters of myeloid cells, eg CD11bhi CD11chi CD64+ CD206+ alveolar macrophages with HLA-DRhi in PCP.

      • With respect to B cells, the authors found that FCRL5+ B cells were more abundant in patients with ICI-ILD compared to those with PCP and DI-ILD, suggesting these FCRL5+ B cells may have a role in irAE.

      • One patient's extreme CD16+ T cell (97.5% positive) and death, led the authors to consider CD16+ T cells as an indicator of disease severity in PCP. This was then tested and found to be correct.

      • Authors discuss results in context of literature leading them to suggest that CD16+ T cells may target endothelial cells and wonder if anti-complement therapy may be efficacious in PCP.

      • Great discussion on auto-reactive T cell clones where the authors suggest that in ICI-ILD CD8s may react against healthy lung, driving ILD.

      • An observation of CXCR3 in different CD8 populations in ICI-ILD and PCP lead the authors to hypothesise on the chemoattractants in the microenvironment.

      • Excellent point suggesting CD57 may not always be a marker of senescence on T cells - reflective of growing change within the community.

      • Well considered suggestion that FCRL5+ B cells may be involved in ICI-ILD driven autoimmunity.

      • The authors discuss the main weaknesses in the discussion and stress that the findings detailed in the paper "demonstrate a correlation rather than proof of causation".

      • Figures and legends are clear and pleasing to the eye.

      Weaknesses:

      • This is an observational study, with no starting hypothesis being tested.

      • Only patients who were able to have a lavage taken have been recruited.

      • One set of analysis wasn't carried out for one subgroup (ICI-ILD) as PD1 expression was negative owing to the use of nivolumab.

      • Some immune cell subsets wouldn't be picked up with the markers and gating strategies used; e.g. NK cells.

      • Some immune cells would be disproportionately damaged by the storage, thawing and preparation of the samples; e.g. granulocytes.

      • Numbers are low (7 cases of PCP, 9 of DI-ILD, and 9 of ICI-ILD), sex, age and adverse event matching wasn't performed, and treatment regimen are varied and 'suspected' (suggesting incomplete clinical data) - but these are difficult samples to collect. These numbers drop further for some analyses e.g. T cell clustering owing to factors such as low cell number.

      • The disease comparisons are with each other, there is no healthy control.

      • Samples are taken at one time point.

      • The discussion on probably the stand out result - the CD16+ T cells in PCP - relies on two papers - leading to a slightly skewed emphasis on one paper on CD16+ cells in COVID. There are other papers out there that have observed CD16+ T cells in other conditions. It is also worth being in mind that given the markers used, these CD16+ T cell may be gamma deltas.

      • The discussion on ICI patient consistently showing increased PD1, could have been greater, as given the ICI is targeting PD1, one would expect the opposite as commented on, and observed, in the methods section.

      Reviewer #2 (Public Review):

      Yanagihara and colleagues investigated the immune cell composition of bronchoalveolar lavage fluid (BALF) samples in a cohort of patients with malignancy undergoing chemotherapy and with with lung adverse reactions including Pneumocystis jirovecii pneumonia (PCP) and immune-checkpoint inhibitors (ICIs) or cytotoxic drug induced interstitial lung diseases (ILDs). Using mass cytometry, their aim was to characterize the cellular and molecular changes in BAL to improve our understanding of their pathogenesis and identify potential biomarkers and therapeutic targets. In this regard, the authors identify a correlation between CD16 expression in T cells and the severity of PCP and an increased infiltration of CD57+ CD8+ T cells expressing immune checkpoints and FCLR5+ B cells in ICI-ILD patients.

      The conclusions of this paper are mostly well supported by data, but some aspects of the data analysis need to be clarified and extended.

      1) The authors should elaborate on why different set of markers were selected for each analysis step. E.g., Different set of markers were used for UMAP, CITRUS and viSNE in the T cell and myeloid analysis.

      2) The authors should state if a normality test for the distribution of the data was performed. If not, non-parametric tests should be used.

      3) The authors should explore the correlation between CD16 intensity and the CTCAE grade in T cell subsets such as EMRA CD8 T cells, effector memory CD4, etc as identified in Figure 1B.

      4) The authors could use CITRUS to better assess the B cell compartment.

      Reviewer #3 (Public Review):

      The authors collected BALF samples from lung cancer patients newly diagnosed with PCP, DI-ILD or ICI-ILD. CyTOF was performed on these samples, using two different panels (T-cell and B-cell/myeloid cell panels). Results were collected, cleaned-up, manually gated and pre-processed prior to visualisation with manifold learning approaches t-SNE (in the form of viSNE) or UMAP, and analysed by CITRUS (hierarchical clustering followed by feature selection and regression) for population identification - all using Cytobank implementation - in an attempt to identify possible biomarkers for these disease states. By comparing cell abundances from CITRUS results and qualitative inspection of a small number of marker expressions, the authors claimed to have identified an expansion of CD16+ T-cell population in PCP cases and an increase in CD57+ CD8+ T-cells, FCRL5+ B-cells and CCR2+ CCR5+ CD14+ monocytes in ICI-ILD cases.

      By the authors' own admission, there is an absence of healthy donor samples and, perhaps as a result of retrospective experimental design, also an absence of pre-treatment samples. The entire analysis effectively compares three yet-established disease states with no common baseline - what really constitutes a "biomarker" in such cases? The introduction asserts that "y characterizing the cellular and molecular changes in BAL from patients with these complications, we aim to improve our understanding of their pathogenesis and identify potential therapeutic targets" (lines 82-84). Given these obvious omissions, no real "changes" have been studied in the paper. These are very limited comparisons among three, and only these three, states.

      Even assuming more thorough experimental design, the data analysis is unfortunately too shallow and has not managed to explore the wealth of information that could potentially be extracted from the results. CITRUS is accessible and convenient, but also make a couple of big assumptions which could affect data analysis - 1) Is it justified to concatenate all FCS files to analyse the data in one batch / small batches? Could there be batch effects or otherwise other biological events that could confuse the algorithm? 2) With a relatively small number of samples, and after internal feature selection of CITRUS, is the regression model suitable for population identification or would it be too crude and miss out rare populations? There are plenty of other established methods that could be used instead. Have those methods been considered?

      Colouring t-SNE or UMAP (e.g. Figure 6C) plots by marker expression is useful for quick identification of cell populations but it is not a quantitative analysis. In a CyTOF analysis like this, it is common to work out fold changes of marker expressions between conditions. It is inadequate to judge expression levels and infer differences simply by looking at colours.

      The relatively small number of samples also mean that most results presented in the paper are not statistical significant. Whilst it is understandable that it is not always possible to collect a large number of patient samples for studies like this, having several entire major figures showing "n.s." (e.g. Figures 3A, 4B and 5C), together with limitations in the comparisons themselves and inadequate analysis, make the observations difficult to be convincing, and even less so for the single fatal PCP case where N = 1.

      It would also be good scientific practice to show evidence of sample data quality control. Were individual FCS files examined? Did the staining work? Some indication of QC would also be great.

      This dataset generated and studied by the authors have the potential to address the question they set out to answer and thus potentially be useful for the field. However, in the current state of presentation, more evidence and more thorough data analysis are needed to draw any conclusions, or correlations, as the authors would like to frame them.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This paper provides useful information about how the ionome of Arabidopsis thaliana adapts to very high CO2-levels, backed up by solid evidence and carefully designed studies. However, the broader claims of the paper about climate change and food security - heavily emphasized in the abstract, introduction, and discussion - are inappropriate, as there is no direct link to the presented work.

      We sincerely thank you for the work you have done in reviewing our manuscript. We very much appreciate your overall positive assessment of the experimental work as a whole, its value and robustness.

      In this revised version, we took on board the majority of your suggestions and your comments. In particular, we understood your critical point about overstating our objectives, which might in turn seem uncorrelated with our results. We fully agree with the comments that have been made on this point. Consequently, we have made substantial modifications and corrections in order to clarify our objectives and their implications: exploring in depth the natural variation of the shoot ionome response to elevated CO2, and generating a valuable resource allowing a better understanding of the genetic and molecular mechanisms involved in the regulation of plant mineral nutrition by the elevation of atmospheric CO2.

      We also made modifications in response to the other suggestions, including a clarification of the functional experiments carried out around the function of TIP2;2 in response to elevated CO2. Figure 7 now comprises the comparison between both ambient and elevated CO2 conditions, which is much more informative that what appeared in the previous version.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The study's abstract, introduction, and conclusions are not supported by the methods and results conducted. In fact, the results presented suggest that Arabidopsis could easily adapt to an extremely high CO2 environment.

      We understand the reviewer’s comment. Although our work is considered useful, robust and well designed, we agree with the reviewer's point. We have certainly overemphasized the significance of our work to address the issue of food security in response to rising atmospheric CO2, at the expense of the factual description of the results of our fundamental study of the mechanisms at the interface between CO2 and mineral nutrition. We have clarified this focus by modifying the text of the introduction, objectives and discussion. We hope that these modifications will enable readers to better appreciate the core of this work.

      Regarding the last part of the comment, our results do suggest that genetic variation could allow adaptation to rising atmospheric CO2, and our study does indeed aim to identify the extent and basis of this genetic variation.

      This study offers good evidence pointing to a genetic basis for Arabidopsis thaliana's response to elevated CO2 (eCO2) levels and its subsequent impact on the leaf ionome. The natural variation analyses in the study support the hypothesis that genetic factors, rather than local adaptation, guide the influence of eCO2 on the ionome of rosette leaves in Arabidopsis. However, the manuscript's claim regarding its role in "the development of biofortified crops adapted to a high-CO2 world" (line 23) is overstated, especially given the absence of any analysis on the influence of eCO2 on the seed ionome and Arabidopsis is a poor model for harvest index for any crop. The manuscript, in its current form, necessitates massive revisions, particularly in clarifying its broader implications and in providing more substantial evidence for some of its assertions.

      We thank the reviewer for this comment, and we would like to thank the reviewer for the positive appreciation for the identification of genetic basis for Arabidopsis thaliana's response to elevated CO2 and its subsequent impact on the leaf ionome. Nevertheless, it is true that the study of the leaf ionome is far from being able to lead to the development of biofortified plants. Some papers described that nutrient harvest index in Arabidopsis is a potential indicator of nutrient use efficiency (for instance, Masclaux-Daubresse and Chardon, Journal of Experimental Botany 2011 or Aranjuelo et al., Journal of Experimental Botany 2013). However, as we did not include any seed ionome data in the paper, we added clear mentions that our analyses were made on leaves (lines 56/57/250/319) and a comment in the discussion section to address this limitation (lines 325-328).

      Major Drawbacks and Questions:

      (1) Evidence for the Central Premise:

      The foundational premise of the study is the assertion that rising atmospheric CO2 levels result in a decline in plant mineral content. This phenomenon is primarily observed in C3 plants, with C4 plants seemingly less affected. The evidence provided on this topic is scant and, in some instances, contradicts the authors' own references. The potential reduction of certain minerals, especially in grains, can be debated. For instance, reduced nitrogen (N) and phosphorus (P) content in grains might not necessarily be detrimental for human and animal consumption. In fact, it could potentially mitigate issues like nitrogen emissions and phosphorus leaching. Labeling this as a "major threat to food security" (line 30) is exaggerated. While the case for microelements might be more compelling, the introduction fails to articulate this adequately. Furthermore, the introduction lacks any discussion on how eCO2 might influence nutrient allocation to grains, which would be crucial in substantiating the claim that eCO2 poses a threat to food security. A more comprehensive introduction that clearly delineates the adverse effects of eCO2 and its implications for food security would greatly enhance the manuscript.

      We partially agree with this comment. The decline in mineral status of C3 plants under conditions of elevated atmospheric CO2 has been widely described in the literature, and specifically documented for the cereal grains. While there are variations in this effect (depending on species, ecotype, cultivar), there is no debate about its acceptance. Here are just a few of the many works describing this effect, both on a global scale and at the level of the individual plant (Cotrufo MF (1998) Elevated CO2 reduces the nitrogen concentration of plant tissues. Global Change Biology 4: 43-54; Loladze I (2014) Hidden shift of the ionome of plants exposed to elevated CO(2)depletes minerals at the base of human nutrition. eLife 3: e02245; Myers SS (2014) Increasing CO2 threatens human nutrition. Nature 510: 139-142; Poorter H (1997) The effect of elevated CO2 on the chemical composition and construction costs of leaves of 27 C3 species. Plant, Cell & Environment 20: 472-482 ; Soares JC (2019) Preserving the nutritional quality of crop plants under a changing climate: importance and strategies. Plant and Soil 443: 1-26; Stitt] M (1999) The interaction between elevated carbon dioxide and nitrogen nutrition: the physiological and molecular background. Plant, Cell & Environment 22: 583-621; Uddling J (2018) Crop quality under rising atmospheric CO2. Curr Opin Plant Biol 45: 262-267).

      In addition to this, the threat to food security posed by this alteration in plant mineral status has also been well described in the literature by several modeling approaches (Beach RH (2019) Combining the effects of increased atmospheric carbon dioxide on protein, iron, and zinc availability and projected climate change on global diets: a modelling study. Lancet Planet Health 3: e307-e317; Ebi KL (2019) Elevated atmospheric CO(2) concentrations and climate change will affect our food's quality and quantity. Lancet Planet Health 3: e283-e284; Medek DE (2017) Estimated Effects of Future Atmospheric CO2 Concentrations on Protein Intake and the Risk of Protein Deficiency by Country and Region. Environ Health Perspect 125: 087002; Smith MR (2018) Impact of anthropogenic CO2 emissions on global human nutrition. Nature Climate Change 8: 834-839; Weyant C (2018) Anticipated burden and mitigation of carbon-dioxide-induced nutritional deficiencies and related diseases: A simulation modeling study. PLoS Med 15: e1002586; Zhu C (2018) Carbon dioxide (CO2) levels this century will alter the protein, micronutrients, and vitamin content of rice grains with potential health consequences for the poorest rice-dependent countries. Sci Adv 4: eaaq1012). To reinforce this point, we have added a sentence and references (lines 30-33). Nevertheless, we understand the reviewer's comment on the nuance to be given to the intensity of this potential threat. We have therefore modified the text, replacing "major threat" by "significant threat" (lines 3 and 29).

      We also would like to answer the reviewer’s comment on the potential environmental benefit associated with reduced N and P content in grains (mitigation of N emissions and P leaching). Indeed, if this reduced N and P content results from a lowered use efficiency of soil nutrients by plants, as suggested by several studies (Bloom 2010, Cassan 2023, Gojon 2023 and references therein), this may at the opposite favor N oxides emission and P leaching from the soil.

      (2) Exaggerated Concerns:

      The paper begins with the concern that carbon fertilization will lead to carbon dilution in our foods. While we indeed face numerous genuine threats in the coming decades, this particular issue is manageable. The increase in CO2 alone offers many opportunities for boosting yield. However, the heightened heat and increased evapotranspiration will pose massive challenges in many environments.

      While there are indeed multiple threats that we are facing in the coming decades, we don't fully agree with this comment. At present, there's no evidence to say that the negative effect of CO2 on plant mineral content will be manageable. Furthermore, there is compelling evidence that altered mineral nutrition and mineral status of plants will be an important factor limiting the high CO2-induced increase in yield, as will be heat or increased evapotranspiration (see for instance Coskun et al (2016) Nutrient constraints on terrestrial carbon fixation: The role of Nitrogen. J. Plant Physiol. 203: 95-109; Jiang M (2020) Low phosphorus supply constrains plant responses to elevated CO2 : A meta-analysis. Glob Chang Biol 26: 5856-5873 ; Reich PB (2006) Nitrogen limitation constrains sustainability of ecosystem response to CO2. Nature 440: 922-925). Thus, although we do not negate the crucial importance of heat and water stress, we believe it is relevant to study the basic mechanisms responsible for the negative effect of CO2 on plant mineral composition.

      Figure 4 in fact suggests that 43% of the REGMAP panel (cluster 3) is already pre-adapted to very high CO2 levels. This suggests annual species could adapt very rapidly.

      We agree with the reviewer. However, this suggests that genetic variation exists in some ecotypes to support adaptation to elevated CO2. The purpose of this work is indeed to identify this genetic variation, in order to characterize the mechanisms behind.

      (3) Assumptions on CO2 Levels:

      The assumption of 900ppm seems to be based on a very extreme climate change scenario. Most people believe we will overshoot the 1.5°C scenario, however, it seems plausible that 2.5 to 3°C scenarios are more likely. This would correspond to around 500ppm of CO2. https://www.nature.com/articles/s41597-022-01196-7/tables/4

      We agree with the reviewer that the CO2 concentration we used corresponds to a high value in the IPCC projections. That said, this value is currently considered very plausible: the following figure (from Smith and Myers (2018) Nature Climate Change) shows that current CO2 emissions align with the IPCC's most extreme model (RCP 8.5), which would result in a CO2 concentration of around 900 ppm in 2100. Furthermore, nothing allows to exclude the 4°C scenario in the 6th IPCC report.

      Author response image 1.

      (4) Focus on Real Challenges:

      We have numerous real challenges, such as extreme heat and inconsistent rainfall, to address in the context of climate change. However, testing under extreme CO2 conditions and then asserting that carbon dilution will negatively impact nutrition is exaggerated.

      While we fully agree that several threats linked to climate change exist, and all deserve to be studied, we find it questionable to consider that the potential effect of high CO2 on the mineral nutrition of plants is not a real challenge. The mineral nutrition of plants is already a current major environmental challenge. This perspective seems to reflect the reviewer's personal opinion rather than an analysis of our work.

      In contrast, the FACE experiments are fundamental and are conducted at more realistic eCO2 levels. Understanding the interaction between a 20% increase in CO2 and new precipitation patterns is key for global carbon flux prediction.

      Again, we do not fully understand this comment, as the aim of our study was not to perform a global carbon flux prediction, but to unravel genes and mechanisms underlying the negative effect of elevated CO2 on the nutrient content of Arabidopsis rosettes. However, we agree with the reviewer’s comment and with the fact that FACE are useful facilities to explore the CO2 response in more natural environments, and we highlight the fact that the decrease in mineral status of C3 plants has been widely documented in FACE studies. FACE experiments do not facilitate, however, to conduct fully controlled experiments (temperature, rainfall, wind and light intensities are not controllable in FACE), that allow to disentangle the mechanisms by which elevated CO2 regulates the signaling pathways associated with the plant mineral composition. In the longer term, studying the mechanisms we have identified in a more global context of climate change could be highly relevant.

      As I look at the literature on commercial greenhouse tomato production, 1000ppm of eCO2 is common, but it also looks like the breeders and growers have already solved for flavor and nutrition under these conditions.

      Indeed, tomato is often cultivated in CO2-enriched greenhouses at 1000 ppm. According to the literature, this results in a 20-25% reduction in vitamin C or lycopene, and requires a significantly higher nitrogen and water intake to reach expected sugar levels (Doddrell H (2023) Horticulture Research). In addition, the negative effect of elevated CO2 on tomato nutrient content seems to have significant repercussions on nutrition-health properties (Boufeldja (2023), Molecules).

      Conclusion:

      While the study provides valuable insights into the genetic underpinnings of Arabidopsis thaliana's response to elevated CO2 levels, it requires an entirely revised writeup, especially in its abstract, broader claims and implications. The manuscript would benefit from a more thorough introduction, a clearer definition of its scope, and a clear focus on the limits of this study.

      We thank the reviewer for the comments made on our manuscript. In addition to the responses that we provide to these comments, we have modified the main text of the introduction, objectives and discussion to take these comments into consideration. We believe that this will significantly improve the manuscript.

      Reviewer #2 (Public Review):

      Strengths:

      The authors have conducted a large, well-designed experiment to test the response to eCO2. Overall, the experimental design is sound and appropriate for the questions about how a change in CO2 affects the ionome of Arabidopsis. Most of the conclusions in this area are well supported by the data that the authors present.

      We thank the reviewer for this positive appreciation.

      Weakness:

      While the authors have done good experiments, it is a big stretch from Arabidopsis grown in an arbitrary concentration of CO2 to relevance to human and animal nutrition in future climates. Arabidopsis is a great model plant, but its leaves are not generally eaten by humans or animals.

      We agree with the reviewer’s comment. We recognized that implying a direct contribution of our work to human nutrition in the future climates is overstated, as mentioned by the reviewer 1 as well. This was not an intentional overstatement, as we have always been convinced that our work contributed to the understanding of the basic mechanisms involved in the negative regulation of plant mineral nutrition by high CO2. We have significantly modified the text to correct any misunderstanding of our work’s implication.

      The authors don't justify their choice of a CO2 concentration. Given the importance of the parameter for the experiment, the rationale for selecting 900 ppm as elevated CO2 compared to any other concentration should be addressed. And CO2 is just one of the variables that plants will have to contend with in future climates, other variables will also affect elemental concentrations.

      We agree with this comment. We added a justification of the high CO2 concentration used in this work in the Material and Methods section (lines 343-344). You can also read the explanation of this choice in the response to the reviewer 1’s point 3.

      Given these concerns, I think the emphasis on biofortification for future climates is unwarranted for this study.

      Anew, we agree with this comment and we have significantly modified the text to correct any misunderstanding of our work’s implication.

      Additionally, I have trouble with these conclusions:

      -Abstract "Finally, we demonstrate that manipulating the function of one of these genes can mitigate the negative effect of elevated CO2 on the plant mineral composition."

      -Discussion "Consistent with these results, we show that manipulating TIP2;2 expressions with a knock-out mutant can modulate the Zn loss observed under high CO2."

      The authors have not included the data to support this conclusion as stated. They have shown that this mutant increases the Zn content of the leaves when compared to WT but have not demonstrated that this response is different than in ambient CO2. This is an important distinction: one way to ameliorate the reduction of nutrients due to eCO2 is to try to identify genes that are involved in the mechanism of eCO2-induced reduction. Another way is to increase the concentration of nutrients so that the eCO2-induced reduction is not as important (i.e. a 10% reduction in Zn due to eCO2 is not as important if you have increased the baseline Zn concentration by 20%). The authors identified tip2 as a target from the GWAS on difference, but their validation experiment only looks at eCO2.

      We thank the reviewer for this comment, and we agree with it. It is much more interesting, especially in the context of this paper, to analyze the function of a candidate gene not only in elevated CO2, but in both ambient and elevated CO2. Therefore, we added in Figure 7 data for the expression of TIP2;2 in contrasted haplotypes under ambient CO2, in comparison to those already presented under elevated CO2 (now Fig. 7C and 7D). This showed that TIP2;2 expression is lower in haplotype 0 also under ambient CO2. We also added in Figure 7 (Fig. 7E) the Zn level in WT and tip2;2-1 mutant under ambient CO2, in comparison to those already presented under elevated CO2. This showed that that the tip2;2-1 mutant line did not present any decrease in Zn shoot content in response to elevated CO2, in opposition to what is observed for the WT.

      We have added comments associated to these new results in the Results and Discussion sections and in the discussion section (lines 233-242 in the results section, and lines 310-314 in the discussion section).

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Reviewer Comments on the Article's Approach to Ionome Analysis

      (1) Omission of Phosphorus from the Ionome:

      It's surprising that phosphorus (P) was not measured in the ionome. After nitrogen (N), P is often the most limiting mineral for plant development and yield, making it a significant component of the ionome. Why did the authors omit this crucial element?

      We agree with the reviewer that P is an important mineral for plant growth. The absence of data related to P content is due to feasibility constraints rather than oversight. The MP-AES instrument we used to analyze the ionome (except N and C, that we obtained from an Elementar Analyzer) would have required an extra-step and an extra-analysis to obtain data for macronutrient such as P or K. In the context of this large-scale experiment, we faced the necessity to compromise and proceed without these data.

      (2) Relationship Between Leaf Ionome and Seed:

      The manuscript lacks evidence demonstrating the relationship between the leaf ionome and the seed. This connection is vital to establish the study's aims as outlined in lines 20-24. If the central argument is that eCO2 threatens food security, it's essential for the authors to either:

      • Provide evidence that eCO2 induces changes in the ionome profiles of seeds.

      • Show that changes in the rosette leaf ionome lead to alterations in seed ionome profiles.

      We agree with the reviewer. Although we know that seed ionome composition of Arabidopsis model accession such as Columbia is indeed negatively affected by eCO2, we do not provide the data that support some of the terms used in lines 20-24. The correspondence between leaf and seed ionome in natural population under eCO2 is certainly a next question that we will address. Therefore, to align our stated objectives with our data, we have modified the sentence in lines 20-24. We also added a comment on this point lines on the discussion section (lines 324-328).

      (3) Analysis of Ionome in Rosette Leaves:

      Why did the authors choose to analyze the ionome specifically in rosette leaves? Is there a known correlation between the ionome profile in rosette leaves and seeds?

      See our answer to the above comment.

      (4) Experimental Design Comments:

      • The layout of the accession growouts, the methods of randomization, blocking, and controls/checks should be detailed.

      • Were BLUEs (Best Linear Unbiased Estimators) or BLUPs (Best Linear Unbiased Predictors) employed to account for experimental design conditions? If not, it's recommended that they be used.

      We thank the reviewer for this comment. A note on replicates has been added in the Method/Plant Material section. Concerning the BLUEs/BLUPs, although I am not familiar with their use, I do not think that these approaches are relevant in our experimental design. Indeed, we pooled 3 to 5 replicates for each accession to measure the ionome (as mentioned in the Method/Ionome analysis section – we realized this was perhaps not clear enough, and thus we reinforced this point in this section). Therefore, we do not have the variance data required to perform BLUEs/BLUPs.

      (5) Carbon Dilution Effect:

      The statement, "The first component of the PCA described a clear antagonistic trend between C content and the change of other mineral elements (Fig. 3B)..." suggests a well-understood carbon dilution effect. These results are anticipated and align with existing knowledge.

      We thank the reviewer for this comment. However, this sentence does not relate to the biomass dilution hypothesis referred to by the reviewer. Indeed, the composition of each mineral (C and others) is expressed as a percentage of biomass, not as an absolute value. Therefore, this reflects more a probable effect of the increase in carbon compounds (notably soluble sugars), which could influence mineral composition.

      (6) Heritability Estimates:

      The authors should report both the broad-sense heritability and an estimate of heritability based on a GRM or Kinship matrix.

      We thank the reviewer for this suggestion. We are skeptical of using a kinship matrix to estimate heritability in our study. Estimating narrow-sense heritability using a kinship matrix is conceptually based on the infinitesimal model of Fisher, thereby meaning that phenotypic variation is driven by hundreds to thousands of QTLs with small effects. If this is the case, GWAS conducted on several hundred (or even thousands) of genotypes will not be powerful enough to detect such QTLs. Accordingly, estimates of broad-sense heritability based on estimates of variance components can drastically differ from estimates of narrow-sense heritability based on the use of a kinship matrix, as illustrated in the study of Bergelson et al. (2019 Scientific Reports).

      (7) Application of the Breeder's Equation:

      It would be beneficial if the authors applied the breeder's equation to estimate the species' potential rate of response. Based on the allele frequency of the adapted cluster 3 (69 ecotypes or 43% frequency of Figure 3B), it seems plausible that the populations could adapt within 23 generations.

      We thank the reviewer for this suggestion. Indeed, it would be really interesting to test whether sub-populations could adapt in comparison with others, and over what period of time. It is nevertheless not possible to do so using the Breeder’s equation in our case, as this requires fitness data under conditions of ambient or elevated CO2 (i.e. production of seeds) to be applied, and we do not have these data at the level of the whole population.

      (8) Overall Quality:

      In general, the authors have executed a high-quality ionome mapping experiment. However, the abstract, introduction, and discussion should be entirely rewritten and reframed.

      We thank the reviewer for the positive evaluation of our experiment. As previously mentioned, we are for the most part in agreement with the comments made about the need to align our stated objectives with our experimental data and conclusions. To do so, we have rewritten part of the abstract, introduction and discussion. The details of these modifications are described in the responses made to each comment.

      Here's a line-by-line list of suggestions on writing:

      Line 30 would read better with a comma after thus (or by replacing thus with therefore and then a comma at the start of the sentence).

      Line 33 nevertheless would read better in between commas.

      Lines 45 - 48 sentence is too long, could probably divide it into two.

      Lines 90 - 94 are hard to interpret, recommend rephrasing for clarity.

      Line 130 - keep verbs in the past tense for consistency (ran instead of run).

      Line 194 - what do the authors mean by crossed? I'm inferring they looked at the intersection of DEGs with the list of genes identified by GWA mapping, probably should use a more concise word.

      There's a concurrent use of the adjective strong (Lines 80, 142, 144, 197, 245). I would advise using a more concise adjective or avoiding its use to let the reader form their own opinion on the data.

      Lines 174-176 the cited reference (No. 15) is incorrect. The study by Katz et al. (2022) does not provide information on the role of ZIF1 in zinc sequestration mechanisms under elevated CO2 conditions.

      We thank the reviewer for these detailed recommendations. We have corrected or rephrased the text according to these suggestions.

      Reviewer #2 (Recommendations For The Authors):

      Technical points:

      900 ppm as elevated CO2: Given the importance of the parameter for the experiment, the rationale for selection 900 ppm as elevated CO2 compared to any other concentration should be addressed.

      We acknowledge the reviewer's point and have previously addressed related aspects earlier in our response. In line with this, we have included a justification for this particular parameter in the Method section.

      The authors do not mention what genotype was used for their root/shoot RNAseq experiment.

      We thank the reviewer for this comment, and indeed, this information was not mentioned. This is now done, in the Method section.

      Line 125: Spelling error "REGMPA".

      This has been corrected.

      Line 338: Removal of outlier observations - "Prior to GWAS and multivariate analyses such as PCA or clustering, mineral composition measures were pre-processed to remove technical outliers". The authors should mention the exact number of outliers that were removed and what the explicit criteria were for removal.

      The number of outliers removed from each dataset is now indicated in Supplemental Table 7 (this is cited in the Method section). The explicit criteria used for this analysis is actually mentioned in the corresponding Method section: “the values positioned more than 5 median absolute deviations away from the median were removed from the dataset”.

      Line 379: "Lowly expressed genes with an average value across conditions under 25 reads were excluded from the analysis". Providing information about the number of the lowly expressed genes that were removed from the analysis can help with the interpretation of the likelihood of the candidates selected being correct.

      This is a standard procedure in RNAseq analysis. It avoids many false positives in the differential analysis of gene expression based on ratios (where a very small number in the denominator can lead to a very high variation in expression, of no real significance). For information, this step led to the removal of 11607 and 10121 genes for the shoot and root datasets.

      Line 384: It's not clear how many biological replicates were used.

      This has been corrected.

      Additional comment: We have also become aware of a confusion concerning one of the candidate genes located close to GWA peaks: line 180 of the first version, we mentioned CAX1 (AT1G16380) for its role on nutrient deficiency response. There are actually two genes annotated as CAX1 in TAIR (both are cation exchangers), but the one involved in nutrient deficiency response is AT2G38170. We therefore removed the sentence mentioning AT1G16380/CAX1 as a potential candidate gene.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      This paper performed a functional analysis of the poorly characterized pseudo-phosphatase Styxl2, one of the targets of the Jak/Stat pathway in muscle cells. The authors propose that Styxl2 is essential for de novo sarcomere assembly by regulating autophagic degradation of non-muscle myosin IIs (NM IIs). Although a previous study by Fero et al. (2014) has already reported that Styxl2 is essential for the integrity of sarcomeres, this study provides new mechanistic insights into the phenomenon. In vivo studies in this manuscript are compelling; however, I feel the contribution of autophagy in the degradation of NM IIs is still unclear.

      Major concerns:

      1) The contribution of autophagy in the degradation of Myh9 is still unclear to this reviewer.

      It has been reported that autophagy is dispensable for sarcomere assembly in mice (Cell Metab, 2009, PMID; 1994508). In Fig. 7A, the authors showed that overexpressed Styxl2 downregulated the amount of ectopically expressed Myh9 in an ATG5-dependent manner in C2C12 cells; however, the experiment is far from a physiological condition. Therefore, the authors should test ATG5 knockdown and the genetic interaction between Styxl2 and ATG5 in vivo. That is, 1) loss of ATG5 on sarcomere assembly in zebrafish, and 2) the genetic interaction between Styxl2 and ATG5; co-injection of Styxl2 mRNA and ATG5-MO into the zebrafish embryos.

      Our response: In fact, the reference cited by the reviewer (Cell Metab, 2009; PMID; 19945408) clearly indicated that autophagy is required for sarcomere assembly. Moreover, another paper using the fish extraocular muscle regeneration model (Autophagy, 2014, PMID: 27467399), also showed that the sarcomere structure was disrupted in the regenerated muscles when autophagy was inhibited by chloroquine. In addition, other references (Nature medicine, 2007, PMID: 17450150; Autophagy, 2010, PMID: 20431347) also showed that loss of Atg5 in mouse cardiac muscles led to disorganized sarcomere structure. We also performed the Atg5 knockdown experiments as suggested by the reviewer. However, the sarcomere structure defects were not so obvious as Styxl2 knockdown (see Author response image 1 below). In fact, it was reported that Atg5 knockdown may not be a desirable strategy to disrupt autophagy as it was found “--- only a small amount of Atg5 is needed for autophagy, knockdown of Atg5 to levels low enough to block autophagy might be difficult to achieve, --” (Nature medicine, 2007, PMID: 17450150). Due to the ineffectiveness of the Atg5 MO in our assays, we did not perform the second experiment suggested by the reviewer. Moreover, as Styxl2 is not a key component of the autophagy machinery, it is less likely that overexpression of Styxl2 alone can rescue the autophagy defects caused by Atg5.

      Author response image 1.

      The fish zygotes were injected with Atg5 or Ctrl MO. 48 hpf, the fish were stained with an anti-Actinin antibody. Some fast muscle fibers were disrupted when Atg5 was knocked down. The number in numerator at the bottom of each image represents fish embryos showing normal Actinin staining pattern, while that in denominator represents the total number of embryos examined. Scale bar, 10 µm.

      2) As referenced, Yamamoto et al. reported that Myh9 is degraded by autophagy. Mechanistically, Nek9 acts as an autophagic adaptor that bridges Atg8 and Myh9 through interactions with both. Inconsistent with the model, the authors mentioned on page 12, lines 365-367, "A recent report showed that Myh9 could also undergo Nek9-mediated selective autophagy (Yamamoto et al., 2021), suggesting that Myh9 is ubiquitinated". I think it is not yet explored whether autophagic degradation of Myh9 requires its ubiquitination. Moreover, I cannot judge whether Myh9 is ubiquitinated in a Styxl2-dependent manner from the data in Fig. 7C. The author should test whether Nek9 is required for Myh9 degradation in muscles. If Nek plays a role in the Myh9 degradation, it would be better to remove Fig. 7C.

      Our response: Indeed, as pointed out by the reviewer, it has not been explored whether Myh9 is ubiquitinated or not. However, it has been well-established that some proteins undergoing autophagic degradation are ubiquitinated, which are linked to Atg8/LC3 via p62 and NBR1 (Mol Cell, 2009, PMID: 19250911; J Biol Chem, 2007, PMID: 17580304). To improve the data quality, we repeated the Myh9 ubiquitination experiment in cells with or without Styxl2 by using a slightly different strategy: as shown in the revised Figure 7C, we first co-transfect HEK 293T cells with HA-Myh9, Myc-ubiquitin, and Flag-Styxl2. We then immunoprecipitated Myc-tagged Ubiquitin from the whole cell lysates, and then blot for HAMyh9. We detected an obvious increase in Ubiquitin-conjugated HA-Myh9 (revised Figure 7C). As suggested by the reviewer, we also tested whether knockdown of Nek9 affects the degradation of Myh9. We failed to detect an obvious effect (see Author response image 2 below) caused by Nek9 knockdown. One possible explanation for this negative result is that Nek9 itself is a negative regulator of selective autophagy (J Biol Chem, 2020, PMID: 31857374). By knocking it down, the functions of the autophagy machinery are expected to be enhanced instead of being impaired. This may explain why we failed to detect an effect on Myh9 degradation simply by knocking down Nek9. To further elucidate whether Nek9 is involved in Myh9 degradation in myoblasts, we may need to use a dominant-negative mutant of Nek9 missing the LCIII-binding motif as shown by Yamamoto (Nat Commun, 2021, PMID: 34078910). This will be addressed in our future study.

      Author response image 2.

      C2C12 cells were transfected with negative control siRNA (NC), siNek9#2 or siNek9#3. 18 h later, the cells were transfected with plasmids HA-Myh9 and Flag-Styxl2 or Flag-Stk24. After another 24 h, the cells were harvested for RT-qPCR (left panel) or western blot (right panel).

      3) In Fig. 5F, the protein level of Styxl2 and Myh10 should be checked because the efficiency of Myh10-MO was not shown anywhere in this manuscript.

      Our response: As suggested by the reviewer, a Western blot showing the protein levels of Myh10 was shown in Figure 5-figure supplement 1B.

      Reviewer #2 (Public Review):

      The authors investigated the role of the Jak1-Stat1 signaling pathway in myogenic differentiation by screening the transcriptional targets of Jak1-Stat1 and identified Styxl2, a pseudophosphatase, as one of them. Styxl2 expression was induced in differentiating muscles. The authors used a zebrafish knockdown model and conditional knockout mouse models to show that Styxl2 is required for de novo sarcomere assembly but is dispensable for the maintenance of existing sarcomeres. Styxl2 interacts with the non-muscle myosin IIs, Myh9 and Myh10, and promotes the replacement of these non-muscle myosin IIs by muscle myosin IIs through inducing autophagic degradation of Myh9 and Myh10. This function is independent of its phosphatase domain.

      A previous study using zebrafish found that Styxl2 (previously known as DUSP27) is expressed during embryonic muscle development and is crucial for sarcomere assembly, but its mechanism remains unknown. This paper provides important information on how Styxl2 mediates the replacement of non-muscle myosin with muscle myosin during differentiation. This study may also explain why autophagy deficiency in muscles and the heart causes sarcomere assembly defects in previous mouse models.

      Reviewer #3 (Public Review):

      Wu and colleagues are characterising the function of Styxl2 during muscle development, a pseudo-phosphatase that was already described to have some function in sarcomere morphogenesis or maintenance (Fero et al. 2014). The authors verify a role for Styxl2 in sarcomere assembly/maintenance using zebrafish embryonic muscles by morpholino knockdown and by a conditional Styxl2 allele in mice (knocked-out in satellite cells with Pax7 Cre).

      Experiments using a tamoxifen inducible Cre suggest that Styxl2 is dispensable for sarcomere maintenance and only needed for sarcomere assembly.

      BioID experiments with Styxl2 in C2C 12 myoblasts suggest binding of nonmuscle myosins (NMs) to Styxl2. Interestingly, both NMs are downregulated when muscles differentiate after birth or during regeneration in mice. This down-regulation is reduced in the Styxl2 mutant mice, suggesting that Styxl2 is required for the degradation of these NMs.

      Impressively, reducing one NM (zMyh10) by double morpholino injection in a Styxl2 morphant zebrafish, does improve zebrafish mobility and sarcomere structure. Degradation of Mhy9 is also stimulated in cell culture if Styxl2 is co-expressed. Surprisingly, the phosphatase domain is not needed for these degradation and sarcomere structure rescue effects. Inhibitor experiments suggest that Styxl2 does promote the degradation of NMs by promoting the selective autophagy pathway.

      Strengths:

      A major strength of the paper is the combination of various systems, mouse and fish muscles in vivo to test Styxl2 function, and cell culture including a C2C12 muscle cell line to assay protein binding or protein degradation as well as inhibitor studies that can suggest biochemical pathways.

      Weakness:

      The weakness of this manuscript is that the sarcomere phenotypes and also the western blots are not quantified. Hence, we rely on judging the results from a single image or blot. Also, Styxl2 role in sarcomere biology was not entirely novel.

      Few high resolution sarcomere images are shown, myosins have not been stained for.

      Reviewer #1 (Recommendations For The Authors):

      Minor concerns:

      4) The position of molecular weight markers should be shown in all Western blot data.

      Our response: As suggested by the reviewer, the molecular weight markers have been added in the Western blot data.

      5) Schematic models of Styxl2deltaN509 and N513 construct would be helpful for the readers.

      Our response: A schematic has been added in Figure 6B (upper panel) to show Styxl2deltaN509 and Styxl2N513.

      6) Several data were described but not shown (data not shown). I think the data need to be included in the main or supplemental figures.

      Our response: As suggested by the reviewer, the raw data were now included in the Figure 6-figure supplement 1A and Figure 7-figure supplement 1.

      Reviewer #2 (Recommendations For The Authors):

      1) In Fig. 5E, the authors suggest that the needle touch response was improved by additional knockdown of Myh10. This is a bit confusing because the germline knockout of Myh10 is lethal (line 445). The authors should provide more explanation on this point. Additionally, it would be better to include Myh10-MO in Fig. 5E.

      Our response:<br /> In line 445 of our original manuscript, we stated that germline knockout of mouse Myh10 gene is lethal based on a published report (Proc Natl Acad Sci USA, 1997, PMID: 9356462). Here, in zebrafish zygotes, we only knocked down zMyh10, thus, we do not expect to get a lethal phenotype. In addition, other groups who knocked down Myh10 in fish also did not get a lethal phenotype (Dev Biol, 2015, PMID: 25446029). As to the control involving Myh10MO in the experiment in Fig.5E, we did include it in our experiments. As we did not observe any obvious effects on either motility or sarcomere structures, we did not include the data set in the figure.

      2) It was suggested that Myh9 and Myh10 form a complex (Rao et al. PLoS One 9, e114087, 2014). Thus, the IP experiments do not rule out the possibility that Styxl2 directly interacts with either Myh9 or Myh10 and indirectly with the other.

      Our response: In known myosin-II complexes, different myosin molecules can associate with each other through their tail domains (Bioarchitecture, 2013, PMID: 24002531). Thus, if we use fulllength myosin molecules in our co-immunoprecipitation assays, it will be difficult to exclude the possibility raised by the reviewer. However, by using truncated myosin proteins, we showed that the head domain of either Myh9 or Myh10 could interact with Styxl2 in the absence of the tail domain (Figure 4E, F). This result strongly suggests that both Myh9 and Myh10 can independently interact with Styxl2.

      Reviewer #3 (Recommendations For The Authors):

      1) The western blot shown in Figure 3B supporting the induced deletion of Styxl2 should be quantified. Ideally, some other blots, e.g., in Figure 5, too. Please add the age of the mice in Figure 5B to the figure legend.

      Our response:<br /> As suggested by the reviewer, we quantified the data in Figures.3B, 3F, 5B, 5D, and 7A and the data were included in the revised figures. In Fig.5B, we already indicated the age of the mice (i.e., P1) in the legend.

      2) A quantification of the sarcomere phenotypes in the double knock-down of zMyh10 and Styxl2 compared to Styxl2 single would make the paper significantly stronger. Furthermore, a double morpholino control should be included to rule out any RNAi machinery 'dilution effect'.

      Our response: As suggested by the reviewer, we quantified the sarcomere structures using the line scan analysis in ImageJ and the scan images were placed as inserts in the upper corner of the immunofluorescent images (revised Figures 5F, and 6C). To avoid potential “dilution effects”, in all the experiments involving the use of two different MOs, the total amount of MO was kept the same in all control samples by including a control MO (e.g., in samples treated with one specific MO, an equal amount of a control MO was also included, while in samples without any specific MO, twice as much control MO was used).

      3) The sarcomere phenotypes in figure 6 should also be better quantified, for example using simple line scans of the alpha-actinin stains and assay periodicity or calculating the autocorrelation coefficients. How about myosin stains?

      Our response: We quantified Figure 6C as suggested by the reviewer. We also performed myosin staining. The results were similar to that shown by the a-actinin antibody (see revised Figure 6-Fig supplement 1B).

      4) Do the authors see periodic NMs patterns in developing mouse muscle fibers as indicated by the model in in in figure 7D? It is unclear if nonmuscle myosin is present in a PERIODIC pattern in early myofibrils. NM myosin periodic patterns that have been observed have a periodicity of only about 1 µm fitting the shorter length of the NM bipolar filaments (about 300 nm only, PMID 28114270).

      Our response: The reviewer raised a good point here. Ideally, we should examine developing mouse muscle fibers to prove that NM shows periodic patterns. However, due to the difficulty in catching myocytes undergoing sarcomere assembly, the majority of the studies involving NM in sarcomeres use cultured cardiomyocytes. Using TA muscles from P1 new-born mice, we failed to detect the presence of NM in sarcomeres (see Author response image 3 below). Actually, nearly all the myofibers showed mature sarcomere pattern without the NM signal. More work is needed in the future to examine developing mouse fibers at different embryonic stages to look for the presence of NM in developing sarcomeres.

      Author response image 3.

      The TA muscles were collected from male and female P1 mice. The muscles were sectioned and co-stained for a-actinin (Actn) and Myh9. The majority of myofibrils is mature without the NM II signal. Scale bar, 10 µm.

      5) Recent work suggested that mechanical tension is key to assemble the first long periodic myofibril containing immature sarcomeres. Tension is likely produced by a combination of NM and Mhc in the assembling sarcomeres themselves. This could be included in the introduction or discussion (PMIDs 24631244, 29316444, 29702642, 35920628).

      Our response: We thank the reviewer for pointing to us additional relevant references. We have added them in the Introduction.

      6) I suggest replacing "sarcomeric muscles" with "striated muscles".

      Our response: We revised the term in the manuscript as suggested by the reviewer.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      We appreciate the valuable and constructive comments of Reviewer #1 on our manuscript. We have addressed the comments from Reviewer #1 in the public review in the response to the recommendations for the authors, as the public review comments largely overlap with that of the recommendations for the authors.

      Reviewer #1 (Recommendations For The Authors):

      (1.1) Figure 1 did not use a mock-infected control for the development of R-loops but only a time before infection. I think it would have been a good control to have that after the same time of infection non-infected cells did not show increases in R-loops and this is not a product of the cell cycle.

      We prepared our DRIPc-seq library using cell extracts harvested at 0, 3, 6, and 12 h post-infection (hpi), all at the same post-seeding time point. Each sample was infected with HIV-1 virus in a time-dependent manner. Therefore, it is unlikely that the host cellular R-loop induction observed in our DRIPc-seq results was due to R-loop formation during the cell cycle. In Lines 93–95 of the Results section of the revised manuscript, we have provided a more detailed description of our DRIPc-seq library experimental scheme. Thank you. 

      (1.2) Figure 2 should have included a figure showing the proportion of DRIPc-seq peaks located in different genome features relative to one another instead of whether they were influenced by time post-infection. Figure 2C was performed in HeLa cells, but primary T cell data would have been more relevant as primary CD4+ T cells are more relevant to HIV infection.

      We have included a new figure presenting the relative proportion of DRIPc-seq peaks mapped to different genomic features at each hpi (Fig. 2C of the revised manuscript). We found that the proportion of DRIPc-seq peaks mapped to various genomic compartments remained consistent over the hours following the HIV-1 infection. This further supports our original claim that HIV-1 infection does not induce R-loop enrichment at specific genomic features but that the accumulation of R-loops after HIV-1 infection is widely distributed.

      We considered HeLa cells as the primary in vitro infection model, therefore, we conducted RNA-seq only on HeLa cells. However, we agree with the reviewer's opinion that data from primary CD4+ T cells may be more physiologically relevant. Nevertheless, as demonstrated in the new figure (Fig. 2C of the revised manuscript), HIV-1 infection did not significantly alter the proportion of R-loop peaks mapped to specific genomic compartments, such as gene body regions, in HeLa, primary CD4+ T, and Jurkat cells. Therefore, we anticipate no clear correlation between changes in gene expression levels and R-loop peak detection upon HIV-1 infection, even in primary T cells. Thank you.   

      (1.3) Figure 5G is very hard to see when printed, is there a change in brightness or contrast that could be used? The arrows are helpful but they don't seem to be pointing to much.

      We have highlighted the intensity of the PLA foci and magnified the images in Fig. 5G in the revised manuscript. While editing the images according to your suggestion, we found a misannotation regarding the multiplicity of infection in the number of PLA foci per nucleus quantification analysis graph in Fig. 5G of the original manuscript. We have corrected this issue and hope that it is now much clearer. 

      (1.4) The introduction provided a good background for those who may not have a comprehensive understanding of DNA-RNA hybrids and R-loops, but the rationale that integration in non-expressed sequence implies that R-loops may be involved is very weak and was not addressed experimentally. A better rationale would have been to point out that, although integration in genes is strongly associated with gene expression, the association is not perfect, particularly in that some highly expressed genes are, nonetheless, poor integration targets.

      In accordance with the reviewer's comment, we revised the Introduction. We have deleted the statement and reference in the introduction "... the most favored region of HIV-1 integration is an intergenic locus, ...”, which may overstate the relevance of the R-loop in HIV-1 integration events in non-expressed sequences. Instead, we introduced a more recent finding that high levels of gene expression do not always predict high levels of integration, together with the corresponding citation (Lines 46– 47 of the revised manuscript), according to the reviewer’s suggestion in the reviewer's public review 2)-(a).

      (1.5) The discussion was seriously lacking in connecting their conclusions regarding R-loop targeting of integration to how integration works at the structural level, where it is very clear that concerted integration on the two DNA strands ca 5 bp apart is essential to correct, 2-ended integration. It is very difficult to visualize how this would be possible with the triple-stranded R-loop as a target. The manuscript would be greatly strengthened by an experiment showing concerted integration into a triplestranded structure in vitro using PICs or pure integrase.

      We believe there has been a misunderstanding of our interpretation regarding the putative role of R-loop structures in the HIV-1 integration site mechanism because of some misleading statements in our original manuscript. Based primarily on our current data, we believe that R-loop structures are bound by HIV-1 integrase proteins and lead to HIV-1 viral genome integration into the vicinity regions of the host genomic R-loops. By carefully revising our manuscript, we found that the title, abstract, and discussion of our original manuscript includes phrases, such as “HIV-1 targets R-loops for integration,” which may overstate our finding on the role of R-loop in HIV-1 integration site selection. We replaced these phrases. For example, we used phrases, such as, “HIV-1 favors vicinity regions of R-loop for the viral genome integration,” in the revised manuscript. We apologize for the inconvenience caused by the unclear and nonspecific details of our findings.  

      Using multiple biochemical experiments, we successfully demonstrated the interaction between the cellular R-loop and HIV-1 integrase proteins in cells and in vitro (Fig. 5 of the revised manuscript). However, we could not validate whether the center of the triple-stranded R-loops is the extraction site of HIV-1 integration, where the strand transfer reaction by integrase occurs. This is because an R-loop can be multi-kilobase in size (1, 2); therefore, we displayed a large-scale genomic region (30-kb windows) to present the integration sites surrounding the R-loop centers. Nevertheless, we believe that we validated R-loop-mediated HIV-1 integration in R-loop-forming regions using our pgR-poor and pgR-rich cell line models. When infected with HIV-1, pgR-rich cells, but not pgR-poor cells, showed higher infectivity upon R-loop induction in designated regions following DOX treatment (Fig. 3C and 3D of the revised manuscript). In addition, we quantified site-specific integration events in R-loop regions, and found that a greater number of integration events occurred in designated regions of the pgR-rich cellular genome upon R-loop induction by DOX treatment, but not in pgR-poor cells (Fig. 3E–G of the revised manuscript). 

      We agree with the reviewer that an experiment showing the concerted integration of purified PICs into a triple-stranded structure in vitro would greatly strengthen our manuscript. We attempted the purification of viral DNA (vDNA)-bound PICs using either Sso7d-tagged HIV-1 integrase proteins or non-tagged HIV-1 integrase proteins (F185K/C280S) procured from the NIH HIV reagent program (HRP-20203), following the method described by Passos et al., Science, 2017; 355 (89-92) (3). Despite multiple attempts, we could not purify the nucleic acid-bound protein complexes for in vitro integration assays. However, we believe that pgR-poor and pgR-rich cell line models provide a strong advantage in specificity of our primer readouts. Compounded with our in cellulo observation, we believe that our work provides strong evidence for a causative relationship between R-loop formation/R-loop sites and HIV-1 integration.

      Additionally, in the Discussion section of the revised manuscript, we have expanded our discussion on the role of genomic R-loops contributing in molding the host genomic environment for HIV-1 integration site selection, and the potential explanation on how R-loops are driving integration over long-range genomic regions. Thank you. 

      (1.6) There are serious concerns with the quantitation of integration sites used here, which should be described in detail following line 503 but isn't. In Figure 3, E-G, they are apparently shown as reads per million, while in Figure 4B as "sites (%)" and in 4C as log10 integration frequency." Assuming the authors mean what they say, they are using the worst possible method for quantitation. Counting reads from restriction enzyme-digested, PCR-digested DNA can only mislead. At the numbers provided (MOI 0.6, 10 µg DNA assayed) there would be about 1 million proviruses in the samples assayed, so the probability of any specific site being used more than once is very low, and even less when one considers that a 10% assay efficiency is typical of integration site assays. Although the authors may obtain millions of reads per experiment, the number of reads per site is an irrelevant value, determined only by technical artefacts in the PCR reactions, most significantly the length of the amplicons, a function of the distance from the integration site to the nearest MstII site, further modified by differences in Tm. Better is to collapse identical reads to 1 per site, as may have been done in Figure 4B, however, the efficiency of integration site detection will still be inversely related to the length of the amplicon. Indeed, if the authors were to plot the read frequency against distance to the nearest MstII site, it is likely that they would get plots much like those in Figure 4B.

      Detailed methods for integration site sequencing data processing are described in the Materials and Methods section of the revised manuscript (Line 621–631 of the revised manuscript). We primarily followed HIV-1 integration site sequencing data processing methods previously described by Li et al., mBio, 2020; 11(5) (4).  

      While it may be correct that the HIV-1 integration event cannot occur more than once at a given site, our Fig. 3E, 4C, and 4D of the revised manuscript present the number of integration-site sequencing read counts expressed in reads-per-million (RPM) units or as log10-normalized values. Based on the number of mapped reads from the integration site sequencing results, we can infer that there was an integration event at this site, whether it was a single or multiple event.

      We believe that the original annotation of y-axis, “Integration frequency,” may be misleading as it can be interpreted as a probability of any specific site being used for HIV-1 integration. Therefore, we corrected it as “number of mapped read” for clarity (Fig. 3E–G, 4C and 4D, and the corresponding figure legends of the revised manuscript). We apologize for any confusion. Thank you.

      Other points:

      (1.7) Overall: There are numerous grammatical and usage errors, especially in agreement of subject and verb, and missing articles, sometimes multiple times in the same sentence. These must be corrected prior to resubmission.

      The revised manuscript was edited by a professional editing service. Thank you.

      (1.8) Line 126-134: A striking result, but it needs more controls, as discussed above, including a dose-response analysis.

      We determined the doses of NVP and RAL inhibitors in HeLa cells by optimizing the minimum dose of drug treatment that provided a sufficient inhibitory effect on HIV1 infection (Author response image 1). The primary objective of this experiment was to determine R-loop formation while reverse transcription or integration of the HIV-1 life cycle was blocked, therefore, we do not think that a dose-dependent analysis of inhibitors is required.

      Author response image 1.

      (A and B) Representative flow cytometry histograms of VSV-G-pseudotyped HIV-1-EGFP-infected HeLa cells at an MOI of 1, harvested at 48 hpi. The cells were treated with DMSO, the indicated doses of nevirapine (NVP) (A) or indicated doses of raltegravir (RAL) (B) for 24 h before infection. 

      (1.9) Line 183: Please tell us what ECFP is and why it was chosen. Is there a reference for its failure to form R-loops?

      Ibid: The human AIRN gene is a very poor target for HIV integration in PBMC.

      A high GC skew value (> 0) is a predisposing factor for R-loop formation at the transcription site. This is because a high GC skew causes a newly synthesized RNA strand to hybridize to the template DNA strand, and the non-template DNA strand remains looped out in a single-stranded conformation (5) (Ref 36 in the revised manuscript). The ECFP sequence possessed a low GC skew value, as previously used for an R-loop-forming negative sequence (6) (Ref 17 of the revised manuscript). We have added this description and the corresponding references to Lines 188–192 of the revised manuscript.  

      The human AIRN gene (RefSeq DNA sequence: NC_000006.12) sequence possesses a GC skew value of -0.04, in a window centered at base 2186, while the mouse AIRN (mAIRN) sequence is characterized by a GC skew value of 0.213. The ECFP sequence gave a GC skew value of -0.086 in our calculation. We anticipated that the human AIRN gene region does not form a stable R-loop, and in fact, it did not harbor R-loop enrichment upon HIV-1 infection in our DRIPc-seq data analysis of multiple cell types (Author response image 2)

      Author response image 2.

      Genome browser screenshot over the chromosomal regions in 20-kb windows centered on human AIRN showing results from DRIPc-seq in the indicated HIV-1-infected cells (blue, 0 hpi; yellow, 3 hpi; green, 6 hpi; red, 12 hpi)

      (1.10) Line 190: You haven't shown dependence. Associated is a better word.

      Thank you for the suggestion. We have changed “R-loop-dependent site-specific HIV-1 integration events...” to “R-loop-associated site-specific HIV-1 integration events...” (Line 198 of the revised manuscript) according to the reviewer’s suggestion in the revised manuscript. 

      (1.11) Line 239: What happened to P1? What is the relationship of the P and N regions to genes?

      We have added superimpositions of the P1 chromatin region on DRIPc-seq and the HIV-1 integration frequency to Figure 4C of the revised manuscript. We observed a relevant integration event within the P1 R-loop region, but to a lesser extent than in the P2 and P3 R-loop regions, perhaps because the P1 region has relatively less R-loop enrichment than the P2 and P3 regions, as examined by DRIP-qPCR in S3A Fig. of the revised manuscript.

      Genome browser screenshots with annotations of accommodating genes in the P and N regions are shown in S2A–E Fig. of the revised manuscript, and RNA-seq analysis of the relative gene expression levels of the P1-3 and N1,2 R-loop regions are shown in S4 Table of the revised manuscript. Thank you.

      (1.12) Line 261: But the binding affinity of integrase to the R-loop is somewhat weaker than to double-stranded DNA according to Figure 5A.

      Nucleic acid substrates were loaded at the same molarity, and the percentage of the unbound fraction was calculated by dividing the intensity of the unbound fraction in each lane by the intensity of the unbound fraction in the lane with 0 nM integrase in the binding reaction. The calculated percentages of the unbound fraction from three independent replicate experiments are shown in Fig. 5A, right of the revised manuscript. In our analysis and measurements, the integrase proteins showed higher binding affinities to the R-loop and R-loop comprising nucleic acid structures than to dsDNA in vitro. We hope that this explanation clarifies this point. 

      (1.13) Line 337: "accumulate". This is a not uncommon misinterpretation of the results of studies on the distribution of intact proviruses in elite controllers. The only possible correct interpretation of the finding is that proviruses form everywhere else but cells containing them are eliminated, most likely by the immune system.

      Thank you for the suggestion. We have changed the Line 337 of the original manuscript to “... HIV-1 proviruses in heterochromatic regions are not eliminated but selected by immune system,” in Lines 361-363 of the revised manuscript. 

      (1.14) Line 371 How many virus particles per cell does this inoculum amount to?

      We determined the amount of GFP reporter viruses required to transduce ∼50% of WT Jurkat T cells, corresponding to an approximate MOI of 0.6. We repeatedly obtained 30–50% of VSV-G-pseudotyped HIV-1-EGFP positively infected cells for HIV1 integration site sequencing library construction for Jurkat T cells. 

      (1.15) Line 503 and Figures 3 and 4: There must be a clear description of how integration events are quantitated.

      Detailed methods for integration site sequencing data processing are described in the Materials and Methods section of the revised manuscript (Line 621–631 of the revised manuscript). We primarily followed HIV-1 integration site sequencing data processing methods previously described in Li et al., mBio, 2020; 11(5) (4).

      Reviewer #2 (Public Review):

      Retroviral integration in general, and HIV integration in particular, takes place in dsDNA, not in R-loops. Although HIV integration can occur in vitro on naked dsDNA, there is good evidence that, in an infected cell, integration occurs on DNA that is associated with nucleosomes. This review will be presented in two parts. First, a summary will be provided giving some of the reasons to be confident that integration occurs on dsDNA on nucleosomes. The second part will point out some of the obvious problems with the experimental data that are presented in the manuscript.

      We appreciate your comments. We have carefully addressed the concerns expressed as follows (your comments are in italics):  

      (2.1) 2017 Dos Passos Science paper describes the structure of the HIV intasome. The structure makes it clear that the target for integration is dsDNA, not an R-loop, and there are very good reasons to think that structure is physiologically relevant. For example, there is data from the Cherepanov, Engelman, and Lyumkis labs to show that the HIV intasome is quite similar in its overall structure and organization to the structures of the intasomes of other retroviruses. Importantly, these structures explain the way integration creates a small duplication of the host sequences at the integration site. How do the authors propose that an R-loop can replace the dsDNA that was seen in these intasome structures?

      We do appreciate the current understanding of the HIV-1 integration site selection mechanism and the known structure of the dsDNA-bound intasome. Our study proposes an R-loop as another contributor to HIV-1 integration site selection. Recent studies providing new perspectives on HIV-1 integration site targeting motivated our current work. For instance, Ajoge et al., 2022 (7) indicated that a guanine-quadruplex (G4) structure formed in the non-template DNA strand of the R-loop influences HIV-1 integration site targeting. Additionally, I. K. Jozwik et al., 2022 (8) showed retroviral integrase protein structure bound to B-to-A transition in target DNA. R-loop structures are a prevalent class of alternative non-B DNA structures (9). We acknowledge the current understanding of HIV-1 integration site selection and explore how R-loop interactions may contribute to this knowledge in the Discussion section of our manuscript. 

      Primarily based on our current data, we believe that R-loop structures are bound by HIV-1 integrase proteins and lead to HIV-1 viral genome integration into the vicinity regions of the host genomic R-loops, but we do not claim that R-loops completely replace dsDNA as the target for HIV-1 integration. An R-loop can be multi-kilobase in size and the R-loop peak length widely varies depending on the immunoprecipitation and library construction methods (1, 2), therefore, we could not validate whether the center of triple-stranded R-loops is the extraction site of HIV-1 integration where the strand transfer reaction by integrase occurs. Therefore, we replaced phrases such as, “HIV-1 targets R-loops for integration,” which may overstate our finding on the role of R-loop in HIV-1 integration site selection, with phrases, such as, “HIV-1 favors vicinity regions of R-loop for the viral genome integration,” in the revised manuscript. We apologize for the inconvenience caused by the unclear and non-specific details of our findings. Nevertheless, we believe that we validated R-loop-mediated HIV-1 integration in R-loop-forming regions using our pgR-poor and pgR-rich cell line models. We quantified site-specific integration events in the R-loop regions, and found that a greater number of integration events occurred in designated regions of the pgR-rich cellular genome upon R-loop induction by DOX treatment, but not in pgR-poor cells (Fig. 3E–G of the revised manuscript). 

      dsDNA may have been the sole target of the intasome demonstrated in vitro possibly because dsDNA has only been considered as a substrate for in vitro intasome assembly. We hope that our work will initiate and advance future investigations on target-bound intasome structures by considering R-loops as potential new targets for integrated proteins and intasomes.  

      (2.2) As noted above, concerted (two-ended) integration can occur in vitro on a naked dsDNA substrate. However, there is compelling evidence that, in cells, integration preferentially occurs on nucleosomes. Nucleosomes are not found in R loops. In an infected cell, the viral RNA genome of HIV is converted into DNA within the capsid/core which transits the nuclear pore before reverse transcription has been completed. Integration requires the uncoating of the capsid/core, which is linked to the completion of viral DNA synthesis in the nucleus. Two host factors are known to strongly influence integration site selection, CPSF6 and LEDGF. CPSF6 is involved in helping the capsid/core transit the nuclear pore and associate with nuclear speckles. LEDGF is involved in helping the preintegration complex (PIC) find an integration site after it has been released from the capsid/core, most commonly in the bodies of highly expressed genes. In the absence of an interaction of CPSF6 with the core, integration occurs primarily in the lamin-associated domains (LADs). Genes in LADs are usually not expressed or are expressed at low levels. Depending on the cell type, integration in the absence of CPSF6 can be less efficient than normal integration, but that could well be due to a lack of LEDGF (which is associated with expressed genes) in the LADs. In the absence of an interaction of IN with LEDGF (and in cells with low levels of HRP2) integration is less efficient and the obvious preference for integration in highly expressed genes is reduced. Importantly, LEDGF is known to bind histone marks, and will therefore be preferentially associated with nucleosomes, not R-loops. LEDGF fusions, in which the chromatin binding portion of the protein is replaced, can be used to redirect where HIV integrates, and that technique has been used to map the locations of proteins on chromatin. Importantly, LEDGF fusions in which the chromatin binding component of LEDGF is replaced with a module that recognizes specific histone marks direct integration to those marks, confirming integration occurs efficiently on nucleosomes in cells. It is worth noting that it is possible to redirect integration to portions of the host genome that are poorly expressed, which, when taken with the data on integration into LADs (integration in the absence of a CPSF6 interaction) shows that there are circumstances in which there is reasonably efficient integration of HIV DNA in portions of the genome in which there are few if any R-loops.

      Although R-loops may not wrap around nucleosomes, long and stable R-loops likely cover stretches of DNA corresponding to multiple nucleosomes (10). For example, R-loops are associated with high levels of histone marks, such as H3K36me3, which LEDGF recognizes (2, 11). R-loops dynamically regulate the chromatin architecture. Possibly by altering nucleosome occupancy, positioning, or turnover, R-loop structures relieve superhelical stress and are often associated with open chromatin marks and active enhancers (2, 10). These features are also distributed over HIV-1 integration sites (12). In the Discussion section of the revised manuscript, we explored the R-loop molding mechanisms in the host genomic environment for HIV-1 integration site selection and its potential collaborative role with LEDGF/p75 and CPSF6 governing HIV-1 integration site selection. 

      By carefully revising our original manuscript, with respect to the reviewer's comment, we recognized the need to tone down our statements. We found that the title, abstract, and discussion of our original manuscript includes phrases, such as, “HIV-1 targets Rloops for integration,” which may overstate our finding on the role of R-loop in HIV-1 integration site selection. We replaced these phrases. For example, we used phrases, such as “HIV-1 favors vicinity regions of R-loop for the viral genome integration,” in the revised manuscript. We apologize for the inconvenience caused by the unclear and non-specific details of our findings.

      (2.3) Given that HIV DNA is known to preferentially integrate into expressed genes and that R-loops must necessarily involve expressed RNA, it is not surprising that there is a correlation between HIV integration and regions of the genome to which R loops have been mapped. However, it is important to remember that correlation does not necessarily imply causation.

      We understand the reviewer's concern regarding the possibility of a coincidental correlation between the R-loop regions and HIV-1 integration sites, particularly when the interpretation of this correlation is primarily based on a global analysis. 

      Therefore, we designed pgR-poor and pgR-rich cell lines, which we believe are suitable models for distinguishing between integration events driven by transcription and the presence of R-loops. Although the two cell lines showed comparable levels of transcription at the designated region upon DOX treatment via TRE promoter activation (Fig. 3B of the revised manuscript), only pgR-rich cells formed R-loops at the designated regions (Fig. 3C of the revised manuscript). When infected with HIV1, pgR-rich cells, but not pgR-poor cells, showed higher infectivity after DOX treatment (Fig. 3D of the revised manuscript). Moreover, we quantified site-specific integration events in the R-loop regions, and found that a greater number of integration events occurred in designated regions of the pgR-rich cellular genome upon R-loop induction by DOX treatment, but not in pgR-poor cells (Fig. 3E of the revised manuscript). Therefore, we concluded that transcriptional activation without an R-loop (in pgR-poor cells) may not be sufficient to drive HIV-1 integration. We believe that our work provides strong evidence for a causative relationship between R-loop formation/Rloop sites and HIV-1 integration. We hope that our explanation addresses your concerns. Thank you.

      If we consider some of the problems in the experiments that are described in the manuscript:

      (2.4) In an infected individual, cells are almost always infected by a single virion and the infecting virion is not accompanied by large numbers of damaged or defective virions. This is a key consideration: the claim that infection by HIV affects R-loop formation in cells was done with a VSVg vector in experiments in which there appears to have been about 6000 virions per cell. Although most of the virions prepared in vitro are defective in some way, that does not mean that a large fraction of the defective virions cannot fuse with cells. In normal in vivo infections, HIV has evolved in ways that avoid signaling infected the cell of its presence. To cite an example, carrying out reverse transcription in the capsid/core prevents the host cell from detecting (free) viral DNA in the cytoplasm. The fact that the large effect on R-loop formation which the authors report still occurs in infections done in the absence of reverse transcription strengthens the probability that the effects are due to the massive amounts of virions present, and perhaps to the presence of VSVg, which is quite toxic. To have physiological relevance, the infections would need to be carried out with virions that contain HIV even under circumstances in which there is at most one virion per cell.

      Our virus production and in vitro and ex vivo HIV-1 infection experimental conditions, designed for infecting cell types, such as HeLa cells and primary CD4+ T cells with VSV-G pseudotyped HIV, were based on a comprehensive review of numerous references. At the very beginning of this study, we tested HIV-1-specific host genomic R-loop induction using empty virion particles (virus-like particles, VLP) or other types of viruses (non-retrovirus, SeV; retroviruses, FMLV and FIV), all produced with a VSV G protein donor. We could not include a control omitting the VSV G protein or using natural HIV-1 envelope protein to prevent viral spread in culture. We observed that despite all types of virus stocks being prepared using VSV-G, only cells infected with HIV-1 viruses showed R-loop signal enrichment (Author response image 3). Therefore, we omitted the control for the VSV G protein in subsequent analyses, such as DRIPcseq. We have also revised our manuscript to provide a clearer description of the experimental conditions. In particular, we now clearly stated that we used VSV-G pseudotyped HIV-1 in this study, throughout the abstract, results, and discussion sections of the revised manuscript. Thank you.

      Author response image 3.

      (A) Dot blot analysis of the R-loop in gDNA extracts from HIV-1 infected U2OS cells with MOI of 0.6 harvested at 6 hpi. The gDNA extracts were incubated with or without RNase H in vitro before membrane loading (anti-S9.6 signal). (B) Dot blot analysis of the R-loop in gDNA extracts from HeLa cells infected with 0.3 MOI of indicated viruses. The infected cells were harvested at 6 hpi. The gDNA extracts were incubated with or without RNase H in vitro before membrane loading (anti-S9.6 signal).

      HIV-1 co-infection may also be expected in cell-free HIV-1 infections. However, it was previously suggested that the average number of infection events varies within 1.02 to 1.65 based on a mathematical model that estimates the frequency of multiple infections with the same virus (Figure 4c of Ito et al., Sci. Rep, 2017; 6559) (13). 

      (2.5) Using the Sso7d version of HIV IN in the in vitro binding assays raises some questions, but that is not the real question/problem. The real problem is that the important question is not what/how HIV IN protein binds to, but where/how an intasome binds. An intasome is formed from a combination of IN bound to the ends of viral DNA. In the absence of viral DNA ends, IN does not have the same structure/organization as it has in an intasome. Moreover, HIV IN (even Sso7d, which was modified to improve its behavior) is notoriously sticky and hard to work with. If viral DNA had been included in the experiment, intasomes would need to be prepared and purified for a proper binding experiment. To make matters worse, there are multiple forms of multimeric HIV IN and it is not clear how many HIV INs are present in the PICs that actually carry out integration in an infected cell.

      As the reviewer has noted, HIV IN, even with Sso7d tagging, is difficult. We attempted the purification of viral DNA (vDNA)-bound PICs using either Sso7d-tagged HIV-1 integrase proteins or non-tagged HIV-1 integrase proteins (F185K/C280S), procured from the NIH HIV reagent program (HRP-20203), following the method described by Passos et al., Science, 2017; 355 (89-92) (3). Despite multiple attempts, we were unable to purify the vDNA-bound IN protein complexes for in vitro assays. However, through multiple biochemical experiments, we believe that we have successfully demonstrated the interaction between cellular R-loops and HIV-1 integrase proteins both in cells and in vitro (Fig. 5A–F of the revised manuscript). We also observed a close association between integrase proteins and host cellular Rloops in HIV-1-infected cells, using a fluorescent recombinant virus (HIV-IN-EGFP) with intact IN-EGFP PICs (Fig. 5G of the revised manuscript). 

      (2.6) As an extension of comment 2, the proper association of an HIV intasome/PIC with the host genome requires LEDGF and the appropriate nucleic acid targets need to be chromatinized.

      The interaction between cellular R-loops and HIV-1 integrase proteins in HeLa cells endogenously expressing LEDGF/p75 was examined using reciprocal immunoprecipitation assays in Fig. 5C–F, S6B, and S6D Fig. of the revised manuscript. In addition, as discussed in more detail in our response to comment [28], we observed a close association between host cellular R-loops and HIV-1 integrase proteins by PLA assay, in HIV-1-infected HeLa cells. 

      (2.7) Expressing any form of IN, by itself, in cells to look for what IN associates with is not a valid experiment. A major factor that helps to determine both where integration takes place and the sites chosen for integration is the transport of the viral DNA and IN into the nucleus in the capsid core. However, even if we ignore that important part of the problem, the IN that the authors expressed in HeLa cells won't be bound to the viral DNA ends (see comment 2), even if the fusion protein would be able to form an intasome. As such, the IN that is expressed free in cells will not form a proper intasome/PIC and cannot be expected to bind where/how an intasome/PIC would bind.

      As discussed in more detail in our response to comment [2-8], we believe that our PLA experiment using the pVpr-IN-EGFP virus, which has previously been examined for virion integrity, as well as the IN-EGFP PICs (14), demonstrated a close association between host cellular R-loops and HIV-1 integrase proteins in HIV-1-infected cells. 

      (2.8) As in comment 1, for the PLA experiments presented in Figure 5 to work, the number of virions used per cell (which differs from the MOI measured by the number of cells that express a viral marker) must have a high, which is likely to have affected the cells and the results of the experiment. However, there is the additional question of whether the IN-GFP fusion is functional. The fact that the functional intasome is a complex multimer suggests that this could be a problem. There is an additional problem, even if IN-GFP is fully functional. During a normal infection, the capsid core will have delivered copies of IN (and, in the experiments reported here, the IN-GFP fusion) into the nucleus that is not part of the intasome. These "free" copies of IN (here IN-GFP) are not likely to go to the same sites as an intasome, making this experiment problematic (comment 4).

      The HIV-IN-EGFP virus stock was produced by polyethylenimine-mediated transfection of HEK293T cells with 6 µg of pVpr-IN-EGFP, 6 µg of HIV-1 NL4-3 noninfectious molecular clone (pD64E; NIH AIDS Reagent Program 10180), and 1 µg of pVSV-G as previously described in (14), and described in the Materials and Methods section of our manuscript. The pVpr-IN-EGFP vector used to produce HIV-1-IN-EGFP virus stock was provided by Anna Cereseto group (Albanese et al., PLOS ONE, 2008; 6(6); Ref 34 of the revised manuscript). It was previously reported that the HIV-1INEGFP virions produced by IN-EGFP trans-incorporation through Vpr are intact and infective viral particles (Figure 1 of Albanese et al., PLOS ONE, 2008; 6(6)). Therefore, we believe that the HIV-IN-EGFP used in our PLA experiments was functional. 

      Additionally, Albanese et al. showed that the EGFP signal of HIV-IN-EGFP virions colocalizes with the viral protein matrix (p17MA) and capsid (P24CA) as well as with the newly synthesized cDNA produced by reverse transcriptase by labeling and visualizing the synthesized cDNA (14). In addition, the fluorescent recombinant virus (HIV-INEGFP) was structurally intact at the nuclear level (Figure 6 of Albanese et al., PLOS ONE, 2008; 6(6)). Therefore, we believe that our PLA experimental result is not likely misled as the reviewer concerns due to the integrity of the HIV-IN-EGFP virion as well as IN-EGFP PICs.

      Furthermore, the in vitro HIV-1 infection setting of our PLA experiments was carefully determined based on multiple studies that performed image-based assays on HIV-1infected cells. For instance, Albanese et al. infected 4 × 104 cells with viral loads equivalent to 1.5 or 3 µg of HIV-1 p24 for their immunofluorescence analysis, in their previous report (14). We titrated the fluorescent HIV-1 virus stocks by examining both the multiplicity of infection (MOI) and quantifying the HIV-1 p24 antigen content (Author response image 4). In our calculation, we infected 5 × 104 HeLa cells with viral loads equivalent to 1.3 ug of HIV-1 p24, which is indicated as 2 MOI in Fig. 5G of our manuscript, for our PLA experiments. 

      Image-Based Assays often require increased and enhanced signal for statistical robustness. For example, Achuthan et al. infected cells with VSV-G-pseudotyped HIV1 at the approximate MOI of 350 for vDNA and PIC visualization (15). Therefore, we believe our experimental condition for PLA experiments, which we carefully designed based on previous study that are frequently referred, are reasonable. We really hope that our discussion sufficiently addressed the reviewer’s concern. 

      Author response image 4.

      Gating strategy used to determine HIV-1-infectivity in HeLa cells at 48 hpi. Cells were infected with a known p24 antigen content in the stock of the VSV-G-pseudotyped HIV-1-EGFP-virus. The percentages of GFP-positive cell population are indicated.

      (2.9) In the Introduction, the authors state that the site of integration affects the probability that the resulting provirus will be expressed. Although this idea is widely believed in the field, the actual data supporting it are, at best, weak. See, for example, the data from the Bushman lab showing that the distribution of integration sites is the same in cells in which the integrated proviruses are, and are not, expressed. However, given what the authors claim in the introduction, they should be more careful in interpreting enzyme expression levels (luciferase) as a measure of integration efficiency in experiments in which they claim proviruses are integrated in different places.

      We thank the reviewer for the constructive comment. We have changed the statement in Lines 41–42 in the Introduction section of our original manuscript to “The chromosomal landscape of HIV-1 integration influences proviral gene expression, persistence of integrated proviruses, and prognosis of antiretroviral therapy.” (Lines 39-41 of the revised manuscript). We believe that this change can tone-down the relevance between the site of integration and the provirus expression level.

      The piggyBac transposase randomly insert the “cargo (transposon)” into TTAA chromosomal sites of the target genome, generating efficient insertions at different genomic loci (16, 17). We believe that this random insertion of the pgR-poor/rich vector mediated by the piggyBac system allows us not to mislead the R-loop-mediated HIV1 integration site because of the genome locus bias of the vector insertion. Therefore, Figure 3 in our manuscript does not claim any relevance between the site of integration and the resulting provirus expression levels. Instead, as noted in Line 214 of the revised manuscript, using the luciferase reporter HIV-1 virus, we attempted to examine HIV-1 infection in cells with an "extra number of R-loops” in the host cellular genome. We observed that pgR-rich cells showed higher luciferase activity upon DOX treatment than pgR-poor cells (Fig. 3D of the revised manuscript). We believe that this is because a greater number of HIV-1 integration events may occur in pgR-rich cells, where DOX-inducible de novo R-loop regions are introduced. This has been further examined in Fig. 3E–G of the revised manuscript. We hope this explanation clarifies the Figure 3. Thank you. 

      (2.10) Using restriction enzymes to create an integration site library introduces biases that derive from the uneven distribution of the recognition sites for the restriction enzymes.

      As described in the Materials and Methods section, we adopted a sequencing library construction method using a previously established protocol (18, 19). Although we recognize the advantages of DNA fragmentation by sonication, in in vitro or ex vivo HIV-1 infection settings, where the multiplicity of infection is carefully determined based on multiple references, more copies of integrated viral sequences are expected compared to that in samples from infected patients (18). Therefore, in these settings, restriction enzyme-based DNA fragmentation and ligation-mediated PCR sequencing are well-established methods that provide significant data sources for HIV-1 integration site sequencing (15, 20-22). Furthermore, our data showing the proportion of integration sites over R-loop regions (Fig. 4B of the revised manuscript) are presented alongside the respective random controls (i.e., proportion of integration sites within the 30-kb windows centered on randomized DRIPc-seq peaks, gray dotted lines; control comparisons between randomized integration sites with DRIPc-seq peaks, black dotted lines; and randomized integration sites with randomized DRIPcseq peaks, gray solid lines), which do not show such a correlation between the HIV-1 integration sites and nearby areas of the R-loop regions. Therefore, we believe that our results from the integration site sequencing data analysis are unlikely to be biased. 

      Reviewer #3 (Public Review):

      In this manuscript, Park and colleagues describe a series of experiments that investigate the role of R-loops in HIV-1 genome integration. The authors show that during HIV-1 infection, R-loops levels on the host genome accumulate. Using a synthetic R-loop prone gene construct, they show that HIV-1 integration sites target sites with high R-loop levels. They further show that integration sites on the endogenous host genome are correlated with sites prone to R-loops. Using biochemical approaches, as well as in vivo co-IP and proximity ligation experiments, the authors show that HIV-1 integrase physically interacts with R-loop structures.

      My primary concern with the paper is with the interpretations the authors make about their genome-wide analyses. I think that including some additional analyses of the genome-wide data, as well as some textual changes can help make these interpretations more congruent with what the data demonstrate. Here are a few specific comments and questions:

      We are grateful for the time and effort we spent on our behalf and the reviewer’s appreciation for the novelty of our work, in particular, R-loop induction by HIV-1 infection and the correlation between host R-loops and the genomic site of HIV-1 integration. In the following sections, we provide our responses to your comments and suggestions. Your comments are in italics. We have carefully addressed the following issues.

      (3.1) I think Figure 1 makes a good case for the conclusion that R-loops are more easily detected HIV-1 infected cells by multiple approaches (all using the S9.6 antibody). The authors show that their signals are RNase H sensitive, which is a critical control. For the DRIPc-Seq, I think including an analysis of biological replicates would greatly strengthen the manuscript. The authors state in the methods that the DRIPc pulldown experiments were done in biological replicates for each condition. Are the increases in DRIPc peaks similar across biological replicates? Are genomic locations of HIV-1-dependent peaks similar across biological replicates? Measuring and reporting the biological variation between replicate experiments is crucial for making conclusions about increases in R-loop peak frequency. This is partially alleviated by the locus-specific data in Figure S3A. However, a better understanding of how the genome-wide data varies across biological replicates will greatly enhance the quality of Figure 1.

      DRIPc-seq experiments were conducted with two biological replicates. To define consensus DRIPc-seq peaks using these two replicates, we used two methods applicable to ChIP-seq analysis: the irreproducible discovery rate (IDR) method and sequencing data pooling. We found that the sequencing data pooling method yielded significantly more DRIPc-seq peaks than consensus peak identification through IDR, and we decided to utilize R-loop peaks from pooled sequencing data for our downstream analyses, as described in the figure legends and Materials and Methods of the revised manuscript. 

      As noted by the reviewer, it is important to verify whether the increasing trend in the number of R-loop peaks and genomic locations of HIV-1 dependent R-loops were consistently observed across the two biological replicates. Therefore, we independently performed R-loop calling on each replicate of the sequencing data of primary CD4+ T cells from two individual donors to verify that the increase in R-loop numbers was consistent (Author response image 5). Additionally, the overlap of the R-loop peaks between the two replicates was statistically significant across the genome (Author response table 1). Thank you.

      Author response image 5.

      Bar graph indicating DRIPc-seq peak counts for HIV-1-infected primary CD4+ T cells harvested at the indicated hours post infection (hpi). Pre-immunoprecipitated samples were untreated (−) or treated (+) with RNase H, as indicated. Each dot corresponds to an individual data set from two biologically independent experiments.

      Author response table 1.

      DRIPc-seq peak length and Chi-square p-value in CD4+ T cells from individual donor 1 and 2 

      (3.2) I think that the conclusion that R-loops "accumulate" in infected cells is acceptable, given the data presented. However, in line 134 the authors state that "HIV1 infection induced host genomic R-loop formation". I suggest being very specific about the observation. Accumulation can happen by (a) inducing a higher frequency of the occurrence of individual R-loops and/or (b) stabilizing existing R-loops. I'm not convinced the authors present enough evidence to claim one over the other. It is altogether possible that HIV-1 infection stabilizes R-loops such that they are more persistent (perhaps by interactions with integrase?), and therefore more easily detected. I think rephrasing the conclusions to include this possibility would alleviate my concerns.

      We thank the reviewer for the considerable discussion on our manuscript. We have now changed Line 134 to, “HIV-1 infection induces host genomic R-loop enrichment” (Lines 132-133 of the revised manuscript), and added a new conclusion sentence implicating the possible explanation for the R-loop signal enrichment upon HIV-1 infection (Lines 133–135 of the revised manuscript), according to the reviewer's suggestion.    

      (3.3) A technical problem with using the S9.6 antibody for the detection of R-loops via microscopy is that it cross-reacts with double-stranded RNA. This has been addressed by the work of Chedin and colleagues (as well as others). It is absolutely essential to treat these samples with an RNA:RNA hybrid-specific RNase, which the authors did not include, as far as their methods section states. Therefore, it is difficult to interpret all of the immunofluorescence experiments that depend on S9.6 binding.

      We understand the reviewer's concern regarding the cross-reactivity of the S9.6 antibody with more abundant dsRNA, particularly in imaging applications. We carefully designed the experimental and analytical methods for R-loop detection using microscopy. For example, we pre-extracted the cytoplasmic fraction before staining with the S9.6 antibody and quantified the R-loop signal by subtracting the nucleolar signal. Both of these steps were taken to eliminate the possibility of misdetecting Rloops via microscopy because of the prominent cytoplasmic and nucleolar S9.6 signals, which primarily originate from ribosomal RNA. In addition, we included R-loop negative control samples in our microscopy analysis that were subjected to intensive RNase H treatment (60U/mL RNase H for 36 h) and observed a significant reduction in the S9.6 signal (Figure 1E of the revised manuscript). RNase H-treated samples served as essential and widely accepted negative controls for R-loop detection. 

      We would like to point out that recent studies have reported strong intrinsic specificity of S9.6 anybody for DNA:RNA hybrid duplex over dsDNA and dsRNA, along with the structural elucidations of S9.6 antibody recognition of hybrids (23, 24). Therefore, our interpretation of host cellular R-loop enrichment after HIV-1 infection using S9.6 antibodies in multiple biochemical approaches is well supported. Nevertheless, we agree with the reviewer's opinion that additional negative controls for the detection of R-loops via microscopy, such as RNase T1-and RNase III-treated samples, could improve the robustness and accuracy of R-loop imaging data (25).  

      (3.4) Given that there is no clear correlation between expression levels and R-loop peak detection, combined with the data that show increased detection of R-loop frequency in non-genic regions, I think it will be important to show that the R-loop forming regions are indeed transcribed above background levels. This will help alleviate possible concerns that there are technical errors in R-loop peak detection.

      Figures S5D and S5E in the revised manuscript show the relative gene expression levels of the R-loop-forming positive regions (P1-3) and the referenced Rloop-positive loci (RPL13A and CALM3). The gene expression levels of these R-loopforming regions were significantly higher than those of the ECFP or mAIRN genes without DOX treatment, which can be considered background levels of transcription in cells. Thank you. 

      (3.5) In Figures 4C and D the hashed lines are not defined. It is also interesting that the integration sites do not line up with R-loop peaks. This does not necessarily directly refute the conclusions (especially given the scale of the genomic region displayed), but should be addressed in the manuscript. Additionally, it would greatly improve Figure 4 to have some idea about the biological variation across replicates of the data presented 4A.

      We thank the reviewer for the considerable comment on our study. First of all, we added an annotation for the dashed lines in the figure legends of Figures 4C and 4D in the revised manuscript.

      We agree with the reviewer's interpretation of the relationship between the integration sites and R-loop peaks. Primarily based on our current data, we believe R-loop structures are bound by HIV-1 integrase proteins and lead HIV-1 viral genome integration into the “vicinity” regions of the host genomic R-loops. We displayed a large-scale genomic region (30-kb windows) to present integration sites surrounding R-loop centers because an R-loop can be multi-kilobase in size (1, 2). Depending on the immunoprecipitation and library construction methods, the R-loop peaks varied in size, and the peak length showed a wide distribution (Figure 3B of Malig et al., 2020, Figure 1B of Sanz et al., 2016, and Figure 2A of the revised manuscript). Therefore, presenting integration site events within a wide window of R-loop peaks could be more informative and better reflect the current understanding of R-loop biology.

      R-loop formation recruits diverse chromatin-binding protein factors, such as H3K4me1, p300, CTCF, RAD21, and ZNF143 (Figure 6A and 6B of Sanz et al., 2016) (26), which allow R-loops to exhibit enhancer and insulator chromatin states, which can act as distal regulatory elements (26, 27). We have demonstrated physical interactions between host cellular R-loops and HIV-1 integrase proteins (Figure 5 of the revised manuscript), therefore, we believe that this ‘distal regulatory element-like feature’ of the R-loop can be a potential explanation for how R-loops drive integration over longrange genomic regions.

      According to your suggestion, we added this explanation to the relevant literature in the Discussion section of the revised manuscript.

      Author response image 6 which represents the biological variation across replicates of the data shown in Figure 4A. The integration site sequencing data for Jurkat cells were adopted from SRR12322252 (4), which consists of the integration site sequencing data of HIV-1-infected wild type Jurkat cells with one biological replicate. We hope that our explanations and discussion have successfully addressed your concerns. Thank you. 

      Author response image 6.

      Bar graphs showing the quantified number of HIV-1 integration sites per Mb pair in total regions of 30-kb windows centered on DRIPc-seq peaks from HIV-1 infected HeLa cells and primary CD4+ T cells (magenta) or non-R-loop region in the cellular genome (gray). Each dot corresponds to an individual data set from two biologically independent experiments.

      (3.6) The authors do not adequately describe the Integrase mutant that they use in their biochemical experiments in Figure 5A. Could this impact the activity of the protein in such a way that interferes with the interpretation of the experiment? The mutant is not used in subsequent experiments for Figure 5 and so even though the data are consistent with each other (and the conclusion that Integrase interacts with R-loops) a more thorough explanation of why that mutant was used and how it impacts the biochemical activity of the protein will help the interpretation of the data presented in Figure 5.

      We appreciate the reviewer’s suggestions. In our EMSA analysis, we purified and used Sso7d-tagged HIV-1 integrase proteins with an active-site amino acid substitution, E152Q. First, we used the Sso7d-tagged HIV-1 integrase protein, as it has been suggested in previous studies that the fusion of small domains, such as Sso7d (DNA binding domain) can significantly improve the solubility of HIV integrase proteins without affecting their ability to assemble with substrate nucleic acids and their enzymatic activity (Figure 1B of Li et al., PLOS ONE, 2014;9 (8) (28, 29). We used an integrase protein with an active site amino acid substitution, E152Q, in our mobility shift assay, because the primary goal of this experiment was to examine the ability of the protein to bind or form a complex with different nucleic acid substrates. We thought that abolishing the enzymatic activity of the integrase protein, such as 3'-processing that cleaves DNA substrates, would be more appropriate for our experimental objective. This Sso7d tagged- HIV-1 integrase with the E152Q mutation has also been used to elucidate the structural model of the integrase complex with a nucleic acid substrate by cryo-EM (3) and has been shown to not disturb substrate binding.   Based on the reviewer’s comments, we have added a description of the E152Q mutant integrase protein in Lines 268–270 of the revised manuscript. Thank you.

      Reviewer #3 (Recommendations For The Authors):

      The paper suffers from many grammatical errors, which sometimes interfere with the interpretations of the experiments. In the view of this reviewer, the manuscript must be carefully revised prior to publication. For example, lines 247-248 "Intasomes consist of HIV-1 viral cDNA and HIV-1 coding protein, integrases." It is unclear from this sentence whether there are multiple integrases or multiple proteins that interact with the viral genome to facilitate integration. This makes the subsequent experiments in Figure 5 difficult to interpret. There are many other examples, too numerous to point out individually.

      We thoughtfully revised the original manuscript, making the best efforts to provide clearer details of our findings. We believe that we have made substantial changes to the manuscript, including Lines 247–248 of the original manuscript that the reviewer noted. Furthermore, the revised manuscript was edited by a professional editing service. Thank you.     (1) M. Malig, S. R. Hartono, J. M. Giafaglione, L. A. Sanz, F. Chedin, Ultra-deep Coverage Singlemolecule R-loop Footprinting Reveals Principles of R-loop Formation. J Mol Biol 432, 22712288 (2020).

      (2) L. A. Sanz et al., Prevalent, Dynamic, and Conserved R-Loop Structures Associate with Specific Epigenomic Signatures in Mammals. Mol Cell 63, 167-178 (2016).

      (3) D. O. Passos et al., Cryo-EM structures and atomic model of the HIV-1 strand transfer complex intasome. Science 355, 89-92 (2017).

      (4) W. Li et al., CPSF6-Dependent Targeting of Speckle-Associated Domains Distinguishes Primate from Nonprimate Lentiviral Integration. mBio 11,  (2020).

      (5) P. A. Ginno, Y. W. Lim, P. L. Lott, I. Korf, F. Chedin, GC skew at the 5' and 3' ends of human genes links R-loop formation to epigenetic regulation and transcription termination. Genome Res 23, 1590-1600 (2013).

      (6) S. Hamperl, M. J. Bocek, J. C. Saldivar, T. Swigut, K. A. Cimprich, Transcription-Replication Conflict Orientation Modulates R-Loop Levels and Activates Distinct DNA Damage Responses. Cell 170, 774-786 e719 (2017).

      (7) H. O. Ajoge et al., G-Quadruplex DNA and Other Non-Canonical B-Form DNA Motifs Influence Productive and Latent HIV-1 Integration and Reactivation Potential. Viruses 14,  (2022).

      (8) I. K. Jozwik et al., B-to-A transition in target DNA during retroviral integration. Nucleic Acids Res 50, 8898-8918 (2022).

      (9) F. Chedin, C. J. Benham, Emerging roles for R-loop structures in the management of topological stress. J Biol Chem 295, 4684-4695 (2020).

      (10) F. Chedin, Nascent Connections: R-Loops and Chromatin Patterning. Trends Genet 32, 828838 (2016).

      (11) P. B. Chen, H. V. Chen, D. Acharya, O. J. Rando, T. G. Fazzio, R loops regulate promoterproximal chromatin architecture and cellular differentiation. Nat Struct Mol Biol 22, 9991007 (2015).

      (12) A. R. Schroder et al., HIV-1 integration in the human genome favors active genes and local hotspots. Cell 110, 521-529 (2002).

      (13) Y. Ito et al., Number of infection events per cell during HIV-1 cell-free infection. Sci Rep 7, 6559 (2017).

      (14) A. Albanese, D. Arosio, M. Terreni, A. Cereseto, HIV-1 pre-integration complexes selectively target decondensed chromatin in the nuclear periphery. PLoS One 3, e2413 (2008).

      (15) V. Achuthan et al., Capsid-CPSF6 Interaction Licenses Nuclear HIV-1 Trafficking to Sites of Viral DNA Integration. Cell Host Microbe 24, 392-404 e398 (2018).

      (16) X. Li et al., piggyBac transposase tools for genome engineering. Proc Natl Acad Sci U S A 110, E2279-2287 (2013).

      (17) Y. Cao et al., Identification of piggyBac-mediated insertions in Plasmodium berghei by next generation sequencing. Malar J 12, 287 (2013).

      (18) E. Serrao, P. Cherepanov, A. N. Engelman, Amplification, Next-generation Sequencing, and Genomic DNA Mapping of Retroviral Integration Sites. J Vis Exp,  (2016).

      (19) K. A. Matreyek et al., Host and viral determinants for MxB restriction of HIV-1 infection. Retrovirology 11, 90 (2014).

      (20) G. A. Sowd et al., A critical role for alternative polyadenylation factor CPSF6 in targeting HIV-1 integration to transcriptionally active chromatin. Proc Natl Acad Sci U S A 113, E10541063 (2016).

      (21) B. Lucic et al., Spatially clustered loci with multiple enhancers are frequent targets of HIV-1 integration. Nat Commun 10, 4059 (2019).

      (22) P. K. Singh, G. J. Bedwell, A. N. Engelman, Spatial and Genomic Correlates of HIV-1 Integration Site Targeting. Cells 11,  (2022).

      (23) C. Bou-Nader, A. Bothra, D. N. Garboczi, S. H. Leppla, J. Zhang, Structural basis of R-loop recognition by the S9.6 monoclonal antibody. Nat Commun 13, 1641 (2022).

      (24) Q. Li et al., Cryo-EM structure of R-loop monoclonal antibody S9.6 in recognizing RNA:DNA hybrids. J Genet Genomics 49, 677-680 (2022).

      (25) J. A. Smolka, L. A. Sanz, S. R. Hartono, F. Chedin, Recognition of RNA by the S9.6 antibody creates pervasive artifacts when imaging RNA:DNA hybrids. J Cell Biol 220,  (2021).

      (26) L. A. Sanz, F. Chedin, High-resolution, strand-specific R-loop mapping via S9.6-based DNARNA immunoprecipitation and high-throughput sequencing. Nat Protoc 14, 1734-1755 (2019).

      (27) M. Merkenschlager, D. T. Odom, CTCF and cohesin: linking gene regulatory elements with their targets. Cell 152, 1285-1297 (2013).

      (28) M. Li, K. A. Jurado, S. Lin, A. Engelman, R. Craigie, Engineered hyperactive integrase for concerted HIV-1 DNA integration. PLoS One 9, e105078 (2014).

      (29) M. Li et al., A Peptide Derived from Lens Epithelium-Derived Growth Factor Stimulates HIV1 DNA Integration and Facilitates Intasome Structural Studies. J Mol Biol 432, 2055-2066 (2020).

    1. Author Response

      The following is the authors’ response to the original reviews.

      General remarks for the Editor and the Reviewers

      We would like to thank the Editor and the Reviewers for their feedback. Below we address their comments and present our point-by-point responses as well as the related changes in the manuscript.

      In addition to these changes, in a few cases we have found it necessary to move some texts and provide some additional explanations within the manuscript. We emphasize that these amendments have been made for only technical reasons, and do not alter the results and conclusions of the paper, but may help to render the text more coherent and understandable to readers with little knowledge of the subject.

      These minor corrections are:

      • We extended the Introduction section by a sentence (lines 40-42) that is intended to fit the proposed template directed, non-enzymatic replication mechanism into a more general prebiotic evolutionary context, thus emphasizing its biological relevance. This sentence includes an additional reference (Rosenberger et al., 2021).

      • Two very methodologically oriented and repeated descriptions of random sequence generation have been moved to the Methods section (lines 178-185) from the Results section (lines 336-339 and lines 351-354).

      • We complemented the Data availability statement with licensing information (lines 684-685).

      • Further minor changes (also indicated by red texts) have been implemented to remedy logical and grammatical glitches.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Szathmary and colleagues explore the parabolic growth regime of replicator evolution. Parabolic growth occurs when nucleic acid strain separation is the rate-limiting step of the replication process which would have been the case for non-enzymatic replication of short oligonucleotide that could precede the emergence of ribozyme polymerases and helicases. The key result is that parabolic replication is conducive to the maintenance of genetic diversity, that is, the coexistence of numerous master sequences (the Gause principle does not apply). Another important finding is that there is no error threshold for parabolic replication except for the extreme case of zero fidelity.

      Strengths:

      I find both the analytic and the numerical results to be quite convincing and well-described. The results of this work are potentially important because they reveal aspects of a realistic evolutionary scenario for the origin of replicators.

      Weaknesses:

      There are no obvious technical weaknesses. It can be argued that the results represent an incremental advance because many aspects of parabolic replication have been explored previously (the relevant publications are properly cited). Obviously, the work is purely theoretical, experimental study of parabolic replication is due. In the opinion of this reviewer, though, these are understandable limitations that do not actually detract from the value of this work.

      We are grateful that this Reviewer appreciates our work. We completely agree that the ultimate validation must come from experiments. It is important to stress that in this field theory often preceded experimental work by decades, and the former often guided the latter. We hope that for the topic of the present paper experiments will follow considerably faster.

      Reviewer #2 (Public Review):

      Summary:

      A dominant hypothesis concerning the origin of life is that, before the appearance of the first enzymes, RNA replicated non-enzymatically by templating. However, this replication was probably not very efficient, due to the propensity of single strands to bind to each other, thus inhibiting template replication. This phenomenon, known as product inhibition, has been shown to lead to parabolic growth instead of exponential growth. Previous works have shown that this situation limits competition between alternative replicators and therefore promotes RNA population diversity. The present work examines this scenario in a model of RNA replication, taking into account finite population size, mutations, and differences in GC content. The main results are (1) confirmation that parabolic growth promotes diversity, but that when the population size is small enough, sequences least efficient at replicating may nevertheless go extinct; (2) the observation that fitness is not only controlled by the replicability of sequences, but also by their GC content; (3) the observation that parabolic growth attenuates the impact of mutations and, in particular, that the error threshold to which exponentially growing sequences are subject can be exceeded, enabling sequence identity to be maintained at higher mutation rates.

      Strengths:

      The analyses are sound and the observations are intriguing. Indeed, it has been noted previously that parabolic growth promotes coexistence, its role in mitigating the error threshold catastrophe - which is often presented as a major obstacle to our understanding of the origin of life - had not been examined before.

      Weaknesses:

      Although all the conclusions are interesting, most are not very surprising for people familiar with the literature. As the authors point out, parabolic growth is well known to promote diversity (SzathmaryGladkih 89) and it has also been noted previously that a form of Darwinian selection can be found at small population sizes (Davis 2000).

      Given that under parabolic growth, no sequence is ever excluded for infinite populations, it is also not surprising to find that mutations have a less dramatic exclusionary impact.

      In the two articles cited (Szathmary-Gladkih 1989 and Davis 2000) the subexponentiality of the system was implemented in a mechanistic way, by introducing the exponent 0 < 𝑝 < 1. Although the behaviour of these models is more or less consistent with experimental findings (von Kiedrowski, 1986; Zielinski and Orgel, 1987), the divergence of per capita growth rates (𝑥̇/𝑥) at very low concentrations–which guarantees the ability to maintain unlimited diversity in the case of infinite population sizes–makes this formal approach partly unrealistic.

      To avoid the possible artefacts of this mechanistic approach, and as there are no previous studies analysing the diversity maintaining ability of finite populations of parabolic replicators in an individual-based model context, we implemented a simplified template replication mechanism leading to parabolic growth and analysed the dynamics in an individual-based stochastic model context. The key point of our investigation is that considerable diversity can be maintained in the system even when the population size is quite small.

      Regarding the Reviewer’s comment on selection: Darwinian selection can only occur in a simple subexponential dynamics if the ratio of replicabilities diverges, cf. Eq. (8) and the preceding paragraph in Davis, 2000.

      Our results also show (Figs. 4B and 4C) that high mutation rates and the error threshold problem can still be considered as a major limiting factor for parabolically replicating systems in terms of their diversity-maintaining ability. In the light of the above, potential mechanisms to relax the error threshold in such systems, one of which is demonstrated in the present study, seem to be important steps to account for the sequence diversification and increase in molecular complexity during the early evolution of RNA replicators.

      A general weakness is the presentation of models and parameters, whose choices often appear arbitrary. Modeling choices that would deserve to be further discussed include the association of the monomers with the strands and the ensuing polymerization, which are combined into a single association/polymerization reaction (see also below), or the choice to restrict to oligomers of length L = 10. Other models, similar to the one employed here, have been proposed that do not make these assumptions, e.g. Rosenberger et al. Self-Assembly of Informational Polymers by Templated Ligation, PRX 2021. To understand how such assumptions affect the results, it would be helpful to present the model from the perspective of existing models.

      The assumption of one-step polymerization reactions that we used here is a common technique for modelling template replication of sequence-represented replicators [see, e.g., Fontana and Schuster, 1998 (10.1126/science.280.5368.1451), Könnyű et al., 2008 (10.1186/1471-2148-8267), Vig-Milkovics et al, 2019 (10.1016/j.jtbi.2018.11.020) or Szilágyi et al., 2020 (10.1371/journal.pgen.1009155)]. This is because assuming base-to-base polymerisation of the copy would lead to a very large number of different types of intermediates, which a Gillespietype stochastic simulation algorithm could not handle in reasonable computation times, even if the sequences were relatively short. For comparison, in our model, where polymerization is one-step, the characteristic time of a simulation for 𝐿 = 10, 𝑁 = 105 and 𝛿 = 0.01 was 552 hours.

      Note that in Rosenberg et al. (PRX 2021), in contrast to a pioneering work [Fernando et al, 2007 (10.1007/s00239-006-0218-4)], sequences of replicators are not represented, which makes this approach completely inapplicable to our case, in which sequence defines the fitness. In sum, we suggest that this valid criticism points to possible future work.

      The values of the (many) parameters, often very specific, also very often lack justifications. For example, why is the "predefined error factor" ε = 0.2 and not lower or higher? How would that affect the results?

      A general remark. For the more important parameters , several values were used to test the behaviour of the model (see Table 1), but due to the considerable number of parameters, it is impossible to examine all possible combinations. 𝑐+ = 1 fixes the timescale, 𝐿 is set to 10 to obtain reasonable running times (see above).

      𝜀 characterizes how replicability decreases as the number of mutations increases. In the manuscript we used the following default vector: 𝜀 = (0.05, 0.2, 1) in which the third element corresponds to the mutation-free sequence, so it must to be 1. The first element determines the baseline replicability (see Methods), which we preferred not to change because it would fundamentally alter the ratio of replication propensities to association and dissociation propensities (as the substantial amount of complementary sequences of the master sequences are of baseline replicability) and thus would alter the reaction kinetics to an extent that it is not comparable with the original results. Therefore, only the second element can be adjusted. Accordingly, we have analysed the behaviour of the model in the cases of a steeper and a more gradual loss of replicability using the following two vectors, respectively: 𝜀, = (0.05, 𝟎. 𝟎𝟓, 1) and 𝜀,, = (0.05, 𝟎. 𝟓, 1). The choice of 𝜀, is chemically more plausible, since for very short oligomers the loss of chemical activity and replicability as a function of the number of mutations can be very sharp. We performed a series of simulations with all possible combinations of 𝛿 = 0.001, 0.005, 0.1 and 𝑁 = 103, 104, 105 for 𝜀′ and 𝜀,,in the constant population and chemostat model context (36 different runs). For other parameters, we took the default values, see Table 1. These values also correspond to the parameters we used in Figures 2 and 6. The results show that the steeper loss of replicability (𝜀,) slightly increases the diversity maintaining ability of the system, whereas the more gradual loss of replicability (𝜀,,) moderately decreases the diversity-maintaining ability of the system, and that these shifts are more pronounced in the constant population size model (Author response image 1) than in the chemostat model (Author response image 2). Altogether, these results confirm that the qualitative outcome of the model is robust in a wide range of loss of replicability (𝜀 vector) values.

      Author response image 1.

      Replicator coexistence in the constant population model with different loss of replicability (𝜀 vector) values. Within a given combination of 𝛿 and 𝑁 parameter values, the upper panel corresponds to the steeper loss of replicability (𝜀!), the middle panel to the default 𝜀 vector (Figure 2A), and the bottom panel to the more gradual loss of replicability vector (𝜀!!). Within each 𝛿; 𝑁 parameter combination, the same master sequence set was used with the three different 𝜀 vectors for comparability.

      Author response image 2.

      Replicator coexistence in the chemostat model with different loss of replicability (𝜀 vector) values. Within a given combination of 𝛿 and 𝑁 parameter values, the upper panel corresponds to the steeper loss of replicability (𝜀!), the middle panel to the default 𝜀 vector (Figure 6A), and the bottom panel to the more gradual loss of replicability vector (𝜀!!). Within each 𝛿; 𝑁 parameter combination, the same master sequence set was used with the three different 𝜀 vectors for comparability.

      Similarly, in equation (11), where does the factor 0.8 come from?

      This factor scales the decay rate of duplex sequences (𝑐"!") as the function of the binding energy

      (𝐸b). The value of 0.8 is an arbitrary choice, the value should be in the interval (0,1) and is only relevant in the chemostat model. It is expected to have a similar effect on the dynamics as the duplex decay factor parameter 𝑓, which we have investigated in a wide range of different values (cf. Table 1, Fig. 6), although 𝑓 is independent of the binding energy (𝐸/): increasing/decreasing the 0.8 factor is expected to decrease/increase the average total population size. We have investigated the diversity maintaining ability of the system at smaller (0.6) and larger (0.9) parameter values at different population sizes (𝑁 ≈ 103, 104 and 105) and at different replicability distances (δ = 0.001, 0.005 and 0.01) as shown in Fig. 6. We have found that the number of coexisting master types changes very little in response to changes in this factor. Only two shifts could be detected (underlined): factor 0.9 combined with 𝑁 ≈ 104 and 𝛿 = 0.001 caused the number of surviving master types to decrease by one, while factor 0.9 combined with 𝑁 ≈ 103 and 𝛿 = 0.01 caused the number of surviving master types to increase by one (Author response table 1). Factor 0.6 produced the same number of surviving types as the default (Author response table 1). In summary, the model shows marked robustness to changes in the values of this parameter.

      Author response table 1.

      Number of coexisting master types in the chemostat model with different binding energy dependent duplex decay rates. Within each 𝛿; 𝑁 parameter combination, the same master sequence set was used with the three different factor values: 0.6, 0.8 (the original) and 0.9 for comparability.

      Why is the kinetic constant for duplex decay reaction 1.15e10−8?

      Note that this value is the minimum of the duplex decay rate, Table 1 correctly shows the interval of this kinetic constant as: [1.15 ⋅ 10-8, 6.4 ⋅ 10-5]. Both values are derived from the basic parameters of the system and can be computed according to Eq. (11). The minimum: as the parameter set corresponding to this value is: . The maximum: with .

      Are those values related to experiments, or are they chosen because specific behaviors can happen only then?

      See above.

      The choice of the model and parameters potentially impact the two main results, the attenuation of the error threshold and the role of GC content:

      Regarding the error threshold, it is also noted (lines 379-385) that it disappears when back mutations are taken into account. This suggests that overcoming the error threshold might not be as difficult as suggested, and can be achieved in several ways, which calls into question the importance of the particular role of parabolic growth. Besides, when the concentration of replicators is low, product inhibition may be negligible, such that a "parabolic replicator" is effectively growing exponentially and an error catastrophe may occur. Do the authors think that this consideration could affect their conclusion? Can simulations be performed?

      The assumption of back mutation only provides a theoretical solution to the error threshold problem: back mutation guarantees a positive (non-zero) concentration of a master type, but, since the probability of back mutation is generally very low, this equilibrium concentration may be extremely low, or negligible for typical system sizes. Consequently, back mutation alone does not solve the problem of the error catastrophe: in our system back mutation is present (the probability that a sequence with 𝑘 errors mutates back to a master sequence is 𝜇k(1−𝜇)L-k), and the diversity-maintaining ability is limited. The effect of back mutation decreases exponentially with increasing sequence length.

      Regarding the role of the GC content, GC-rich oligomers are found to perform the worst but no rationale is provided.

      For GC-rich oligonucleotides the dissociation probability of a template-copy complex is relatively low (cf. Eqs. (9, 10)), thus they have a relatively low number of offspring, cf. lines 557-561: “a relatively high dissociation probability and the consequential higher propensity of being in a simple stranded form provides an advantage for sequences with relatively low GC content in terms of their replication affinity, that is, the expected number of offspring in case of such variants will be relatively high.”. Note that the simulation results shown in Fig. 3A, demonstrate the realization of this effect with prepared sequences (along a GC content gradient).

      One may assume that it happens because GC-rich sequences are comparatively longer to release the product. However, it is also conceivable that higher GC content may help in the polymerization of the monomers as the monomers attach longer on the template (as described in Eq. (9)). This is an instance where the choice to pull into a single step the association and polymerization reactions are pulled into a single step independent of GC content may be critical.

      It would be important to show that the result arises from the actual physics and not from this modeling choice.

      Some more specific points that would deserve to be addressed:

      • Line 53: it is said that p "reflects how easily the template-reaction product complex dissociates". This statement is not correct. A reaction order p<1 reflects product inhibition, the propensity of templates to bind to each other, not slow product release. Product release can be limiting, yet a reaction order of 1 can be achieved if substrate concentrations are sufficiently high relative to oligomer concentrations (von Kiedrowski et al., 1991).

      We think the key reference is Von Kiedrowski (1993) in this case. Other things being equal, his Table 1 on p. 134 shows that a sufficient increase in 𝐾4, i.e., the stability of the duplex (template and copy) (association rate divided by dissociation rate) throws the system into the parabolic regime. This is what we had in mind. In order to clarify this, we modified the quoted sentence thus: “In this kinetics, the growth order is equal or close to 0.5 (i.e., the dynamics is sub-exponential) because increased stability of the template-copy complex (rate of association divided by dissociation) promotes parabolic growth (von Kiedrowski et al., 1991; von Kiedrowski & Szathmáry, 2001).”

      • Population size is a key parameter, and a comparison is made between small (10^3) and large (10^5) populations, but without explaining what determines the scale (small/large relative to what?).

      The “small” value (103) corresponds to the smallest meaningful population size, significantly smaller population sizes (e.g. 102) cannot maintain the 10 master types (or any subset of them) and are chemically unrealistic. The “large value” (105) is the largest population size for which simulation times are still acceptable, in the case of 106 the runtimes are in the order of months.

      • In the same vein, we might expect size not to be the only important parameter, but also concentration.

      With constant volume population size and concentration are strictly coupled.

      • Lines 543-546: if understanding correctly, the quantitative result is that the error threshold rises from 0.1 in the exponential case to 0.196 in the parabolic. Are the authors suggesting that a factor of 2 is a significant difference?

      In this paragraph we compared the empirical error threshold of our system (which is close to 𝑝"#$ = 0.15) with the error threshold of the well-known single peak fitness landscape (which can be approximated by ) as a reference case. To make the message even clearer we have extended the last sentence (lines 596-597) as follows: “but note that applying this approach to our system is a serious oversimplification”. The 0.196 is simply the probability of error-free replication of a sequence when , but we have removed this sentence (“corresponding to the replication accuracy of a master sequence”) from the manuscript as it seems to be confusing.

      • Figure 3C: this figure shows no statistically significant effect?

      Thank you for pointing out this. We statistically tested the hypothesis that the GC content between the survived and the extinct master subsets are different. This analysis revealed that the differences between these two groups are statistically significant, which we now included in the manuscript at lines 380-390: “A direct investigation of whether the sequence composition of the master types is associated with their survival outcome was conducted using the data from the constant population model simulation results (Figure 2). In these data, the average GC content was measured to be lower in the surviving master subpopulations than in the extinct subpopulations (Figure 3C). To determine whether this difference was statistically significant, nonparametric, two-sample Wilcoxon rank-sum tests (Hollander & Wolfe, 1999) were performed on the GC content of the extinct-surviving master subsets. The GC content was significantly different between these two groups in all nine investigated parameter combinations of population size (N) and replicability distance (δ) at p<0.05 level, indicating a selective advantage for a lower GC content in the constant population model context. The exact p values obtained from this analysis are shown in Figure 3C.”

      • line 542: "phase transition-like species extension (Figure 4B)": such a clear threshold is not apparent.

      Thank you for pointing out the incorrect phrasing. As there is no clear threshold in the number of coexisting types as a function of the mutation rate, we removed the “phase transition-like” expression: “However, when finite population sizes and stochastic effects are taken into account, at the largest investigated per-base mutation rate (𝑝mut = 0.15), the summed relative steady-state master frequencies approach zero (Figure 4C) with accelerating species extinction (Figure 4B), indicating that this value is close to the system׳s empirical error threshold.” (lines 589-594).

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      On the whole, the work is well done and presented, there are no major recommendations. It seems a good idea to cite and briefly discuss this recent paper: https://pubmed.ncbi.nlm.nih.gov/36996101/ which develops a symbiotic scenario of the coevolution of primordial replicators and reproducers that appears to be fully compatible with the results of the current work.

      Thank you for bringing this article to our attention. We have inserted the following sentence at lines 621-624: “The demonstrated diversity-maintaining mechanism of finite parabolic populations can be used as a plug-in model to investigate the coevolution of naked and encapsulated molecular replicators (e.g., Babajanyan et al., 2023).”

      The manuscript is well written, but there are some minor glitches that merit attention. For example:

      l. 5 "carriers presents a problem, because product formation and mutual hybridization" - "mutual" is superfluous here, delete

      l. 13 "amplification. In addition, sequence effects (GC content) and the strength of resource" - hardly "effects" - should be 'features' or 'properties'

      l. 41 "If enzyme-free replication of oligomer modules with a high degree of sequence" - "modules" here is only confusing - simply, "oligomers"

      l. 44 "under ecological competition conditions with which distinct replicator types with different" - delete "with" etc, there are many such minor glitches that are best corrected.

      Thank you for pointing out, we have corrected! Other drafting errors, glitches, superfluous sentences have also been corrected.

      Reviewer #2 (Recommendations For The Authors):

      None

      Editor (Recommendations For The Authors):

      In the manuscript, it appears that coexistence is assessed at a given point in time, while figures seem to show that it remains time-dependent. It would be great if the authors could clarify this and/or discuss this.

      We appreciate you bringing this to our attention, as we have indeed missed to elaborate on this important point. The steady state characteristic of the coexistence is assessed in our model in the following way: the relative frequency of each master sequence is tested for the condition of ≥ 100- (cut-off relative frequency for survival) in every 2,000th replication step in the interval between 10,000 replication steps before termination and actual termination (10= replication steps). If the above condition is true more than once, we consider the master type in question as survived (we have included this explanation in the Methods section: lines 258-268). Although this relatively narrow time interval can still be regarded as a snapshot of the state of the system, according to our numerical experiences, the resulting measure is a reliable quantitative indicator of the apparent stability of species coexistence in the parabolic dynamics.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Responses to reviewers’ comments

      (1) The rationale of selecting tNOX/ENOX2 as a potential target of 4-dmH, but not heliomycin, is unclear by taking a biased approach. Thus, there is high possibility that 4-dmH binds to other proteins involved in apoptosis inhibition. An unbiased screen to identify 4-dmH-binding proteins would be a better approach unless there is a clear and logical rationale.

      We apologize for this oversight. In response to this comment, we rewrote the abstract, reorganized the results, and added more references to better introduce tNOX/ENOX2.

      A) Under the “4-dmH, but not heliomycin, targets intracellular tNOX, an upstream regulator of SIRT1” result section:

      We next addressed the molecular mechanisms underlying SIRT1 inhibition and concurrent cell death by these two compounds in oral cancer cells. Being an NAD+-dependent protein deacetylase, SIRT1 activity is primarily governed by NAD+/NADH ratio, thus, there exists a positive correlation between these two [1-9]. We then questioned whether these two compounds inhibit SIRT1 by affecting the intracellular NAD+/NADH levels, and were surprised to find that 4-dmH, but not heliomycin, caused a prominent inhibition of intracellular NAD+/NADH ratio (revised Fig. 7a). The discrepancy in their ability to reduce NAD+ generation led us to explore the role of a tumor-associated NADH oxidase (tNOX, ENOX2) in 4-dmH-suppressed SIRT1 and apoptosis induction. We have previously reported that tNOX inhibition reduced the intracellular NAD+/NADH ratio and SIRT1 deacetylase activity, increasing p53 acetylation and apoptosis [10-13]. In the light of this information, we assessed the effect of the compounds on tNOX expression and found that 4-dmH, but not heliomycin, considerably diminished tNOX protein expression in a concentration-dependent manner (Fig. 7b).

      B) To demonstrate that our results from ligand-binding assays (CETSA) were specific to tNOX, we conducted more CETSA experiments to exclude PARP or NOX4 targets of 4-dmH. PARP acts as a DNA damage sensor and also a NAD+-consuming enzyme, affecting many cellular functions [14]. NOX4 belongs to the NOX family of NADPH oxidases that mediate electron transport through intracellular membranes and is also shown to be involved in tumorigenesis [15, 16]. We show that 4-dmH treatments did not seem to increase the melting temperature of PARP or NOX4, excluding those two proteins as potential targets of 4-dmH (revised Fig. 8c).

      Author response image 1.

      (2) The authors should show whether heliomycin indeed does not induce apoptosis, while 4-dmH cannot induce autophagy.

      We have reorganized and revised our manuscript and figures (Fig. 5 and Fig. 6) to better demonstrate the different cell death pathways associated with heliomycin and 4-dmH. Using flow cytometry, we show that heliomycin, but not 4-dmH, induced autophagy in two lines of oral cancer cells (Fig. 5a). In the revision, we moved up the analysis of apoptosis by JC-1 staining to Figure 5 (revised Fig. 5b). We also reorganized the protein analysis to demonstrate better the downregulation of pro-apoptotic Bak and Puma and a lack of caspase 3-directed PARP cleavage, indicating the ineffective apoptosis by heliomycin (revised Fig. 5c). Similarly, we found that the absence of upregulation of ULK1, Atg 5, Atg7, and cleaved LC3-II provided evidence for the inadequate autophagy by 4-dmH (revised Fig. 5d). Attached please see the revised Figure 5.

      Author response image 2.

      (3) They should demonstrate whether genetic knockdown of tNOX, SirT1, or both tNOX and SirT1 induces apoptosis or autophagy and also reduces malignant properties of oral cancer cells.

      A) In the revision, we conducted more experiments utilizing the RNAi-knockdown to understand the role of tNOX on the regulation of apoptosis or autophagy. Our results indicate that the tNOX-depletion effectively provoked spontaneous apoptosis and autophagy in SAS cells (revised Fig. 7e). However, given that SIRT1 per se is not the focus of this present study and SIRT1-knockdown has been shown to increase apoptotic population by other groups [17] [18], we decided not to pursue it further.

      Author response image 3.

      B) In our earlier studies, we have adequately demonstrated that tNOX confers a survival advantage for cancer cells. For example, tNOX-deficiency by RNA interference in cancer cells abolishes cancer phenotypes, reducing NAD+ production, proliferation, and migration/invasion while increasing apoptosis [19-22]. On the other hand, tNOX-overexpressing in non-cancerous cells stimulates the growth of cells, decreases doubling time, and enhances cell migration [23-26].

      (4) The authors should examine whether overexpression of SirT1 or tNOX in cells treated with heliomycin or 4-dmH could nullify heliomycin-induced autophagy and 4-dmH-induced apoptosis. Also, instead of overexpressing tNOX, they can supplement NAD into cells treated with 4-dmH.

      A) The utilization of tNOX overexpression has been previously reported in several studies, demonstrating that tNOX-overexpressing in non-cancerous cells stimulates the growth of cells, decreases doubling time, and enhances cell migration [23-26]. However, in our experiences, the effect of tNOX overexpression in cancer cells is much less apparent than that in non-cancerous cells. Thus, we decided not to study it further, given that our results from tNOX knockdown have evidently signified the role of tNOX in the regulation of apoptosis and autophagy.

      B) Since SIRT1 is not the major focus of this present study and SIRT1-overexpression has been shown to reduce stress-mediated apoptosis by other groups [27, 28], we decided not to pursue it further.

      C) The systemic deterioration in NAD+ level has been correlated with many diseases and aging [29-31]. In this regard, NAD+ administration was reported to attenuate doxorubicin-induced apoptosis in the liver of mice, suggesting a protective effect [32]. The administration of nicotinamide riboside (NR), a precursor of NAD+, was also demonstrated to prevent ROS generation and apoptosis in the mouse sepsis models [33]. With data from these animal studies already demonstrating the benefits of NAD+ supplements, we decided not to conduct similar experiments in a cell-based setting.

      (5) Related to Fig. 5C and 6a, the authors should examine the effects of heliomycin and 4-dmH on the cell cycle profiles, Annexin V positivity, and colony formation.

      We added the results from colony-forming assays and revealed that both compounds exhibited high growth-suppressive ability against oral cancer cells (revised Fig. 6c). Nevertheless, we showed that the diminution in growth by the compounds was least likely to arise from cell cycle arrest mediated by these two compounds (revised Fig. 6d). Due to the possible interference of the fluorescence wavelength of heliomycin/derivative, we examined JC-1 staining rather than Annexin V positivity. The apoptotic effect of the compounds was demonstrated in revised Fig. 5b in the revision.

      Author response image 4.

      (6) They should also examine whether either or both heliomycin and 4-dmH induce reactive oxygen species (ROS).

      In our previous report, we examined the effects of heliomycin and 4-dmH on oxidative stress utilizing H2DCFDA [34]. The dye fluoresces in the presence of intracellularly generated reactive oxygen species (ROS). We showed that 4-dmH significantly induced the generation of ROS generation. However, no marked ROS generation was observed in cells exposed to heliomycin.

      (7) Related to Fig. 9d, they should mutate amino acid residue(s) in tNOX that are crucial for the 4-dmH-tNOX binding, including Ile 90, Lys98, Pro111, Pro113, Leu115, Pro117, and Pro118, to examine whether these mutants lose the binding to 4-dmH and fail to rescue 4-dmH-induced apoptosis, unlike wild-type tNOX.

      For further evaluation of the importance of the consistent interaction residues in the three docked compound-tNOX complexes, the seven interaction residues on tNOX were substituted with alanine or glycine amino acids and then simulated the protein structures. The simulated protein structures appear slightly different from the original tNOX structure. Overall, the root mean square difference between the original tNOX structure and the structures with residues substituted by alanine or glycine amino acids was estimated at 3.339 or 4.024 angstroms (Å), respectively (Fig. S1a). The simulated protein structures were also employed to conduct the docking analysis for 4-dmH. The results of further docking analysis revealed that 4-dmH could bind within the same pocket of different types of tNOX structures but with varying orientations (Fig. S1b). This observation also suggests that the replacement of both key residues with alanine or glycine could result in a reduction of the binding affinity of 4-dmH to tNOX, with values of -8.2 and -7.6 kcal/mol, respectively. Moreover, the substitution of both key residues with alanine or glycine also reduces the number of the original interacting residues and interaction forces in core moieties in the 4-dmH-tNOX complexes (Fig. S1c and S1d). Together, our experimental results and molecular docking simulations are consistent with the notion that 4-dmH possesses a better affinity ability for tNOX than for SIRT1.

      Author response image 5.

      The simulated tNOX structures (a, b) and the binding modes of 4-dmH after docking study (c, d). (a) Superimposition of three types of tNOX structures, including the original tNOX structure (orange) and the critical residues in tNOX protein substituted with alanine (magenta) or glycine (cyan). The substituted residues were shown as sticks. (b) Superimposition of the docked 4-dmH (blue). (c) Schematic presentations of possible interactions between 4-dmH and the interacted residues in tNOX protein substituted with alanine. (d) Schematic presentations of possible interactions between 4-dmH and the interacted residues in tNOX protein substituted with glycine. The key residues were identified based on the best docking pose of 4-dmH. The red circles and ellipses indicate the identical residues that interacted with different types of tNOX structures.

      (8) Related to Fig. 10a, heliomycin appears to also reduce tNOX levels (although the extent is not as robust as 4-dmH), which is not expected since heliomycin does not bind to tNOX. They should compare the effects of heliomycin and 4-dmH on reducing the protein levels of tNOX. If heliomycin does not change the tNOX protein levels, then they need to discuss why heliomycin reduces tNOX levels in vivo.

      In our previous studies, we have shown that tNOX knockdown partially attenuates SIRT1 expression and represses growth in various cancer cell types, such as lung [22], bladder [20], and stomach [13]. We also observed that tNOX is acetylated/ubiquitinated under certain stresses and SIRT1 depletion affects tNOX expression (data not shown). It is speculated that SIRT1 deacetylates tNOX and modulates its protein stability. Thus, there is a reciprocal regulation between tNOX and SIRT1. Although heliomycin does not bind to tNOX, its inhibition of SIRT1 activity/expression might also have an impact on tNOX expression.

      (9) Related to Fig. 10F, if tNOX is an upstream regulator of SirT1 and both heliomycin and 4-dmH ultimately target SirT1, it is unclear why heliomycin and 4-dmH cause different biological outcomes. One explanation is that tNOX has apoptosis-inhibiting function other than supporting (or independent of) SirT1 and hence 4-dmH-mediated tNOX inhibition causes apoptosis rather than autophagy. They should explain and discuss more about whether tNOX-inhibiting/binding function of 4-dmH is sufficient to explain the different biological outcomes from heliomycin.

      Thank you for this valuable suggestion. Indeed, in our earlier studies, we have adequately demonstrated that tNOX-deficiency by RNA interference in cancer cells abolishes cancer phenotypes, reducing NAD+ production, proliferation, and migration/invasion while increasing apoptosis; thus, tNOX confers a survival advantage for cancer cells [19-22]. On the other hand, tNOX-overexpressing in non-cancerous cells stimulates the growth of cells, decreases doubling time, and enhances cell migration [23-26]. With these lines of evidence, we believe that tNOX not only supports but also exerts functions independent of SIRT1. The tNOX- and SIRT1-inhibiting function of 4-dmH, thus, results in the different biological outcomes from the SIRT1-binding heliomycin.

      (10) They should examine the effects of heliomycin and 4-dmH on cell viability of non-tumor cells to examine their toxicities.

      Using cell impedance measurements, we also examined the effects of heliomycin and 4-dmH on the proliferation of human non-cancerous BEAS-2B cells. Our results demonstrated that heliomycin did not exhibit cytotoxicity toward human non-cancerous BEAS-2B cells (revised Fig. 6a). Furthermore, the water-soluble 4-dmH effectively diminished cell proliferation in a dose-dependent manner in oral cancer cells, but much less apparent in that of BEAS-2B cells (revised Fig. 6b). Similar results were reported in our previous study, indicating that 4-dmH displayed much higher IC50 values against non-cancerous human dermal microvascular endothelium HMEC-1 cells compared to those of tumor cells [34].

      Author response image 6.

      (11) They should consistently use either tNOX or ENOX2 to avoid confusion.

      Thank you for the suggestion. We have now consistently used tNOX throughout the manuscript. However, for the revised Figure 7d, the commercially available antibody to ENOX2 (from Proteintech, Rosemont, IL, USA) is different from the one to tNOX (produced in our laboratory) and this is the only place we have used ENOX2 rather than tNOX.

      References

      1) Mouchiroud L, Houtkooper RH, Moullan N, Katsyuba E, Ryu D, Canto C, Mottis A, Jo YS, Viswanathan M, Schoonjans K et al: The NAD(+)/Sirtuin Pathway Modulates Longevity through Activation of Mitochondrial UPR and FOXO Signaling. Cell 2013, 154(2):430-441.

      2) He S, Gao Q, Wu X, Shi J, Zhang Y, Yang J, Li X, Du S, Zhang Y, Yu J: NAD(+) ameliorates endotoxin-induced acute kidney injury in a sirtuin1-dependent manner via GSK-3beta/Nrf2 signalling pathway. J Cell Mol Med 2022, 26(7):1979-1993.

      3) Donmez G: The neurobiology of sirtuins and their role in neurodegeneration. Trends Pharmacol Sci 2012, 33(9):494-501.

      4) Teertam SK, Phanithi PB: Up-regulation of Sirtuin-1/autophagy signaling in human cerebral ischemia: possible role in caspase-3 mediated apoptosis. Heliyon 2022, 8(12):e12278.

      5) Li BY, Peng WQ, Liu Y, Guo L, Tang QQ: HIGD1A links SIRT1 activity to adipose browning by inhibiting the ROS/DNA damage pathway. Cell reports 2023, 42(7):112731.

      6) Bai P, Canto C, Oudart H, Brunyanszki A, Cen Y, Thomas C, Yamamoto H, Huber A, Kiss B, Houtkooper RH et al: PARP-1 inhibition increases mitochondrial metabolism through SIRT1 activation. Cell Metab 2011, 13(4):461-468.

      7) Ma Y, Nie H, Chen H, Li J, Hong Y, Wang B, Wang C, Zhang J, Cao W, Zhang M et al: NAD(+)/NADH metabolism and NAD(+)-dependent enzymes in cell death and ischemic brain injury: current advances and therapeutic implications. Curr Med Chem 2015, 22(10):1239-1247.

      8) Fulco M, Schiltz RL, Iezzi S, King MT, Zhao P, Kashiwaya Y, Hoffman E, Veech RL, Sartorelli V: Sir2 regulates skeletal muscle differentiation as a potential sensor of the redox state. Mol Cell 2003, 12(1):51-62.

      9) Yang Y, Liu Y, Wang Y, Chao Y, Zhang J, Jia Y, Tie J, Hu D: Regulation of SIRT1 and Its Roles in Inflammation. Front Immunol 2022, 13:831168.

      10) Tikhomirov AS, Shchekotikhin AE, Lee YH, Chen YA, Yeh CA, Tatarskiy VV, Jr., Dezhenkova LG, Glazunova VA, Balzarini J, Shtil AA et al: Synthesis and Characterization of 4,11-Diaminoanthra[2,3-b]furan-5,10-diones: Tumor Cell Apoptosis through tNOX-Modulated NAD(+)/NADH Ratio and SIRT1. Journal of medicinal chemistry 2015, 58(24):9522-9534.

      11) Chang CF, Islam A, Liu PF, Zhan JH, Chueh PJ: Capsaicin acts through tNOX (ENOX2) to induce autophagic apoptosis in p53-mutated HSC-3 cells but autophagy in p53-functional SAS oral cancer cells. Am J Cancer Res 2020, 10(10):3230-3247.

      12) Lin CY, Islam A, Su CJ, Tikhomirov AS, Shchekotikhin AE, Chuang SM, Chueh PJ, Chen YL: Engagement with tNOX (ENOX2) to Inhibit SIRT1 and Activate p53-Dependent and -Independent Apoptotic Pathways by Novel 4,11-Diaminoanthra[2,3-b]furan-5,10-diones in Hepatocellular Carcinoma Cells. Cancers (Basel) 2019, 11(3).

      13) Chen HY, Cheng HL, Lee YH, Yuan TM, Chen SW, Lin YY, Chueh PJ: Tumor-associated NADH oxidase (tNOX)-NAD+-sirtuin 1 axis contributes to oxaliplatin-induced apoptosis of gastric cancer cells. Oncotarget 2017, 8(9):15338-15348.

      14) Xu Q, Liu X, Mohseni G, Hao X, Ren Y, Xu Y, Gao H, Wang Q, Wang Y: Mechanism research and treatment progress of NAD pathway related molecules in tumor immune microenvironment. Cancer Cell Int 2022, 22(1):242.

      15) Brandes RP, Weissmann N, Schroder K: Nox family NADPH oxidases: Molecular mechanisms of activation. Free Radic Biol Med 2014, 76:208-226.

      16) Gong S, Wang S, Shao M: NADPH Oxidase 4: A Potential Therapeutic Target of Malignancy. Front Cell Dev Biol 2022, 10:884412.

      17) Wang Y, Sui Y, Niu Y, Liu D, Xu Q, Liu F, Zuo K, Liu M, Sun W, Wang Z et al: PBX1-SIRT1 Positive Feedback Loop Attenuates ROS-Mediated HF-MSC Senescence and Apoptosis. Stem Cell Rev Rep 2023, 19(2):443-454.

      18) Wang X, Lu Y, Tuo Z, Zhou H, Zhang Y, Cao Z, Peng L, Yu D, Bi L: Role of SIRT1/AMPK signaling in the proliferation, migration and invasion of renal cell carcinoma cells. Oncol Rep 2021, 45(6).

      19) Liu SC, Yang JJ, Shao KN, Chueh PJ: RNA interference targeting tNOX attenuates cell migration via a mechanism that involves membrane association of Rac. Biochem Biophys Res Commun 2008, 365(4):672-677.

      20) Lin MH, Lee YH, Cheng HL, Chen HY, Jhuang FH, Chueh PJ: Capsaicin Inhibits Multiple Bladder Cancer Cell Phenotypes by Inhibiting Tumor-Associated NADH Oxidase (tNOX) and Sirtuin1 (SIRT1). Molecules 2016, 21(7).

      21) Cheng HL, Lee YH, Yuan TM, Chen SW, Chueh PJ: Update on a tumor-associated NADH oxidase in gastric cancer cell growth. World J Gastroenterol 2016, 22(10):2900-2905.

      22) Lee YH, Chen HY, Su LJ, Chueh PJ: Sirtuin 1 (SIRT1) Deacetylase Activity and NAD(+)/NADH Ratio Are Imperative for Capsaicin-Mediated Programmed Cell Death. J Agric Food Chem 2015, 63(33):7361-7370.

      23) Islam A, Su AJ, Zeng ZM, Chueh PJ, Lin MH: Capsaicin Targets tNOX (ENOX2) to Inhibit G1 Cyclin/CDK Complex, as Assessed by the Cellular Thermal Shift Assay (CETSA). Cells 2019, 8(10).

      24) Su YC, Lin YH, Zeng ZM, Shao KN, Chueh PJ: Chemotherapeutic agents enhance cell migration and epithelial-to-mesenchymal transition through transient up-regulation of tNOX (ENOX2) protein. Biochim Biophys Acta 2012, 1820(11):1744-1752.

      25) Zeng ZM, Chuang SM, Chang TC, Hong CW, Chou JC, Yang JJ, Chueh PJ: Phosphorylation of serine-504 of tNOX (ENOX2) modulates cell proliferation and migration in cancer cells. Experimental cell research 2012, 318(14):1759-1766.

      26) Chueh PJ, Wu LY, Morre DM, Morre DJ: tNOX is both necessary and sufficient as a cellular target for the anticancer actions of capsaicin and the green tea catechin (-)-epigallocatechin-3-gallate. Biofactors 2004, 20(4):235-249.

      27) Ran D, Zhou D, Liu G, Ma Y, Ali W, Yu R, Wang Q, Zhao H, Zhu J, Zou H et al: Reactive Oxygen Species Control Osteoblast Apoptosis through SIRT1/PGC-1alpha/P53(Lys382) Signaling, Mediating the Onset of Cd-Induced Osteoporosis. J Agric Food Chem 2023.

      28) Zhang Z, Chen X, Liu S: Role of Sirtuin-1 in Neonatal Hypoxic-Ischemic Encephalopathy and Its Underlying Mechanism. Med Sci Monit 2020, 26:e924544.

      29) McReynolds MR, Chellappa K, Baur JA: Age-related NAD(+) decline. Exp Gerontol 2020, 134:110888.

      30) Xie N, Zhang L, Gao W, Huang C, Huber PE, Zhou X, Li C, Shen G, Zou B: NAD(+) metabolism: pathophysiologic mechanisms and therapeutic potential. Signal Transduct Target Ther 2020, 5(1):227.

      31) Zapata-Perez R, Wanders RJA, van Karnebeek CDM, Houtkooper RH: NAD(+) homeostasis in human health and disease. EMBO Mol Med 2021, 13(7):e13943.

      32) Wang B, Ma Y, Kong X, Ding X, Gu H, Chu T, Ying W: NAD(+) administration decreases doxorubicin-induced liver damage of mice by enhancing antioxidation capacity and decreasing DNA damage. Chem Biol Interact 2014, 212:65-71.

      33) Hong G, Zheng D, Zhang L, Ni R, Wang G, Fan GC, Lu Z, Peng T: Administration of nicotinamide riboside prevents oxidative stress and organ injury in sepsis. Free Radic Biol Med 2018, 123:125-137.

      34) Nadysev GY, Tikhomirov AS, Lin MH, Yang YT, Dezhenkova LG, Chen HY, Kaluzhny DN, Schols D, Shtil AA, Shchekotikhin AE et al: Aminomethylation of heliomycin: Preparation and anticancer characterization of the first series of semi-synthetic derivatives. European journal of medicinal chemistry 2018, 143:1553-1562.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer 1:

      (1) General comment: The evidence for these highly novel, potentially interesting roles (of the exocyst) would need to be more compelling to support direct involvement.

      We wish to thank the reviewer for his/her comments, and for considering that the proposed functions are highly novel and potentially interesting. To strengthen the evidence supporting the new roles of the exocyst, we have performed a number of additional experiments that are depicted in novel figures or figure panels of the new version of the manuscript. Particularly, we aimed at providing further support of the direct involvement of the exocyst in different steps of the regulated secretory pathway. Please see the details below.

      (2) For instance, the localization of exocyst to Golgi or to granule-granule contact sites does not seem substantial.

      We have performed quantitative colocalization studies, as suggested by the reviewer to further substantiate our initial findings. We have carefully analysed GFP-Sec15 distribution in relation to the Golgi complex and secretory Glue granules at relevant time points of salivary gland development. Overall, we found that GFP-Sec15 distribution is dynamic during salivary gland development. Before Glue synthesis (72 h AEL), Sec15 was observed in close association (defined as a distance equal to, or less than 0.6 µm) with the Golgi complex (please see below Author response image 1). This association was lost once Glue granules have begun to form (96 h AEL). Importantly, we do not see relevant association between GFP-Sec15 and the ER (please see Author response image 2). These observations support our conclusion that the exocyst plays a role at the Golgi complex. New images supporting these conclusions, as well as quantitative data, have been included in Figure 5 of the new version of the manuscript. In addition, real time imaging, as well as 3D reconstruction analyses, confirming the close association between Sec15 and Golgi cisternae are now included in the manuscript. Please see Supplementary Videos 1-3. These new data are described in the text lines 200-210 of the Results section and text lines 359368 of the Discussion section.

      Interestingly, at the time when Sec15-Golgi association is lost (96 h AEL), Sec15 foci associate instead with newly formed secretory granules (< 1µm diameter). This association persists during secretory granule maturation (100-116 h AEL), when Sec15 foci localize specifically in between neighbouring, immature secretory granules. When maturation has ended and Glue granule exocytosis begins (116-120 h AEL), this localization between granules is lost. These observations are consistent with a role of the exocyst in homotypic fusion during SG maturation. We have included new images showing that association between Sec15 and secretory granules is dynamic and depends on the developmental stage. We have quantified this association both during maturation and at a stage when SGs are already mature. We have in addition performed a 3D reconstruction analysis of these images to confirm the close association between Sec15 and immature SGs. These new data are now depicted in Figure 7BC, Supplementary Videos 4-5, and described in text lines 216-221 of the Results section. In addition, a lower magnification image is provided below in this letter (Author response image 3), quantifying the proportion of Sec15 foci localized in between SGs (yellow arrows) relative to the total number of Sec15 foci (yellow arrows + green arrowheads).

      Author response image 1.

      Criteria utilized to define Sec15 focithat were“associated” or“not associated” withthe trans-Golgi network in the experiments of Figure 5C-E of the manuscript.When the distance between maximal intensities of GFP-Sec15 and Golgi-RFP signals was equal or less than 0.6 m, the signals were considered “associated” (upper panels). When the distance was more than 0.6 m, the signals were considered “not associated” (lower panels).

      Author response image 2.

      Criteria utilized to define Sec15 focithat were“associated” or“not associated” withthe ERin the experiments of Figure 5A-Bof the manuscript.When the distance between maximal intensities of GFP-Sec15 and KDEL-RFP signals was equal or less than 0.6 m, the signals were considered “associated”. When the distance was more than 0.6 m, the signals were considered “not associated”.

      Author response image 3.

      (A) GFP-Sec15 foci (cyan) and SGs (red) are shown in cells bearing Immature SGs or (B) with mature SGs. Yellow arrows indicate GFP-Sec15 foci localized in between SGs; green arrowheads indicate GFP-Sec15 foci that arenot in between SGs. (C) Quantification of the percentage (%) of Sec15 foci localized in between SGs respect to the total number of Sec15 foci in cells filled with immature SGs (ISG)vs cells with mature SGs (MSG).

      It is interesting to mention that previous evidence from mammalian cultured cells (Yeaman et al,  2001) show that the exocyst localizes both at the trans-Golgi network and at the plasma membrane, weighing in favour of our claim that the exocyst is required at various steps of the exocytic pathway. Thus, the exocyst may play multiple roles in the secretion pathway in other biological models as well. This concept has now been included at the Discussion section of the revised version of the manuscript (lines 359-368).

      To make the conclusions of our work clearer, in the revised version of the manuscript, we have now included a graphical abstract, summarizing the dynamic localization of the exocyst in relation to the processes of SG biogenesis, maturation and exocytosis reported in our work. 

      (3) Instead, it is possible that defects in Golgi traffic and granule homotypic fusion are not due to direct involvement of the exocyst in these processes, but secondary to a defect in canonical exocyst roles at the plasma membrane. A block in the last step of glue exocytosis could perhaps propagate backward in the secretory pathway to disrupt Golgi complexes or cause poor cellular health due to loss of cell polarity or autophagy.

      We thank the reviewer for these thoughtful comments. We have performed a number of additional experiments to assess “cellular health” or to identify possible defects in cell polarity after knock-down of exocyst subunits. These new data have been included in new supplementary figures 5 and 6 of the revised version of the manuscript (please see below). 

      In our view, the precise localization of GFP-Sec15 at the Golgi complex (Figure 5C-E), as well as in between immature secretory granules (Figure 7B-D), argues in favour of a direct involvement of the exocyst in SG biogenesis and homofusion respectively. 

      We truly appreciate the comment of the reviewer raising the possibility that the defects that we observe at early steps of the pathway (SG biogenesis and SG maturation) may actually stem from a backward effect of the role of the exocyst in SG-plasma membrane tethering. We wish to respectfully point out that the processes of biogenesis, maturation and plasma membrane tethering/fusion of SGs do not occur simultaneously in the Drosophila larval salivary gland in vivo, as they do in other secretory model systems (i.e. cell culture). In this regard, the experimental model is unique in terms of synchronization. In each cell of the salivary gland, the three processes (biogenesis, maturation and exocytosis) occur sequentially, and controlled by developmental cues. At the developmental stage when SGs fuse with the plasma membrane, SG biogenesis has already ceased many hours earlier: SG biogenesis occurs at 96-100 hours after egg lay (AEL), SG maturation takes place at 100-112 hours AEL, and SG-plasma membrane fusion happens only when all SGs have undergone maturation and are ready to fuse with the plasma membrane at 116-120 h AEL. Thus, in our view it is not conceivable that a defect in SG-plasma membrane tethering/fusion (116-120 h AEL) may affect backwards the processes of SG biogenesis or SG maturation, which have occurred earlier in development (96-112 h AEL).

      As suggested by the reviewer, we have analysed several markers of cellular health and cell polarity, comparing conditions of exocyst subunit silencing (exo70RNAi, sec3RNAi or exo84RNAi) with wild type controls (whiteRNAi). These new data are depicted in Supplementary Figures 5 and 6, and described in lines 172-179 of the Results section of the revised version of the manuscript. Noteworthy, for these experiments we have applied silencing conditions that block secretory granule maturation, bringing about mostly immature SGs. Our analyses included: 1) Subcellular distribution of PI(4,5)P2, 2) subcellular distribution of the tetraspanin CD63, 3) of Rab11, 4) of filamentous actin, and 5) of CD8. We have also compared 6) nuclear size and nuclear general morphology, 7) the number and distribution of mitochondria, 8) morphology and subcellular distribution of the cis- and 9) trans-Golgi networks. Finally, 10) we have compared basal autophagy in salivary cells with or without knocking down exocyst subunits. The markers that we have analysed behaved similarly to those of control salivary glands, suggesting that the observed defects in regulated exocytosis indeed reflect different roles of the exocyst in the secretory pathway, rather than poor cellular health or impaired cell polarity.  

      Our conclusions are in line with previous studies in which apico-basal polarity, Golgi complex morphology and distribution, as well as apical membrane trafficking were also evaluated in exocyst mutant backgrounds, finding no anomalies (Jafar-Nejad et al, 2005). 

      Conversely, in studies in which apical polarity was disturbed by interfering with Crumbs levels, SG biogenesis, maturation and exocytosis were not affected (Lattner et al, 2019), indicating that these processes not necessarily interfere with one another.  

      (4) Final recommendation: In the absence of stronger evidence for these other exocyst roles, I would suggest focusing the study on the canonical role (interesting, as it was previously reported that Drosophila exocyst had no function in the salivary gland and limited function elsewhere [DOI: 10.1034/j.1600-0854.2002.31206.x]), and leave the alternative roles for discussion and deeper study in the future.  

      We appreciate the reviewer´s recommendation. However, we believe that the major strength of our work is the discovery of non-canonical roles of the exocyst complex, unrelated to its function as a tethering complex for vesicle-plasma membrane fusion. We believe that in the new version of our manuscript, we provide stronger evidence supporting the two novel roles of the exocyst:

      a) Its participation in maintaining the normal structure of the Golgi complex, and b) Its function in secretory granule maturation.

      Reviewer 2:

      (5) General comment: A key strength is the breadth of the assays and study of all 8 exocyst subunits in a powerful model system (fly larvae). Many of the assays are quantitated and roles of the exocyst in early phases of granule biogenesis have not been ascribed. 

      We are grateful that the reviewer appreciates the novelty of our contribution.

      (6) However there are several weaknesses, both in terms of experimental controls, concrete statements about the granules (better resolution), and making a clear conceptual framework. Namely, why do KD of different exocysts have different effects on presumed granule formation

      The reviewer has raised a point that is central to the interpretation of all our data throughout the manuscript. The short answer is that the extent of RNAi-dependent silencing of exocyst subunits determines the phenotype: 

      1) Maximum silencing affects Golgi complex morphology and prevents SG biogenesis. 2) Intermediate silencing blocks SG maturation, without affecting Golgi complex morphology and SG biogenesis. 3) Weak silencing blocks SG tethering and fusion with the plasma membrane, without affecting Golgi complex morphology, SG biogenesis or SG maturation. 

      In other words, 1) Low levels of exocyst subunits are sufficient for normal Golgi complex morphology and SG biogenesis. 2) Intermediate levels of exocyst subunits are sufficient for SG maturation (and also sufficient for SG biogenesis). 3) High levels of exocyst subunits are required for SG tethering and subsequent fusion with the plasma membrane. 

      Based on the above notion, we have exploited the fact that temperature can fine-tune the level of Gal4/UAS-dependent transcription, thereby achieving different levels of silencing, as shown by Norbert Perrimon et al in their seminal paper “the level of RNAi knockdown can also be altered by using Gal4 lines of various strengths, rearing flies at different temperatures, or via coexpression of UAS-Dicer2” (Perkins et al, 2015). 

      We found in our system that indeed, by applying appropriate silencing conditions (RNAi line and temperature) to any of the eight subunits of the exocyst, we have been able to obtain one of the three alternative phenotypes: Impaired SG biogenesis, or impaired SG maturation, or impaired SG tethering/fusion with the plasma membrane.

      These concepts are summarized below in Author response image 4. Please see also at point 26, the general comment of Reviewer #3. 

      We have conducted qRT-PCR assays to provide experimental support to the notions summarized above in Author response image 4. We measured the remaining levels of mRNAs of some of the exocyst subunits, after inducing RNAi-mediated silencing at different temperatures, or with different RNAi transgenic lines. The remaining RNA levels after silencing correlate well with the observed phenotypes, following the predictions of Author response image 4 and summarized in Author response image 5. These new data are now shown in Supplementary Figure 2 of the revised version of the manuscript, and described in lines 153-159 at the Results section.

      (7) Why does just overexpression of a single subunit (Sec15) induce granule fusion?

      The reviewer raises a very important point. Based on available data from the literature, Sec15 behaves as a seed for assembly of the holocomplex and it also mediates the recruitment of the holocomplex to SGs through its interaction with Rab11 (Escrevente et al, 2021; Bhuin and Roy, 2019; Wu et al, 2005; Zhang et al, 2004; Guo et al, 1999). Thus, overexpression of Sec15 is expected to enhance exocyst assembly, thereby potentiating the activities carried out by the complex in the cell, including SG homofusion. In the revised version of the manuscript we have also performed the overexpression of Sec8, finding that, unlike Sec15, Sec8 fails to induce homotypic fusion. These results were expected, as they confirm that Sec8 does not behave as a seed for mounting the whole complex. These new data have been included in Figure 7E-H, and are described in text lines 221-229 of the Results section. 

      Author response image 4.

      Conceptual model of RNAi expression at different temperatures , remaining levels of mRNA/protein levels and phenotypes obtained at each temperature.

      Author response image 5.

      qRT-PCR assays presented in Supplementary Figure 2 are shown in combination with the phenotypes observed at each of the conditions analyzed. Note the correlation between phenotypes and the extent of mRNA downregulation.

      (8) While the paper is fascinating, the major comments need to be addressed to really be able to make better sense of this work, which at present is hard to disentangle direct vs. secondary effects, especially as much of the TGN seems to be altered in the KDs.  

      We hope that our response to point 6) has helped to clarify this important point raised by the Reviewer. After applying silencing conditions where normal structure of the trans-Golgi network is impaired, SG biogenesis does not occur. Thus, since SGs do not form, it is not conceivable to detect defects in SG maturation or SG fusion with the plasma membrane in the same cell.

      (9) The authors conveniently ascribe many of the results to the holocomplex, but their own data (Fig. 4 and Fig. 6) are at odds with this.

      This is another central point of our work, so we thank the reviewer for his/her comment. In Figures 4A, 7A and 9A of the revised version of the manuscript, we show that, by inducing appropriate levels of silencing of any of the 8 subunits of the exocyst, each of the three alternative phenotypic manifestations can occur. In our opinion, this argues in favour of a function for the whole exocyst complex in each of the three specific activities proposed in our study: 1) SG biogenesis, 2) SG maturation, and 3) SG tethering/fusion with the plasma membrane. In detailed characterizations of these three phenotypes performed throughout the study, we decided to induce silencing of just two or three of the subunits of the exocyst, assuming that the whole complex accounts the mechanisms involved.

      Major comments

      (10) Resolution not sufficient. Identification of "mature secretory granules" (MSG) in Fig. 3 is based on low-resolution images in which the MSG are not clearly seen (see control in Fig. 3A) and rather appear as a diffuse haze, and not as clear granules. There may be granules here, but as shown it is not clear. Thus it would be helpful to acquire images at higher resolution (at the diffraction limit, or higher) to see and count the MSG.

      We thank the reviewer for raising this point, as it may not be straightforward to the reader to identify the SGs throughout the figures of our study. To make it clearer, in Figure 3A (magnified insets on the right), we have delimitated individual SGs with a green dotted line, and included diagrams (far right), which we hope will help the identification of SGs. In Figure 3B, we show that after silencing Sec84, a mosaic phenotype was observed: In some cells SGs fail to undergo maturation, and remain smaller than normal. In other cells of this mosaic phenotype, biogenesis of SGs was impaired and the fluorescent cargo remained trapped in a mesh-like structure (that we later show that corresponds to the ER). The dotted line marks individual SGs, and the diagrams included on the right intend to help the interpretation of the phenotype. The mesh-like structures where Sgs3-GFP was retained are also marked with dotted line, and schematized on the right. These new schemes are described in the Figure 3 caption of the revised version of the manuscript.

      We wish to mention that all the confocal images depicted in this figure and throughout the manuscript  have been captured at high resolution, with a theoretical resolution limit of 168177nm (d = γ/2NA). Given that secretory granules range from 0.8-7µm in diameter, the resolution is more than sufficient to clearly resolve these structures. 

      (11) Note: the authors are not clear on which objective was used. Maybe the air objective as the resolution appears poor).  

      In this particular figure, we have utilized a Plan-Apochromat 63X/1.4NA oil objective of the inverted Carl Zeiss LSM 880 confocal microscope (mentioned in materials and methods).

      (12) They need to prove that the diffuse Sgs3-GFP haze is indeed due to MSG.  

      If we interpret correctly the concern of the reviewer, what he/she calls “diffuse haze” is actually the distribution of Sgs3-GFP within individual SGs, which, as previously reported by other authors, is not homogeneous at this stage (Syed et al. 2022). We hope that the diagrams that we have included in Figure 3 A, B (point 10) will help the readers interpreting the images.   

      (13) Related it is unclear what are the granule structures that correspond to Immature secretory granules (ISG) and cells with mesh-like structures (MLS)?

      We are confident that the diagrams now included in Figure 3A and B will help the interpretation, and particularly to identify immature granules and the mesh-like structure generated after silencing of exocyst subunits.

      (14) Similarly, Sgs3 images of KD of 8 exocyst subunits were interpreted to be identical, in Fig. 4, but the resolution is poor.

      We hope that the issue related to resolution of our images has been properly addressed in the response to point 10) of this letter. In Figure 4A, we show that after silencing of any of the 8 subunits (with the appropriate conditions), in all cases SG biogenesis was impaired, and Sgs3GFP was instead retained in a mesh-like structure. Images obtained after silencing different exocyst subunits are of course not identical, but in all cases, a mesh-like structure has replaced the formation of SGs (Figure 4A). Hopefully, the diagrams now included in Figure 3A and B help the correct interpretation of the phenotypes throughout the study.

      To demonstrate that the structure in which Sgs3-GFP was retained upon exocyst complex knockdown corresponds to the ER, we performed a colocalization analysis between Sgs3-GFP and the ER markers GFP-KDEL or Bip-sfGFP-HDEL, after which we calculated the Pearsons Coefficient, which indicated substantial colocalization (Figure 4B-G and Supplementary Figures 7 and 8). These new data are described in lines 196-199 of the revised version of the manuscript. To facilitate the visualization of the results, in the revised version of the manuscript we have included magnified cropped areas of the images shown in Figure 4A.

      (15) What is remarkable is a highly variable effect of different subunit KD on the percentage of cells with MLS (Fig. 4C). Controls = 100 %, Exo70=~75% (at 19 deg), Sec3 = ~30%, Sec10 = 0%, Exo84 = 100% ... This is interesting for the functional exocyst is an octameric holocomples, thus why the huge subunit variability in the phenotypes? The trivial explanation is either: i) variable exocyst subunit KD (not shown) or ii) variability between experiments (no error bars are shown). Both should be addressed by quantification of the KD of different proteins and secondly by replicating the experiments.

      We agree with the reviewer statement. We believe that both, variability of KD efficiency (i) and variability between experiments (ii) contribute to the variable effect observed after knocking down the different subunits. As detailed in the response to point 6), we have performed qRT-PCR determinations to confirm that the severity of the phenotype depends on the efficiency of RNAimediated silencing. We chose to analyse in detail the effect on the subunits exo70 and sec3, which were those with the highest phenotypic differences between the three silencing temperatures utilized. We found that as expected, the levels of silencing were temperaturedependent, being higher at 29°C and lower at 19°C. These data were included in Supplementary Figure 2, and described lines 153-159 of the Results section and also summarized in Author response images 4 and 5 of this rebuttal letter.

      We thank the reviewer for his/her comment on the replication of experiments and statistics. We failed to include detailed numerical information in the original submission, such as the number of replicas and standard deviations of the data depicted in Figure 3C and Supplementary Figure 1, so we apologize for this omission. In the revised version of the manuscript, we have included a table (Supplementary Table 3) in which all the raw data of Figure 3C and Supplementary Figure 1, including standard deviations, are now depicted.

      (16) If their data holds up then the underlying mechanism here needs to be considered.

      (Note: there is some precedent from the autophagy field of differential exocyst effects)

      Our proposed mechanism is essentially that the holocomplex is required for multiple processes along the secretory pathway. Each of these actions (Golgi structure maintenance, SG maturation and SG tethering/fusion with the plasma membrane) requires different amounts of holocomplex activity, being this the reason why each phenotype manifests at different levels of RNAi-mediated silencing (Author response image 4 of this letter). The model predicts that Golgi structure maintenance requires minimal levels of complex activity, and that is why strong knock-down of exocyst subunits is required to obtain this phenotype. In line with our results, it has been reported that other tethering complexes of the CATCHR family are also required for maintaining Golgi cisternae stuck together (D'Souza et al, 2020; Khakurel and Lupashin, 2023; Liu et al, 2019). One possibility is that the exocyst may play a redundant role in the maintenance of the normal structure of the Golgi complex, along with other CATCHR complexes. This potential redundancy could explain why severe exocyst knock-down is required to observe structural anomalies at this organelle. On the other end of the spectrum, we propose that tethering/fusion with the plasma membrane is very susceptible to even slight reduction of complex activity, so that mild RNAi-mediated silencing is sufficient to provoke defects in this process. This proposed model is depicted in Author response image 4 and discussed in lines 395-405 of the Discussion section. 

      (17) In the salivary glands the authors state that the exocyst is needed for Sgs3-GFP exit from the ER. First, Pearson's coefficient should be shown so as to quantitate the degree of ER localizations of all KDs.

      We thank the reviewer for this comment that helped us to strengthen the observation that when SG biogenesis is impaired, Sgs3-GFP remains trapped in the ER. In the revised version of the manuscript, we have calculated Pearson´s coefficient to assess colocalization between ER markers (GFP-KDEL or Bip-sfGFP-HDEL) and Sgs3-GFP in salivary gland cells that express sec15RNAi. The Pearson’s coefficient was around 0.6 for both ER markers, indicating that colocalization with Sgs3-GFP was substantial (Supplementary Figure 8, text lines 196-199 of the Results section).

      (18) Second, there should be some rescue performed (if possible) to support specificity. 

      As suggested by the reviewer, we have performed a rescue experiment of the phenotype provoked by the expression of sec15 RNAi, which consisted on the retention of Sgs3-GFP in the endoplasmic reticulum: Expression of Sec15-GFP reverted substantially the ER retention phenotype, rescuing SG biogenesis and also SG maturation in most cells (over 60% of the cells). These new data are now shown in Supplementary Figure 4, and described in lines 168-171 of the Results section.

      (19) Third, importantly other proteins that should traffic to the PM need to be shown to traffic normally so as to rule out a non-specific effect.

      We have addressed this issue (also mentioned by Reviewer #1), by analyzing the localization of a number of polarization markers, finding that the overall polarization of the cell was not affected by loss of function of exocyst subunits. Please, see our response to the point 3) raised by Reviewer #1. The new data showing cell polarization markers are shown in Supplementary Figure 6 of the revised version of the manuscript, and described on text lines 172-179 of the Results section.

      (20) It is unclear from their model (Fig. 5) why after exocyst KD of Sec15 the cis-Golgi is more preserved than the TGN, which appears as large vacuoles. This is not quantitated and not shown for the 8 subunits.

      We thank the reviewer for this relevant comment. We agree that the phenotype of either, sec15 or sec3 loss-of-function cells manifests differently with cis-Golgi and trans-Golgi markers. While the cis-Golgi marker looked fragmented and aggregated, the trans-Golgi marker adopted a swollen appearance. However, in our view, the different appearance of the two markers does not necessarily imply that one compartment is more preserved than the other. In the revised version of the manuscript, we have quantified the penetrance of the phenotypes provoked by sec15 or sec3 silencing, using both cis-Golgi and trans-Golgi markers. In both cases, the penetrance was high, although even higher with the trans-Golgi marker. These new data are now depicted in Supplementary Figure 9 of the revised version of the manuscript. 

      It is interesting to mention that in HeLa cells, as well as in the retinal epithelial cell line hTERT, Golgi phenotypes similar to those we have described here have been reported after loss-offunction of other tethering complexes, which were shown to maintain the Golgi cisternae stuck together, including the GOC and GARP complexes (D'Souza et al, 2020, Khakurel and Lupashin, 2023; Shijie Liu et al, 2019). As we did throughout our work, not every aspect of the analysis included the silencing of all eight subunits. In this case, we chose to silence Sec3 and Sec15. Please note that we have modified the model depicted in Figure 6E-F, to highlight the cis- and transGolgi phenotypes upon exocyst knock-down, as well as the localization of the exocyst in cisternae of the Golgi complex.

      (21) Acute/Chronic control: It would be nice to acutely block the exocyst so as to better distinguish if the effects observed are primary or secondary effects (e.g. on a recycling pathway).

      We thank the reviewer for raising this important issue. To address this point, and to be able to induce silencing of exocyst subunits at specific time intervals of larval development, we utilized a strategy based on a thermosensitive variant of the Gal4 inhibitor Gal80 (Gal80ts)(Lee and Luo, 1999). We blocked Gal4 activity (and therefore RNAi expression) by maintaining the larvae at 18 °C during the 1st and 2nd instars (until 120 hours after egg lay), and then induced the activity of Gal4 specifically at the 3rd larval instar by raising the temperature to 29 ºC, a condition in which Gal80ts becomes inactive. After silencing the expression of sec3 or sec15 at the 3rd larval instar only, the phenotype was very similar to that observed after chronic silencing of exocyst subunits (larvae maintained at 29 ºC all throughout development, where Gal4 was never inhibited). These observations suggest that the defects observed in the secretory pathway after knock down of exocyst subunits reflect genuine functions of the exocyst in this pathway, rather than a secondary effect derived from impaired development of the salivary glands at early larval stages. These new results are now shown in Supplementary Figure 3, and described in manuscript lines 160-171 of the Results section.   

      (22) Granule homotypic fusion. Strangely over-expression of just one subunit, Sec15-GFP, made giant secretory granules (SG) that were over 8 microns big! Why is that, especially if normally the exocyst is normally a holocomplex. Was this an effect that was specific to Sec15 or all exocyst subunits? Is the Sec15 level rate limiting in these cells? It may be that a subcomplex of Sec15/10 plays earlier roles, but in any case this needs to be addressed across all (or many) of the exocyst subcomplex members.

      Please, see our response to point 7) of this letter. Sec15 is believed to act as a seed for the formation of the whole complex.

      (23) In summary, there are clearly striking effects on secretory granule biogenesis by dysfunction of the exocyst, however right now it is hard to disentangle effects on ERGolgi traffic, loss of the TGN, and a problem in maturation or fusion of granules. 

      As discussed in detail in our response to the point 3 raised by Reviewer #1, the secretory pathway is highly synchronized in each of the cells of the Drosophila salivary gland. SG biogenesis, SG maturation and SG fusion with the plasma membrane never occur simultaneously in the same cell. Thus, in a cell in which ER-Golgi traffic is impaired (and SG biogenesis does not occur), SGs do not exist, and therefore, they cannot exhibit defects in the process of maturation or fusion with the plasma membrane. In summary, we believe that our work has shown that in Drosophila larval salivary glands the exocyst holocomplex is required for (at least) three functions along the secretory pathway: 1) To maintain the appropriate Golgi complex architecture, thus enabling ERGolgi transport; 2) For secretory granule maturation: both, homotypic fusion and acquisition of maturation factors; 3) For secretory granule exocytosis: secretory granule tethering to enable subsequent fusion with the plasma membrane. As mentioned above (point 6 of this letter), these three functions require different amounts of the holocomplex, and therefore can be revealed by inducing different levels of silencing.  

      (24) It is also confusing if the entire exocyst holocomplex or subcomplex plays a key role 

      The fact that, by silencing any of the subunits (with the appropriate conditions) it is possible obtain any of the 3 phenotypes (impaired SG biogenesis, impaired SG maturation or impaired SG fusion with the plasma membrane) argues in favour of a function of the complex as a whole in each of these three functions.

      Reviewer 3:

      (25) General comment: Freire and co-authors examine the role of the exocyst complex during the formation and secretion of mucins from secretory granules in the larval salivary gland of Drosophila melanogaster. Using transgenic lines with a tagged Sgs3 mucin the authors KD expression of exocyst subunit members and observe a defect in secretory granules with a heterogeneity of phenotypes. By carefully controlling RNAi expression using a Gal4-based system the authors can KD exocyst subunit expression to varying degrees. The authors find that the stronger the inhibition of expression of exocyst the earlier in the secretory pathway the defect. The manuscript is well written, the model system is physiological, and the techniques are innovative.

      We appreciate the reviewer´s assessment of our work. 

      (26) My major concern is that the evidence underlying the fundamental claim of the manuscript that "the exocyst complex participates" in multiple secretory processes lacks direct evidence.

      We thank the reviewer for raising this important issue. We believe that the analysis of Sec15 subcellular localization during salivary gland development (Figures 5, 7B-D and 9E-F), in combination with the detailed analysis of the phenotypes provoked by loss-of-function of each of the exocyst subunits, provide evidence supporting multiple functions of the exocyst in the secretory pathway. We have also included 3D reconstructions and videos of GFP-Sec15 colocalization with Golgi and SG markers to support exocyst localization associated to these structures (Supplementary Videos 1-7), text lines 200-210; 216-221 and 303-305.

      (27) It is clear from multiple lines of evidence, which are discussed by the authors, that exocyst is essential for an array of exocytic events. The fundamental concern is that loss of homeostasis on the plasma membrane proteome and lipidome might have severe pleiotropic effects on the cell.

      We agree with the reviewer that this is an important point that needed to be addressed. As discussed in detail above at the response to point 3 raised by Reviewer #1, we have analysed several plasma membrane markers (including a PI(4,5)P2 lipid reporter), and found that overall, plasma membrane integrity and polarity were not substantially affected (Supplementary Figure 6). In addition, we have analyzed several markers of general cellular “health” that indicate that salivary gland cells do not seem to be distressed by the reduction of exocyst complex activity (Supplementary Figure 5). These new data are described in lines 172-179 of the Results section.

      (28) Perhaps the authors have more evidence that exocyst is important for homeotypic fusion of the SGs, as supported by the localisation of Sec15 on the fusion sites.

      We believe that the fact that, by silencing any of the exocyst subunits (with the appropriate conditions), immature smaller-than-normal granules were observed, argus in favour that the exocyst as a whole participates in SG homofusion (Figure 7A). In addition, we have included more images, quantifications, 3D reconstructions and videos of GFP-Sec15 localized just at the contact sites between immature SGs. We have quantified and compared GFP-Sec15 localization at immature SG vs its localization at mature SGs, finding that localizes preferentially at immature SGs, supporting a role of the exocyst as a tethering complex during homotypic fusion (shown Figure 7B-C and Supplementary Videos 4-6, and described in lines 216-221 of the Results section). Please see also our response to the point 2 raised by reviewer 1 in this rebuttal letter, and to Author response image 3 above in this letter.

      (29) The second question that I think is important to address is, what exactly do the varying RNAi levels correspond to in terms of experiments, and have these been validated? Due to the fundamental claim being that the severity of the phenotype being correlated with the level of KD, I think validation of this model is absolutely essential.  

      We thank the Reviewer for raising this important point, and agree it was lacking in the original version of our manuscript. As discussed in our response to the point 6) raised by Reviewer #2, we have performed qRT-PCR determinations for exo70 and sec3 mRNA levels after inducing silencing of these subunits at different temperatures, or with different RNAi transgenic lines. The remnant mRNA levels correlate well with the observed phenotypes. Please see Supplementary Figure 2 of the revised manuscript, and Author response image 5 of this rebuttal letter; described in lines 155-159 of the Results section. 

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      -  The authors assert in the discussion that exocyst involvement in constitutive secretion is well documented. This is based on a very recent study in mammalian culture cells. Therefore, I would not dismiss the issue as completely settled. Furthermore, a previous study of Drosophila sec10 reported no roles outside the ring gland (DOI: 10.1034/j.1600-0854.2002.31206.x).

      We have included these observations in the Discussion section. Lines 326-329.

      -  A salivary gland screening by Julie Brill's lab reported exocyst components as hits (DOI: 10.1083/jcb.201808017).

      We have referred to this paper in the Discussion section. Lines 326-329.

      -  It should be explained in more detail what is measured in graphs 7C, F, and others quantifying fluorescence around secretory granules. Looking at the images, the decrease in Rab1 and Rab11 seems less convincing.

      We have made a clearer description of how fluorescence intensity was measured in the Methods section lines 558-561. Also, we have uploaded a source data file in which the raw data of each experiment used for quantifications are disclosed. 

      Please note that the data indicates that Rab11 levels are higher in sec5 (Figure 8J-L) and sec3 (supplementary Figure 11M-R).

      Reviewer #2 (Recommendations For The Authors):

      No major issues.

      Writing - The authors should better frame their interpretations of other studies of the exocyst that include the role in autophagy, Palade body trafficking, and differential roles of the subunits.

      We have discussed these specific points in the Discussion section, lines 348-355 and 409-410.

      Minor - Fig. 6A: Why are variable temperatures (19-29 deg C used for the 8 KD experiments)?

      Please show it all at the same temperature (control too).

      The need for the usage of specific temperatures to obtain specific phenotypes with each of the RNAi lines used was explained in point 6 of this letter.

      Reviewer #3 (Recommendations For The Authors):

      In the abstract, the authors refer to the exocytic process and go on to describe secretory granule biogenesis and exocytosis. However, there are many exocytic processes aside from secretory granule biogenesis, and I think the authors should clarify this.

      Corrected in the Abstract. Lines 19-21

      Page 17 Thomas, 2021 reference, there is a glitch with the reference.

      Thanks for noticing. Fixed.

      References

      Bhuin T, Roy JK. Developmental expression, co-localization and genetic interaction of exocyst component Sec15 with Rab11 during Drosophila development. Exp Cell Res. 2019 Aug 1;381(1):94-104. doi: 10.1016/j.yexcr.2019.04.038. Epub 2019 May 7. PMID: 31071318.

      D'Souza Z, Taher FS, Lupashin VV. Golgi inCOGnito: From vesicle tethering to human disease. Biochim Biophys Acta Gen Subj. 2020 Nov;1864(11):129694. doi: 10.1016/j.bbagen.2020.129694. Epub 2020 Jul 27. PMID: 32730773; PMCID: PMC7384418.

      Escrevente C, Bento-Lopes L, Ramalho JS, Barral DC. Rab11 is required for lysosome exocytosis through the interaction with Rab3a, Sec15 and GRAB. J Cell Sci. 2021 Jun 1;134(11):jcs246694. doi: 10.1242/jcs.246694. Epub 2021 Jun 8. PMID: 34100549; PMCID: PMC8214760.

      Guo W, Roth D, Walch-Solimena C, Novick P. The exocyst is an effector for Sec4p, targeting secretory vesicles to sites of exocytosis. EMBO J. 1999 Feb 15;18(4):1071-80. doi: 10.1093/emboj/18.4.1071. PMID: 10022848; PMCID: PMC1171198.

      Jafar-Nejad H, Andrews HK, Acar M, Bayat V, Wirtz-Peitz F, Mehta SQ, Knoblich JA, Bellen HJ. Sec15, a component of the exocyst, promotes notch signaling during the asymmetric division of Drosophila sensory organ precursors. Dev Cell. 2005 Sep;9(3):351-63. doi: 10.1016/j.devcel.2005.06.010. PMID: 16137928.

      Khakurel A, Lupashin VV. Role of GARP Vesicle Tethering Complex in Golgi Physiology. Int J Mol Sci. 2023 Mar 23;24(7):6069. doi: 10.3390/ijms24076069. PMID: 37047041; PMCID: PMC10094427.

      Lattner J, Leng W, Knust E, Brankatschk M, Flores-Benitez D. Crumbs organizes the transport machinery by regulating apical levels of PI(4,5)P2 in Drosophila. Elife. 2019 Nov 7;8:e50900. doi: 10.7554/eLife.50900. PMID: 31697234; PMCID: PMC6881148.

      Lee T, Luo L. Mosaic analysis with a repressible cell marker for studies of gene function in neuronal morphogenesis. Neuron. 1999 Mar;22(3):451-61. doi: 10.1016/s08966273(00)80701-1. PMID: 10197526.

      Liu S, Majeed W, Grigaitis P, Betts MJ, Climer LK, Starkuviene V, Storrie B. Epistatic Analysis of the Contribution of Rabs and Kifs to CATCHR Family Dependent Golgi Organization. Front Cell Dev Biol. 2019 Aug 2;7:126. doi: 10.3389/fcell.2019.00126. PMID: 31428608; PMCID: PMC6687757.

      Perkins LA, Holderbaum L, Tao R, Hu Y, Sopko R, McCall K, Yang-Zhou D, Flockhart I, Binari R, Shim HS, Miller A, Housden A, Foos M, Randkelv S, Kelley C, Namgyal P, Villalta C, Liu LP, Jiang X, Huan-Huan Q, Wang X, Fujiyama A, Toyoda A, Ayers K, Blum A, Czech B, Neumuller R, Yan D, Cavallaro A, Hibbard K, Hall D, Cooley L, Hannon GJ, Lehmann R, Parks A, Mohr SE, Ueda R, Kondo S, Ni JQ, Perrimon N. The Transgenic RNAi Project at Harvard Medical School: Resources and Validation. Genetics. 2015 Nov;201(3):843-52. doi: 10.1534/genetics.115.180208. Epub 2015 Aug 28. PMID: 26320097; PMCID: PMC4649654.

      Wu S, Mehta SQ, Pichaud F, Bellen HJ, Quiocho FA. Sec15 interacts with Rab11 via a novel domain and affects Rab11 localization in vivo. Nat Struct Mol Biol. 2005 Oct;12(10):879-85. doi: 10.1038/nsmb987. Epub 2005 Sep 11. PMID: 16155582.

      Yeaman C, Grindstaff KK, Wright JR, Nelson WJ. Sec6/8 complexes on trans-Golgi network and plasma membrane regulate late stages of exocytosis in mammalian cells. J Cell Biol. 2001 Nov 12;155(4):593-604. doi: 10.1083/jcb.200107088. Epub 2001 Nov 5. PMID: 11696560; PMCID: PMC2198873.

      Zhang XM, Ellis S, Sriratana A, Mitchell CA, Rowe T. Sec15 is an effector for the Rab11 GTPase in mammalian cells. J Biol Chem. 2004 Oct 8;279(41):43027-34. doi: 10.1074/jbc.M402264200. Epub 2004 Jul 29. PMID: 15292201.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      The authors present a potentially useful model involving Ca2+ signaling in inflammasome activation. As it stands, it was felt that the data were not sufficient to support the model and the claims of the study are inadequately presented.

      Public Reviews:

      Reviewer #1 (Public Review):

      This manuscript proposes a complex unclear model involving Ca2+ signaling in inflammasome activation. The experimental approaches used to study the calcium dynamics are problematic and the results shown are of inadequate quality. The major claims of this manuscript are not adequately substantiated.

      Major concerns:

      (1) The analysis of lysosomal Ca2+release is being carried out after many hours of treatment. Such evidence is not meaningful to claim that PA activates Ca2+ efflux from lysosome and even if this phenomenon was robust, it is not doubtful that such kinetics are meaningful for the regulation of inflammasome activation. Furthermore, the evidence for lysosomal Ca2+ release is indirect and relies on a convoluted process that doesn't make any conceptual sense to me. In addition to these major shortcomings, the indirect evidence of perilysosomal Ca2+ elevation is also of very poor quality and from the standpoint of my expertise in calcium signaling, the data are incredulous. The use of GCaMP3-ML1, transiently transfected into BMDMs is highly problematic. The efficiency of transfection in BMDMs is always extremely low and overexpression of the sensor in a few rare cells can lead to erroneous observations. The overexpression also results in gross mislocalization of such membrane-bound sensors. The accumulation of GCaMP3-ML1 in the ER of these cells would prevent any credible measurements of perilysosomal Ca2+ signals. A meaningful investigation of this process in primary macrophages requires the generation of a mouse line wherein the sensor is expressed at low levels in myeloid cells, and shown to be localized almost exclusively in the lysosomal membrane. The mechanistic framework built around these major conceptual and technical flaws is not especially meaningful and since these are foundational results, I cannot take the main claims of this study seriously.

      Ans) We agree with the reviewer’s concern that transfection efficiency could be low in BMDMs together with possible mislocalization of GCAMP3-ML1. However, in our experiment, transfection of BMDM with test plasmids resulted in good expression of test proteins. Below, we present our data showing good transfection efficiency of BMDM cells, while a different plasmid was employed.

      Author response image 1.

      (2) The cytosolic Ca2+ imaging shown in Figure 1C doesn't make any sense. It looks like a snapshot of basal Ca2+ many hours after PA treatment - calcium elevations are highly dynamic. Snapshot measurements are not helpful and analyses of Calcium dynamics requires a recording over a certain timespan. Unfortunately, this technical approach has been used throughout the manuscript. Also, BAPTA-AM abrogates IL-1b secretion because IL-1b transcription is Ca2+ dependent - the result shown in figure 1D does not shed light on anything to do with inflammasome activation and it is misleading to suggest that.

      Ans) We agree with the reviewer’s concern that snapshot could lead to false conclusion. We have not traced cytosolic Ca2+ content after treatment with LPS + PA. However, we have traced lysosomal Ca2+ and ER Ca2+ for more than 15 min, which was presented in Figure 4B. We also agree with the comment that BAPTA-AM might affect transcription of pro-IL-1β. We have conducted immunoblot analysis after treatment with LPS+PA in the presence of BAPTA-AM. Protein band of pro-IL-1β was not affected by BAPTA-AM treatment suggesting no effect of BAPTA-AM on transcription or translation of pro-IL-1β, which was added to Figure 1D, as suggested.

      (3) Trpm2-/- macrophages are known to be hyporesponsive to inflammatory stimuli - the reduced secretion of IL-1b by these macrophages is not novel. From a mechanistic perspective, this study does not add much to that observation and the proposed role of TRPM2 as a lysosomal Ca2+ release channel is not substantiated by good quality Ca2+ imaging data (see point 3 above). Furthermore, the study assumes that TRPM2 is a lysosomal ion channel. One paper reported TRPM2 in the lysosomes but this is a controversial claim, with no replication or further development in the last 14 years. This core assumption can be highly misleading to readers unfamiliar with TRPM2 biology and it is necessary to present credible evidence that TRPM2 is functional in the lysosomal membrane of macrophages. Ideally, this line of investigation should rest on robust demonstration of TRPM2 currents in patch-clamp electrophysiology of lysosomes. If this is not technically feasible for the authors, they should at least investigate TRPM2 localization on lysosomal membranes of macrophages.

      Ans) We agree with the reviewer’s comment that TRPM2. However, we have shown that TRPM2 current was not activated in the plasma membrane of BMDMs after treatment with LPS+PA. We also agree with the reviewer’s comment that inflammatory cytokine release from TRPM2 KO cells or inflammasome response of TRPM2 KO macrophages to ROS or nanoparticles has been reported to be reduced; however, the role of TRPM2 in metabolic inflammation or inflammasome activation in response to lipid stimulators has not been shown, as discussed in the new lines 9-10 from the bottom of page 18. Regarding the role of lysosomal TRPM2 in inflammation, we have shown that bafilomycin A1 treatment abrogated increase of cytosolic Ca2+ by LPS+PA (Figure 3-figure supplement 1D), supporting the role of lysosome and lysosomal Ca2+ in inflammasome activation by LPS+PA.

      We agree with the reviewer’s comment that TRPM2 expression on lysosome needs to be tested. We conducted confocal microscopy after immunofluorescence staining using anti-TRMP2 and -LAMP2 antibodies, which showed a certain portion of TRPM2 was colocalized with LAMP-2. This result substantiating TRPM2 expression on lysosome of macrophages was incorporated as Figure 2-figure supplement 1A.

      (4) Apigenin and Quercetin are highly non-specific and their effects cannot be attributed to CD38 inhibition alone. Such conclusions need strong loss of function studies using genetic knockouts of CD38 - or at least siRNA knockdown. Importantly, if indeed TRPM2 is being activated downstream of CD38, this should be easily evident in whole cell patch clamp electrophysiology. TRPM2 currents can be resolved using this technique and authors have Trpm2-/- cells for proper controls. Authors attempted these experiments but the results are of very poor quality. If the TRPM2 current is being activated through ADPR generated by CD38 (in response to PA stimulation), then it is very odd that authors need to include 200 uM cADPR to see TRPM2 current (Fig. 3A). Oddly, even these data cast great doubt on the technical quality of the electrophysiology experiments. Even with such high concentrations of cADPr, the TRPM2 current is tiny and Trpm2-/- controls are missing. The current-voltage relationship is not shown, and I feel that the results are merely reporting leak currents seen in measurements with substandard seals. Also 20 uM ACA is not a selective inhibitor of TRPM2 - relying on ACA as the conclusive diagnostic is problematic.

      Ans) We agree with the reviewer’s comment that effects of apigenin and quercetin could be due to mechanisms other than inhibition of CD38-mediated inflammasome activation. Indeed, that is the reason we have used TRPM2 KO mice and cells. Small TRPM2 current after treatment with high concentrations of cADPr might suggest the minor role of plasma membrane of TRPM2 in macrophage. Regarding concern about ACA, we added data showing inhibition of IL-1β release in response to LPS+PA by ACA as a new Figure 3-figure supplement 1A.

      (5) TRPM2 is expressed in many different cell lines. The broad metabolic differences observed by the authors in the Trpm2-/- mice cannot be attributed to macrophage-mediated inflammation. Such a conclusion requires the study of mice wherein Trpm2 is deleted selectively in macrophages or at least in the cells of the myeloid lineage.

      Ans) We agree with the reviewer’s comment that TRPM2 in cells other than macrophage might have affected the results. Thus, we have conducted in vitro stimulation of TRPM2-KO primary peritoneal macrophages with LPS+PA. We have observed that IL-1β release of TRPM2-KO macrophages in response in vitro treatment with LPS+PA was significantly lower than that from wild-type macrophages (Figure 2C & D), showing the role of TRPM2 in macrophages in inflammasome activation by LPS+PA, which could be independent of TRPM2 in tissues or cells other than macrophages.

      (6) The ER-Lysosome Ca2+ refilling experiments rely on transient transfection of organelle-targeted sensors into BMDMs. See point #1 to understand why I find this approach to be highly problematic. Furthermore, the data procured are also not convincing and lack critical controls (localization of sensors has not been demonstrated and their response to acute mobilization of Ca2+ has not been shown to inspire any confidence in these results).

      Ans) We agree with the reviewer’s comment that transfection or ER-targeted Ca2+ sensor could have artifactual effects. However, we have studied ER-Lysosome Ca2+ experiment using not only GEM-CEPIAer but also using D1ER, a FRET-based ER Ca2+ sensor which has an advantage of short distance of molecular interaction. Thus, we believe that changes of ER Ca2+ after treatment with LPS+PA is not due to an artifactual effect. Multiple contact between VAPA and ORP1L (Figure 4E) also supports ER-lysosome contact, likely facilitating ER-lysosome Ca2+ flux.

      (7) Authors claim that SCOE is coupled to K+ efflux. But there is no credible evidence that SOCE is activated in PA stimulated macrophages. The data shown in Fig 4 supp 1 do not investigate SOCE in a reliable manner - the conclusion is again based on snapshot measurements and crude non-selective inhibitors. The correct way to evaluate SOCE is to record cytosolic Ca2+ elevations over a period of time in absence and presence of extracellular Ca2+. However, even such recordings can be unreliable since the phenomenon is being investigated hours after PA stimulation. So, the only definitive way to demonstrate that Orai channels are indeed active during this process is through patch clamp electrophysiology of PA stimulated cells.

      Ans) We agree with the reviewer’s comment that the final proof of SOCE activation is activation of Orai channel evidenced by electrophysiology. However, we have shown STIM1 aggregation colocalized with Ora1, which is another strong evidence of SOCE channel activation (Vaca L. Cell Calcium 47:199, 2010). Such a paper showing the role of SOCE aggregation in SOCE activation was incorporated in the text (line 4 from the bottom of page 10) and References.

      Reviewer #2 (Public Review):

      In this manuscript by Kang et. al., the authors investigated the mechanisms of K+-efflux-coupled SOCE in NLRP3 inflammasome activation by LP(LPS+PA, and identified an essential role of TRPM2-mediated lysosomal Ca2+ release and subsequent IP3Rs-mediated ER Ca2+ release and store depletion in the process. K+ efflux is shown to be mediated by a Ca2+-activated K+ channel (KCa3.1). LP-induced cytosolic Ca2+ elevation also induced a delayed activation of ASK1 and JNK, leading to ASC oligomerization and NLRP3 inflammasome activation. Overall, this is an interesting and comprehensive study that has identified several novel molecular players in metabolic inflammation. The manuscript can benefit if the following concerns could be addressed:

      (1) The expression of TRPM2 in the lysosomes of macrophages needs to be more definitively established. For instance, the cADPR-induced TRPM2 currents should be abolished in the TRPM2 KO macrophages. Can you show the lysosomal expression of TRPM2, either with an antibody if available or with a fluorescently-tagged TRPM2 overexpression construct?

      Ans) We agree with the reviewer’s comment that TRPM2 expression on lysosome needs to be tested. We conducted confocal microscopy after immunofluorescent staining using anti-TRMP2 and -LAMP2 antibodies, which showed a certain portion of TRPM2 was colocalized with LAMP2. This result was incorporated as Figure 2-figure supplement 1A.

      (2) Can you use your TRPM2 inhibitor ACA to pharmacologically phenocopy some results, e.g., about [Ca2+]ER, [Ca2+]LY, and [Ca2+]i from the TRPM2 knockout? Ans) We agree with the reviewer’s comment that the effect of ACA on other experimental results needs to be shown. We did not study the effect of ACA on Ca2+ flux; however, we have observed that ACA inhibited IL-1β release in response to LPS+PA. This data was incorporated as Figure 3-figure supplement 1A.

      Author response image 2.

      (3) In Fig. S4A, bathing the cells in zero Ca2+ for three hours might not be ideal. Can you use a SOCE inhibitor, e.g, YM-58483, to make the point?

      Ans) We agree with the reviewer’s comment that SOCE inhibitor experiment would be necessary in addition to the experiment employing zero Ca2+. In fact, we have already used two SOCE inhibitors (2-APB and BTP2) (Figure 4-fig. supplement 1 B-D. Particularly, BTP2 experiment could eliminate possible role of ER Ca2+ inhibition that might occur when 2-APB was employed.

      (4) In Fig. 1A, you need a positive control, e.g., ionomycin, to show that the GPN response was selectively reduced upon LP treatment.

      Ans) We did not employ ionomycin as a control in this study. In our previous study using other agents inducing lysosomal Ca2+ efflux, we have observed lysosomal Ca2+ efflux with intact subsequent ionomycin response. While we did not include ionomycin in the current paper, we are positive that ionomycin response would be preserved.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      See Public Review.

      Reviewer #2 (Recommendations For The Authors):

      (5) In Fig. 4B, the red label should read "BAPTA-1 Dextran", but not "GAPTA-1 Dextran".

      (6) Writing should be improved in many sections.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Response to the Referee Comments We would like to express our appreciation to the editor and the reviewers for their thoughtful comments and constructive suggestions on the manuscript. We agree with most of the comments and have carefully revised the manuscript accordingly. The revisions are highlighted in red font in the revised manuscript. Below are point-by-point responses to the referee’s comments.

      Public Reviews:

      Reviewer #1 (Public Review):

      Microglia are increasingly recognized as playing an important role in shaping the synaptic circuit and regulating neural dynamics in response to changes in their surrounding environment and in brain states. While numerous studies have suggested that microglia contribute to sleep regulation and are modulated by sleep, there has been little direct evidence that the morphological dynamics of microglia are modulated by the sleep/wake cycle. In this work, Gu et al. applied a recently developed miniature two-photon microscope in conjunction with EEG and EMG recording to monitor microglia surveillance in freely-moving mice over extended period of time. They found that microglia surveillance depends on the brain state in the sleep/wake cycle (wake, non-REM, or REM sleep). Furthermore, they subjected the mouse to acute sleep deprivation, and found that microglia gradually assume an active state in response. Finally, they showed that the state-dependent morphological changes depend on norepinephrine (NE), as chemically ablating noradrenergic inputs from locus coeruleus abolished such changes; this is in agreement with previous publications. The authors also showed that the effect of NE is partially mediated by β2-adrenergic receptors, as shown with β2-adrenergic receptor knock-out mice. Overall, this study is a technical tour de force, and its data add valuable direct evidence to the ongoing investigations of microglial morphological dynamics and its relationship with sleep. However, there are a number of details that need to be clarified, and some conclusions need to be corroborated by more control experiments or more rigorous statistical analysis. Specifically:

      1. The number of branch points per microglia shown here (e.g., Fig. 2g) is much lower than the values of branch points in the literature, e.g., Liu T et al., Neurobiol. Stress 15: 100342, 2021 (mouse dmPFC, IHC); Liu YU et al., Nat. Neurosci. 22: 1771-81, 2019 (mouse S1, in vivo 2P imaging). The authors need to discuss the possible source of such discrepancy.

      Thank you for raising this important point. Two reasons may account for this difference. Firstly, the difference in the definition of branch points in the software. Liu YU et al. used the Sholl analysis of image J software to analyze the number of branch points of microglia. Sholl analysis defines the number of branch points as the number of crossings between branches and concentric circles of increasing radii. We reconstructed microglia morphology using Imaris, a software that defines branching points based on the number of bifurcation points. The number of bifurcations calculated represents the number of microglia branch points. Secondly, this and previous studies found that more branching points present in the state of anesthesia. The morphological characteristics of microglia in head-fixed mice under anesthesia was reported by Liu T et al. and the microglia reconstruction results presented by the authors are indeed more complex than ours. In short, this is an aspect that we have been paying attention to, and the main reasons for this difference may lie in the definition of branch points, analysis methods and related choice of thresholds. True differences in brain states and the heterogeneity of microglia in different brain regions may also contribute to the apparent discrepancy.

      1. Microglia process end-point speed (Fig. 2h, o): here the authors show that the speed is highest in the wake state and lowest in NREM, which agrees with the measurement on microglia motility during wakefulness vs NREM in a recent publication (Hristovska I et al., Nat. Commun. 13: 6273, 2022). However, Hristovska et al. also reported lower microglia complexity in NREM vs wake state, which seems to be the opposite of the finding in this paper. The authors need to discuss the possible source of such differences.

      This is also an important point. Hristovska et al. reported the morphodynamic characteristics of microglia during wakefulness and NREM sleep. It is worth noting that the sleep state of the mice in their experiments was unnatural due to the head fixation and body limitations, the duration of NREM sleep (sleep stability) being quite different from the NREM sleep analyzed under natural sleep. The limitations of this approach are also discussed by Hristovska et al. “Even though sleep episodes were, as anticipated, shorter than those observed in freely moving animals, changes in neuronal activity characteristic of NREM sleep were monitored by EEG recordings, and changes in morphodynamics were observed during single episodes. Several episodes of REM sleep were detected, but they were too short and rare to be analyzed reliably.” The unnatural sleep state would lead to an increase in the microarousal state, and ultimately a change in the structure of the sleep state, which may be the main reason for the difference in microglia behavior from our natural sleep. We have discussed this in the revised manuscript. Please see line 292298.

      1. Fig. 3: the authors used single-plane images to analyze the morphological changes over 3 or 6 hours of SD, which raises the concern that the processes imaged at the baseline may drift out of focus, leading to the dramatic reduction in process lengths, surveillance area, and number of branch points. In fact, a previous study (Bellesi M et al., J. Neurosci. 37(21): 5263-73, 2017) shows that after 8 h SD, the number of microglia process endpoints per cell and the summed process length per cell do not change significantly (although there is a trend to decline). The authors may confirm their findings by either 3D imaging in vivo, or 3D imaging in fixed tissue.

      Three lines of evidence indicate that microglia morphology changes in Fig 3 are due to SD, rather than variations in the focal plane. First, our single-plane images were quite stable over 3 or 6 hours of SD, though occasional reversible drifts might happen due to sudden motions. Second, per your suggestion, further experiments and analysis of 3D imaging were performed to monitor microglia dynamics during sleep deprivation. The new result is shown in revised Fig. S3 C-D: the length of microglia branches and the number of branching points were significantly reduced after SD, in agreement with the results of single-plane imaging. Furthermore, we detected no significant difference in microglia branching characteristics during 6h sleep deprivation in 2AR KO mice (Fig.S4), and this indirectly affirmed that singleplane imaging is stable enough for detecting true changes in branching during SD.

      1. Fig. 4b: the EEG and EMG signals look significantly different from the example given in Fig. 2a. In particular, the EMG signal appears completely flat except for the first segment of wake state; the EEG power spectrum for REM appears dark; and the wake state corresponds to stronger low frequency components (below ~ 4 Hz) compared to NREM, which is the opposite of Fig. 2a. This raises the concern whether the classification of sleep stage is correct here.

      Thank you for insightful comments. We carefully examined the behavioral video of Figure 4b, there were occasionally microarousal events indicated by slow head rotation during NREM sleep, while the companion EMG signals were completely flat, which is atypical during sleep wake cycle. The microarousal events were not excluded from sleep, which makes this set of data unrepresentative and contrary to Fig.4b. In our revised manuscript, we replaced it with more representative data that can clearly and consistently distinguish between different brain states in mice on EMG and EEG. Please see revised Fig.2a, page 34; revised Fig.4b, page 37.

      1. Fig. 4 NE dynamics. • How long is a single continuous imaging session for NE? • When monitoring microglia surveillance, the authors were able to identify wake or NREM states longer than 15 min, and REM states longer than 5 min. Here the authors selected wake/NREM states longer than 1 min and REM states longer than 30 s. What makes such a big difference in the time duration selected for analysis? • Also, the definition of F0 is a bit unclear. Is the same F0 used throughout the entire imaging session, or is it defined with a moving window?

      A single continuous session of NE imaging usually took about 1 hour. Subsequent analysis was performed on imaging data from each recording that included wake, NREM sleep, and REM sleep. Because of the different time scales of microglia morphological dynamic (relatively slow) and NE signals (fast), we used different time windows in the previous analysis in the previous version of the manuscript.

      Per your suggestion, we have now set the same time window selection criteria for both microglia morphological and NE dynamic analysis: for wake and NREM sleep durations longer than 1 minute, and REM sleep durations longer than 30 seconds. We updated the Methods and all statistics in related figures, please see line 151-154, 481-485, 490-492; Fig. 2e-g and 2l-n, page 34. F0 definition is now explained in the Methods section. Please see line 521-522.

      1. Fig. 5b: how does the microglia morphology in LC axon ablation mice compare with wild type mice under the wake state? The text mentioned "more contracted" morphology but didn't give any quantification. Also, the morphology of microglia in the wake state (Fig. 5b) appears very different from that shown in Fig. S3C1 (baseline). What is the reason?

      The morphology of microglia is indeed heterogeneous and variable, affected by factors including brain state, brain region, microenvironmental changes, along with animal-to-animal difference. We didn’t perform the microglia morphology comparison between the LC axon ablation mice and wild type mice and, in view of this, we removed the description of “more contracted morphology” from the main text. It should also be noted that, as we primarily focused on changes of a microglia in different states over time by selfcomparison, we minimized possible effects of heterogeneity in microglia morphology on our conclusions.

      1. The relationship between NE level and microglia dynamics. Fig. 4C shows that the extracellular NE level is the highest in the wake state and the lowest in REM. Previous studies (Liu YU et al., Nat. Neurosci. 22(11):1771-1781, 2019; Stowell RD et al., Nat. Neurosci. 22(11): 1782-1792, 2019) suggest that high NE tone corresponds to reduced microglia complexity and surveillance. Hence, it would be expected that microglia process length, branch point number, and area/volume are higher in REM than in NREM. However, Fig. 2l-n show the opposite. How should we understand this ?

      Your point is well-taken. On the one hand, our data clearly showed that NE is critically involved in the brain state-dependent microglia dynamic surveillance, with evidence from the ablation of the LC-NE projection and from the β2AR knockout animal model.

      On the other hand, we also understand that NE is not the sole determinant, so the relationship between the NE level and the complexity and surveillance may not be unique.

      In this regard, other potential modulators also present dynamic during sleepwake cycle and may partake in the regulation of microglia dynamic surveillance. previous studies (Liu YU et al., 2019; Stowell RD et al., 2019) have shown that microglia can be jointly affected by surrounding neuronal activity and NE level during wake. It has been reported that LC firing stops (Aston-Jones et al., 1981; Rasmussen et al., 1986), while inhibitory neurons, such as PV neurons and VIP neurons, become relatively active during REM sleep (Brécier et al., 2022). ATP level in basal forebrain is shown to be higher in REM than NREM (Peng et al., 2023). In addition, our own preliminary result (Author response image 1) also showed a higher adenosine level in REM than NREM in somatosensory cortex. Last but not the least, we found that β2AR knockout failed to abolish microglial responses to sleep state switch and SD stress altogether.

      In brief, microglia are highly sensitive to varied changes in the surrounding environment, and many a modulator may participate in the microglia dynamic during sleep state. This may underlie the microglia complexity difference between REM and NREM. Future investigations are warranted to delineate the signal-integrative role of microglia in physiology and under stress. We have discussed the pertinent points in the revised manuscript. Please see line 343-354.

      Author response image 1.

      Extracellular adenosine levels in somatosensory cortex in different brain states. AAV2/9-hSyn-GRABAdo1.0 (Peng W. et al., Science. 2020) was injected into the somatosensory cortex (A/P, -1 mm; M/L, +2 mm; D/V, -0.3 mm). Data from the same recording are connected by lines. n = 9 from 3 mice.

      Reviewer #2 (Public Review):

      The manuscript describes an approach to monitor microglial structural dynamics and correlate it to ongoing changes in brain state during sleep-wake cycles. The main novelty here is the use of miniaturized 2p microscopy, which allows tracking microglia surveillance over long periods of hours, while the mice are allowed to freely behave. Accordingly, this experimental setup would permit to explore long-lasting changes in microglia in a more naturalistic environment, which were previously not possible to identify otherwise. The findings could provide key advances to the research of microglia during natural sleep and wakefulness, as opposed to anesthesia. The main findings of the paper are that microglia increase their process motility and surveillance during REM and NREM sleep as compared to the awake state. The authors further show that sleep deprivation induces opposite changes in microglia dynamics- limiting their surveillance and size. The authors then demonstrate potential causal role for norepinephrine secretion from the locus coeruleus (LC) which is driven by beta 2 adrenergic receptors (b2AR) on microglia. However, there are several methodological and experimental concerns which should be addressed.

      The major comments are summarized below:

      1. The main technological advantage of the 2p miniaturized microscope is the ability to track single cells over sleep cycles. A main question that is unclear from the analysis and the way the data is presented is: are the structural changes in microglia reversible? Meaning, could the authors provide evidence that the same cell can dynamically change in sleep state and then return to similar size in wakefulness? The same question arises again with the data which is presented for anesthesia, is this change reversible?

      As revealed by long-term free behavioral mTPM imaging, the brain-statedependent morphological changes in microglia were reproducible and reversible. Author response image 2 shows that microglia displayed reversible dynamic changes during multiple rounds of sleep-wake transition. Author response image 3 shows that microglia dynamics induced by anesthesia also exhibited reversibility.

      Author response image 2.

      Long-term tracking of microglia process area in different brain states. Data analysis used 8 cells. Data total of 31 time points were selected from in vivo imaging data and were used to characterize the morphological changes of microglia over a continuous 7-hour period.

      Author response image 3.

      Reversible changes of microglial process length, area, number of branch points under anesthesia. Wake group: 30 minute-accommodation to new environment; Isoflurane group: 1.5% in air applied at a flow rate of 0.4 L/min for 30 minutes; Recovery group: 30 minutes after recovery from anesthesia. n = 9 cells from 3 mice for each group.

      1. The binary comparison between brain states is misleading, shouldn't the changes in structural dynamics compared to the baseline of the state onset? The authors method describes analysis of the last 5 minutes in each sleep/wake state. However, these transitions are directional- for instance, REM usually follows NREM, so the description of a decrease in length during REM sleep could be inaccurate.

      As you know, the time scale of microglia morphological dynamic is relatively slow, so we analyzed the microglia morphological dynamic of the last part (30s in the revised manuscript) of each state instead of the state onset, allowing time for stabilization of the microglia response to inter-state transition.

      Further, we compared microglia dynamic between two NREM groups transiting to different subsequent states: group1 (NREM to REM) vs group2 (NREM to Wake). This precaution was to exclude the directional effect of state transitions. Our results showed that there was no difference in microglial length, area, number of branching points between the two NREM groups (Author response image 4), indicating that the last 30s of each NREM was not affected by its following state and that it’s reasonable to perform binary comparison.

      Author response image 4.

      Microglial morphological length, area change, and number of branch points of the last 30s of NREM sleep followed by REM or Wake. n = 9 cells from 3 mice for each group.

      1. Sleep deprivation- again, it is unclear whether these structural changes are reversible. This point is straightforward to address using this methodology by measuring sleep following SD. In addition, the authors chose a method to induce sleep deprivation that is rather harsh. It is unclear if the effect shown is the result of stress or perhaps an excess of motor activity.

      We adopted the method of forced exercise as it has been commonly used for sleep deprivation (Pandi-Perumal et al., 2007; Nollet M et al., 2020), though it does have the potential limitation of excess of motor activity.

      In light of your comments and suggestion, we presented new data demonstrating that sleep duration of the mice, mostly NREM sleep, increased compensatively (ZT9-10) after the 6-hour sleep deprivation (ZT2-8) (revised Fig. S3B). This result shows that sleep deprivation indeed increase sleep pressure in the mice. As the sleep pressure was eased during recovery sleep, morphological changes of microglia were reversed over a timescale of several hours (revised Fig. S3 E-J).

      1. The authors perform measurements of norepinephrine with a recently developed GRAB sensor. These experiments are performed to causally link microglia surveillance during sleep to norepinephrine secretion. They perform 2p imaging and collect data points which are single neurons, and it is unclear why the normalization and analysis is performed for bulk fluorescence similar to data obtained with photometry.

      We did not perform single-neuron analysis for two reasons. First, our experimental conditions, e.g., the expression of the NE indicator and the control of imaging laser intensity, did not yield sufficient signal-to-noise to clearly discriminate individual neurons with two-photon imaging. Second, NE signal may play a modulatory role, and fluorescence changes appeared to be global, rather than local or cell-specific. Therefore, we analyzed fluorescence changes in different brain states over the whole field-of-view in Fig. 4, rather than at the subregional or single-cell level.

      1. The experiments involving b2AR KO mice are difficult to interpret and do not provide substantial mechanistic insight. Since b2AR are expressed throughout numerous cell types in the brain and in the periphery, it is entirely not clear whether the effects on microglia dynamics are direct. The conclusion and the statement regarding the expression of b2AR in microglia is not supported by the references the authors present, which simply demonstrate the existence and function of b2AR in microglia. In addition, these mice show significant changes in sleep pattern and increased REM sleep. This could account for reasons for the changes in microglia structure rather than the interpretation that these are direct effects.

      To summarize, the main conclusions of the paper require further support with analysis of existing data and experimental validation.

      Previous studies have revealed that norepinephrine (NE) has a modulating effect on microglial dynamics through β2AR pathway (Stowell RD et al., 2019; Liu YU et al., 2019). Stowell et al. and Liu et al. use in vivo two-photon imaging to demonstrate that microglia dynamics differ between awake and anesthetized mice and to highlight the roles of NE and β2AR in these states (Gyoneva S et al., 2013; Stowell RD et al., 2019; Liu YU et al., 2019). To evaluate the direct effect of β2AR on microglial dynamics, Stowell et al. administered the β2AR agonist clenbuterol to anesthetized mice and found that this decreased the motility, arbor complexity, and process coverage of microglia in the parenchyma (Stowell RD et al., 2019). Inhibition of β2AR by antagonist ICI-118,551 in awake mice recapitulated the effects of anesthesia by enhancing microglial arborization and surveillance (Stowell RD et al., 2019). In addition, it has been shown microglia expressed higher numbers of β2ARs than any other cells in the brain (Zhang et al., 2014).

      To this end, our current work provided new evidence to support the involvement of the LC-NE-β2AR axis in modulating microglia dynamics both during natural sleep-wake cycle and under SD stress. While we were aware the limitation of using pan-tissue β2AR knockout model that precluded us from pinpointing role of microglial β2AR, it is safe to state that β2-adrenergic receptor signaling plays a significant role in the sleep-state dependent microglia dynamic surveillance, based on the present and previous data.

      We have discussed this in the revised manuscript. Please see line 324-354. As you suggested, we added references to support the statement regarding the expression of β2AR in microglia (please see line 333).

      Recommendations for the authors: please note that you control which, if any, revisions, to undertake

      Reviewer #1 (Recommendations For The Authors):

      Some technical details need to be clarified. Also, please double-check for typos.

      1. In vivo imaging preparation: how long is the recovery time between window/EEG implantation surgery and imaging/recording?

      Imaging data were collected one month after the surgery. We have added descriptions to the methods section of the revised manuscript. Please see line 419.

      1. Statistical analysis: the authors used t-test or ANOVA without first checking whether the data pass the normality test. If the data does not follow a normal distribution, nonparametric tests would be more appropriate.

      Per your suggestion, we performed the test of statistical significance using parametric (ANOVA) if past the normality test, or the non-parametric (Friedman) tests for non-normal data. Please see line 533-535.

      1. Fig. 1b needs a minor change. In the figure, the EMG electrodes appear to be connected to the brain as well.

      We have corrected this oversight. Thank you.

      1. Fig. 1c: it would be helpful to give examples of raw EEG and EMG traces for REM and NREM separately.

      Raw traces are now shown as suggested. Please see Fig. 1c, page 32.

      1. Fig. 1h: is each data point one microglia or one end-point?

      In Fig. 1h, each data represents the average speed of all branches of one microglia, not one end-point.

      1. Sleep deprivation starts at 9 am. What time corresponds to Zeitgeber Time 0 (ZT0, the beginning of the light phase)?

      We now clarified that 9 am corresponds to Zeitgeber time 2. Please see line 196.

      1. Line 61: the authors referred to Ramon y Cajal's original suggestion that microglia dynamics are coupled to the sleep-wake cycle. However, the cited paper only indicates that Cajal suggested a role of astrocytes in the sleep-wake cycle, not microglia. In addition, there is a typo in the line: there should be a space between "Ramon" and "y" in Cajal's name.

      We have updated the statement and reference literature to point out the microglia’s involvement in the sleep-wake cycle. The typo was corrected. Please see line 64-65.

      1. Fig. S3B: As each group has only 3 mice, it is unclear how t-test can yield p < 0.01 or even 0.001.

      We checked the original data again and it was correct. This small p-values may be due to the small intra-group difference of control group.

      1. Line 251-253, "Figure 4h-n" should be "Figure 5h-n"?

      We have revised it. Please see line 265-266.

      1. Fig. 5h: the receptor should be "adrenergic receptor", not "adrenal receptor".

      We changed the term to “adrenergic receptor”. Please see Fig 5h.

      1. Fig. 5g, n: the number of data points is apparently less than the sample size given in the figure legend. Perhaps some data points have exactly the same value so they overlap? The authors may consider plotting identical values with a slight shift so that the number of data points shown matches the actual sample size, to avoid confusion.

      Yes, we have added small jitters so different data points can be seen to avoid confusion. Please see Fig. 5n.

      1. There are some typos (e.g., Line 217, "he" should be "the") and some incomplete references (e.g., [13], [22], [34], [35] lack volume and page number, [15] and [39] lack publisher information). Some references have inconsistent formats (e.g., "Journal of Neuroscience" is sometimes abbreviated and sometimes not). Please correct these.

      We have corrected these oversights. Please see references, page 27.

      Reviewer #2 (Recommendations For The Authors):

      Major issues:

      1. Re-analyze the data in a manner that allows to follow and compare the same cells over different state transitions. This is necessary to evaluate the reversibility of microglia structure. In addition, consider analysis of the change from the beginning to the end of each state.

      As shown in response figure 2, microglia dynamics were reversible during multiple rounds of sleep-wake transition.

      1. It would be nice to see the raw data obtained over time, at least for Figure 1, before offline correction of movement to evaluate the imaging quality and level of drift during imaging.

      We agree to your good suggestion. Please see the supporting material video.

      1. It would be helpful to add an analysis of the percent time spent in each state for the 10 hour recordings.

      Advice has been adopted. Please see revised Fig. S4C.

      1. In Figure 2 the results are from 15 cells from several animals. How much do the results vary between mice? It will be helpful to show if this varies between different mice by labeling cells from each mouse differently.

      In Author response image 5, in which we have labeled the distribution of data points from seven mice, there was mixed distribution of data from different animals at each brain state, but no clear animal-to-animal difference.

      Author response image 5.

      Quantitative analysis of microglial length based on multi-plane microglial imaging. n = 17 cells from 7 mice for each group. In right panel, each color codes data from the same animal.

      1. SD- please add some quantification for sleep and EEG to show that the manipulation really caused sleep deprivation. To address the confound of forced movement and stress, it might be helpful to add quantification of movement compared to an undisturbed wakefulness.

      We have added related data (revised Fig. S3B), as suggested. Please see line 196-197.

      1. The DSP4 application should be also performed with NE measurements to verify the specific of the NE signal measured as well as the DSP4 toxin.

      Following your suggestion, we have added DSP4 data in revised Fig. S4B.

      1. Some suggested refined experiments for the b2AR KO are: a-A conditional b2AR KO in microglia, as cited in the work. b- Local application of a b2 blocker during SD. c- Imaging of NE dynamics in the b2 animals. If NE dynamics during natural sleep cycle are perturbed, then this suggests upstream mechanisms rather than direct microglia effects as suggested by the authors.

      We agree that the current study cannot pinpoint a direct effect of microglia harbored β2AR. We have discussed this limitation in the revised manuscript.

      Please see line 324-354.

      Minor:

      1. Typo on page 4 (microcopy instead of microscopy).

      It was corrected. Please see line 87.

      1. Typo page 11- 'and he largest changes in NE' - supposed to be 'the'.

      We have corrected these mistakes. Please see line 228.

      1. Fig. 4- there are several units missing in the figure in panel b: the top is Hz, but what does the color bar indicate exactly? 2 what? both for theta/delta and for NE. We have modified this figure and legend for clarity. Please see Fig. 4, page 37.

      2. Bottom of page 12- referring to figure 4 but talking about figure 5.

      The typo was corrected. Please see line 265-266.

      Reference

      1. Aston-Jones G, Bloom FE. Activity of norepinephrine-containing locus coeruleus neurons in behaving rats anticipates fluctuations in the sleep-waking cycle. J Neurosci. 1, 876–886 (1981).

      2. Bellesi M, de Vivo L, Chini M, Gilli F, Tononi G, Cirelli C. Sleep loss promotes astrocytic phagocytosis and microglial activation in mouse cerebral cortex. J Neurosci. 37, 5263–5273 (2017).

      3. Brécier A, Borel M, Urbain N, Gentet LJ. Vigilance and behavioral state-dependent modulation of cortical neuronal activity throughout the sleep/wake cycle. J Neurosci. 42, 4852–66 (2022).

      4. Dworak M, McCarley RW, Kim T, Kalinchuk AV, Basheer R. Sleep and brain energy levels: ATP changes during sleep. J Neurosci. 30, 9007-16 (2010).

      5. Gyoneva S., Traynelis SF. Norepinephrine modulates the motility of resting and activated microglia via different adrenergic receptors. J Biol Chem. 288, 15291302 (2013).

      6. Kjaerby C, Andersen M, Hauglund N, Untiet V, Dall C, Sigurdsson B, Ding F, Feng J, Li Y, Weikop P, Hirase H, Nedergaard M. Memory-enhancing properties of sleep depend on the oscillatory amplitude of norepinephrine. Nat Neurosci. 25, 1059–1070 (2022).

      7. Liu T, Lu J, Lukasiewicz K, Pan B, Zuo Y. Stress induces microglia-associated synaptic circuit alterations in the dorsomedial prefrontal cortex. Neurobiology of Stress. 15, 100342 (2021).

      8. Liu YU, Ying Y, Li Y, Eyo UB, Chen T, Zheng J, Umpierre AD, Zhu J, Bosco DB, Dong H, Wu LJ. Neuronal network activity controls microglial process surveillance in awake mice via norepinephrine signaling. Nat Neurosci. 22, 1771–1781 (2019).

      9. Nollet M, Wisden W, Franks NP. Sleep deprivation and stress: a reciprocal relationship. Interface Focus. 10, 20190092 (2020).

      10. Pandi-Perumal SR, Cardinali DP, Chrousos GP. 2007. Neuroimmunology of sleep. New York, NY: Springer.

      11. Peng W, Liu X, Ma G, Wu Z, Wang Z, Fei X, Qin M, Wang L, Li Y, Zhang S, Xu M. Adenosine-independent regulation of the sleep-wake cycle by astrocyte activity. Cell Discov. 9, 16 (2023).

      12. Peng W, Wu Z, Song K, Zhang S, Li Y, Xu M. Regulation of sleep homeostasis mediator adenosine by basal forebrain glutamatergic neurons. Science. 369, 6508 (2020).

      13. Rasmussen K, Morilak DA, Jacobs BL. Single unit activity of locus coeruleus neurons in the freely moving cat: I. During naturalistic behaviors and in response to simple and complex stimuli. Brain Research. 371, 324–334 (1986).

      14. Stowell RD, Sipe GO, Dawes RP, Batchelor HN, Lordy KA, Whitelaw BS, Stoessel MB, Bidlack JM, Brown E, Sur M, Majewska AK. Noradrenergic signaling in the wakeful state inhibits microglial surveillance and synaptic plasticity in the mouse visual cortex. Nat Neurosci. 22, 1782-1792 (2019).

      15. Umpierre AD, Bystrom LL, Ying Y, Liu YU, Worrell G, Wu LJ. Microglial calcium signaling is attuned to neuronal activity in awake mice. Elife. 27, e56502 (2020).

      16. Wang Z, Fei X, Liu X, Wang Y, Hu Y, Peng W, Wang YW, Zhang S, Xu M. REM sleep is associated with distinct global cortical dynamics and controlled by occipital cortex. Nat Commun. 13, 6896 (2022).

      17. Zhang Y, Chen K, Sloan SA, Bennett ML, Scholze AR, O’Keeffe S, Phatnani HP, Guarnieri P, Caneda C, Ruderisch N, Deng S, Liddelow SA, Zhang C, Daneman R, Maniatis T, Barres BA, Wu JQ. An RNA-sequencing transcriptome and splicing database of glia, neurons, and vascular cells of the cerebral cortex. J Neurosci. 34, 11929–11947 (2014).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      The paper proposes an interesting perspective on the spatio-temporal relationship between FC in fMRI and electrophysiology. The study found that while similar network configurations are found in both modalities, there is a tendency for the networks to spatially converge more commonly at synchronous than asynchronous time points. However, my confidence in the findings and their interpretation is undermined by an apparent lack of justification for the expected outcomes for each of the proposed scenarios, and in the analysis pipeline itself.

      Main Concerns

      (1) Figure 1 makes sense to me conceptually, including the schematics of the trajectories, i.e.

      Scenario 1: Temporally convergent, same trajectories through connectome state space

      Scenario 2: Temporally divergent, different trajectories through connectome state space

      However, based on my understanding I am concerned that these scenarios do not necessarily translate into the schematic CRP plots shown in Figure 2C, or the statements in the main text:

      For Scenario 1: "epochs of cross-modal spatial similarity should occur more frequently at on-diagonal (synchronous) than off-diagonal (asynchronous) entries, resulting in an on-/off-diagonal ratio larger than unity"

      For Scenario 2: "epochs of spatial similarity could occur equally likely at on-diagonal and off-diagonal entries (ratio≈1)"

      Where do the authors get these statements and the schematics in Figure 2C from? Are they based on previous literature, theory, or simulations?

      I am not convinced based on the evidence currently in the paper, that the ratio of off- to on-diagonal entries (and under what assumptions) is a definitive way to discriminate between scenarios 1 and 2.

      For example, what about the case where the same network configuration reoccurs in both modalities at multiple time points? It seems to me that one would get a CRP with entries occurring equally on the on-diagonal as on the off-diagonal, regardless of whether the dynamics are matched between the two modalities or not (i.e. regardless of scenario 1 or 2 being true).

      This thought experiment example might have a flaw in it, and the authors might ultimately be correct, but nonetheless, a systematic justification needs to be provided for using the ratio of off- to on-diagonal entries to discriminate between scenarios 1 and 2 (and under what assumptions it is valid).

      In the absence of theory, a couple of ways I can think of to gain insight into this key aspect are:

      (1) Use surrogate data for scenarios 1 and 2:

      a. For scenario 1: Run the CRP using a single modality. E.g. feed in the EEG into the analysis as both modality 1 AND modality 2. This should provide at least one example of CRP under scenario 1 (although it does not ensure that all CRPs under this scenario will look like this, it is at least a useful sanity check)

      b. For scenario 2: Run the CRP using a single modality plus a shuffled version. E.g. feed in the EEG into the analysis as both modality 1 AND a temporally shuffled version of the EEG as modality 2. The temporal shuffling of the EEG could be done by simply splitting the data into blocks of say ~10s and then shuffling them into a new order. This should provide a version of the CRP under scenario 2 (although it does not ensure that all CRPs under this scenario will look like this, it is at least a useful sanity check).

      (2) Do simulations, with clearly specified assumptions, for scenarios 1 and 2. One way of doing this is to use a simplified (state-space) setup and randomly simulate N spatially fixed networks that are independently switching on and off over time (i.e. "activation" is 0 or 1). Note that this would result in a N-dimensional connectome state space.

      The authors would only need to worry about simulating the network activation time courses, i.e. they would not need to bother with specifying the spatial configuration of each network, instead, they would make the implied assumption that each of these networks has the same spatial configuration in modality 1 and modality 2.

      With that assumption, the CRP calculation should simply correspond to calculating, at each time i in modality 1 and time j in modality 2, the number of networks that are activating in both modality 1 and modality 2, by using their activation time courses. Using this, one can simulate and compute the CRPs for the two scenarios:

      a. Scenario 1: where the simulated activation timecourses are set to be the same between both modalities

      b. Scenario 2: where the simulated activation timecourses are simulated separately for each of the modalities

      We thank the reviewer for raising this important matter as it directly relates to our study hypothesis. To address this point, we chose to focus on the first of the two alternative suggestions of the reviewer, as it provides evidence based on empirical data. In line with the reviewer’s suggestion 1, recurrence plots have indeed been previously applied to connectome dynamics data from the same modality [Hansen et al., NeuroImage 2015; Fig. 2B]. As shown in the referenced study, where the recurrence plot has been estimated within fMRI connectome dynamics, the on-diagonal entries have noticeably larger correlation values in comparison to off-diagonal entries. As the authors state, this contrast emphasizes the autocorrelation of connectome dynamics in their single modality recurrence plot. Extending these findings to our cross-modal recurrence plots, more synchronicity of connectome dynamics across fMRI and EEG will -by theory- translate into stronger correlation values along the diagonal axis as it represents neighboring timepoints in the data. On the other hand, less cross-modal synchronicity translates to a lack of such correlation prevalence along the diagonal axis.

      Complementing these statements with empirical data, Author response image 1 shows the fMRI-to-iEEG and fMRI-to-fMRI CRPs side by side as suggested by the reviewer. For simplicity, we thresholded each CRP at the top 5% of entries and calculated their corresponding on-/off-diagonal ratios. The on/off-diagonal ratio for fMRI-to-fMRI CRP was 4.32 ± 6.26 across -5 to +5 TR lags (with a maximum of 16.56 at a lag of one TR), while this value was 1.00 ± 0.31 for fMRI-to-iEEG CRP. Thus, it becomes apparent that synchronicity of connectome dynamics directly translates to the on-/off-diagonal ratio in CRP.

      Author response image 1.

      Sample CRP shown for a subject for comparing two cases: fMRI-to-iEEG (left) and fMRI-to-fMRI (right). The comparison shows that in the presence of genuine synchronous connectome dynamics, as expected for the within-molality case (right panel), the on-/off-diagonal ratio is expected to show noticeably higher values. This figure establishes a strong link between our proposed metric of on-/off-diagonal ratio and the extent of synchronicity of connectome dynamics.

      Author response image 2.

      On-/off-diagonal ratio in the fMRI-to-fMRI recurrence plot is considerably higher than the cross-modal fMRI-to-iEEG case. Horizontal axis shows the lag where the metric was calculated in the CRP. The bars reflect the group average metric while the whickers show standard deviation. Note that for the within-modality case, ratio is not defined at lag zero because of identical connectome frames.

      (2) Choices in the analysis pipeline leading up to the computation of FC in fMRI or EEG will affect the quality of information available in the FC. For example, but not only, the choice of parcellation (in the study, the number of parcels is very high given the number of EEG sensors). I think it is important that we see the impact of the chosen pipeline on the time-averaged connectomes, an output that the field has some idea about what is sensible. This would give confidence that the information being used in the main analyses in the paper is based on a sensible footing and relates to what the field is used to thinking about in terms of FC. This should be trivial to compute, as it is just a case of averaging the time-varying FCs being used for the CRP over all time points. Admittedly, this approach is less useful for the intracranial EEG.

      We agree with the reviewer on ensuring that the time-averaged FC aligns with expectations of the field and prior work. For this reason, our supplementary analysis already included an analysis that replicates the well-established (albeit modest) spatial similarity between fMRI static connectome and EEG/iEEG static connectomes:

      “In scalp EEG-fMRI data, cross-modal spatial (2D) Pearson correlation of group-level time-averaged connectomes between fMRI and EEG-FCAmp or fMRI and EEG-FCPhase were calculated across all frequency bands. The average spatial correlation value across frequency bands r = 0.28 and r = 0.28 for EEG-FCAmp and EEG-FCPhase, respectively. The spatial correlation values across all frequency bands and connectivity measures were significantly higher than the corresponding null distributions generated by phase-permuted group-level fMRI-FC spatial organization (p<0.005; 200 repetitions; FDR-corrected at q<0.05 for the number of frequency bands). …. Of note, the small effect sizes are strongly in line with prior literature (Hipp and Siegel, 2015; Wirsich et al., 2017; Betzel et al., 2019) and may point to possible divergence in the dynamic domain as investigated in the main manuscript.”

      This replication directly confirms the validity of our selected atlas for further investigations into the connectome dynamics. We acknowledge that with 64 EEG channels, one can only estimate a relatively coarse connectome. Among the well-known coarse atlases, we chose the Desikan-Killiany atlas as it is based on anatomical features, eliminating possible biases towards a particular functional data modality. Moreover, this atlas has been commonly used for multimodal functional connectivity studies, facilitating the confirmation of prior findings in the time-averaged domain [Deligianni et al. Front. Neurosci 2104, Wirsich et al. NeuroImage, 2020, Wirsich et al., NeuroImage 2021].

      (3) Leakage correction. The paper states: "To mitigate this issue, we provide results from source-localized data both with and without leakage correction (supplementary and main text, respectively)." Given that FC in EEG is dominated by spatial leakage (see Hipp paper), then I cannot see how it can be justified to look at non-spatial leakage correction results at all, let alone put them up front as the main results. All main results/figures for the scalp EEG should be done using spatial leakage-corrected EEG data.

      We agree that relying on leakage-uncorrected scalp EEG alone would be problematic. It is for this reason that the intracranial data constructs the core of our results, emphasizing that the observed multiplex architecture of connectomes is indeed present in the absence of source leakage. Only when this finding is established in the intracranial EEG, do we provide the scalp EEG data as a generalization to whole-cortex coverage connectomes of healthy subjects. Moreover, it is known that existing source-leakage correction algorithms may inadvertently remove some of the genuine zero-lag connectivity. For instance, Finger and colleagues have shown that the similarity of functional connectivity to structural connectivity diminishes after correction for source-leakage (Finger et. al, PLOS Comp. Biol. 2016). Therefore, we have deliberately chosen to include our generalization findings before source-leakage correction (main text) as well as after source-leakage correction reflecting a more stringent approach (supplementary analysis). Importantly, our conclusions hold true for both before and after source-leakage correction.

      Reviewer #2 (Public Review):

      Summary:

      The study investigates the brain's functional connectivity (FC) dynamics across different timescales using simultaneous recordings of intracranial EEG/source-localized EEG and fMRI. The primary research goal was to determine which of three convergence/divergence scenarios is the most likely to occur.

      The results indicate that despite similar FC patterns found in different data modalities, the time points were not aligned, indicating spatial convergence but temporal divergence.

      The researchers also found that FC patterns in different frequencies do not overlap significantly, emphasizing the multi-frequency nature of brain connectivity. Such asynchronous activity across frequency bands supports the idea of multiple connectivity states that operate independently and are organized into a multiplex system.

      Strengths:

      The data supporting the authors' claims are convincing and come from simultaneous recordings of fMRI and iEEG/EEG, which has been recently developed and adapted.

      The analysis methods are solid and involve a novel approach to analyzing the co-occurrence of FC patterns across modalities (cross-modal recurrence plot, CRP) and robust statistics, including replication of the main results using multiple operationalizations of the functional connectome (e.g., amplitude, orthogonalized, and phase-based coupling).

      In addition, the authors provided a detailed interpretation of the results, placing them in the context of recent advances and understanding of the relationships between functional connectivity and cognitive states.

      Weaknesses:

      Despite the impressive work, the paper still lacks some analyses to make it complete.

      Firstly, the effect of the window size is unclear, especially in the case of different frequencies where the number of cycles that fall in a window will vary drastically. A typical oscillation lasts just a few cycles (see Myrov et al., 2024), and brain states are usually short-lived because of meta-stability (see Roberts et al., 2019).

      We now replicate our results with an additional window size. Please see section “Recommendations for the authors”.

      Secondly, the authors didn't examine frequencies lower than 1Hz despite similarities between fMRI and infra-slow oscillations found in prior literature (see Palva et al., 2014; Zhang et al., 2023).

      We address this issue below. Please see section “Recommendations for the authors”.

      On a minor note, the phase-locking value (PLV) is positively biased for EEG data (see Palva et al., 2018) and a different metric for phase coupling could be a more appropriate choice (e.g., iPLV/wPLI, see Vinck et al., 2011).

      While iPLV and wPLI are not positively biased, they may reduce genuine zero-phase connectivity as they were initially designed to address spurious zero-phase connectivity from source leakage in scalp EEG. Indeed, PLV connectivity is shown to be more strongly correlated with structural connectivity than wPLI and other phase coupling methods [Finger et al., PLOS Comp. Biol. 2016], emphasizing that it contains genuine connectivity that may be lacking when zero-phase connectivity is removed. We chose PLV because it is a widely used functional connectivity metric, particularly in intracranial data where source leakage is not a critical concern. Thus, using PLV facilitates cross-study comparisons including to our prior work [e.g. Mostame et al. NeuroImage 2020, Mostame et al. J Neurosci 2021].

      The repository with the code is also unavailable.

      Thank you for bringing this to our attention. We have now made our repository publicly accessible at: https://github.com/connectlab/Mostame2024_Multiplex_iEEG_fMRI.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      The window widths used to compute FC as a function of time are an important aspect, so I feel that this should be briefly described up-front in the main Results text.

      Methods. "Finally, to compensate for the time lag between hemodynamic and neural responses of the brain (Logothetis et al., 2001), we shifted the fMRI-FC time course 6 seconds backwards in time." What about the effects of temporal blurring from the HRF? Do we need to care about that?

      We agree with the importance to investigate the effect if temporal blurring of the HRF. The main text already included a replication of findings from CRPs generated using fMRI data and EEG amplitude signals convolved with the canonical HRF. This method serves as an alternative to the 6-second shifting. Both approaches produced similar results.

      Methods. In fMRI connectome computation it is common to look at partial correlation rather than full correlation. Partial correlation focuses more on direct connections. It would be good if the paper acknowledged and justified why it is OK to use full correlation.

      We have now added a brief explanation in this regard in the main text (Methods section) as follows:

      “In fMRI connectome computation, some prior work has used partial correlation instead of full correlation. Partial correlation emphasizes direct connections by calculating correlation between any pair of bran regions after regressing out the timeseries of all other regions. However, we have opted to use full correlation because this permits interpretation of our outcomes in the context of the vast existing literature that uses full correlations in fMRI including the majority of bimodal (EEG-fMRI) connectome studies (e.g. Tagliazucchi et al., 2012; Deligianni et al., 2014; Wirsich et al., 2017b, 2020, 2021; Allen et al., 2018).”

      The paper should relate the results to findings showing clear links between simultaneously recorded EEG and fMRI beyond FC. E.g. Mantini (PNAS) 2007 and Van De Ville (PNAS) 2010 to name two.

      In line with this important point, we have extended the existing discussion section that compares our outcomes to EEG-fMRI beyond functional connectivity:

      “Prior multi-modal studies of neural dynamics have predominantly aimed at methodologically cross-validating hemodynamic and electrophysiological observations, thus focusing on their convergence. These important foundational studies include e.g., the cross-modal comparison of region-wise (Mukamel et al., 2005; Nir et al., 2007) or ICN-wise (Mantini et al., 2007) activity fluctuations, instantaneous activity maps (Hunyadi et al., 2019; Zhang et al., 2020) or EEG microstates (Van de Ville 2010), infraslow connectome states (Abreu et al., 2020), or connection-wise FC including studies in the iEEG-fMRI and scalp EEG-fMRI data used in the current study (Ridley et al., 2017; and Wirsich et al., 2020, respectively). In contrast to this prior work, the current study investigated the highly time-resolved cross-modal temporal relationship at the level of FC patterns distributed over all available pairwise connections, and found a connectome-level temporal divergence. The discrepancy between temporal divergence in our study and convergence in prior studies implies that infraslow fluctuations of activity in individual regions or of FC in individual region-pairs observable in both modalities (prior studies) are neurally distinct from connectome-wide FC dynamics observable separately in each modality (current study). Indeed, we confirmed the existence of infraslow electrophysiological FC dynamics driving cross-modal temporal associations at the level of individual connections (Fig. S3) …”

      Reviewer #2 (Recommendations For The Authors):

      (1) Check different window sizes and stability of the FC patterns as a function of it.

      We thank the reviewer for the helpful feedback. We agree that the window size could possibly affect the estimation of individual connectome frames, particularly given that neural processes unfold at hundreds of milliseconds rather than seconds. However, we expect that the asynchronous nature of cross-modal convergence observed in our data would remain intact regardless of the specific window length used for FC calculations. To confirm this, we replicated some of our main analyses in the iEEG-fMRI data with a window length of 500ms (as opposed to 3s, equivalent to one TR) as follows:

      First, we showed that changing the window length does not substantially impact the overall architecture of the connectomes (Author response image 3). Particularly, the time-averaged connectome patterns across different frequency bands were all strongly correlated between the two analyses (500ms and 3s window lengths).

      Author response image 3.

      Time-averaged connectome patterns are highly replicable when calculated using 3s or 500ms window lengths. Horizontal axis represents frequency bands, while each dot represents a subject. Vertical axis shows 2D Pearson correlation of the two connectomes. The group average within each frequency band is marked by a horizontal line.

      Second, we replicated our major findings of CRP and its on-/off-diagonal ratio in the iEEG-fMRI dataset using a window length of 500ms for FC calculations. Indeed, the data does not show a substantial difference in the on-/off-diagonal ratios of the CRP entries between the 3s and 500ms window lengths. Specifically, the ratio was equal to 1.02 ± 0.07 for 500ms window length, emphasizing absence of significant temporal convergence of the connectome dynamics (see Author response image 4). A paired t-test between group-averaged ratios across different lags confirms a lack of significant difference between the two analyses (p= 0.50). This finding further emphasizes the genuine asynchronous nature of connectome dynamics across the neural timescales measured in fMRI and electrophysiology. We have added this analysis to the supplementary data.

      Author response image 4.

      On-/off-diagonal ratio is shown across lags for both analyses: 3s window length (blue) and 500ms window length (red). Each bar shows the mean across subjects, while the whiskers show the corresponding standard deviations.

      (2) Try to decrease the lowest frequency of the analysis below 1Hz or just compute it for multiple log-spaced frequencies from infra-slow delta to high-gamma band.

      Thank you for pointing out this matter. We do not expect considerable signal in the frequency range below the current lower bound of delta (1Hz) because as in most other EEG recordings, EEG was not recorded in DC setting and has a hardware high-pass filter of 0.1Hz. Nonetheless, we investigated the power spectral density of our iEEG-fMRI data and found that there is indeed little signal power left in the available infraslow range [0.5 – 1 Hz] after the preprocessing steps (Author response image 5).

      Author response image 5.

      Power spectral density of all subjects in the fMRI-iEEG dataset shows lack of sufficient power in the infraslow range. Infraslow range signals are almost always filtered out during recording unless the recording setup includes a DC amplifier. The infraslow signal of EEG that is often considered correlated with the fMRI signals in the literature most commonly are extracted from the slow-changing envelope of the bandlimited signals, like envelope of gamma oscillations.

      Accordingly, when the iEEG signals are filtered within the range of [0.5, 1], there is little signal variation observed in the signal timeseries, contrasting the adjacent delta band signal (Author response image 6). Importantly, the power envelope of the delta band (and all other canonical bands not shown here) comprise major fluctuations in the infraslow range, as expected. We would like to emphasize that the existing studies addressing infraslow EEG signal dynamics typically consider the infraslow envelope fluctuations of band-limited signals in traditional frequency bands [e.g. Nir et. al, Nat Neurosci 2008] rather than direct recordings in the infraslow frequency range. Investigating HRF-convolved EEG signals similarly captures the infraslow characteristics of the timeseries [e.g. Mantini et al. PNAS 2007, Sadaghiani et al., J Neurosci 2010] (and note that HRF-convolved analyses are included as supplementary investigation in the current study). To the best of our knowledge, very few studies have investigated direct infraslow EEG signals using DC EEG, and we are aware of only two DC-EEG studies with concurrent fMRI [Hiltunen et al., J Neurosci 2014, Grooms et al., Brain Connectivity 2017]. The infraslow correlates of fMRI in electrophysiological signals reported in prior work therefore reflect the slow changes in faster activity or connectivity of traditional frequency bands, which is indeed already included in the current study.

      Author response image 6.

      Sample timeseries of the iEEG signal of the nine subjects (nine rows) for a 400 second interval. Blue signals show the bandlimited delta with its envelope shown as darker blue. The red signal represents the infraslow signal component left in the data, which is much lower in power.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Ritvo and colleagues present an impressive suite of simulations that can account for three findings of differentiation in the literature. This is important because differentiation-in which items that have some features in common, or share a common associate are less similar to one another than are unrelated items-is difficult to explain with classic supervised learning models, as these predict the opposite (i.e., an increase in similarity). A few of their key findings are that differentiation requires a high learning rate and low inhibitory oscillations, and is virtually always asymmetric in nature.

      This paper was very clear and thoughtful-an absolute joy to read. The model is simple and elegant, and powerful enough to re-create many aspects of existing differentiation findings. The interrogation of the model and presentation of the findings were both extremely thorough. The potential for this model to be used to drive future work is huge. I have only a few comments for the authors, all of which are relatively minor.

      (1) I was struck by the fact that the "zone" of repulsion is quite narrow, compared with the zone of attraction. This was most notable in the modeling of Chanales et al. (i.e., just one of the six similarity levels yielded differentiation). Do the authors think this is a generalizable property of the model or phenomenon, or something idiosyncratic to do with the current investigation? It seems curious that differentiation findings (e.g., in hippocampus) are so robustly observed in the literature despite the mechanism seemingly requiring a very particular set of circumstances. I wonder if the authors could speculate on this point a bit-for example, might the differentiation zone be wider when competitor "pop up" is low (i.e., low inhibitory oscillations), which could help explain why it's often observed in hippocampus? This seems related a bit to the question about what makes something "moderately" active, or how could one ensure "moderate" activation if they were, say, designing an experiment looking at differentiation.

      We thank the reviewer for this comment. In the previous version of the manuscript, in the section entitled “Differentiation Requires a High Learning Rate and Is Sensitive to Activation Dynamics”, we discussed some reasons why differentiation may be more likely to be found in the hippocampus – namely, the high learning rate of the hippocampus and the sparsity of hippocampal activation patterns (pp. 27-28):

      “These results have implications for where to look for differentiation in the brain. Our finding that differentiation requires a high learning rate suggests that differentiation will be more evident in the hippocampus than in neocortex, insofar as hippocampus is thought to have a higher learning rate than neocortex (McClelland et al., 1995). In keeping with this prediction, numerous studies have found differentiation effects in hippocampus but not in neocortical regions involved in sensory processing (e.g., Chanales et al., 2017; Favila et al., 2016; Zeithamova et al., 2018). At the same time, some studies have found differentiation effects in neocortex (e.g., Schlichting et al., 2015; Wammes et al., 2022). One possible explanation of these neocortical differentiation effects is that they are being ``propped up’’ by top-down feedback from differentiated representations in the hippocampus. This explanation implies that disruptions of hippocampal processing (e.g., lesions, stimulation) will eliminate these neocortical differentiation effects; we plan to test this prediction in future work.

      Additionally, the simulations where we adjusted the oscillation amount (using our model of Schlichting et al., 2015) imply that differentiation will be most evident in brain regions where it is relatively hard to activate competitors. Given the U shape of the NMPH learning rule, limiting competitor activity makes it less likely that plasticity will ``cross over'' from weakening (and differentiation) to strengthening (and integration). Thus, within the hippocampus, subregions with sparser activity (e.g., dentate gyrus, and to a lesser extent, CA3; Barnes et al., 1990, GoodSmith et al., 2017; West et al., 1991) will be more prone to differentiation. There is strong empirical support for this prediction. For example, Wammes et al. (2022) manipulated the similarity of stimuli in a statistical learning experiment and found that moderate levels of visual similarity were associated with significant differentiation in the dentate gyrus but not other subregions. Also, numerous studies have found greater differentiation in dentate gyrus / CA3 than in CA1 (e.g., Dimsdale-Zucker et al., 2018; Wanjia et al., 2021; Molitor et al., 2021; Kim et al., 2017; but see Zheng et al., 2021).”

      In the revised draft we have supplemented this discussion with a new section entitled “Reconciling the Prevalence of Differentiation in the Model and in the Data” (pp. 30-31):

      “A key lesson from our model is that, from a computational perspective, it is challenging to obtain differentiation effects: The region of parameter space that gives rise to differentiation is much smaller than the one that gives rise to integration (for further discussion of this issue, see the section in Methods on Practical Advice for Getting the Model to Show Differentiation). However, the fact that integration is more prevalent in our simulations across parameter configurations does not mean that integration will be more prevalent than differentiation in real-life circumstances. What really matters in predicting the prevalence of differentiation in real life is how the parameters of the brain map on to parameters of the model: If the parameters of the brain align with regions of model parameter space that give rise to differentiation (even if these regions are small), this would explain why differentiation has been so robustly observed in extant studies. Indeed, this is exactly the case that we sought to make above about the hippocampus – i.e., that its use of especially sparse coding and a high learning rate will give rise to the kinds of neural dynamics that cause differentiation (as opposed to integration). As another example, while it is true that half of the overlap conditions in our simulation of Chanales et al. (2021) give rise to integration, this does not imply that integration will occur half of the time in the Chanales et al. (2021) study; it may be that the levels of overlap that are actually observed in the brain in Chanales et al. (2021) are more in line with the levels of overlap that give rise to differentiation in our model.”

      (2) With real fMRI data we know that the actual correlation value doesn't matter all that much, and anti-correlations can be induced by things like preprocessing decisions. I am wondering if the important criterion in the model is that the correlations (e.g., as shown in Figure 6) go down from pre to post, versus that they are negative in sign during the post learning period. I would think that here, similar to in neural data, a decrease in correlation would be sufficient to conclude differentiation, but would love the authors' thoughts on that.

      We thank the reviewer for bringing this up. In the paper, we define differentiation as the moving apart of representations – so we agree with the reviewer that it would be appropriate to conclude that differentiation is taking place when correlations go down from pre to post.

      In addition to the definitional question (“what counts as differentiation”), one can also ask the mechanistic question of what is happening in the model at the (simulated) neuronal level in conditions where differentiation (i.e., an average decrease in similarity from pre to post) occurs. Here, the model’s answer is clear: When the similarity of two pairmates decreases, it is because the pairmates have acquired anticorrelated representations at the (simulated) neuronal level. When similarity decreases on average from pre to post, but the average “post” similarity value is not negative, this is because there is a mix of outcomes across runs of the model (due to variance in the initial, random model weights and also variance in the order in which items are presented across training epochs) – some runs lead to differentiation (manifested as anticorrelated pairmate representations) whereas others lead to no change or integration. The average pre-to-post change depends on the relative frequencies with which these different outcomes occur.

      We have made several edits to the paper to clarify this point.

      We added a new section under “Results” in our simulation of Chanales et al. (2021) entitled, “Pairs of Items that Differentiate Show Anticorrelated Representations” (p. 15):

      “Figure 6B also highlights that, for learning rates where robust differentiation effects occur in aggregate (i.e., there is a reduction in mean pattern similarity, averaging across model runs), these aggregate effects involve a bimodal distribution across model runs: For some model runs, learning processes give rise to anticorrelated representations, and for other model runs the model shows integration; this variance across model runs is attributable to random differences in the initial weight configuration of the model. The aggregate differentiation effect is therefore a function of the proportion of model runs showing differentiation (here, anticorrelation) and the proportion of model runs showing integration. The fact that differentiation shows up as anticorrelation in the model's hidden layer relates to the learning effects discussed earlier:

      Unique competitor units are sheared away from (formerly) shared units, so the competitor ends up not having any overlap with the target representation (i.e., the level of overlap is less than you would expect due to chance, which mathematically translates into anticorrelation). We return to this point and discuss how to test for anticorrelation in the Discussion section.”

      We added new text to the “Take-Home Lessons” section in the Chanales et al. (2021) simulation (p. 17):

      “In particular, the simulations expose some important boundary conditions for when representational change can occur according to the NMPH (e.g., that differentiation depends on a large learning rate, but integration does not), and the simulations provide a more nuanced account of exactly how representations change (e.g., that differentiation driven by the NMPH is always asymmetric, whereas integration is sometimes asymmetric and sometimes symmetric; and that, when differentiation occurs on a particular model run, it tends to give rise to anticorrelated representations in the model's hidden layer).”

      We added new text to the “Nature of Representational Change” section in the Favila et al. (2016) simulation (p. 21):

      “Figure 8 - Supplement 1 also indicates that, as in our simulation of Chanales et al. (2021), individual model runs where differentiation occurs show anticorrelation between the pairmate representations, and gradations in the aggregate level of differentiation that is observed across conditions reflect differences in the proportion of trials showing this anticorrelation effect.”

      We added new text to the “Take-Home Lessons” section in the Favila et al. (2016) simulation (p.21):

      “As in our simulation of \cite{chanales2021adaptive}, we found that the NMPH-mediated differentiation was asymmetric, manifested as anticorrelation between pairmate representations on individual model runs, and required a high learning rate, leading to abrupt representational change.”

      We added new text to the “Nature of Representational Change” section in the Schlichting et al. (2015) simulation (p. 26):

      “Also, as in our other simulations, when differentiation occurs on a particular model run it tends to give rise to anticorrelated representations (results not shown).”

      We added new text to the “Take-Home Lessons” section in the Schlichting et al. (2015) simulation (pp. 26-27):

      “As in the other versions of our model, differentiation requires a high learning rate, and – on model runs when it occurs – it is asymmetric and gives rise to anticorrelated representations.”

      We added new text at the start of the Discussion (p. 27):

      “In addition to qualitatively replicating the results from the studies we simulated, our model gives rise to several novel predictions – most notably, that differentiation driven by the NMPH requires a rapid learning rate and, when it occurs for a particular pair of items, it is asymmetric and gives rise to anticorrelated representations.”

      We also added a new section in the Discussion entitled “Testing the Model's Prediction about Anticorrelation”, which (among other things) highlights the reviewer’s point that fMRI pattern similarity values can be affected by preprocessing choices (p. 30):

      “Even though we operationally define differentiation as a reduction in similarity with learning, the way that it actually shows up on individual model runs is as anticorrelation between pairmates; in the model, the size of the aggregate differentiation effect is determined by the proportion of model runs that show this anticorrelation effect (vs. no change or integration). This implies that, if we could get a clean measurement of the similarity of pairmates in an experiment, we might see a multimodal distribution, with some pairmates showing anticorrelation, and others showing increased correlation (integration) or no change in similarity. This kind of clean readout of the similarity of individual pairs might be difficult to obtain with fMRI; it is more feasible that this could be obtained with electrophysiology. Another challenge with using fMRI to test this prediction is that anticorrelation at the individual-neuron level might not scale up to yield anticorrelation at the level of the BOLD response; also, fMRI pattern similarity values can be strongly affected by preprocessing choices – so a negative pattern similarity value does not necessarily reflect anticorrelation at the individual-neuron level. A final caveat is that, while we predict that differentiation will show up as anticorrelation in the brain region that gives rise to the differentiation effect, this might not translate into anticorrelation in areas that are downstream of this region (e.g., if the hippocampus is the source of the differentiation effect, we would expect anticorrelation there, but not necessarily in neocortical regions that receive input from the hippocampus; we revisit this point later in the discussion, when we address limitations and open questions).”

      We added new text in the Discussion, under “Limitations and Open Questions” (p. 31):

      “Importantly, while hippocampus can boost the representation of unique features in neocortex, we expect that neocortex will continue to represent shared perceptual features (e.g., in Favila et al., 2016, the fact that both pairmates are photos of barns). For this reason, in paradigms like the one used by Favila et al. (2016), the predicted effect of hippocampal differentiation on neocortical representations will be a reduction in pattern similarity (due to upregulation in the representation of unique pairmate features) but neocortex should not cross over into anticorrelation in these paradigms (due to its continued representation of shared perceptual features). Indeed, this is exactly the pattern that Wanjia et al. (2021) observed in their study, which used similar stimuli to those used in Favila et al. (2016).”

      Lastly, we updated the Abstract (p. 1)

      “What determines when neural representations of memories move together (integrate) or apart (differentiate)? Classic supervised learning models posit that, when two stimuli predict similar outcomes, their representations should integrate. However, these models have recently been challenged by studies showing that pairing two stimuli with a shared associate can sometimes cause differentiation, depending on the parameters of the study and the brain region being examined. Here, we provide a purely unsupervised neural network model that can explain these and other related findings. The model can exhibit integration or differentiation depending on the amount of activity allowed to spread to competitors – inactive memories are not modified, connections to moderately active competitors are weakened (leading to differentiation), and connections to highly active competitors are strengthened (leading to integration). The model also makes several novel predictions – most importantly, that when differentiation occurs as a result of this unsupervised learning mechanism, it will be rapid and asymmetric, and it will give rise to anticorrelated representations in the region of the brain that is the source of the differentiation. Overall, these modeling results provide a computational explanation for a diverse set of seemingly contradictory empirical findings in the memory literature, as well as new insights into the dynamics at play during learning.”

      (3) For the modeling of the Favila et al. study, the authors state that a high learning rate is required for differentiation of the same-face pairs. This made me wonder what happens in the low learning rate simulations. Does integration occur?

      For the same-face condition of the Favila simulation, lowering learning rate does not result in an overall integration effect:

      Author response image 1.

      In other cases, we do see integration emerge at lower learning rates – e.g., in the Schlichting interleaved condition we see a small integration effect emerge for a learning rate value of 0.3:

      Author response image 2.

      Our view is that, while integration can emerge at low learning rates, it is not a reliable property of the model – in some cases, there is a “window” of learning rates where there is enough learning to drive integration but not enough to drive differentiation, and in other cases there is not. Given this lack of reliability across simulations, we would prefer not to discuss this in the paper.

      This paradigm has a lot of overlap with acquired equivalence, and so I am thinking about whether these are the sorts of small differences (e.g., same-category scenes and perhaps a high learning rate) that bias the system to differentiate instead of integrate.

      We agree that it would be very interesting to use the model to explore acquired equivalence and related phenomena, but we think it is out of scope of the current paper. We have added some text to the Discussion under “Limitations and Open Questions” (p. 32):

      “Another important future direction is to apply the model to a wider range of learning phenomena involving representational change – for example, acquired equivalence, which (like some of the studies modeled here) involves linking distinct stimuli to a shared associate (see, e.g., Honey and Hall, 1989; Shohamy and Wagner, 2008; Myers et al., 2003; Meeter et al., 2009; de Araujo Sanchez and Zeithamova, 2023). It is possible that some of these phenomena might be better explained by supervised learning, or a mixture of unsupervised and supervised learning, than by unsupervised learning alone.”

      (4) For the simulations of the Schlichting et al. study, the A and B appear to have overlap in the hidden layer based on Figure 9, despite there being no similarity between the A and B items in the study (in contrast to Favila et al., in which they were similar kinds of scenes, and Chanales et al., in which they were similar colors). Why was this decision made? Do the effects depend on some overlap within the hidden layer? (This doesn't seem to be explained in the paper that I saw though, so maybe just it's a visualization error?)

      Overlap in the pretrained hidden representations of A and B is not strictly necessary for these effects – it would be possible to reconfigure other parameters to get high levels of competition even if there were no overlap (e.g., by upregulating the strengths of connections from shared input features). Having said that, it is definitely true that overlap between the pretrained hidden representations boosts competition, and we think it is justified to posit this in the Schlichting simulation. We have now added an explanation for this in the paper (p. 23):

      “New text in Schlichting, “Knowledge Built into the Network”

      Matching the previous two simulations, we pretrained the weights so the hidden representations of the stimuli initially had 2/6 units in common. Even though the A and B stimuli used in the actual experiment did not have obvious feature overlap (they were randomly selected novel objects), it is important to note that the hidden layer is not simply a representation of the sensory features of the A and B stimuli; the hidden layer also receives input from the output layer, which represents the shared associate of A and B (X). We think that the presence of this shared associate justifies our use of initially-overlapping hidden representations.”

      (5) It seems as though there were no conditions under which the simulations produced differentiation in both the blocked and intermixed conditions, which Schlichting et al. observed in many regions (as the present authors note). Is there any way to reconcile this difference?

      We thank the reviewer for bringing this up. If we set the connection strength between X (in the output layer) and A (in the hidden layer) in the blocked condition to .9 instead of .999 (keeping this connection strength at .8 for the interleaved condition) and we set Osc to .0615, we observe differentiation in both conditions.

      Rather than replacing the original results in the paper, which would entail re-making the associated videos, etc., we have added a supplementary figure (Figure 10 - Supplement 1), which is included on p. 46.

      We also added the following to the Results section of the Schlichting simulation in the main text (p. 26):

      “Figure 10 - Supplement 1 shows results from an alternative parameterization where, in the low-oscillation-amplitude condition, differentiation is observed in both the blocked and interleaved conditions (mirroring results from Schlichting et al., 2015, who found differentiation in both conditions in several regions of interest, including parts of the hippocampus and medial prefrontal cortex).”

      (6) A general question about differentiation/repulsion and how it affects the hidden layer representation in the model: Is it the case that the representation is actually "shifted" or repelled over so it is no longer overlapping? Or do the shared connections just get pruned, such that the item that has more "movement" in representational space is represented by fewer units on the hidden layer (i.e., is reduced in size)? I think, if I understand correctly, that whether it gets shifted vs. reduce would depend on the strength of connections along the hidden layer, which would in turn depend on whether it represents some meaningful continuous dimension (like color) or not. But, if the connections within the hidden layer are relatively weak and it is the case that representations become reduced in size, would there be any anticipated consequences of this (e.g., cognitively/behaviorally)?

      The representations are shifted – this is discussed in the Chanales results section:

      “Because the activity ``set point'' for the hidden layer (determined by the kWTA algorithm) involves having 6 units active, and the unique parts of the competitor only take up 4 of these 6 units, this leaves room for activity to spread to additional units. Given the topographic projections in the output layer, the model is biased to ``pick up'' units that are adjacent in color space to the currently active units; because activity cannot flow easily from the competitor back to the target (as a result of the aforementioned severing of connections), it flows instead {\em away} from the target, activating two additional units, which are then incorporated into the competitor representation. This sequence of events (first a severing of the shared units, then a shift away from the target) completes the process of neural differentiation, and is what leads to the behavioral repulsion effect in color recall (because the center-of-mass of the color representation has now shifted away from the target).”

      Reviewer #2 (Public Review):

      This paper addresses an important computational problem in learning and memory. Why do related memory representations sometimes become more similar to each other (integration) and sometimes more distinct (differentiation)? Classic supervised learning models predict that shared associations should cause memories to integrate, but these models have recently been challenged by empirical data showing that shared associations can sometimes cause differentiation. The authors have previously proposed that unsupervised learning may account for these unintuitive data. Here, they follow up on this idea by actually implementing an unsupervised neural network model that updates the connections between memories based on the amount of coactivity between them. The goal of the authors' paper is to assess whether such a model can account for recent empirical data at odds with supervised learning accounts. For each empirical finding they wish to explain, the authors built a neural network model with a very simple architecture (two inputs layers, one hidden layer, and one output layer) and with prewired stimulus representations and associations. On each trial, a stimulus is presented to the model, and inhibitory oscillations allow competing memories to pop up. Pre-specified u-shaped learning rules are used to update the weights in the model, such that low coactivity leaves model connections unchanged, moderate coactivity weakens connections, and high coactivity strengthens connections. In each of the three models, the authors manipulate stimulus similarity (following Chanales et al), shared vs distinct associations (following Favila et al), or learning strength (a stand in for blocked versus interleaved learning schedule; following Schlichting et al) and evaluate how the model representations evolve over trials.

      As a proof of principle, the authors succeed in demonstrating that unsupervised learning with a

      simple u-shaped rule can produce qualitative results in line with the empirical reports. For instance, they show that pairing two stimuli with a common associate (as in Favila et al) can lead to *differentiation* of the model representations. Demonstrating these effects isn't trivial and a formal modeling framework for doing so is a valuable contribution. Overall, the authors do a good job of both formally describing their model and giving readers a high level sense of how their critical model components work, though there are some places where the robustness of the model to different parameter choices is unclear. In some cases, the authors are very clear about this (e.g. the fast learning rate required to observe differentiation). However, in other instances, the paper would be strengthened by a clearer reporting of the critical parameter ranges.

      We thank the reviewer for raising this point. The interdependence of parameters in our model makes it infeasible to identify critical parameter ranges. We have added a paragraph to the “Approach to Parameterization and Data Fitting” section in the Methods to address this point (p. 33):

      “The overall goal of this modeling work is to account for key empirical regularities regarding differentiation and integration and to establish boundary conditions on these regularities. As such, the modeling work described below focuses more on qualitative fits to general properties of the data space than on quantitative fits to results from specific studies. Automatic parameter optimization is not feasible for this kind of model, given the large number of model parameters and the highly interactive, nonlinear nature of competitive dynamics in the model; consequently, model fitting was done by hand.

      These complex interactions between parameters also make it infeasible to list “critical parameter ranges” for generating particular model outcomes. Our experience in working with the model has been that activation dynamics are what matter most for learning, and that disparate parameter sets can give rise to the same activation dynamics and -- through this -- the same learning effects; likewise, similar parameter sets can give rise to different activation dynamics and different learning outcomes. Consequently, in this paper we have focused on characterizing the dynamics that give rise to different learning effects (and how they can be affected by local parameter perturbations, e.g., relating to learning rate and oscillation size), rather than the – impossible, we believe – task of enumerating the full set of parameter configurations that give rise to a particular result.”

      For instance, it's clear from the manipulation of oscillation strength in the model of Schlichting et al that this parameter can dramatically change the direction of the results. The authors do report the oscillation strength parameter values that they used in the other two models, but it is not clear how sensitive these models are to small changes in this value.

      In some cases, the effects of oscillation strength are relatively smooth. For example, in the Favila simulation, increasing the oscillation amplitude Osc effectively recapitulates the U-shaped curve (i.e., higher levels of Osc lead to more competitor activation, which initially leads to weakening / differentiation but then gives way to strengthening / integration), as is shown for the Favila Different Face condition in this plot:

      Author response image 3.

      In the Chanales 2/6 overlap condition, the effects of varying Osc are more nonlinear:

      Author response image 4.

      We think this is attributable to the increased “all-or-none” recurrent dynamics in this simulation (due to the recurrent projections within the output layer), which make it more difficult to evoke moderate (vs. high) levels of activation. This difficulty in reliably obtaining graded activation dynamics is likely a consequence of the small-scale (“toy”) nature of the model and the simple inhibitory mechanisms employed here, as opposed to being a generalizable property of the brain – presumably, the actual brain employs more nuanced and effective means of controlling activation. Furthermore, we don’t think that the high prevalence of integration in the model’s parameter space necessarily translates into a prediction that integration should be more prevalent overall – see the new “Reconciling the Prevalence of Differentiation in the Model and in the Data” section described in response to one of the reviewer’s other points below. Due to the paper already being quite long, we have opted not to include the above plots / discussion in the paper.

      Similarly, it's not clear whether the 2/6 hidden layer overlap (only explicitly manipulated in the model of Chanales et al) is required for the other two models to work.

      When we were parameterizing the model, we opted to keep the 2/6 level of overlap for all of the simulations and we adjusted other parameters to fit the data; in part, this was because overlap can only be adjusted in discrete jumps, whereas other influential parameters in the model can be adjusted in a more graded, real-valued way. Our use of 2/6 overlap (as opposed to, say, 1/6 or 3/6 overlap) for the Favila and Schlichting models was done out of convenience, and should not be interpreted as a strong statement that this particular level of overlap is necessary for obtaining differentiation; we could easily get the model to show differentiation given other overlap levels by adjusting other parameters.

      Finally, though the u-shaped learning rule is essential to this framework, the paper does little formal investigation of this learning rule. It seems obvious that allowing the u-shape to collapse too much toward a horizontal line would reduce the model's ability to account for empirical results, but there may be other more interesting features of the learning rule parameterization that are essential for the model to function properly.

      Given that the paper is already quite long, we have opted not to include further exploration of the parameters of the U-shaped learning rule in the paper. However, for the reviewer’s information, we report the effects of a few illustrative manipulations of these parameters below. As a general principle, the effects of these manipulations make sense in light of the theoretical framework described in the paper.

      For example, the parameter “DRevMag” controls the size of the negative “dip” in the U-shaped curve (more negative values = a larger dip). Given that this negative dip is essential for severing weights to competitors and causing differentiation, shifting DRevMag upwards towards zero should shift the balance of the model away from differentiation and towards integration. This is indeed what we observe, as shown in this parameter sweep from the Chanales simulation:

      Author response image 5.

      As another example: The “DRev” parameter controls where the U-shaped curve transitions from negative weight change to positive weight change. Lower values of DRev mean that the region of coactivity values leading to negative weight change will be smaller, and the region of coactivity values leading to positive weight change will be larger. As such, we would expect that lower values of DRev would bias the model toward integration. That is indeed the case, as shown in this parameter sweep from the Schlichting Blocked simulation:

      Author response image 6.

      There are a few other points that may limit the model's ability to clearly map onto or make predictions about empirical data. The model(s) seems very keen to integrate and do so more completely than the available empirical data suggest. For instance, there is a complete collapse of representations in half of the simulations in the Chanales et al model and the blocked simulation in the Schlichting et al model also seems to produce nearly complete integration Even if the Chanales et al paper had observed some modest behavioral attraction effects, this model would seem to over-predict integration. The author's somewhat implicitly acknowledge this when they discuss the difficulty of producing differentiation ("Practical Advice for Getting the Model to Show Differentiation") and not of producing integration, but don't address it head on.

      We thank the reviewer for this comment – R1 had a similar comment. We have added a new section to the Discussion to address this point (p. 30):

      “Reconciling the Prevalence of Differentiation in the Model and in the Data.

      A key lesson from our model is that, from a computational perspective, it is challenging to obtain differentiation effects: The region of parameter space that gives rise to differentiation is much smaller than the one that gives rise to integration (for further discussion of this issue, see the section in Methods on Practical Advice for Getting the Model to Show Differentiation). However, the fact that integration is more prevalent in our simulations across parameter configurations does not mean that integration will be more prevalent than differentiation in real-life circumstances. What really matters in predicting the prevalence of differentiation in real life is how the parameters of the brain map on to parameters of the model: If the parameters of the brain align with regions of model parameter space that give rise to differentiation (even if these regions are small), this would explain why differentiation has been so robustly observed in extant studies. Indeed, this is exactly the case that we sought to make above about the hippocampus – i.e., that its use of especially sparse coding and a high learning rate will give rise to the kinds of neural dynamics that cause differentiation (as opposed to integration). As another example, while it is true that half of the overlap conditions in our simulation of Chanales et al. (2021) give rise to integration, this does not imply that integration will occur half of the time in the Chanales et al. (2021) study; it may be that the levels of overlap that are actually observed in the brain in Chanales et al. (2021) are more in line with the levels of overlap that give rise to differentiation in our model.”

      Second, the authors choice of strongly prewiring associations in the Chanales and Favila models makes it difficult to think about how their model maps onto experimental contexts where competition is presumably occurring while associations are only weakly learned. In the Chanales et al paper, for example, the object-face associations are not well learned in initial rounds of the color memory test. While the authors do justify their modeling choice and their reasons have merit, the manipulation of AX association strength in the Schlichting et al model also makes it clear that the association strength has a substantial effect on the model output. Given the effect of this manipulation, more clarity around this assumption for the other two models is needed.

      We thank the reviewer for bringing this up. We have edited the section entitled “A Note on Prewiring Representations” in the Methods to further justify our choice to prewire associations in the Chanales and Favila models (p. 37):

      “In our model, our practice of ``prewiring'' memory representations for the A and B pairmates serves two functions. In some cases, it is meant to stand in for actual training (as in the blocked / interleaved manipulation; the connections supporting the AX association are prewired to be stronger in the blocked condition than in the interleaved condition). However, the other, more fundamental role of prewiring is to ensure that the A and B input patterns evoke sparse distributed representations in the hidden layer (i.e., where some units are strongly active but most other units are inactive). In the real brain, this happens automatically because the weight landscape has been extensively sculpted by both experience and evolution. For example, in the real hippocampus, when the second pairmate is presented for the first time, it will evoke a sparse distributed representation in the CA3 subfield (potentially overlapping with the first pairmate’s CA3 representation) even before any learning of the second pairmate has occurred, due to the strong, sparse mossy fiber projections that connect the dentate gyrus to CA3 (McNaughton & Morris, 1987). As discussed above, we hypothesize that this initial, partial overlap between the second pairmate’s representation and the first pairmate’s representation can lead to pop-up of the unique features of the first pairmate’s representation, triggering learning that leads to differentiation or integration. In our small-scale model, we are effectively starting with a ``blank brain''; in the absence of prewiring, the A and B inputs would activate overly diffuse representations that do not support these kinds of competitive dynamics. As such, prewiring in our model is necessary for proper functioning. The presence of prewired A and B representations should therefore not be interpreted as reflecting a particular training history (except in the blocked / interleaved case above); rather, these prewired representations constitute the minimum step we would take to ensure well-defined competitive dynamics in our small-scale model.

      The fact that connection strengths serve this dual function – sometimes reflecting effects of training (as in our simulation of Schlichting et al., 2015) and in other cases reflecting necessary prewiring – complicates the interpretation of these strength values in the model. Our view is that this is a necessary limitation of our simplified modeling approach – one that can eventually be surmounted through the use of more biologically-detailed architectures (see Limitations and Open Questions in the Discussion).”

      Overall, this is strong and clearly described work that is likely to have a positive impact on computational and empirical work in learning and memory. While the authors have written about some of the ideas discussed in this paper previously, a fully implemented and openly available model is a clear advance that will benefit the field. It is not easy to translate a high-level description of a learning rule into a model that actually runs and behaves as expected. The fact that the authors have made all their code available makes it likely that other researchers will extend the model in numerous interesting ways, many of which the authors have discussed and highlighted in their paper.

      Reviewer #3 (Public Review):

      This paper proposes a computational account for the phenomenon of pattern differentiation (i.e., items having distinct neural representations when they are similar). The computational model relies on a learning mechanism of the nonmonotonic plasticity hypothesis, fast learning rate and inhibitory oscillations. The relatively simple architecture of the model makes its dynamics accessible to the human mind. Furthermore, using similar model parameters, this model produces simulated data consistent with empirical data of pattern differentiation. The authors also provide insightful discussion on the factors contributing to differentiation as opposed to integration. The authors may consider the following to further strengthen this paper:

      The model compares different levels of overlap at the hidden layer and reveals that partial overlap seems necessary to lead to differentiation. While I understand this approach from the perspective of modeling, I have concerns about whether this is how the human brain achieves differentiation. Specifically, if we view the hidden layer activation as a conjunctive representation of a pair that is the outcome of encoding, differentiation should precede the formation of the hidden layer activation pattern of the second pairmate. Instead, the model assumes such pattern already exists before differentiation. Maybe the authors indeed argue that mechanistically differentiation follows initial encoding that does not consider similarity with other memory traces?

      Related to the point above, because the simulation setup is different from how differentiation actually occurs, I wonder how valid the prediction of asymmetric reconfiguration of hidden layer connectivity pattern is.

      We thank the reviewer for this comment. In the revised manuscript, we have edited the “Note on Prewiring Representations” in the Methods to clarify how our assumptions about prewiring relate to what we really think is happening in the brain (p. 37):

      “In our model, our practice of ``prewiring'' memory representations for the A and B pairmates serves two functions. In some cases, it is meant to stand in for actual training (as in the blocked / interleaved manipulation; the connections supporting the AX association are prewired to be stronger in the blocked condition than in the interleaved condition). However, the other, more fundamental role of prewiring is to ensure that the A and B input patterns evoke sparse distributed representations in the hidden layer (i.e., where some units are strongly active but most other units are inactive). In the real brain, this happens automatically because the weight landscape has been extensively sculpted by both experience and evolution. For example, in the real hippocampus, when the second pairmate is presented for the first time, it will evoke a sparse distributed representation in the CA3 subfield (potentially overlapping with the first pairmate’s CA3 representation) even before any learning of the second pairmate has occurred, due to the strong, sparse mossy fiber projections that connect the dentate gyrus to CA3 (McNaughton & Morris, 1987). As discussed above, we hypothesize that this initial, partial overlap between the second pairmate’s representation and the first pairmate’s representation can lead to pop-up of the unique features of the first pairmate’s representation, triggering learning that leads to differentiation or integration. In our small-scale model, we are effectively starting with a ``blank brain''; in the absence of prewiring, the A and B inputs would activate overly diffuse representations that do not support these kinds of competitive dynamics. As such, prewiring in our model is necessary for proper functioning. The presence of prewired A and B representations should therefore not be interpreted as reflecting a particular training history (except in the blocked / interleaved case above); rather, these prewired representations constitute the minimum step we would take to ensure well-defined competitive dynamics in our small-scale model.

      The fact that connection strengths serve this dual function – sometimes reflecting effects of training (as in our simulation of Schlichting et al., 2015) and in other cases reflecting necessary prewiring – complicates the interpretation of these strength values in the model. Our view is that this is a necessary limitation of our simplified modeling approach – one that can eventually be surmounted through the use of more biologically-detailed architectures (see Limitations and Open Questions in the Discussion).”

      Although as the authors mentioned, there haven't been formal empirical tests of the relationship between learning speed and differentiation/integration, I am also wondering to what degree the prediction of fast learning being necessary for differentiation is consistent with current data. According to Figure 6, the learning rates lead to differentiation in the 2/6 condition achieved differentiation after just one-shot most of the time. On the other hand, For example, Guo et al (2021) showed that humans may need a few blocks of training and test to start showing differentiation.

      We thank the reviewer for mentioning this. We have added a paragraph to the “Differentiation Requires a High Learning Rate and Is Sensitive to Activity Dynamics” section of the Discussion that addresses this point (pp. 28-29):

      “Although the results from Wanjia et al. (2021) provide strong support for the model's prediction that differentiation will be abrupt, they raise another question: What explains variance across items in when this abrupt change takes place? The answer to this question remains to be seen, but one possibility is encoding variability: If we assume that participants stochastically sample (i.e., attend to) the features of the scene pairmates, it is possible that participants might initially fail to sample the features that distinguish the scene pairmates, which can be quite subtle – and if the distinguishing features of the pairmates are not represented in high-level visual regions (i.e., the pairmates are represented in these regions as having the same features), this could delay the onset of differentiation until the point at which the distinguishing features happen (by chance) to be sampled.”

      Related to the point above, the high learning rate prediction also seems to be at odds with the finding that the cortex, which has slow learning (according to the theory of complementary learning systems), also shows differentiation in Wammes et al (2022).

      We now address this point in the section of the Discussion entitled “Differentiation Requires a High Learning Rate and Is Sensitive to Activity Dynamics” (p. 27):

      “Our finding that differentiation requires a high learning rate suggests that differentiation will be more evident in the hippocampus than in neocortex, insofar as hippocampus is thought to have a higher learning rate than neocortex (McClelland et al., 1995). In keeping with this prediction, numerous studies have found differentiation effects in hippocampus but not in neocortical regions involved in sensory processing (e.g., Chanales et al., 2017; Favila et al., 2016; Zeithamova et al., 2018). At the same time, some studies have found differentiation effects in neocortex (e.g., Schlichting et al., 2015; Wammes et al., 2022). One possible explanation of these neocortical differentiation effects is that they are being ``propped up’’ by top-down feedback from differentiated representations in the hippocampus.”

      More details about the learning dynamics would be helpful. For example, equation(s) showing how activation, learning rate and the NMPH function work together to change the weight of connections may be added. Without the information, it is unclear how each connection changes its value after each time point.

      We thank the reviewer for this comment. We have made two major changes to address this concern. First, we have edited the “Learning” section within “Basic Network Properties” in the main text (pp. 6-7):

      “Connection strengths in the model between pairs of connected units x and y were adjusted at the end of each trial (i.e., after each stimulus presentation) as a U-shaped function of the coactivity of x and y, defined as the product of their activations on that trial. The parameters of the U-shaped learning function relating coactivity to change in connection strength (i.e., weakening / strengthening) were specified differently for each projection where learning occurs (bidirectionally between the input and hidden layers, the hidden layer to itself, and the hidden to output layer). Once the U-shaped learning function for each projection in each version of the model was specified, we did not change it for any of the various conditions. Details of how we computed coactivity and how we specified the U-shaped function can be found in the Methods section.”

      Second, we have added the requested equations to the “Learning” part of the Methods (pp. 37-38):

      The right side of the function, strong activation leads to strengthening of the connectivity, which I assume will lead to stronger activation on the next time point. The model has an upper limit of connection strength to prevent connection from strengthening too much. The same idea can be applied to the left side of the function: instead of having two turning points, it can be a linear function such that low activation keeps weakening connection until the lower limit is reached. This way the NMPH function can take a simpler form (e.g., two line-segments if you think the weakening and strengthening take different rates) and may still simulate the data.

      We thank the reviewer for mentioning this. We have added a new paragraph in the “Learning” section of the Methods to justify the particular shape of the learning curve (pp. 38-39):

      “Evidence for the U-shaped plasticity function used here (where low activation leads to no change, moderate activation leads to weakening, and higher levels of activation lead to strengthening) was previously reviewed in Ritvo et al. (2019). In brief, there are three lines of work that support the U shape: First, multiple neurophysiological studies have found that moderate postsynaptic depolarization leads to synaptic weakening and higher levels of depolarization lead to synaptic strengthening (e.g., Artola et al., 1990; Hansel et al., 1996). Second, human neuroscience studies have used pattern classifiers, applied to fMRI and EEG data, to measure memory activation, and have related this measure to subsequent memory accessibility; several studies using this approach have found that low levels of activation lead to no change in memory strength, moderate levels of activation lead to impaired subsequent memory, and higher levels of activation lead to increased subsequent memory (e.g., Newman and Norman, 2010; Detre et al., 2013; Kim et al., 2014; for related findings, see Lewis-Peacock and Norman, 2014; Wang et al., 2019). Third, a recent human fMRI study by Wammes et al. (2022) manipulated memory activation by varying the visual similarity of pairmates and observed a U-shaped function relating visual similarity to representational change in the hippocampus, whereby low levels of pairmate similarity were associated with no change, moderate levels of similarity were associated with differentiation, and the differentiation effect went away at higher levels of similarity.

      We have also included a pointer to this new paragraph in the “Nonmonotonic Plasticity Hypothesis” section of Introduction (p. 2):

      (for further discussion of the empirical justification for the NMPH, see the Learning subsection in the Methods)”

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      A few additional minor things about data presentation and the like:

      (1) Figure 1 legend - a more general description of how to interpret the figure might be helpful for more naive readers (e.g., explaining how one can visualize in the schematic that there is overlap in the hidden layer between A and B). Also, from the Figure 1 depiction, it's not clear what is different about the setup from the initial left hand side panels in A, B, C, to make it such that activity spreads strongly to A in panel A, weakly in panel B, and not at all in panel C since the weights are the same. Is there a way to incorporate this into the graphic, or describe it in words?

      To address this point, we have added the following text to the Figure 1 caption (p. 3):

      “Note that the figure illustrates the consequences of differences in competitor activation for learning, without explaining why these differences would arise. For discussion of circumstances that could lead to varying levels of competitor activation, see the simulations described in the text.”

      (2) I believe not all of the papers cited on lines 193-195 actually have similarity manipulations in them. I'd recommend double checking this list and removing those less relevant to the statement.

      Thank you for pointing this out; we have removed the Ballard reference and we have clarified what we mean by similarity reversal (p. 7):

      “The study was inspired by recent neuroimaging studies showing ``similarity reversals'', wherein stimuli that have more features in common (or share a common associate) show less hippocampal pattern similarity (Favila et al., 2016; Schlichting et al., 2015; Molitor et al., 2021; Chanales et al., 2017; Dimsdale-Zucker et al., 2018; Wanjia et al., 2021; Zeithamova et al., 2018; Jiang et al., 2020; Wammes et al., 2022).”

      (3) I wanted a bit more detail about how the parameters were set in the main paper, not just in the methods. Even something as brief as noting that model fitting was done by hand by tweaking parameters to re-create the empirical patterns (if I'm understanding correctly) would have been helpful for me.

      To address this point, we have added the following text under “Basic Network Properties” (p. 4):

      “Our goal was to qualitatively fit key patterns of results from each of the aforementioned studies. We fit the parameters of the model by hand as they are highly interdependent (see the Methods section for more details).”

      (4) In Figure 4E, it would be helpful to describe the x and y axes of the MDS plots in the legend.

      To address this point, we have added the following new text to the Figure 4 caption that clarifies how the MDS plots were generated (p. 11):

      “MDS plots were rotated, shifted, and scaled such that pairmate 1before is located at (0,0), pairmate 2before is located directly to the right of pairmate 1before, and the distance between pairmate 1before and pairmate 2before is proportional to the baseline distance between the pairmates.”

      (5) Figure 6 - at first I thought the thicker line was some sort of baseline, but I think it is just many traces on top of one another. If other readers may be similarly confused, perhaps this could be stated.

      Thanks for this comment. We have updated Figure 6 (p. 16).

      We have also updated the caption.

      I am having a lot of difficulty understanding the terms "competitor-to-competitor,"

      "competitor-to-target/shared," and "target/shared-to-target/shared," and therefore I don't fully get Figure 5. I think it might be helpful to expand the description of these terms where they are first introduced in the paper (p. 13?). I think I am missing something crucial here, and I am not quite sure what that is-which I know is not very helpful! But, to narrate my confusion a bit, I thought that these terms would somehow relate to connections between different connections of the network. For example is competitor-to-competitor within the hidden layer? Or is this somehow combining across relevant connections that might span different pairs of layers in the model? And, I really have no idea why it is "target/shared."

      Thank you for these comments. We have updated Figure 5 and we have also made several changes to the main text and the figure caption to address these points.

      Changes to the main text (p. 13):

      “Whether symmetric or asymmetric integration occurs depends on the relative strengths of connections between pairs of unique competitor units (competitor-competitor connections) compared to connections between unique competitor units and shared units (competitor-shared connections) after the first trial (Figure 5; note that the figure focuses on connections between hidden units, but the principle also applies to connections that span across layers). Generally, coactivity between unique competitor units (competitor-competitor coactivity) is less than coactivity between unique competitor units and shared units (competitor-shared coactivity), which is less than coactivity between unique target units and shared units (target-shared coactivity).”

      (7) Relatedly in Figure 13, I understand how some competitor-to-target/shared connections could be spared in the bottom instance given panel B. However, I'm struggling to understand how that relates to the values in the corresponding chart in panel A. What about panel A, bottom (vs. the top) means lower coactivities between some competitor-to-target/shared? Is it because if the noise level is higher, the "true" activation of competitor-to-target/shared connections is weaker? I think again, I'm missing something critical here! and wonder if other readers may be in the same situation. (I know the authors described this also on p. 36, but I'm still confused!)

      We have updated Figure 13 to clarify these points.

      (8)  In Figure 9, I believe there is no caption for panel D. Also, it looks as though the item unit active for A and B is the same. I wonder if this is an error?

      Thank you for catching these errors! They have both been fixed.

      Reviewer #2 (Recommendations For The Authors):

      -Perhaps I missed it, but I think defining coactivity (how it is computed) in the main text would be useful for readers, as this is critical for understanding the model. I did find it in the methods.

      We thank the reviewer for this suggestion. We have updated the “Learning” section within “Basic Network Properties” in the main text to address this point (pp. 6-7):

      “Connection strengths in the model between pairs of connected units x and y were adjusted at the end of each trial (i.e., after each stimulus presentation) as a U-shaped function of the coactivity of x and y, defined as the product of their activations on that trial. The parameters of the U-shaped learning function relating coactivity to change in connection strength (i.e., weakening / strengthening) were specified differently for each projection where learning occurs (bidirectionally between the input and hidden layers, the hidden layer to itself, and the hidden to output layer). Once the U-shaped learning function for each projection in each version of the model was specified, we did not change it for any of the various conditions. Details of how we computed coactivity and how we specified the U-shaped function can be found in the Methods section.”

      -The modeling results in the different face condition are at odds with the data for the Favila et al model (they observe some differentiation in the paper and the model predicts no change). This could be due to a number of unmodeled factors, but it is perhaps worth noting.

      Thank you for pointing this out. It is possible to better capture the pattern of results observed by Favila et al. in their paper (with some differentiation in the different-face condition and even more differentiation in the same-face condition) by slightly adjusting the model parameters (specifically, by setting the oscillation amplitude Osc for the hidden layer to .1 instead of .067).

      Rather than replacing the old (Osc \= .067) results in the paper, which would entail re-making the associated videos, etc., we have added a supplementary figure (Figure 8 - Supplement 1; see p.45):

      We also added new text to the Favila Results, under “Differentiation and Integration” (p. 20):

      “Note also that the exact levels of differentiation that are observed in the different-face and same-face conditions are parameter dependent; for an alternative set of results showing some differentiation in the different-face condition (but still less than is observed in the same-face condition), see Figure 8 - Supplement 1.”

      -Related to my comment in the public review about pre-wiring associations, in the caption for Figure 9 (Schlichting model), the authors report "In both conditions, the pre-wired connection linking the "item B" hidden units to the "item X" output unit is set to .7. In the interleaved condition, the connection linking the "item A" hidden units to the "item X" output unit is set to .8, to reflect some amount of initial AX learning. In the blocked condition, the connection linking the "item A" hidden units to the "item X" output unit is set a higher value (.999), to reflect extra AX learning." What are the equivalent values for the other models, especially the Favila model since the structure is the same as Schlichting? I understood all the "strong" connections to be .99 unless otherwise stated. If that's the case, I don't understand why the blocked Schlichting model and the Favila model produce opposite effects. More clarity would be useful here.

      We have added a new paragraph to the results section for the Schlicting model (under “Differentiation and Integration”) to clarify why the blocked Schlichting model and the Favila model show different results (p. 24):

      “Note that the key feature driving integration in the blocked condition of this simulation is not the high strength of the connection from X to A on its own – rather, it is the asymmetry in the pretrained connection strengths from X to A (.999) and from X to B (.7). This asymmetry, which is meant to reflect the extensive training on A-X that occurred before the initial presentation of B-X, results in the A-X hidden representation decisively winning the competition during B-X presentation, which then leads to the B input also being linked to this representation (i.e., integration). It is instructive to compare this to the same-face condition from our simulation of Favila et al. (2016): In that simulation, the two pairmates are also linked strongly (.99 initial connection strength) to a shared associate, but in that case the connections are equally strong, so there is more balanced competition -- in this case, the competitor representation only comes to mind moderately (instead of displacing the target representation), so the result is differentiation instead of integration.”

      -The meaning of the different colored dots in Figure 5 is bit hard to keep track of, even given the legend labels. The figure might benefit from a model sketch highlighting each of the different coactivity types. The left side of Fig 13 was useful but again somehow mapping on the colors would help further. Another note on these figures: what does having two dots of each color mean? Is it just an illustration of the variance? There would be more dots if there was one dot per coactivity value.

      We have updated Figure 5 and Figure 13 to clarify these points (including a clarification that the dots only represent a subset of the possible pairings between units).

      -While I appreciate the goal of the paper is to account for these three studies, readers who aren't familiar with or specifically interested in these studies may appreciate a small amount of intuition on why formalizing unsupervised learning models may be broadly important for computational investigations of learning/memory/cognition.

      We have added the following text under “Basic Network Properties” in the Introduction to address this point (p. 4):

      “Achieving a better understanding of unsupervised learning is an important goal for computational neuroscience, given that learning agents have vastly more opportunities to learn in an unsupervised fashion than from direct supervision (for additional discussion of this point, see, e.g., Zhuang et al., 2021).”

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This paper presents a compelling and comprehensive study of decision-making under uncertainty. It addresses a fundamental distinction between belief-based (cognitive neuroscience) formulations of choice behaviour with reward-based (behavioural psychology) accounts. Specifically, it asks whether active inference provides a better account of planning and decision-making, relative to reinforcement learning. To do this, the authors use a simple but elegant paradigm that includes choices about whether to seek both information and rewards. They then assess the evidence for active inference and reinforcement learning models of choice behaviour, respectively. After demonstrating that active inference provides a better explanation of behavioural responses, the neuronal correlates of epistemic and instrumental value (under an optimised active inference model) are characterised using EEG. Significant neuronal correlates of both kinds of value were found in sensor and source space. The source space correlates are then discussed sensibly, in relation to the existing literature on the functional anatomy of perceptual and instrumental decision-making under uncertainty.

      Strengths:

      The strengths of this work rest upon the theoretical underpinnings and careful deconstruction of the various determinants of choice behaviour using active inference. A particular strength here is that the experimental paradigm is designed carefully to elicit both information-seeking and reward-seeking behaviour; where the information-seeking is itself separated into resolving uncertainty about the context (i.e., latent states) and the contingencies (i.e., latent parameters), under which choices are made. In other words, the paradigm - and its subsequent modelling - addresses both inference and learning as necessary belief and knowledge-updating processes that underwrite decisions.

      The authors were then able to model belief updating using active inference and then look for the neuronal correlates of the implicit planning or policy selection. This speaks to a further strength of this study; it provides some construct validity for the modelling of belief updating and decision-making; in terms of the functional anatomy as revealed by EEG. Empirically, the source space analysis of the neuronal correlates licences some discussion of functional specialisation and integration at various stages in the choices and decision-making.

      In short, the strengths of this work rest upon a (first) principles account of decision-making under uncertainty in terms of belief updating that allows them to model or fit choice behaviour in terms of Bayesian belief updating - and then use relatively state-of-the-art source reconstruction to examine the neuronal correlates of the implicit cognitive processing.

      Response: We are deeply grateful for your careful review of our work and for the thoughtful feedback you have provided. Your dedication to ensuring the quality and clarity of the work is truly admirable. Your comments have been invaluable in guiding us towards improving the paper, and We appreciate your time and effort in not just offering suggestions but also providing specific revisions that I can implement. Your insights have helped us identify areas where I can strengthen the arguments and clarify the methodology.

      Comment 1:

      The main weaknesses of this report lies in the communication of the ideas and procedures. Although the language is generally excellent, there are some grammatical lapses that make the text difficult to read. More importantly, the authors are not consistent in their use of some terms; for example, uncertainty and information gain are sometimes conflated in a way that might confuse readers. Furthermore, the descriptions of the modelling and data analysis are incomplete. These shortcomings could be addressed in the following way.

      First, it would be useful to unpack the various interpretations of information and goal-seeking offered in the (active inference) framework examined in this study. For example, it will be good to include the following paragraph:

      "In contrast to behaviourist approaches to planning and decision-making, active inference formulates the requisite cognitive processing in terms of belief updating in which choices are made based upon their expected free energy. Expected free energy can be regarded as a universal objective function, specifying the relative likelihood of alternative choices. In brief, expected free energy can be regarded as the surprise expected following some action, where the expected surprise comes in two flavours. First, the expected surprise is uncertainty, which means that policies with a low expected free energy resolve uncertainty and promote information seeking. However, one can also minimise expected surprise by avoiding surprising, aversive outcomes. This leads to goal-seeking behaviour, where the goals can be regarded as prior preferences or rewarding outcomes.

      Technically, expected free energy can be expressed in terms of risk plus ambiguity - or rearranged to be expressed in terms of expected information gain plus expected value, where value corresponds to (log) prior preferences. We will refer to both decompositions in what follows; noting that both decompositions accommodate information and goal-seeking imperatives. That is, resolving ambiguity and maximising information gain have epistemic value, while minimising risk or maximising expected value have pragmatic or instrumental value. These two kinds of values are sometimes referred to in terms of intrinsic and extrinsic value, respectively [1-4]."

      Response 1: We deeply thank you for your comments and corresponding suggestions about our interpretations of active inference. In response to your identified weaknesses and suggestions, we have added corresponding paragraphs in the Methods section (The free energy principle and active inference, line 95-106):

      “Active inference formulates the necessary cognitive processing as a process of belief updating, where choices depend on agents' expected free energy. Expected free energy serves as a universal objective function, guiding both perception and action. In brief, expected free energy can be seen as the expected surprise following some policies. The expected surprise can be reduced by resolving uncertainty, and one can select policies with lower expected free energy which can encourage information-seeking and resolve uncertainty. Additionally, one can minimize expected surprise by avoiding surprising or aversive outcomes (oudeyer et al., 2007; Schmidhuber et al., 2010). This leads to goal-seeking behavior, where goals can be viewed as prior preferences or rewarding outcomes.

      Technically, expected free energy can also be expressed as expected information gain plus expected value, where the value corresponds to (log) prior preferences. We will refer to both formulations in what follows. Resolving ambiguity, minimizing risk, and maximizing information gain has epistemic value while maximizing expected value have pragmatic or instrumental value. These two types of values can be referred to in terms of intrinsic and extrinsic value, respectively (Barto et al., 2013; Schwartenbeck et al., 2019).”

      Oudeyer, P. Y., & Kaplan, F. (2007). What is intrinsic motivation? A typology of computational approaches. Frontiers in neurorobotics, 1, 108.

      Schmidhuber, J. (2010). Formal theory of creativity, fun, and intrinsic motivation (1990–2010). IEEE transactions on autonomous mental development, 2(3), 230-247.

      Barto, A., Mirolli, M., & Baldassarre, G. (2013). Novelty or surprise?. Frontiers in psychology, 4, 61898.

      Schwartenbeck, P., Passecker, J., Hauser, T. U., FitzGerald, T. H., Kronbichler, M., & Friston, K. J. (2019). Computational mechanisms of curiosity and goal-directed exploration. elife, 8, e41703.

      Comment 2:

      The description of the modelling of choice behaviour needs to be unpacked and motivated more carefully. Perhaps along the following lines:

      "To assess the evidence for active inference over reinforcement learning, we fit active inference and reinforcement learning models to the choice behaviour of each subject. Effectively, this involved optimising the free parameters of active inference and reinforcement learning models to maximise the likelihood of empirical choices. The resulting (marginal) likelihood was then used as the evidence for each model. The free parameters for the active inference model scaled the contribution of the three terms that constitute the expected free energy (in Equation 6). These coefficients can be regarded as precisions that characterise each subjects' prior beliefs about contingencies and rewards. For example, increasing the precision or the epistemic value associated with model parameters means the subject would update her beliefs about reward contingencies more quickly than a subject who has precise prior beliefs about reward distributions. Similarly, subjects with a high precision over prior preferences or extrinsic value can be read as having more precise beliefs that she will be rewarded. The free parameters for the reinforcement learning model included..."

      Response 2: We deeply thank you for your comments and corresponding suggestions about our description of the behavioral modelling. In response to your identified weaknesses and suggestions, we have added corresponding content in the Results section (Behavioral results, line 279-293):

      “To assess the evidence for active inference over reinforcement learning, we fit active inference (Eq.9), model-free reinforcement learning, and model-based reinforcement learning models to the behavioral data of each participant. This involved optimizing the free parameters of active inference and reinforcement learning models. The resulting likelihood was used to calculate the Bayesian Information Criterion (BIC) (Vrieze 2012) as the evidence for each model. The free parameters for the active inference model (AL, AI, EX, prior, and α) scaled the contribution of the three terms that constitute the expected free energy in Eq.9. These coefficients can be regarded as precisions that characterize each participant's prior beliefs about contingencies and rewards. For example, increasing α means participants would update their beliefs about reward contingencies more quickly, increasing AL means participants would like to reduce ambiguity more, and increasing AI means participants would like to learn the hidden state of the environment and avoid risk more. The free parameters for the model-free reinforcement learning model are the learning rate α and the temperature parameter γ and the free parameters for the model-based are the learning rate α, the temperature parameter γ and prior (the details for the model-free reinforcement learning model can be seen in Eq.S1-11 and the details for the model-based reinforcement learning model can be seen Eq.S12-23 in the Supplementary Method). The parameter fitting for these three models was conducted using the `BayesianOptimization' package in Python (Frazire 2018), first randomly sampling 1000 times and then iterating for an additional 1000 times.”

      Vrieze, S. I. (2012). Model selection and psychological theory: a discussion of the differences between the Akaike information criterion (AIC) and the Bayesian information criterion (BIC). Psychological methods, 17(2), 228.

      Frazier, P. I. (2018). A tutorial on Bayesian optimization. arXiv preprint arXiv:1807.02811.

      Comment 3:

      In terms of the time-dependent correlations with expected free energy - and its constituent terms - I think the report would benefit from overviewing these analyses with something like the following:

      "In the final analysis of the neuronal correlates of belief updating - as quantified by the epistemic and intrinsic values of expected free energy - we present a series of analyses in source space. These analyses tested for correlations between constituent terms in expected free energy and neuronal responses in source space. These correlations were over trials (and subjects). Because we were dealing with two-second timeseries, we were able to identify the periods of time during decision-making when the correlates were expressed.

      In these analyses, we focused on the induced power of neuronal activity at each point in time, at each brain source. To illustrate the functional specialisation of these neuronal correlates, we present whole-brain maps of correlation coefficients and pick out the most significant correlation for reporting fluctuations in selected correlations over two-second periods. These analyses are presented in a descriptive fashion to highlight the nature and variety of the neuronal correlates, which we unpack in relation to the existing EEG literature in the discussion. Note that we did not attempt to correct for multiple comparisons; largely, because the correlations observed were sustained over considerable time periods, which would be almost impossible under the null hypothesis of no correlations."

      Response 3: We deeply thank you for your comments and corresponding suggestions about our description of the regression analysis in the source space. In response to your suggestions, we have added corresponding content in the Results section (EEG results at source level, line 331-347):

      “In the final analysis of the neural correlates of the decision-making process, as quantified by the epistemic and intrinsic values of expected free energy, we presented a series of linear regressions in source space. These analyses tested for correlations over trials between constituent terms in expected free energy (the value of avoiding risk, the value of reducing ambiguity, extrinsic value, and expected free energy itself) and neural responses in source space. Additionally, we also investigated the neural correlate of (the degree of) risk, (the degree of) ambiguity, and prediction error. Because we were dealing with a two-second time series, we were able to identify the periods of time during decision-making when the correlates were expressed. The linear regression was run by the "mne.stats.linear regression" function in the MNE package (Activity ~ Regressor + Intercept). Activity is the activity amplitude of the EEG signal in the source space and regressor is one of the regressors that we mentioned (e.g., expected free energy, the value of reducing ambiguity, etc.).

      In these analyses, we focused on the induced power of neural activity at each time point, in the brain source space. To illustrate the functional specialization of these neural correlates, we presented whole-brain maps of correlation coefficients and picked out the brain region with the most significant correlation for reporting fluctuations in selected correlations over two-second periods. These analyses were presented in a descriptive fashion to highlight the nature and variety of the neural correlates, which we unpacked in relation to the existing EEG literature in the discussion. Note that we did not attempt to correct for multiple comparisons; largely, because the correlations observed were sustained over considerable time periods, which would be almost impossible under the null hypothesis of no correlations.”

      Comment 4:

      There was a slight misdirection in the discussion of priors in the active inference framework. The notion that active inference requires a pre-specification of priors is a common misconception. Furthermore, it misses the point that the utility of Bayesian modelling is to identify the priors that each subject brings to the table. This could be easily addressed with something like the following in the discussion:

      "It is a common misconception that Bayesian approaches to choice behaviour (including active inference) are limited by a particular choice of priors. As illustrated in our fitting of choice behaviour above, priors are a strength of Bayesian approaches in the following sense: under the complete class theorem [5, 6], any pair of choice behaviours and reward functions can be described in terms of ideal Bayesian decision-making with particular priors. In other words, there always exists a description of choice behaviour in terms of some priors. This means that one can, in principle, characterise any given behaviour in terms of the priors that explain that behaviour. In our example, these were effectively priors over the precision of various preferences or beliefs about contingencies that underwrite expected free energy."

      Response 4: We deeply thank you for your comments and corresponding suggestions about the prior of Bayesian methods. In response to your suggestions, we have added corresponding content in the Discussion section (The strength of the active inference framework in decision-making, line 447-453):

      “However, it may be the opposite. As illustrated in our fitting results, priors can be a strength of Bayesian approaches. Under the complete class theorem (Wald 1947; Brown 1981), any pair of behavioral data and reward functions can be described in terms of ideal Bayesian decision-making with particular priors. In other words, there always exists a description of behavioral data in terms of some priors. This means that one can, in principle, characterize any given behavioral data in terms of the priors that explain that behavior. In our example, these were effectively priors over the precision of various preferences or beliefs about contingencies that underwrite expected free energy.”

      Wald, A. (1947). An essentially complete class of admissible decision functions. The Annals of Mathematical Statistics, 549-555.

      Brown, L. D. (1981). A complete class theorem for statistical problems with finite sample spaces. The Annals of Statistics, 1289-1300.

      Reviewer #2 (Public Review):

      Summary:

      Zhang and colleagues use a combination of behavioral, neural, and computational analyses to test an active inference model of exploration in a novel reinforcement learning task.

      Strengths:

      The paper addresses an important question (validation of active inference models of exploration). The combination of behavior, neuroimaging, and modeling is potentially powerful for answering this question.

      Response: We want to express our sincere gratitude for your thorough review of our work and for the valuable comments you have provided. Your attention to detail and dedication to improving the quality of the work are truly commendable. Your feedback has been invaluable in guiding us towards revisions that will strengthen the work. We have made targeted modifications based on most of the comments. However, due to factors such as time and energy constraints, we have not added corresponding analyses for several comments.

      Comment 1:

      The paper does not discuss relevant work on contextual bandits by Schulz, Collins, and others. It also does not mention the neuroimaging study of Tomov et al. (2020) using a risky/safe bandit task.

      Response 1:

      We deeply thank you for your suggestions about the relevant work. We now discussion and cite these representative papers in the Introduction section (line 42-55):

      “The decision-making process frequently involves grappling with varying forms of uncertainty, such as ambiguity - the kind of uncertainty that can be reduced through sampling, and risk - the inherent uncertainty (variance) presented by a stable environment. Studies have investigated these different forms of uncertainty in decision-making, focusing on their neural correlates (Daw et al., 2006; Badre et al., 2012; Cavanagh et al., 2012).

      These studies utilized different forms of multi-armed bandit tasks, e.g the restless multi-armed bandit tasks (Daw et al., 2006; Guha et al., 2010), risky/safe bandit tasks (Tomov et al., 2020; Fan et al., 2022; Payzan et al., 2013), contextual multi-armed bandit tasks (Schulz et al., 2015; Schulz et al., 2015; Molinaro et al., 2023). However, these tasks either separate risk from ambiguity in uncertainty, or separate action from state (perception). In our work, we develop a contextual multi-armed bandit task to enable participants to actively reduce ambiguity, avoid risk, and maximize rewards using various policies (see Section 2.2) and Figure 4(a)). Our task makes it possible to study whether the brain represents these different types of uncertainty distinctly (Levy et al., 2010) and whether the brain represents both the value of reducing uncertainty and the degree of uncertainty. The active inference framework presents a theoretical approach to investigate these questions. Within this framework, uncertainties can be reduced to ambiguity and risk. Ambiguity is represented by the uncertainty about model parameters associated with choosing a particular action, while risk is signified by the variance of the environment's hidden states. The value of reducing ambiguity, the value of avoiding risk, and extrinsic value together constitute expected free energy (see Section 2.1).”

      Daw, N. D., O'doherty, J. P., Dayan, P., Seymour, B., & Dolan, R. J. (2006). Cortical substrates for exploratory decisions in humans. Nature, 441(7095), 876-879.

      Badre, D., Doll, B. B., Long, N. M., & Frank, M. J. (2012). Rostrolateral prefrontal cortex and individual differences in uncertainty-driven exploration. Neuron, 73(3), 595-607.

      Cavanagh, J. F., Figueroa, C. M., Cohen, M. X., & Frank, M. J. (2012). Frontal theta reflects uncertainty and unexpectedness during exploration and exploitation. Cerebral cortex, 22(11), 2575-2586.

      Guha, S., Munagala, K., & Shi, P. (2010). Approximation algorithms for restless bandit problems. Journal of the ACM (JACM), 58(1), 1-50.

      Tomov, M. S., Truong, V. Q., Hundia, R. A., & Gershman, S. J. (2020). Dissociable neural correlates of uncertainty underlie different exploration strategies. Nature communications, 11(1), 2371.

      Fan, H., Gershman, S. J., & Phelps, E. A. (2023). Trait somatic anxiety is associated with reduced directed exploration and underestimation of uncertainty. Nature Human Behaviour, 7(1), 102-113.

      Payzan-LeNestour, E., Dunne, S., Bossaerts, P., & O’Doherty, J. P. (2013). The neural representation of unexpected uncertainty during value-based decision making. Neuron, 79(1), 191-201.

      Schulz, E., Konstantinidis, E., & Speekenbrink, M. (2015, April). Exploration-exploitation in a contextual multi-armed bandit task. In International conference on cognitive modeling (pp. 118-123).

      Schulz, E., Konstantinidis, E., & Speekenbrink, M. (2015, November). Learning and decisions in contextual multi-armed bandit tasks. In CogSci.

      Molinaro, G., & Collins, A. G. (2023). Intrinsic rewards explain context-sensitive valuation in reinforcement learning. PLoS Biology, 21(7), e3002201.

      Levy, I., Snell, J., Nelson, A. J., Rustichini, A., & Glimcher, P. W. (2010). Neural representation of subjective value under risk and ambiguity. Journal of neurophysiology, 103(2), 1036-1047.

      Comment 2:

      The statistical reporting is inadequate. In most cases, only p-values are reported, not the relevant statistics, degrees of freedom, etc. It was also not clear if any corrections for multiple comparisons were applied. Many of the EEG results are described as "strong" or "robust" with significance levels of p<0.05; I am skeptical in the absence of more details, particularly given the fact that the corresponding plots do not seem particularly strong to me.

      Response 2: We deeply thank you for your comments about our statistical reporting. We have optimized the fitting model and rerun all the statistical analyses. As can be seen (Figure 6, 7, 8, S3, S4, S5), the new regression results are significantly improved compared to the previous ones. Due to the limitation of space, we place the other relevant statistical results, including t-values, std err, etc., on our GitHub (https://github.com/andlab-um/FreeEnergyEEG). Currently, we have not conducted multiple comparison corrections based on Reviewer 1’s comments (Comments 3) “Note that we did not attempt to correct for multiple comparisons; largely, because the correlations observed were sustained over considerable time periods, which would be almost impossible under the null hypothesis of no correlations”.

      Author response image 1.

      Comment 3:

      The authors compare their active inference model to a "model-free RL" model. This model is not described anywhere, as far as I can tell. Thus, I have no idea how it was fit, how many parameters it has, etc. The active inference model fitting is also not described anywhere. Moreover, you cannot compare models based on log-likelihood, unless you are talking about held-out data. You need to penalize for model complexity. Finally, even if active inference outperforms a model-free RL model (doubtful given the error bars in Fig. 4c), I don't see how this is strong evidence for active inference per se. I would want to see a much more extensive model comparison, including model-based RL algorithms which are not based on active inference, as well as model recovery analyses confirming that the models can actually be distinguished on the basis of the experimental data.

      Response 3: We deeply thank you for your comments about the model comparison details. We previously omitted some information about the comparison model, as classical reinforcement learning is not the focus of our work, so we put the specific details in the supplementary materials. Now we have placed relevant information in the main text (see the part we have highlighted in yellow). We have now added the relevant information regarding the model comparison in the Results section (Behavioral results, line 279-293):

      “To assess the evidence for active inference over reinforcement learning, we fit active inference (Eq.9), model-free reinforcement learning, and model-based reinforcement learning models to the behavioral data of each participant. This involved optimizing the free parameters of active inference and reinforcement learning models. The resulting likelihood was used to calculate the Bayesian Information Criterion (BIC) as the evidence for each model. The free parameters for the active inference model (AL, AI, EX, prior, and α) scaled the contribution of the three terms that constitute the expected free energy in Eq.9. These coefficients can be regarded as precisions that characterize each participant's prior beliefs about contingencies and rewards. For example, increasing α means participants would update their beliefs about reward contingencies more quickly, increasing AL means participants would like to reduce ambiguity more, and increasing AI means participants would like to learn the hidden state of the environment and avoid risk more. The free parameters for the model-free reinforcement learning model are the learning rate α and the temperature parameter γ and the free parameters for the model-based are the learning rate α, the temperature parameter γ and prior (the details for the model-free reinforcement learning model can be found in Eq.S1-11 and the details for the model-based reinforcement learning model can be found in Eq.S12-23 in the Supplementary Method). The parameter fitting for these three models was conducted using the `BayesianOptimization' package in Python, first randomly sampling 1000 times and then iterating for an additional 1000 times.”

      We have now incorporated model-based reinforcement learning into our comparison models and placed the descriptions of both model-free and model-based reinforcement learning algorithms in the supplementary materials. We have also changed the criterion for model comparison to Bayesian Information Criterion. As indicated by the results, the performance of the active inference model significantly outperforms both comparison models.

      Sorry, we didn't do model recovery before, but now we have placed the relevant results in the supplementary materials. From the result figures, we can see that each model fits its own generated simulated data well:

      “To demonstrate how reliable our models are (the active inference model, model-free reinforcement learning model, and model-based reinforcement learning model), we run some simulation experiments for model recovery. We use these three models, with their own fitting parameters, to generate some simulated data. Then we will fit all three sets of data using these three models.

      The model recovery results are shown in Fig.S6. This is the confusion matrix of models: the percentage of all subjects simulated based on a certain model that is fitted best by a certain model. The goodness-of-fit was compared using the Bayesian Information Criterion. We can see that the result of model recovery is very good, and the simulated data generated by a model can be best explained by this model.”

      Author response image 2.

      Comment 4:

      Another aspect of the behavioral modeling that's missing is a direct descriptive comparison between model and human behavior, beyond just plotting log-likelihoods (which are a very impoverished measure of what's going on).

      Response 4: We deeply thank you for your comments about the comparison between the model and human behavior. Due to the slight differences between our simulation experiments and real behavioral experiments (the "you can ask" stage), we cannot directly compare the model and participants' behaviors. However, we can observe that in the main text's simulation experiment (Figure 3), the active inference agent's behavior is highly consistent with humans (Figure 4), exhibiting an effective exploration strategy and a desire to reduce uncertainty. Moreover, we have included two additional simulation experiments in the supplementary materials, which demonstrate that active inference may potentially fit a wide range of participants' behavioral strategies.

      Author response image 3.

      (An active inference agent with AL=AI=EX=0. It can accomplish tasks efficiently like a human being, reducing the uncertainty of the environment and maximizing the reward.)

      Author response image 4.

      (An active inference agent with AL=AI=0, EX=10. It will only pursue immediate rewards (not choosing the "Cue" option due to additional costs), but it can also gradually optimize its strategy due to random effects.)

      Author response image 5.

      (An active inference agent with EX=0, AI=AL=10. It will only pursue environmental information to reduce the uncertainty of the environment. Even in "Context 2" where immediate rewards are scarce, it will continue to explore.) (a) shows the decision-making of active inference agents in the Stay-Cue choice. Blue corresponds to agents choosing the "Cue" option and acquiring "Context 1"; orange corresponds to agents choosing the "Cue" option and acquiring "Context 2"; purple corresponds to agents choosing the "Stay" option and not knowing the information about the hidden state of the environment. The shaded areas below correspond to the probability of the agents making the respective choices. (b) shows the decision-making of active inference agents in the Stay-Cue choice. The shaded areas below correspond to the probability of the agents making the respective choices. (c) shows the rewards obtained by active inference agents. (d) shows the reward prediction errors of active inference agents. (e) shows the reward predictions of active inference agents for the "Risky" path in "Context 1" and "Context 2".

      Comment 5:

      The EEG results are intriguing, but it wasn't clear that these provide strong evidence specifically for the active inference model. No alternative models of the EEG data are evaluated.

      Overall, the central claim in the Discussion ("we demonstrated that the active inference model framework effectively describes real-world decision-making") remains unvalidated in my opinion.

      Response 5: We deeply thank you for your comments. We applied the active inference model to analyze EEG results because it best fit the participants' behavioral data among our models, including the new added results. Further, our EEG results serve only to verify that the active inference model can be used to analyze the neural mechanisms of decision-making in uncertain environments (if possible, we could certainly design a more excellent reinforcement learning model with a similar exploration strategy). We aim to emphasize the consistency between active inference and human decision-making in uncertain environments, as we have discussed in the article. Active inference emphasizes both perception and action, which is also what we wish to highlight: during the decision-making process, participants not only passively receive information, but also actively adopt different strategies to reduce uncertainty and maximize rewards.

      Reviewer #3 (Public Review):

      Summary:

      This paper aims to investigate how the human brain represents different forms of value and uncertainty that participate in active inference within a free-energy framework, in a two-stage decision task involving contextual information sampling, and choices between safe and risky rewards, which promotes a shift from exploration to exploitation. They examine neural correlates by recording EEG and comparing activity in the first vs second half of trials and between trials in which subjects did and did not sample contextual information, and perform a regression with free-energy-related regressors against data "mapped to source space." Their results show effects in various regions, which they take to indicate that the brain does perform this task through the theorised active inference scheme.

      Strengths:

      This is an interesting two-stage paradigm that incorporates several interesting processes of learning, exploration/exploitation, and information sampling. Although scalp/brain regions showing sensitivity to the active-inference-related quantities do not necessarily suggest what role they play, it can be illuminating and useful to search for such effects as candidates for further investigation. The aims are ambitious, and methodologically it is impressive to include extensive free-energy theory, behavioural modelling, and EEG source-level analysis in one paper.

      Response: We would like to express our heartfelt thanks to you for carefully reviewing our work and offering insightful feedback. Your attention to detail and commitment to enhancing the overall quality of our work are deeply admirable. Your input has been extremely helpful in guiding us through the necessary revisions to enhance the work. We have implemented focused changes based on a majority of your comments. Nevertheless, owing to limitations such as time and resources, we have not included corresponding analyses for a few comments.

      Comment 1:

      Though I could surmise the above general aims, I could not follow the important details of what quantities were being distinguished and sought in the EEG and why. Some of this is down to theoretical complexity - the dizzying array of constructs and terms with complex interrelationships, which may simply be part and parcel of free-energy-based theories of active inference - but much of it is down to missing or ambiguous details.

      Response 1: We deeply thank you for your comments about our work’s readability. We have significantly revised the descriptions of active inference, models, research questions, etc. Focusing on active inference and the free energy principle, we have added relevant basic descriptions and unified the terminology. We have added information related to model comparison in the main text and supplementary materials. We presented our regression results in clearer language. Our research focused on the brain's representation of decision-making in uncertain environments, including expected free energy, the value of reducing ambiguity, the value of avoiding risk, extrinsic value, ambiguity, and risk.

      Comment 2:

      In general, an insufficient effort has been made to make the paper accessible to readers not steeped in the free energy principle and active inference. There are critical inconsistencies in key terminology; for example, the introduction states that aim 1 is to distinguish the EEG correlates of three different types of uncertainty: ambiguity, risk, and unexpected uncertainty. But the abstract instead highlights distinctions in EEG correlates between "uncertainty... and... risk" and between "expected free energy .. and ... uncertainty." There are also inconsistencies in mathematical labelling (e.g. in one place 'p(s|o)' and 'q(s)' swap their meanings from one sentence to the very next).

      Response 2: We deeply thank you for your comments about the problem of inconsistent terminology. First, we have unified the symbols and letters (P, Q, s, o, etc.) that appeared in the article and described their respective meanings more clearly. We have also revised the relevant expressions of "uncertainty" throughout the text. In our work, uncertainty refers to ambiguity and risk. Ambiguity can be reduced through continuous sampling and is referred to as uncertainty about model parameters in our work. Risk, on the other hand, is the inherent variance of the environment and cannot be reduced through sampling, which is referred to as uncertainty about hidden states in our work. In the analysis of the results, we focused on how the brain encodes the value of reducing ambiguity (Figure 8), the value of avoiding risk (Figure 6), and (the degree of) ambiguity (Figure S5) during action selection. We also analyzed how the brain encodes reducing ambiguity and avoiding risk during belief update (Figure 7).

      Comment 3:

      Some basic but important task information is missing, and makes a huge difference to how decision quantities can be decoded from EEG. For example:

      - How do the subjects press the left/right buttons - with different hands or different fingers on the same hand?

      Response 3: We deeply thank you for your comments about the missing task information. We have added the relevant content in the Methods section (Contextual two-armed bandit task and Data collection, line 251-253):

      “Each stage was separated by a jitter ranging from 0.6 to 1.0 seconds. The entire experiment consists of a single block with a total of 120 trials. The participants are required to use any two fingers of one hand to press the buttons (left arrow and right arrow on the keyboard).”

      Comment 4:

      - Was the presentation of the Stay/cue and safe/risky options on the left/right sides counterbalanced? If not, decisions can be formed well in advance especially once a policy is in place.

      Response 4: The presentation of the Stay/cue and safe/risky options on the left/right sides was not counterbalanced. It is true that participants may have made decisions ahead of time. However, to better study the state of participants during decision-making, our choice stages consist of two parts. In the first two seconds, we ask participants to consider which option they would choose, and after these two seconds, participants are allowed to make their choice (by pressing the button).

      We also updated the figure of the experiment procedure as below (We circled the time that the participants spent on making decisions).

      Author response image 6.

      Comment 5:

      - What were the actual reward distributions ("magnitude X with probability p, magnitude y with probability 1-p") in the risky option?

      Response 5: We deeply thank you for your comments about the missing task information. We have placed the relevant content in the Methods section (Contextual two-armed bandit task and Data collection, line 188-191):

      “The actual reward distribution of the risky path in "Context 1" was [+12 (55%), +9 (25%), +6 (10%), +3 (5%), +0 (5%)] and the actual reward distribution of the risky path in "Context 2" was [+12 (5%), +9 (5%), +6 (10%), +3 (25%), +0 (55%)].”

      Comment 6:

      The EEG analysis is not sufficiently detailed and motivated.

      For example,

      - why the high lower-filter cutoff of 1 Hz, and shouldn't it be acknowledged that this removes from the EEG any sustained, iteratively updated representation that evolves with learning across trials?

      Response 6: We deeply thank you for your comments about our EEG analysis. The 1Hz high-pass filter may indeed filter out some useful information. We chose a 1Hz high-pass filter to filter out most of the noise and prevent the noise from affecting our results analysis. Additionally, there are also many decision-related works that have applied 1Hz high-pass filtering in EEG data preprocessing (Yau et al., 2021; Cortes et al., 2021; Wischnewski et al., 2022; Schutte et al., 2017; Mennella et al., 2020; Giustiniani et al., 2020).

      Yau, Y., Hinault, T., Taylor, M., Cisek, P., Fellows, L. K., & Dagher, A. (2021). Evidence and urgency related EEG signals during dynamic decision-making in humans. Journal of Neuroscience, 41(26), 5711-5722.

      Cortes, P. M., García-Hernández, J. P., Iribe-Burgos, F. A., Hernández-González, M., Sotelo-Tapia, C., & Guevara, M. A. (2021). Temporal division of the decision-making process: An EEG study. Brain Research, 1769, 147592.

      Wischnewski, M., & Compen, B. (2022). Effects of theta transcranial alternating current stimulation (tACS) on exploration and exploitation during uncertain decision-making. Behavioural Brain Research, 426, 113840.

      Schutte, I., Kenemans, J. L., & Schutter, D. J. (2017). Resting-state theta/beta EEG ratio is associated with reward-and punishment-related reversal learning. Cognitive, Affective, & Behavioral Neuroscience, 17, 754-763.

      Mennella, R., Vilarem, E., & Grèzes, J. (2020). Rapid approach-avoidance responses to emotional displays reflect value-based decisions: Neural evidence from an EEG study. NeuroImage, 222, 117253.

      Giustiniani, J., Nicolier, M., Teti Mayer, J., Chabin, T., Masse, C., Galmès, N., ... & Gabriel, D. (2020). Behavioral and neural arguments of motivational influence on decision making during uncertainty. Frontiers in Neuroscience, 14, 583.

      Comment 7:

      - Since the EEG analysis was done using an array of free-energy-related variables in a regression, was multicollinearity checked between these variables?

      Response 7: We deeply thank you for your comments about our regression. Indeed, we didn't specify our regression formula in the main text. We conducted regression on one variable each time, so there was no need for a multicollinearity check. We have now added the relevant content in the Results section (“EEG results at source level” section, line 337-340):

      “The linear regression was run by the "mne.stats.linear regression" function in the MNE package (Activity ~ Regressor + Intercept). Activity is the activity amplitude of the EEG signal in the source space and regressor is one of the regressors that we mentioned (e.g., expected free energy, the value of reducing ambiguity, etc.).”

      Comment 8:

      - In the initial comparison of the first/second half, why just 5 clusters of electrodes, and why these particular clusters?

      Response 8: We deeply thank you for your comments about our sensor-level analysis. These five clusters are relatively common scalp EEG regions to analyze (left frontal, right frontal, central, left parietal, and right parietal), and we referred previous work analyzed these five clusters of electrodes (Laufs et al., 2006; Ray et al., 1985; Cole et al., 1985). In addition, our work pays more attention to the analysis in source space, exploring the corresponding functions of specific brain regions based on active inference models.

      Laufs, H., Holt, J. L., Elfont, R., Krams, M., Paul, J. S., Krakow, K., & Kleinschmidt, A. (2006). Where the BOLD signal goes when alpha EEG leaves. Neuroimage, 31(4), 1408-1418.

      Ray, W. J., & Cole, H. W. (1985). EEG activity during cognitive processing: influence of attentional factors. International Journal of Psychophysiology, 3(1), 43-48.

      Cole, H. W., & Ray, W. J. (1985). EEG correlates of emotional tasks related to attentional demands. International Journal of Psychophysiology, 3(1), 33-41.

      Comment 9:

      How many different variables are systematically different in the first vs second half, and how do you rule out less interesting time-on-task effects such as engagement or alertness? In what time windows are these amplitudes being measured?

      Response 9 (and the Response for Weaknesses 11): There were no systematic differences between the first half and the second half of the trials, with the only difference being the participants' experience. In the second half, participants had a better understanding of the reward distribution of the task (less ambiguity). The simulation results can well describe these.

      Author response image 7.

      As shown in Figure (a), agents can only learn about the hidden state of the environment ("Context 1" (green) or "Context 2" (orange)) by choosing the "Cue" option. If agents choose the "Stay" option, they will not be able to know the hidden state of the environment (purple). The risk of agents is only related to wh

      ether they choose the "Cue" option, not the number of rounds. Figure (b) shows the Safe-Risky choices of agents, and Figure (e) is the reward prediction of agents for the "Risky" path in "Context 1" and "Context 2". We can see that agents update the expected reward and reduce ambiguity by sampling the "Risky" path. The ambiguity of agents is not related to the "Cue" option, but to the number of times they sample the "Risky" path (rounds).

      In our choosing stages, participants were required to think about their choices for the first two seconds (during which they could not press buttons). Then, they were asked to make their choices (press buttons) within the next two seconds. This setup effectively kept participants' attention focused on the task. And the two second during the “Second choice” stage when participants decide which option to choose (they cannot press buttons) are measured for the analysis of the sensor-level results.

      Comment 10:

      In the comparison of asked and not-asked trials, what trial stage and time window is being measured?

      Response 10: We have added relevant descriptions in the main text. The two second during the “Second choice” stage when participants decide which option to choose (they cannot press buttons) are measured for the analysis of the sensor-level results.

      Author response image 8.

      Comment 11:

      Again, how many different variables, of the many estimated per trial in the active inference model, are different in the asked and not-asked trials, and how can you know which of these differences is the one reflected in the EEG effects?

      Response 11: The difference between asked trials and not-asked trials lies only in whether participants know the specific context of the risky path (the level of risk for the participants). A simple comparison indeed cannot tell us which of these differences is reflected in the EEG effects. Therefore, we subsequently conducted model-based regression analysis in the source space.

      Comment 12:

      The authors choose to interpret that on not-asked trials the subjects are more uncertain because the cue doesn't give them the context, but you could equally argue that they don't ask because they are more certain of the possible hidden states.

      Response 12: Our task design involves randomly varying the context of the risky path. Only by choosing to inquire can participants learn about the context. Participants can only become increasingly certain about the reward distribution of different contexts of the risky path, but cannot determine which specific context it is. Here are the instructions for the task that we will tell the participants (line 226-231).

      "You are on a quest for apples in a forest, beginning with 5 apples. You encounter two paths: 1) The left path offers a fixed yield of 6 apples per excursion. 2) The right path offers a probabilistic reward of 0/3/6/9/12 apples, and it has two distinct contexts, labeled "Context 1" and "Context 2," each with a different reward distribution. Note that the context associated with the right path will randomly change in each trial. Before selecting a path, a ranger will provide information about the context of the right path ("Context 1" or "Context 2") in exchange for an apple. The more apples you collect, the greater your monetary reward will be."

      Comment 13:

      - The EEG regressors are not fully explained. For example, an "active learning" regressor is listed as one of the 4 at the beginning of section 3.3, but it is the first mention of this term in the paper and the term does not arise once in the methods.

      Response 13: We have accordingly revised the relevant content in the main text (as in Eq.8). Our regressors now include expected free energy, the value of reducing ambiguity, the value of avoiding risk, extrinsic value, prediction error, (the degree of) ambiguity, reducing ambiguity, and avoiding risk.

      Comment 14:

      - In general, it is not clear how one can know that the EEG results reflect that the brain is purposefully encoding these very parameters while implementing this very mechanism, and not other, possibly simpler, factors that correlate with them since there is no engagement with such potential confounds or alternative models. For example, a model-free reinforcement learning model is fit to behaviour for comparison. Why not the EEG?

      Response 14: We deeply thank you for your comments. Due to factors such as time and effort, and because the active inference model best fits the behavioral data of the participants, we did not use other models to analyze the EEG data. At both the sensor and source level, we observed the EEG signal and brain regions that can encode different levels of uncertainties (risk and ambiguity). The brain's uncertainty driven exploration mechanism cannot be explained solely by a simple model-free reinforcement learning approach.

      Recommendations for the authors:

      Response: We have made point-to-point revisions according to the reviewer's recommendations, and as these revisions are relatively minor, we have only responded to the longer recommendations here.

      Reviewer #1 (Recommendations For The Authors)

      I enjoyed reading this sophisticated study of decision-making. I thought your implementation of active inference and the subsequent fitting to choice behaviour - and study of the neuronal (EEG) correlates - was impressive. As noted in my comments on strengths and weaknesses, some parts of your manuscript with difficult to read because of slight collapses in grammar and an inconsistent use of terms when referring to the mathematical quantities. In addition to the paragraphs I have suggested, I would recommend the following minor revisions to your text. In addition, you will have to fill in some of the details that were missing from the current version of the manuscript. For example:

      Recommendation 1:

      Which RL model did you use to fit the behavioural data? What were its free parameters?

      Response 1: We have now added information related to the comparison models in the behavioral results and supplementary materials. We applied both simple model-free reinforcement learning and model-based reinforcement learning. The free parameters for the model-free reinforcement learning model are the learning rate α and the temperature parameter γ, while the free parameters for the model-based approach are the learning rate α, the temperature parameter γ, and the prior.

      Recommendation 2:

      When you talk about neuronal activity in the final analyses (of time-dependent correlations) what was used to measure the neuronal activity? Was this global power over frequencies? Was it at a particular frequency band? Was it the maximum amplitude within some small window et cetera? In other words, you need to provide the details of your analysis that would enable somebody to reproduce your study at a certain level of detail.

      Response 2: In the final analyses, we used the activity amplitude at each point in the source space for our analysis. Previously, we had planned to make our data and models available on GitHub to facilitate easier replication of our work.

      Reviewer #3 (Recommendations For The Authors)

      Recommendation 1:

      It might help to explain the complex concepts up front, to use the concrete example of the task itself - presumably, it was designed so that the crucial elements of the active inference framework come to the fore. One could use hypothetical choice patterns in this task to exemplify different factors such as expected free energy and unexpected uncertainty at work. It would also be illuminating to explain why behaviour on this task is fit better by the active inference model than a model-free reinforcement learning model.

      Response 1: Thank you for your suggestions. We have given clearer explanations to the three terms in the active inference formula: the value of reducing ambiguity, the value of avoiding risk, and the extrinsic value (Eq.8), which makes it easier for readers to understand active inference.

      In addition, we can simply view active inference as a computational model similar to model-based reinforcement learning, where the expected free energy represents a subjective value, without needing to understand its underlying computational principles or neurobiological background. In our discussion, we have argued why the active inference model fits the participants' behavior better than our reinforcement learning model, as the active inference model has an inherent exploration mechanism that is consistent with humans, who instinctively want to reduce environmental uncertainty (line 435-442).

      “Active inference offers a superior exploration mechanism compared with basic model-free reinforcement learning  (Figure 4 (c)). Since traditional reinforcement learning models determine their policies solely on the state, this setting leads to difficulty in extracting temporal information (Laskin et al., 2020) and increases the likelihood of entrapment within local minima. In contrast, the policies in active inference are determined by both time and state. This dependence on time (Wang et al., 2016) enables policies to adapt efficiently, such as emphasizing exploration in the initial stages and exploitation later on. Moreover, this mechanism prompts more exploratory behavior in instances of state ambiguity. A further advantage of active inference lies in its adaptability to different task environments (Friston et al., 2017). It can configure different generative models to address distinct tasks, and compute varied forms of free energy and expected free energy.”

      Laskin, M., Lee, K., Stooke, A., Pinto, L., Abbeel, P., & Srinivas, A. (2020). Reinforcement learning with augmented data. Advances in neural information processing systems, 33, 19884-19895.

      Wang, J. X., Kurth-Nelson, Z., Tirumala, D., Soyer, H., Leibo, J. Z., Munos, R., ... & Botvinick, M. (2016). Learning to reinforcement learn. arXiv preprint arXiv:1611.05763.

      Friston, K., FitzGerald, T., Rigoli, F., Schwartenbeck, P., & Pezzulo, G. (2017). Active inference: a process theory. Neural computation, 29(1), 1-49.

      Recommendation 2:

      Figure 1A provides a key example of the lack of effort to help the reader understand. It suggests the possibility of a concrete example but falls short of providing one. From the caption and text, applied to the figure, I gather that by choosing either to run or to raise one's arms, one can control whether it is daytime or nighttime. This is clearly wrong but it is what I am led to think by the paper.

      Response 2: Thank you for your suggestion, which we had not considered before. In this figure, we aim to illustrate that "the agent receives observations and optimizes his cognitive model by minimizing variational free energy → the agent makes the optimal action by minimizing expected free energy → the action changes the environment → the environment generates new observations for the agent." We have now modified the image to be simpler to prevent any possible confusion for readers. Correspondingly, we removed the figure of a person raising their hand and the shadowed house in Figure a.

      Author response image 9.

      Recommendation 3:

      I recommend an overhaul in the labelling and methodological explanations for consistency and full reporting. For example, line 73 says sensory input is 's' and the cognitive model is 'q(s),' and the cause of the sensory input is 'p(s|o)' but on the very next line, the cognitive model is 'p(s|o)' and the causes of sensory input are 'q(s).' How this sensory input s relates to 'observations' or 'o' is unclear, and meanwhile, capital S is the set of environmental states. P seems to refer to the generative distribution, but it also means probability.

      Response 3: Thank you for your advice. Now we have revised the corresponding labeling and methodological explanations in our work to make them consistent. However, we are not sure how to make a good modification to P here. In many works, P can refer to a certain probability distribution or some specific probabilities.

      Recommendation 4:

      Even the conception of a "policy" is unclear (Figure 2B). They list 4 possible policies, which are simply the 4 possible sequences of steps, stay-safe, cue-risky, etc, but with no contingencies in them. Surely a complete policy that lists 'cue' as the first step would entail a specification of how they would choose the safe or risky option BASED on the information in that cue

      Response 4: Thank you for your suggestion. In active inference, a policy actually corresponds to a sequence of actions. The policy of "first choosing 'Cue' and then making the next decision based on specific information" differs from the meaning of policy in active inference.

      Recommendation 5:

      I assume that the heavy high pass filtering of the EEG (1 Hz) is to avoid having to baseline-correct the epochs (of which there is no mention), but the authors should directly acknowledge that this eradicates any component of decision formation that may evolve in any way gradually within or across the stages of the trial. To take an extreme example, as Figure 3E shows, the expected rewards for the risky path evolve slowly over the course of 60 trials. The filter would eliminate this.

      Response 5: Thank you for your suggestion. The heavy high pass filtering of the EEG (1 Hz) is to minimize the noise in the EEG data as much as possible.

      Recommendation 6:

      There is no mention of the regression itself in the Methods section - the section is incomplete.

      Response 6: Thank you for your suggestion. We have now added the relevant content in the Results section (EEG results at source level, line 337-340):

      “The linear regression was run by the "mne.stats.linear regression" function in the MNE package (Activity ∼ Regressor + Intercept, Activity is the activity amplitude of the EEG signal in the source space and regressor is one of the regressors that we mentioned).”

      Recommendation 7:

      On Lines 260-270 the same results are given twice.

      Response 7: Thank you for your suggestion. We have now deleted redundant content.

      Recommendation 8:

      Frequency bands are displayed in Figure 5 but there is no mention of those in the Methods. In Figure 5b Theta in the 2nd half is compared to Delta in the 1st half- is this an error?

      Response 8: Thank you for your suggestion. It indeed was an error (they should all be Theta) and now we have corrected it.

      Author response image 10.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer 1

      Major points:

      R1C1: I appreciate that the data are aligned, in some points, with related studies of this niche. However, it would help the reader to have this alignment explored more extensively in the Discussion as well.

      Answer: We acknowledge that the discussion would benefit from additional comparisons to the available datasets. We thus add the following comment after the first paragraph of the discussion: “Previous studies of the different sub-populations of SVZ progenitors were carried out using transcriptomic approaches based on the expression of various more or less specific markers. These approaches have made it possible to identify quiescent and activated neural stem cells as well as mature neuroblasts, but have been faced with the strong influence of the cell cycle on cell clustering. Indeed, neural progenitors in these studies cycling have been gathered in either “mitotic” clusters (Llorens et al. 2015, Zywitza et al. 2018, Cebrian et al. 2021) or “neural progenitor cells” clusters (Dulken et al. 2017) that had no clear biological significance and hindering identification of subtypes of SVZ cycling progenitors. Our study, combining, for the first time, characterization of Facs-isolated cells and an irradiation-based model of sequential regeneration, allowed to clearly distinguish the molecular profiles of TAP and iNB among cycling progenitors reflecting differences in their in vitro and in vivo respective potentials”.

      R1C2: The data on multilineage differentiation, both in culture and upon engraftment, would be greatly strengthened by quantification. What is the relative yield of TUJ1/DCX-positive cells versus the other marker combinations? Specifically regarding the multilineage differentiation in vitro - because different media conditions are used to generate each lineage, it may be difficult to determine relative yield. Could a differentiation system that allows production of all 3 lineages be used instead?

      If the fraction of non-DCX/TUJ1-labeled progeny is low, particularly in vivo, this might suggest that while multilineage differentiation is possible, it is a much less likely cellular state outcome than production of mature neuroblasts. Some suggested references with examples of the culture conditions, experimental conditions, and discussions highlighted in the public review: Culture conditions that allow simultaneous trilineage differentiation. PMID: 17615304 Influence of culture conditions on potency: similar to issues covered in PMID: 21549325.

      Answer: We agree with the reviewer that quantification of a multilineage differentiation in vitro would improve the characterization of the relative potencies of the different SVZ progenitor.

      According to PMID: 17615304 and PMID: 21549325, and in agreement with our own experience, the only culture condition that allows neurosphere-derived neural progenitors to differentiate in vitro into the three lineages is the removal of mitogens from the culture medium. However, this does not work on freshly isolated SVZ cells, which remain in an undifferentiated state in this condition.

      This is why we chose to use specific differentiation media for each of the 3 lineages as in Figure 1C. It is also for this reason that we performed as many experiments as possible in vivo rather than in vitro as in Figure S2. In the new version, we have added a quantitative analysis of stainings by antibodies against GFAP, CNPase or DCX of GFP-positive cells persisting at IS, where high number of grafted cells were found in Figure S2B. This was performed by using the NIS software measuring eGFP-, GFAP-, CNPase- and DCX-positive areas. The intersection between each marker and eGFP areas was then determined as a percentage of staining (Figure S2C). The results showed that approximately one third of GFP+ cells expressed GFAP or DCX. The quantitative analysis of CNPase expression was complicated by CNPase-positive host cells, but the stronger CNPase staining in eGFP-positive areas clearly revealed the expression of CNPase by a significant proportion of eGFP-positive cells.

      R1C3: Additionally, for claims similar to what is currently made in the text, it would be extremely valuable to confirm the purity of the sort for each population - for example by fixing and staining the sorted fraction with additional antibodies that confirm cell identity.

      Answer: We have previously shown in Daynac et al. 2013 that s-iNB expressed the neuroblast markers CD24 and DCX, but also markers of neural progenitors such as Mash1, a basic helix-loop-helix transcription factor. As suggested by the reviewer, we have further investigated the expression of other markers of neural progenitors by sorted cells. The results showed that the proportion of DLX2+ cells a marker of proliferating progenitors (Doetsch et al. 2002) was very high in aNSC/TAP (98%) and progressively decreased in iNB (82%) and mNB (25%). Similarly, the expression of the transcription factor SOX2 that plays an essential role in the maintenance of neural progenitors (PMID: 25126380) accounted for 78% of aNSC/TAP, 70% of iNB and 17% of mNB.

      Altogether, these new data confirmed the identity of the different cell populations and particularly that of iNB. They are commented at the beginning of the Results and shown in Figure S1.

      R1C4: Line 125: GFAP alone doesn't necessarily indicate a "conversion to NSCs" - this conclusion could be greatly strengthened by inclusion of more markers, particularly at the protein level, or cyto-architectural studies.

      Answer: We agree with the reviewer that GFAP expression alone is not sufficient to evidence the presence of NSC in the SVZ. We have thus modified the text accordingly: “Importantly, eGFP+ cells were present in the SVZ of all the animals transplanted with eGFP+s-iNB and eGFP+s-NSC/TAP (Fig. 1Db, Fig. 1Dc), some of them expressing GFAP indicating the generation of astrocytes, and therefore possibly NSC”.

      R1C5: Could these cellular states be reflective of preferential translation of DCX? It would be very helpful to see the flow cytometry sort data for iNBs / mNBs used in Figure 6, particularly if these cells were also fixed and stained directly for DCX protein.

      Answer: As suggested by the reviewer, freshly FAC-sorted iNB and mNB were fixed and labelled with an anti-DCX monoclonal antibody after permeabilization. As shown in the figure below, we found a higher level of DCX expression in mNB than in iNB. Therefore, this result tends to indicate that the proliferation capacity is somehow related to the level of DCX expression. However, because of the relatively low importance of this result, we decided not to include them in the manuscript.

      Author response image 1.

      Modal histogram representation of DCX expression level in unstained, iNB and mNB cells determined by flow cytometry (FlowJo).

      <R1C6: Figure S8 is all zeroes, showing the GFP+Dcxhigh NBs do not retain proliferative capacity. But we don't get a direct experimental comparison to EGFPnegative/lowDcxlow iNB engraftment, which would strengthen the conclusions of the paper.

      Answer: Unfortunately, there is no method available to analyse the eGFPnegative/lowDcxlow iNB engraftment: by definition, these cells do not express eGFP and the use of a tracker is not appropriate for long periods of time — and thus a high number of cell divisions — after engraftment. However, to us, this control is not needed to conclude that GFP+Dcxhigh iNB have no (or at least a lower) stem cell potential in vivo considering that we have shown in Figure 1 and Table 1 that the whole iNB population is able to generate the different types of neural cells.

      R1C7: Transplant data in Table 1 - a relatively small proportion of transplant derived cells are in OB, etc. Given that A cells are thought to cycle at least once in vivo, is this expected?

      Answer: The reviewer is right considering that a relatively small proportion of transplant derived cells were found in the OB. However, we should consider that we used immunocompetent mice as receivers, which could have significantly reduced the engraftment efficiency, and the migration of engrafted cells outside the injection site.

      R1C8: A caveat is that there is not much functional testing of the proposed model, especially for the interconversion of iNB states suggested by the diagram in Figure 7. The text is relatively restrained in proposing this model, so it is reasonable to keep - but perhaps should be noted that this part of the model will need additional testing.

      Answer: Data presented in Figure 6 clearly suggest that Dcxhigh iNB have similar in vitro potential than Dcxlow iNB, whereas they don’t have such potential in vivo (Figure S10). This suggests that, providing they are in appropriate conditions, Dcxhigh iNB could reacquire stem/progenitor properties. However, we agree that this hypothesis requires further investigation. Therefore, as suggested by the reviewer, we have added in the Figure 7 legend: “Possible interconversion of iNB states would require further experimental confirmation.”

      Additional minor points:

      R1C9: Introduction: the SVZ is described as "the lateral wall" - however, several works in the mouse have also examined the medial wall and callosal roof, as cited later in the intro. Suggest rephrasing the second sentence (line 48) and later sentence (line 66) to clarify that "the SVZ" encompasses all of these subregions, they are not necessarily separate niches. Answer: As indicated by the reviewer, the SVZ encompasses distinct subdomains, with NSCs having a regional identity based on their location in the lateral or septal wall of the ventricle and generating different types of neuronal and glial progeny (PMID:34259628.). To address the reviewer concern about possible confusion and clearly indicate that SVZ encompass several subdomains, we have modified the sentence line 66 as follows: “Since then, the single cell RNA-sequencing has revolutionized the field and has made it possible to precisely elucidate the transcriptome of SVZ cells present in the LW and in the septal wall which also harbors NSC niches”.

      However, we did not modify the line 48, since in this sentence we just indicate that the largest neurogenic niche in the adult brain reside in the LW of the SVZ.

      R1C10: Line 77: "exposure" not "exposition"

      Answer: The error has been corrected in the revised manuscript.

      R1C11: As noted in the Public Review - the use of the term "D1/D2" cells seems likely to confuse readers who are also versed in dentate gyrus neurogenesis. Recommend removing this term from the manuscript.

      Answer: We agree that the D1/D2 terminology could bring confusion, D cells referring to Tanycytes in the hypothalamus. We now refer to iNB1 for DcxLow iNB and iNB2 for DcxHigh iNB in the revised manuscript.

      Reviewer 2

      Major comments:

      Lack of rigor

      R2C1: There is a lack of appropriate normalization controls for the microarray data. As there is a decreased level of transcription in quiescent NSCs, there needs to be a cell number control (spike-ins based on cell numbers). Without this normalization, the readout can be greatly skewed.

      Answer: We agree that qNSC are marked by a decreased level of transcription due to quiescence. To overcome this problem in the Clariom assays, we thus chose to calibrate each population, with a fixed amount of cRNA and cDNA using Hela cells as internal control. We totally agree that this method is not optimal but it appears to be efficient in the end. Indeed, it should be noticed that it has been adopted, thus with the same rigor, in other microarray studies published in the field (PMID: 24811379) and also on skeletal muscle cells (PMID: 29273087). Moreover, interestingly the transcriptomic signature of qNSC matches perfectly with those from other studies and particularly to those of related clusters in single cell experiments (including ours, Figure S5). This is probably linked to the fact that more importantly that the number of cells, the main characteristic of these cells is the lack of expression of genes involved in cell proliferation and metabolism. Whatever so, these data confirming previously published are not the main information of our manuscript, which is mainly dedicated to the characterization of proliferating cells, which is not impaired by our choices of normalization.

      R2C2: The absolute segregation of clusters in the single-cell analysis is currently entirely in agreement with the cell cycle stage. This suggests that in the author's analysis, the clustering in 3F is entirely shaped by the cell cycle, making that the defining characteristic of the author's definitions for their cell types. Has an analysis been done that regresses out cell cycle-associated genes to see if there are clusters for different cell states/types that are identified in the absence of cell cycle stage being the defining factor? (Barron and Li, 2016). For example, just as you would see a difference in cluster if you are a quiescent or activated NSC as compared to a neuroblast for example, even without the contribution of cell cycle. These are different cell types.

      Answer: We agree that cell cycle regression would theoretically allow for further discrimination between cycling cells along successive neurogenic stages. We have already performed regression using several methods, including regressing using S- and G2/M-score regression as indicated in the Seurat workflow, removing cell cycle-related PCs from UMAP calculation as used in the Cebrian-Sylla study, and using alternative gene sets such as the ones provided by the tricycle method (PMID: 35101061). These regression methods have all been used on our datasets, the original Cebrian-Sylla datasets and a combination of our datasets with the Cebrian-Sylla original datasets to increase cell number and clustering resolution. However, none of these methods modified the clustering of cycling cells.

      In fact, the strong influence of the cell cycle over clustering highlights the relevance of our depletion/replenishment approaches to decipher the molecular changes masked by the cell cycle, as discussed below.

      R2C3: The use of the DCX-CreERT2 line is a lineage tracing line. Once DCX is expressed, Cre recombines the DNA to allow for fluorescence. It is binary, on or off associated with DCX expression. And once on, it is always on, whether the cell is currently expressing DCX or not. As the authors had previously described a DCXlow condition, the eGFP- cells would not reflect DCXlow, but no DCX at all. And the eGFP+ cells may not be currently expressing DCX anymore. The authors should have used a system where the DCX promoter itself drives fluorescence.

      Answer: We took advantage of the DCX-CreERT2 line to demonstrate that some neural cells that have recently acquired DCX expression (i.e. eGFP+ iNB) could keep (or recover) the potential of neural progenitors in vitro. Of course, some of these GFP+ cells could have stopped to express DCX. This is probably the case when they differentiate into astrocytes and oligodendrocytes in vitro as shown in Figure 6.

      Whatever so, the use of the Dcx promoter as a direct driver of eGFP fluorescence would have totally impeded our capacity to demonstrate such changes in cell fate in vivo because of the impossibility to track oligodendrocytes or astrocytes derived from iNB because of the loss of Dcx expression.

      R2C4: The lack of analysis of images (differentiation, for example) limits the conclusions of the in-vitro data, and the images with unclear staining, limit the conclusions of the in-vivo experiments.

      Answer: This comment is similar to that of R1C2. We have now added a quantification in Figure S2.

      R2C5: The cited difference in splicing differences in cell types was interesting (though did not show up in the transcriptome enrichment analyses Fig S2) and would be something to further pursue, however, this was a very limited analysis. There was no further study of these splicing mediators beyond single-cell data.

      Answer: We now show enrichments of GO terms corresponding to mRNA splicing isoforms in the different types of sorted SVZ cells (Figure S4). This analysis clearly revealed that spliced genes in SVZ cells are mainly involved in neuron development and neurogenesis. Interestingly this also showed that qNSC logically differed from the other cell types by splicing concerning genes involved in mitosis and cell cycle, consistently with their quiescent state. More importantly, GO annotations of differentially spliced isoforms further confirmed that s-TAP and s-iNB have distinct features. We agree with the reviewer that further analysis of splicing mediators would be very important for understanding molecular changes involved in neurogenesis. However, we think that it is largely beyond the scope of this study.

      R2C6: Fig 1C - Show values, not just pictures. You may need to shift your current differentiation paradigm to do so by removing growth factors instead of unique differentiation conditions.

      Answer: See the answer to R1C2.

      R2C7: Fig S1A - Stainings for GFAP and DCX are not clear. It is very hard to distinguish which cells are associated with these signals.

      Answer: This figure (now Figure S2A) shows an eGFP+iNB cell (white arrow) that has reached the rostral migratory stream and expressed DCX (inset a3), but not GFAP (inset a2). This is now indicated in the figure legend. We have also moved the arrow for more clarity.

      R2C8: Fig S1B2 - There is red staining everywhere, so it is very hard to see a specific CNPase signal.

      Answer: We have added a new figure (Fig S2B) distinguishing eGFP+CNPase+ cells (yellow arrows) from eGFP+CNPase- cells (white arrow).

      R2C9: Line 174 - It's the mRNA that you are detecting is being downregulated - be more specific as you are not showing protein downregulation.

      Answer: We specified, "encoding" a major splicing repressor in the Line 174 text to refer to the mRNA: “Interestingly, Ptbp1, encoding a major splicing repressor”.

      R2C10: Line 189 - text in this line have some clusters not shown in the figure - (clusters 6 and 15, DCX+ Ki67+ neuroblasts) - which would be an important thing to visualize. As is shown now, the authors are only showing that iNBs are similar to mitotic TAPs.

      Answer: Clusters 6 and 15 have been added to Figure S5.

      R2C11: Fig 3D-E - Why is cluster 17 called aNSCs (3E) when it has the highest GFAP (Fig 3D). Typically, the highest GFAP cells are qNSCs or astrocytes, not aNSCs.

      Answer: We previously reported that the level of gfap mRNA expression in neural stem cells (quiescent and activated) did not exactly reflect the amount of protein in these cells. This is the reason why we also used the Slc1a3 marker (Glast), which is highly expressed both at the RNA and protein levels in quiescent NSCs (Daynac et al. 2013).

      R2C12: Line 216 - You said in line 216 cluster 13 were astrocytes, then you said in line 227 that cluster 13 was s-qNSC. Which is it?

      Answer: This is due to the fact that we performed two distinct analyses.

      In the first one (line 216), cells were scored based on datasets provided by Cebrian et al. with one dataset containing genes enriched in astrocytes, and another one, genes enriched in quiescent B-cells. Therefore, cluster 13 was shown to contain 73% cells expressing astrocyte markers, whereas cluster 4 gathered cells expressing both qNSC (B-cells, 48%) and astrocyte (52%) genes.

      In the second one (line 227), cells were scored using our transcriptomic signatures of FAC-sorted SVZ cells, which do not include differentiated astrocytes. We demonstrated that the cluster 13 cells only expressed s-qNSC genes.

      R2C13: Line 214 - While other clusters were all named in lines 214-221 that were then further discussed in lines 227-230, clusters 15 and 19 were not. You associate both of those clusters with s-iNB - what was it associated with in the above section?

      Answer: Lines 219-221 have been reworded as follows: Clusters 10, 5, 15, 12, and 8 were defined as cycling progenitors based on the expression of proliferative markers such as Top2a, Mki67, Ascl1. Clusters 1, 3, 7 and 9 were identified as mNB due to the loss of Mki67, Top2 a and Ascl1 expressions and the expression of Robo2 and Dcx. Cluster 19 that have lost Ascl1 but still expressing Top2a and Mki67 together with Robo2 and Dcx appears at the transition between iNB and mNB.

      R2C14: Fig 3I-J - 5 days after irradiation, I would like to see from tissue slices how many cells are dividing compared to 1day post-irradiation and controls. In other paradigms, such as temozolomide experiments (Kalamakis et al), by 5 days we should see less cells in quiescence and more of those quiescent cells exiting quiescence into the cell cycle. Why would there be more cells in quiescence in the irradiated brain? Even if they are radiation resistant, the base number should be comparative between controls and irradiated, which is not what you show in Fig 3I-J. And R2C14)

      Line 234-235 - the text says normalized to numbers of qNSCs which is supposed to be the same (which I agree should be the same). However, your graph in 3I and J shows more qNSCs in irradiated conditions, which would influence greatly and is currently hard to interpret.

      Answer: As stated by the reviewer, there is no increase in the absolute number of quiescent cells in the irradiated SVZ. The reconstitution of SVZ cell populations after 4Gy irradiation has already been studied by our group (Daynac et al. 2013, see Fig. 3F), showing that s-iNB and s-mNB are still under-represented after 5 days, while qNSC are in similar numbers as in unirradiated SVZ. Therefore, this led to an over-representation of quiescent cells and early SVZ progenitors in Figure 3J as compared in Figure 3I.

      R2C15: Fig 6A - the authors show a significant difference in neurospheres between eGFP- (DCX-) and eGFP+ (DCX+) iNBs - as would be expected as DCX suggests a further commitment towards neurogenic fates, yet your population doubling is the same.

      Answer: To determine the population doublings, the medium was changed and cells numbered every 7 days. This condition masked the differences between two cell populations reaching the plateau phase at different time, explaining why eGFP-iNB and eGFP+iNB could not be clearly distinguished by this technique.

      R2C16: Fig 6C - Differentiation data (in-vitro) should be quantified in 6C, just as was mentioned for 1C. These values should be done for both of the populations (eGFP-iNB, and eGFP+iNB) and not just compared to the previous pictures which were on total iNB. Again, numbers are required, not just picture examples.

      Answer: Quantitative data have been given in Figure 6D showing that approximately 60-80% of cells eGFP+iNB are able to differentiate in either neurons, oligodendrocytes or astrocytes. We did not analyze the differentiation of eGFP-iNB since it would not add any supplementary information.

      R2C17: Fig S8 - The authors did not show if the lack of engraftment of eGFP+ cells is due to the transplant (previously you showed only 2/3 worked in a similar paradigm). It would be helpful if the authors would have some means to visualize the DCX low cells to confirm they worked as before in the transplantation (another color? Another type of mouse (Thy1 antigen differences)?) Answer: Unfortunately, the Thy1 antigen has not been documented in mouse subventricular zone progenitors, but only in neurons (PMID: 10813783). Thy1 antigen has also been described in bipotent glial progenitor cell (GCP) from the developing human brain giving rise to oligodendrocytes (PMID: 36931245).

      As shown, in Figure S10 we have performed 5 grafts with s-iNB eGFP+ cells, 2 alone and 3 mixed with eGFP- cells and never found any eGFP+ cells 5 weeks after grafting. Moreover, we did not find any eGFP+ cells in the brains of 3 other animals 2 weeks after grafting with s-iNB eGFP+ cells (These data have been added to Figure S10). As compared to the results described in Figure 1 this clearly shows that iNB DCXhigh are not able to generate persistent cells in the grafted brains similarly as mNB.

      R2C18: Fig S8 - Why were there no eGFP cells even at the injection site? DCX expression promotes migration, indeed DCX expression becomes very high in cells in the SVZ as they begin to exit to go to the migratory stream. If one didn't see migration, one would expect you would still have survival. Currently, the authors show no cells at 5 weeks, however, they would need to show earlier timepoints as well to determine what is happening with these cells. It is possible these GFP+ cells are not even expressing DCX anymore (see above).

      Answer: As stated above, we did not find any GFP+ cells in the brains of 3 other animals 2 weeks after grafting with s-iNB eGFP+ cells (see Figure S10).

      R2C19: Line 320 - the authors suggest a subpopulation of NEURONS continues to divide and cite 2 works from the 1990s showing proliferating SVZ cells can differentiate. Our knowledge of this system has come dramatically forward since the 1990s as well as technologically, and to date, neurons have not been shown to divide.

      Answer: We apologize for this lack of clarity, as we agree that neurons correspond to differentiated non-cycling cells, but we used the terminology used in these articles. The incorrect part of the sentence Line 320 has thus been deleted from the text.

      R2C20: Fig 7 - The whole figure is based on changing levels of RSR genes which were not confirmed in any way to be involved in any of these stages, only descriptively in single-cell analyses.

      Answer: As stated above, in our opinion, further characterization of the involvement of RSR genes in neurogenesis is largely beyond the scope of our manuscript. Nevertheless, we think that the role of RSR genes in neurogenesis is an important question that should be addressed in further studies.

      Overstatement of findings

      R2C21: Fig 1 - Authors did not compare all cell types in each condition but made overstatements about their relationships to each other between graphs. There should also be separate graphs showing all cell types at 4% and a separate one at 20%.

      Answer: In the revised version, Figure 1 shows the graph comparing all cell types at 4%O2 and a separate one at 20% as requested by the reviewer. The graphs clearly shows that 4%O2 promotes iNB proliferation compared to the 20% condition.

      R2C22: Fig 1D-b2 - Why does DCX look nuclear? One can't say they are only NSCs if they are GFAP as astrocytes also express GFAP. The authors would need another marker to separate those populations. In the text, the authors say expressing GFAP (line 124) which means NSC, but then in line 127 expressing GFAP means astrocytes - which further shows you need additional markers to validate those 2 different cell types. Answer: DCX nuclear translocation has been shown to improve cellular proliferation (PMID:32050972).

      As indicated in R1C4. The text has been modified as follows: “Importantly, eGFP+ cells were present in the SVZ of all the animals transplanted with s-iNB eGFP+ and s-NSC/TAP eGFP+ (Fig. 1Db, 1Dc), some of them expressing GFAP indicating the generation of astrocytes, and therefore possibly NSC”.

      R2C23: Fig S2 - The transcriptome signature for s-iNBs is very similar to s-TAP, basically suggesting the iNBs are further along in cell cycle.

      Answer: This is now the Figure S3. Functional enrichment analysis of individual transcriptome signatures revealed that both s-TAP and s-iNB are enriched in genes related to the cell cycle although with different GO terms enrichments. Indeed, s-TAP are enriched in genes related to G1, G1/S and S phase (but with low -log10 adjusted p-values) and s-iNB with genes related to cell cycle mitosis and M phase (with high -log10 adjusted p-values).

      We have previously shown that around 33 % s-iNB have DNA content>2N, versus around 26% of s-TAP and s- aNSC (Daynac et al. 2013), which is in accordance with GO terms enrichments. However, these data have also shown that most s-iNB and s-TAP are in G1, indicating that siNB are not just further along mitosis than TAP.

      Moreover, our transcriptomic data clearly show that s-iNB are distinct from s-TAP: 1) according to principal component analyses (Figure 2B et C), the whole transcriptome of s-TAP is closer to that of s-aNSCs than to that of s-iNB (10% variations in PCA2), 2) the heatmap in Figure 2D shows that they have different RSR genes expression profiles, 3) the new Figure S4 shows that GO annotations of differentially spliced isoforms further confirmed that s-TAP and s-iNB have distinct features, and 5) Figure S5 shows that s-iNB expressed genes associated to either TAP or NB that have been described in previous studies, whereas s-TAP did not express genes associated to NB, but look closer to aNSC. Finally, scRNAsq cell clusters related to s-iNB are distinct from the cluster related to s-TAP as shown 1) in Figure 3D and 2) in Figure 4.

      R2C24: Fig 3 - The lack of information about timepoint 0 after irradiation, and when proliferation and cell cycle entry begins again following irradiation, limits our interpretation of the single-cell irradiated data.

      Answer: We have previously reported the relative abundance of each SVZ neural progenitors in the young adult mouse brain in several papers. Particularly, we based our interpretation on our SVZ irradiation model reported in Daynac et al. 2013 demonstrating a radio resistance of qNSC re-entering into the cell cycle as early as 2 days after 4Gy irradiation successively regenerating aNSC, TAP then iNB and mNB.

      R2C25: Fig S3 - These results effectively show that the s-aNSCs and s-TAPs are actually less specific when compared to that same identity in other studies, and that the iNBs are most similar to mitotic TAPs. This supports what was mentioned above, which is that the transcriptional signatures are very similar between the s-TAPs and i-NBs, showing these are not a unique cell state, but just a bit further along mitosis within the TAP cell state.

      Answer: This is now the Figure S5. In this figure, we show that s-iNB expressed genes associated to either TAP or NB that have been described in previous studies, whereas s-TAP did not express genes associated to NB, but look like closer to aNSC. As indicated above in R2C23, s-iNB are not just a bit further along mitosis within the TAP cell state. Indeed, we give several data showing that s-iNB and s-TAP have different transcriptomic profiles.

      R2C26: Fig 4B - The focus on Ptbp1 as being associated with the iNB cluster border to mNB is expected as all previous studies of Ptbp1 have focused on its role in the progression of other cell types through the cell cycle, its control of cell cycle regulators, and a cell cycle mRNA regulon (Monzon-Casanova et al, 2018, 2019, 2020). This further supports these analyses are specifically defined by cell cycle stages.

      Answer: We totally agree that Ptbp1 expression distinguishes cycling cells from postmitotic neuroblasts in accordance with previously published paper, and that based on this unique gene we cannot find any differences between cycling cells ie. aNSC, TAP and iNB. However, as shown in the manuscript and stated above (R2C23 and 25), these cells can be distinguished by their respective expression of many other genes, including other RSR genes.

      R2C27: Line 281-282 is an overstatement - the authors suggest that this is a new type of cycling neural progenitor - when all studies point to it being the end of mitosis TAPs as they go on their way to mNBs. This clearly shows a trajectory and not a defined, binary cell type.

      Answer: We agree with this statement that the use of the word "type" was misleading, and changed it to "stage" to better reflect that s-iNB are a distinct stage along the differentiation process according to our pseudotime cell-trajectory analysis.

      Author response image 2.

      Pseudotime analysis using Monocle 3 (excluding the cluster 13 corresponding to astrocytes and starting from s-qNSC) revealed two branches starting from s-TAP, one towards cell cycle the other towards neuronal differentiation.

      minor comments:

      R2C28: Fig 3D - For ease, please define what you called the clusters in 3D - not just cluster numbers

      Answer: We chose not to call the clusters in 3D because their identification (Group names) is based on data presented after in Figures 3E, F and G.

      R2C29: Fig 3E-F - Show astrocytes by text in 3E and F

      Answer: As discussed above, astrocytes cannot be shown in these figures because they are based on our signatures which did not include astrocyte signature.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this study from Belato, Knight, and co-workers, the authors investigated the Rec domain of a thermophilic Cas9 from Geobacillus stearothermophilus (GeoCas9). The authors investigated three constructs, two individual subdomains of Rec (Rec1 and Rec2) and the full Rec domain. This domain is involved in binding to the guide RNA of Cas9, as well as the RNA-DNA duplex that is formed upon target binding. The authors performed RNA binding and relaxation experiments using NMR for the wild-type domain as well as two-point mutants. They observed differences in RNA binding activities as well as the flexibility of the domain. The authors also performed experiments on fulllength GeoCas9 to determine whether these biophysical differences affect the RNA binding or cleavage activity. Although the authors observed some changes in the thermal stability of the mutant GeoCas9-gRNA complex, they did not observe substantial differences in the cleavage activities of the mutant GeoCas9 variants.

      Overall, this manuscript provides a detailed biophysical analysis of the GeoCas9 Rec domain. The NMR assignments for this construct should prove very useful, and the results may provide the grounds for future engineering of higher fidelity variants of GeoCas9. While the NMR results are generally well presented, it is unclear how the results on the isolated Rec domain related to the overall function of full-length GeoCas9. In addition, some conclusions are overstated and not fully supported by the evidence provided. The following major points should be addressed by the authors.

      (1) Many of the results rely on the backbone resonance assignments of the three constructs that were used, and the authors have done an excellent job of assigning the Rec1 and Rec2 constructs. However, it is unclear from the descriptions in the text how the full-length Rec construct was assigned. Were these assignments made based on assignments for the individual domains? The authors state that the spectra of individual domains and RecFL overlay very well, but there appear to be many resonances that have chemical shift differences or are only present in one construct. As it stands, it is unclear how the resonances were assigned for residues whose chemical shifts were perturbed, making it difficult to interpret many of the results.

      The Reviewer raises an important oversight. In Lines 491-493, we clarify that we were able to transfer the assignments using spectral overlays of the individual domains with GeoRec (i.e. careful analysis of the data in Figure S3). We also cite two new references where a similar approach was applied to Cas9.

      (2) The minimal gRNA that was used for the Rec-gRNA binding experiments is unlikely to be a good mimic for the full-length gRNA, as it lacks any of the secondary structure that is most specifically recognized by the REC lobe and the rest of the Cas9 protein. The majority of this RNA is a "spacer" sequence, but spacers are variable, so this sequence is arbitrary. Thus, the interactions that the authors are observing most likely represent non-specific interactions between the Rec domains and RNA. The authors also map chemical shift perturbations and line broadening on structural models with an RNA-DNA duplex bound, but this is not an accurate model for how the Rec domain binds to a single-stranded RNA (for which there is no structural model). Thus, many of the conclusions regarding the RNA binding interface are overstated.

      The Reviewer again raises an important point. We have added a section of text explaining the rationale for truncating the gRNA for binding experiments with NMR (Lines 223-235). We chose the 5’end of the gRNA containing the spacer sequence based on crystal structures of NmeCas9 and SpCas9 that show the Rec lobe interacting with this section of nucleic acid. The newly published GeoCas9 cryo-EM structure bound to gRNA, which overlaid well with the NmeCas9 structure, also suggested that this portion of the gRNA could interact with Rec.

      Figures S11 and S12 show our gradual truncation of the gRNA and Rec construct to achieve useful atomic detail. Ultimately, a 39nt gRNA containing a 23 base pair spacer sequence was chosen for this study to retain the NMR signal of the complex and because several structures suggested this 39nt sequence would be long enough to interact with the entire Rec lobe.

      To investigate the effect of the spacer sequence, we have now measured binding affinities via MST between GeoRec and a 39nt Tnnt2 gRNA and a 39nt gRNA from PDB: 8UZA, containing a different spacer sequence used in the very recent GeoCas9 cryo-EM structure. The observed trends for each gRNA are consistent across the samples. We also measured WT, K267E, and R332A GeoCas9 affinity for the full-length Tnnt2 and PDB:8UZA gRNAs.

      Lastly, we used a new cryo-EM structure of GeoCas9 bound to gRNA (PDB: 8JTR) to better define the interface for NMR CSPs and line broadening and have adjusted the language in this section.

      (3) The authors include microscale thermophoresis (MST) data for the Rec constructs binding to the minimal gRNA. These data suggest that all three Rec variants have very similar Kd's for the RNA. Given these similarities, it is unclear why the RNA titration experiments by NMR yielded such different results. Moreover, in the Discussion, the authors state that the NMR titration data are consistent with the MST-derived Kd values. This conclusion appears to be overstated given the very small differences in affinities measured by MST.

      MST and NMR experiments describing the weakened binding affinity of GeoRec and GeoRec2 for the Tnnt2 gRNA agree with each other (Figure 5). However, additional MST experiments with a different gRNA sequence (from PDB: 8UZA) and with fulllength GeoCas9 (new Figure 7) have provided new insight for us to soften and reframe the Discussion to avoid overstatement. See Lines 263-282 and 375-385.

      (4) While the authors have performed some experiments to help place their findings on the isolated Rec domain in the context of the full-length protein, these experiments do not fully support the conclusions that the authors draw about the meaning of their NMR results. The two Cas9 variants that were explored via NMR have no effect on Cas9 cleavage activity, and it is unclear from the data provided whether they have any effect on GeoCas9 binding to the full sgRNA. This makes it difficult to determine whether the observed differences in RNA binding and dynamics of the isolated Rec domain have any consequence in the context of the full protein.

      We have since measured the binding affinities of full-length GeoCas9 to full-length gRNA. (new Figure 7) We have also added a comment in the Discussion section describing how both GeoRec and GeoRec2 domain variants bind the truncated RNA with weaker affinity than the WT, but this biophysical effect does not translate to GeoCas9 with its full-length gRNA. We describe this finding as an explanation for why the single-point mutants have minimal effect of GeoCas9 cleavage activity. See Lines 375-385.

      (5) The authors state in multiple places that the K267E/R332A mutant enhanced GeoCas9 specificity. Improved specificity refers to a situation in which the efficiency of cleavage of a perfectly matched target improves in comparison to a mismatched target. This is not what the authors observed for the double mutant. Instead, the cleavage of the perfect target was drastically reduced, in some cases to a larger degree than for the mismatched target. The double mutant does not appear to have improved specificity, it has simply decreased cleavage efficiency of the enzyme.

      The conclusion has been reframed to suggest that the K267E/R332A double mutant has decreased cleavage efficiency of the enzyme but does not enhance GeoCas9 specificity. We discuss an interesting contrast, namely that mutations in the SpCas9 Rec lobe alter its specificity, which is at times accompanied by a loss of overall activity. We also speculate on why this may not be the case in GeoCas9, considering some very recent (unpublished at the time of initial submission) structural and biochemical data. See Lines 414-418.

      Reviewer #2 (Public Review):

      Summary:

      The manuscript from Belato et al. used advanced NMR approaches and a mutagenesis campaign to probe the conformational dynamics of the recognition lobe (Rec) of the CRISPR Cas9 enzyme from G. stearothermophilus (GeoCas9). Using truncated and full-length constructs they assess the impacts of two different point mutations have on the redistribution and timescale of these motions and assess gRNA recognition and specificity. Single point mutations in the Rec domain in a Cas9 from a related species had profound impacts on- and off-target DNA editing, therefore the authors reasoned analogous mutations in GeoCas9 would have similar effects. However, despite a redistribution of local motions and changes in global stability, their chosen mutations had little impact on DNA editing in the context of the full-length enzyme. Their studies highlight the species-specific complexity of interdomain communication and allosteric mechanisms used by these multi-domain endonucleases. Despite these negative results, their study is highly rigorous, and their approach will broadly support understanding how the activity and specificity of these enzymes can be engineered to tune activity and limit off-target cleavage by these enzymes.

      Strengths:

      (1) Atomistic investigation of the conformational dynamics of the GeoCas9 gRNA recognition lobe (GeoRec), probing dynamics on a broad range of timescales from ps to ms using advanced NMR approaches will be broadly interesting to both the structural biology and CRISPR engineering communities.

      (2) Highly rigorous biophysical studies that push the boundaries of current techniques, provide insight into local dynamics of the GeoRec domain that serve to propagate allosteric information and potentially regulate enzymatic activity.

      (3) The study highlights the complexities of understanding interdomain communication in Cas9 enzymes since analogous mutations in different species have different effects on target recognition and cleavage.

      (4) The type of structural and dynamic insights derived from this study design could serve as foundational information to guide a rational design strategy aimed at improving the selectivity and reducing the off-target effects of Cas9 enzymes.

      Weaknesses:

      (1) Despite the rigor of the experiments, the mutations chosen by the authors do not have a profound effect on the overall substrate affinity or activity of GeoCas9 rendering little mechanistic insight into allosteric communication in this particular Cas9. However, the double mutant K267E/R332A has a more pronounced effect on the cleavage of WT and mismatched (at nucleotides 19 and 20) DNA substrates while minimally affecting the cleavage of mismatched (at nucleotides 5 and 6), suggesting more could be learned about the allosteric mechanism from the detailed characterization of this mutant.

      We thank the Reviewer for this comment. While we have included new binding experiments with full-length GeoCas9 and gRNAs (new Figure 7), the addition of new MD simulations (new Figure 6) better address this point. MD examined our single and double mutants, as well as the recently published high-specificity iGeoCas9, and reported the degree of conformational sampling and nucleic acid contacts and binding energies.

      The simulations show that our mutations induce some, but not the full extent of the effect of iGeoCas9 (with one mutation in GeoRec and many others in the adjacent WED domain), implying that further engineering of GeoRec to mimic iGeoCas9’s properties can have profound functional outcomes. Future efforts to mutate GeoRec will be leverage this strategy. See Lines 309-342.

      (2) Follow-up experiments with other residues that were identified as being highly dynamic might affect substrate recognition and cleavage activity in different ways providing additional insight.

      The Reviewer is correct. While beyond this initial scope, new MD simulations (see the response directly above) and NMR resonances distally affect by gRNA (via CSP or relaxation dispersion) will be used identify the primary targets for this analysis.

      (3) Details regarding the authors' experimental approach are incomplete such as a description of the model used to fit the CD data, a detailed explanation of the global fitting of the relaxation dispersion data describing how the best-fit model was selected, and the description of the ModelFree fitting of fast timescale dynamics is incomplete.

      We thank the Reviewer for pointing out these oversights. We have now included the fitting equation in the CD Methods section.

      We included new Figures S8-S10 with the individual relaxation dispersion curves and note in the Methods that global fits were deemed superior based on the Akaike Information Criterion. For WT, the AIC showed the global fit to be ~10-fold better. For K267E, the global model was 4-fold better, and for R332A, the global model was 6-fold better.

      We have included a more detailed description of CPMG and Model-free fitting. See Lines 520-526.

      Reviewer #3 (Public Review):

      The authors explore the role of Rec domains in a thermophilic Cas9 enzyme. They report on the crystal structure of part of the recognition lobe, its dynamics from NMR spin relaxation and relaxation-dispersion data, its interaction mode with guide RNA, and the effect of two single-point mutations hypothesised to enhance specificity. They find that mutations have small effects on Rec domain structure and stability but lead to significant rearrangement of micro- to milli-second dynamics which does not translate into major changes in guide RNA affinity or DNA cleavage specificity, illustrating the inherent tolerance of GeoCas9. The work can be considered as a first step towards understanding motions in GeoCas9 recognition lobe, although no clear hotspots were discovered with potential for future rational design of enhanced Cas9 variants.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Suggestions for improved or additional experiments, data, or analyses

      (1) Please update the sentences on lines 100-105 and the Methods to clarify how the RecFL assignments were obtained. If RecFL was assigned based on the assignments for Rec1 and Rec2, please describe in the Methods how the shifted resonances were handled. Please also provide chemical shift perturbation profiles for the truncated constructs versus the full-length Rec construct.

      We have now added text (Lines 491-493) and two new references explaining the GeoRec (full-length) assignment.

      We appreciate this point. We have now provided a new Figure S9 with analysis of CSPs and line broadening in truncated constructs (GeoRec2 only). See also Lines 263-282. We also show a similar structural response to mutation in full-length GeoRec and GeoRec2 NMR CSPs (Figure 2 and Figure S5).

      We have provided the CSPs for each construct, relative to the full-length GeoRec domain, Author response image 1. In most cases, the largest CSPs occur at resonances on the periphery of the spectra, retaining the ability to unambiguously assign it.

      Author response image 1.

       

      (2) It is unclear whether the differences in Kd's for the Rec-gRNA interactions are statistically significant, given the errors associated with the values. Can the authors further analyze these data to determine statistical significance? If they are not found to be significantly different, the authors should soften all conclusions related to the observed differences.

      Statistical significance was calculated for all MST data and Figures 5 and 7 have been updated to reflect this

      (3) As mentioned above, it seems likely that the Rec-RNA binding that is observed is non-specific. Have the authors tried MST with another 39 nt RNA? Are there differences in affinities for the Rec constructs?

      We have done MST with another 39nt RNA. The affinity for each gRNA (Tnnt2 vs 8UZA) is similar for WT and K267E, and a factor of ~4 weaker for R332A with 8UZA gRNA. The trend is the same, that WT Rec has a (statistically significant) stronger affinity for the gRNA compared to the mutants.

      (4) Have the authors tried MST with full-length GeoCas9 and the sgRNA? The current data on the thermal stability of the RNP's is interesting, but a more direct measurement of the affinity of the Cas9-sgRNA complexes would provide stronger evidence of the effects of the mutations.

      The Reviewer makes an excellent suggestion. We have now generated Cy5-labeled full-length gRNAs and conducted MST with full-length GeoCas9 (new Figure 7). The binding affinities to multiple guides do not vary significantly. We have discussed this, and its implications, in Lines 376-385.

      (5) One potential issue with not observing differences between the three Cas9 variants' cleavage activity is that the activity of these purified proteins appears to be very low in comparison to previous studies of GeoCas9. There are significant differences in the expression protocol used by the authors of the current study and previous studies. Have the authors attempted to replicate the expression and purification protocol of previous reports? This may improve the enzymatic activity and allow for a more detailed investigation of cleavage between the three variants (e.g. by performing time-course cleavage assays).

      The expression protocol of GeoCas9 is identical to those of previous studies. This was a written mistake on our part, which has now been corrected in the methods section. We apologize for this oversight.

      Recommendations for improving the writing and presentation

      The introduction of the manuscript is reasonable for specialists who are very familiar with Cas9 function, but it does not contain important details that may be unknown to most readers. The authors do not introduce the domains of Cas9 in the Introduction section. A brief description of the domains that are important to this work should be provided. For example, what is the role of the Rec lobe? This is not introduced until lines 110-111, after some discussion of the authors' initial work on these domains. For a broad audience, it would also be helpful to define the two catalytic domains of the protein. A paragraph describing the general architecture of Cas9 and the overall mechanism of Cas9, including allostery and domain movement, would be very helpful to a general audience. There are elements of this throughout the manuscript, but it would be better to have everything described in a single location at the beginning of the Introduction.

      The Reviewer makes an excellent point. We have added significant clarifying text to the Introduction (Lines 42-47, 52-58, and 61-66). We have also amended Figure 1 to highlight the domain arrangement of GeoCas9 and construct domain boundaries.

      Minor corrections to the text

      (1) Lines 37-38: The statement about GeoCas9 activity should reference citation.

      We have added two references here.

      (2) Line 39-40: "The widely-studied SpCas9, as well as GeoCas9, are Type-II CRISPR systems". Cas9 is only a single component of a larger system that contains other proteins and DNA elements, so it would be more appropriate to say "are effectors of type II CRISPR systems" or "are signature proteins of type II CRISPR systems". Also, please define the organism from which SpCas9 is derived. It may be more appropriate to use the three-letter abbreviation "SpyCas9" to be consistent with the abbreviation used for GeoCas9.

      We have revised the initial suggestion and specified the organisms. We have, however, chosen to keep “SpCas9” for consistency with our prior work and the work of many several others, including Doudna et al and Zhang et al.

      (3) Lines 39-42: "only the Type II-C class to which GeoCas9 belongs has been rigorously validated for mammalian genome editing". SpCas9 is from a type II-A system and is by far the most commonly used ortholog for genome editing, including in ongoing clinical trials. It is unlikely that any of the type II-C Cas9 orthologs have been more rigorously validated than SpCas9. The reference cited in this sentence also does not support this statement and is a review written in 2017, so would be unlikely to reflect the current state of the art. Please revise this sentence.

      We have softened and revised this text (Lines 42-47).

      (4) Lines 48-52: It would be helpful to describe the dynamic movement of the HNH domain (and cite appropriate references) prior to describing the authors' previous work. As it stands, it is unclear how this sentence would be understood by a non-specialist.

      We have added text in Lines 61-68

      (5) Lines 44-45: The wording is a little unclear, as it sounds like the guide RNA, rather than the nuclease domains, is responsible for dsDNA cleavage. The sentence could be adjusted to remove "and cleave". Cleavage by the HNH and RuvC domains could be described in a separate sentence.

      We have revised this text. See Lines 49-50.

      (6) Lines 46-48: This segment of the sentence suggests that PAM recognition triggers the allosteric events that result in the movement of the nuclease domain (HNH). This is misleading, as HNH movement is triggered by the complete formation of an R-loop, rather than initial PAM recognition. Please revise this sentence.

      We have revised the text in Lines 52-58.

      (7) Lines 62-65: The first sentence is unclear. The specificity of many protein-nucleic acid complexes is well understood and is also readily quantified by several wellestablished methods. Are the authors specifically referring to the structural basis for Cas9 specificity? Although Cas9 specificity is highly complex, it has been studied structurally in great detail and should not be described as "poorly understood" without some discussion of what is already known. These sentences also elide the fact that Cas9 specificity has been successfully altered via rational design, based on our general framework for understanding protein-nucleic acid interactions. Please clarify these statements.

      The Reviewer makes an important point. We have softened this statement (Lines 8081). We have clarified that we intended to refer to structural characterization of large, multidomain proteins and nucleic acid complexes via NMR. We agree that many critical structural studies comment on Cas9 dynamics and specificity in great detail, including at the domain-level.

      (8) Lines 62-68: It seems like the citations do not match up with the references in this section. The references for citations 8-10 are not about DNA repair complexes, references 11-14 are not papers about the directed evolution of Cas9 (should these be 16-17?), and the references for the HNH domain movements should be for citations 1821.

      We apologize for the confusion, and the references have been updated

      (9) Lines 116-119: The description of the RNAs used is unclear, as the segments that are described add up to 141 not 101. Also, what is meant by "110-nt guide sequence intrinsic to GeoCas9"? Is this referring to the tracrRNA segment? It may be helpful if the RNA sequences shown in the accompanying figures were replaced with cartoons of the RNAs that were used, with the different segments labeled.

      We now describe the gRNA sequences in detail in new Table S4. We also expanded a bit in the text (Lines 224-235).

      (10) Line 121-123: This sentence should contain reference(s).

      We have changed the sentence.

      (11) Line 156-158: Reference 19 did not report or investigate any higher specificity SpCas9 variants, is this citation correct?

      We have removed the reference from this line. Ref. 19 (now Ref 23, Slaymaker et al) should be correct.

      (12) Lines 162-166: Please provide a sequence and structural alignment for SpCas9 and GeoCas9 to support the claim that the amino acid substitutions are equivalent between the two orthologs.

      We have updated Figure 1 to display the similarity in domain arrangement between SpCas9 and GeoCas9 and have noted similarity in structure and sequence of these proteins in Figure S1.

      (13) Lines 234-236: There is insufficient evidence to conclude that the alterations in protein dynamics caused the changes in gRNA interaction. The substitutions are charge swap substitutions, and it is equally (if not more) feasible that these substitutions decrease the potential for favorable electrostatic interactions.

      (14) Lines 261-265: While the RNP stability for R332A is clearly decreased in comparison to WT, the authors' conclusions regarding K267E seem overstated. The difference in Tm for the K267E mutant and WT RNPs is not very large and may be within error, especially given that the CD data are noisy. Similarly, on lines 321-322, only one of the mutations really impacted the stability of the full-length RNP.

      We have softened this text in Lines 303-305.

      (15) Lines 336-338: HiFi-SpCas9 does not contain four mutations, it is a single R691A point mutation, as reported in reference 17. This sentence and subsequent sentences should be updated.

      Here, the “final form” of HiFi SpCas9 contains the R691A and three additional mutations. The Reviewer is correct, though, that the R691A mutation alone was enough to enhance the specificity of WT SpCas9. We have clarified this point on Line 156.

      Minor corrections to the figures

      (16) The cryo-EM structures of GeoCas9 have recently been released on the PDB. The authors may now update figures to include the experimentally determined structure, rather than an AlphaFold model and update the text accordingly.

      We have made this change.

      (17) For Figure S4, please describe what the red dashed lines are in the top three graphs. Are these the Tm values determined for the two individual Rec domains? How do these compare to the inflection points for the two transitions in the full Rec construct (could be determined by plotting the first derivative data)? Please provide information in the Methods on how the temperature-dependent CD spectral data were fit and Tm's were determined.

      We have made these changes in the Figure S4 caption and Methods section.

      (18) The blue box denoting the unassigned region is missing from Figure 2C-D, although it is mentioned in the figure legend.

      We have added the blue box denoting the unassigned linker.

      Reviewer #2 (Recommendations For The Authors):

      The manuscript is well-written and generally clear and concise. The following recommendations will help improve the readability and include details important for interpreting the results.

      (1) In general, the figures are too small and difficult to interpret, it was hard to discern the differences described in the text (e.g. Figure 1A, E, 4A, etc.), the text labels are illegible in several panels (e.g. Figure 4A, S8B, C, etc.), the chosen colors were difficult to interpret in the structures (Figure 4C, S8G, H, etc.), as well as residues with motion (as balls) were difficult to make out due to size and color usage. Similar story for the dispersion curves (Fig 3A), the plots are chaotically crowded, and it is impossible to interpret (or see) the undelaying data.

      We apologize for these difficulties. We have now revised the Figures in several ways. First, we greatly simplified Figure 1, such that it now includes only the domain arrangement, structure, and initial NMR details for GeoRec (essentially A-B of the old Figure 1).

      Second, we have reformatted Figure 3 to make the structure maps a bit easier to see.

      We certainly appreciate the point made by the Reviewer about the dispersion curves. Our intent here is to illustrate the number of curves that can be fit globally, which substantially increase for K267E and R332A GeoRec3, versus WT. As a compromise, we have included the individual dispersion curves in the SI for each variant. We have also thinned the line weights for each fit, and added NMR order parameters to the main figure to round out the discussion of dynamics.

      Third, we have compiled the gRNA titration into Figure 4, removing the CD analysis (to SI), MST data (new Fig 5), and unclear structure maps to focus only on the NMR spectra here.

      Fourth, we have created a new Figure 5 focusing on MST studies of two gRNAs with GeoRec, which now include bar charts of affinities with appropriate statistics.

      Much of the data trimmed from the prior version of the manuscript figures has been moved to Supporting Information. We have also created two new main text Figures (6 & 7) based on MD simulations and MST studies of full-length GeoCas9 and gRNAs to provide additional context for interpreting the results in prior figures.

      (2) Line 39 - this sentence is awkward, could you rephrase it?

      We have rephrased this sentence.

      (3) There is inconsistent labeling, in Figure S2 the full-length construct is referred to as GeoRecFL while in other places in the text and in Figure 1 it is called GeoRec.

      We have changed all references to the intact Rec lobe to “GeoRec.”

      (4) It would be helpful to include a cartoon of the domain organization of GeoCas9 and indicate the truncation mutants that were studied in this manuscript.

      We included the domain organization in Figure 1A and indicated the amino acid boundaries for each construct on the figure and in the Methods section.

      (5) There is significant line broadening that occurs during the titration, not all line broadening is due to changes in rotational correlation time, and differential line broadening may reveal interactions of residues that are in the intermediate regime, certainly, uM affinities measured by the authors, would suggest this, therefore, a plot of I/Io might inform on binding sites, and it might be useful to look at differential broadening as a function of titrant added.

      The Reviewer makes a very good point. In addition to the data in Figure 4, which show a clear reduction in gRNA-induced line broadening in larger GeoRec constructs, we included new titration data on smaller GeoRec2 domains (Figure S12). Here, we conducted an I/I0 analysis and added some clarifying language about the possible nature of line broadening in these samples. See new Figure S12 and Lines 268-274.

      (6) Line 126 "Importantly, many resonances are also minimally impacted." This statement is unclear since from the plots shown in Figure 1D, it seems that many of the residues are impacted by RNA titration, see the point about differential broadening above, this sort of plot may help pick apart residues that broaden due to RNA contacts (rather than changing rotational correlation).

      We have removed this statement, in addition to our revisions above regarding the line broadening.

      (7) Line 137 - I am not sure that a max chemical shift of 0.15 ppm constitutes "strong chemical shift perturbations"

      The Reviewer makes a good point. We have changed “strong” to “significant” which refers to 1 standard deviation above the 10% trimmed mean of the data. See Line 237.

      (8) Line 144 - change to "...experimentally determined structure...".

      We have added new lines 135-136 to make this point clear. We reinforced that initial predictions were based on the Alphafold2, since an experimental structure was lacking, but we have now discussed the mutations in context of the new structural data.

      (9) The section from lines 150 - 166, comparison of the effect of different mutations in different Cas9 seems more appropriate for the discussion section.

      We have added additional text on this point in the Discussion section, within several new paragraphs.

      (10) In Figure S6, chemical shifts are observed at the distal site away from the mutations, could the authors discuss?

      The Reviewer makes an important observation. Indeed, the CSPs caused by K267E and R332A extend beyond the mutation site. These shifts are mostly close in 3D space to the mutation, and consistent in Figures 2 and S5. New titrations of gRNA into isolated GeoRec2 also activate some distal sites, and new MD simulations suggests the mutations disrupt RNA and DNA contacts, where these distal effects may play a role with full-length gRNAs.

      We agree it would be worth mutating distal sites undergoing CSPs to examine their impact on function, but two complicating factors are 1) the lack of substantial gRNA affinity differences in experiments with full-length GeoCas9 and 2) the lack of functional changes in the mutants. In this initial study, it appears difficult to assign an effect to these distal sites in GeoCas9 (beyond speculation). We do have a brief discussion of the distal sites (Lines 293-298) and will follow up this work with more comprehensive mutagenesis studies of these sites.

      (11) It appears that the authors fitted the Tm data to some model although this is not mentioned in the text, figure captions, or methods. In the caption for Figure 4D the authors refer to "Fitted thermal denaturation profiles...".

      We have added the relevant Equation in the Methods and referenced it in Figure S6 and S14 captions.

      (12) Details of the ModelFree fitting are needed, how many residues fit with the minimal models, and how many invoked Rex and other terms? How does the statement in line 191 about the elevated S2 values arising from global tumbling compare with an experimental estimation of rotational correlation eg. from R2/R1 ratios?

      We have included an expanded description of the Model-free protocol (Lines 521-527). The best diffusion tensor was an ellipsoid model. The number of residues utilizing Rex was 81, though Rex contribution was very small. The mean and errors for the fast motion (S<sup>2</sup><sub>f</sub>), slow motion (S<sup>2</sup><sub>z</sub>) and generalized order parameter were 0.97 ± 0.15, 0.84 ± 0.14, and 0.91 ± 0.20, respectively.

      R2/R1 ratios for each of the samples (relaxation conducted on GeoRec2 in isolation) corresponded to an estimated tc of 16.3 ns for all data sets. This value is a bit larger than would be expected for a compact globular protein of 25 kDa, though our X-ray structure of GeoRec2 shows a somewhat elongated domain.

      (13) Line 221 - referring to two different figures at the end of the sentence is confusing, maybe place the figure references immediately after the referral in the sentence.

      We have resolved due to reshuffling of the Figures.

      (14) Line 234 - Fig 4E is mentioned before fig 4D, in fact Fig 4D is not mentioned in the text.

      We have reordered and edited many of the Figures, this is now resolved.

      (15) Line 243 - what is the saturating concentration to which the authors are referring?

      We have amended the Results section to more clearly discuss the effect of gRNA on the GeoRec and (now) GeoRec2 domains. We meant 3-fold excess gRNA-to-protein by “saturating” in the prior version. At that point, CSPs held stable and the degree of line broadening at certain sites had completely obscured the resonance from view.

      (16) Fig 4E caption - mentions error of 1.34 while the figure is labeled 1.1 for the R332A GeoRec mutant.

      This has been resolved due to additional MST trails as well as the editing and reordering of many Figures.

      (17) Line 253 - the authors are discussing regions of allosteric hotspots, how do the motions of these predicted hotspots compare with the relaxation dispersion data? There seems to be some overlap.

      The Reviewer makes a keen observation. Yes, there is overlap in these data. For example, hotspot residue R269 is bracketed by L268 and L270 with relaxation dispersion. Also, hotspot L279 surrounded by C275, A276, R277, and D281 with dispersion in both variants. Further, D403 and E408 reside in a stretch of ms timescale flexibility comprised of N404, L406, N412, and L413. We have yet to fully understand the functional significance of this overlap, but have added a note in Line 298 to draw the reader’s attention to it.

      Reviewer #3 (Recommendations For The Authors):

      Although the scope of the manuscript is rather limited due to the minor effects observed for the selected mutations, it is clear that a lot of work was done in spearheading the investigation of dynamic modes in GeoCas9 Rec2. In my view, the data will still be of relevance and interest to the general structural and chemical biology communities.

      However, there are a few technical shortcomings that need to be addressed and some statements that are poorly supported by data, necessitating either more experimental proofs or rephrasing of the conclusions.

      Major points:

      X-ray structure - No PDB ID, structural statistics, or validation report is given for the structure, so it is impossible to judge of the quality. Please provide these. Furthermore, it would be commendable to determine the structure of the point mutant Rec2 domains, this would greatly strengthen the claim that mutations affect only dynamics and do not change structure.

      We apologize for this oversight. We absolutely had these data at the time of submission but must have forgotten to upload them. The validation report is now attached.

      Regarding the mutant structures, the Reviewer’s point is well taken. In the absence of these structures, we have adjusted the language to include the possibility of structural change. We have also included new MD simulations (new Figure 6 and associated text) that provide comment on possible structural and dynamic changes due to mutation. We note that NMR spectral changes are quite modest, beyond the site of mutation. Further, the new binding data with full-length GeoCas9 (new Figure 7) shows very little change in gRNA affinity with mutations, implying that a profound structural rearrangement does not take place.

      Translating isolated Rec2 findings to FL GeoCas9 - This is an important point and I do appreciate that the authors discuss this. I agree that working on FL samples for NMR would not be feasible, but I am not convinced by the statement that "GeoRec2 in isolation represents the structure of the subdomain within full-length GeoCas9 very well". The chemical shift perturbations observed between isolated Rec2 and FL Cas9 are relatively sizable. This should be discussed in further detail. Figure 1B should showcase peaks having the highest perturbations. Are they located at termini or interaction interfaces?

      We have provided the combined <sup>1</sup>H-<sup>15</sup>N combined CSPs for each construct, relative to the full-length GeoRec domain, Author response image 1. In most cases, the largest CSPs occur at resonances on the periphery of the spectra, retaining the ability to unambiguously assign it. The largest CSPs do appear to exist at the termini.

      The Rec1 and Rec2 subdomains are connected by a short, but flexible unstructured linker in full-length GeoRec. Thus, the two subdomains do not form a particularly tight non-covalent interface and behave somewhat independently (see Figure S4, for example).

      Regarding the statement of “GeoRec2 in isolation...,” we apologize for this confusion.

      We were referring to our solved crystal structure in relation to the AlphaFold model. With the new cryo-EM structure of GeoCas9 having been recently published, our X-ray structure of GeoRec2 is still in excellent agreement, but we have clarified our intent on Line 111.

      Dynamics and effect of mutations - K267E is more destabilizing and leads to more spread chemical shift perturbations throughout Rec2 and to faster-correlated dynamics but not in significantly lower affinity or cleavage. How do the authors explain this?

      The Reviewer raises an interesting question. Regarding the impact of the K267E mutation, new MD simulations also suggest K267E to be quite disruptive of the GeoCas9 structure and dynamics, modulating contacts with the nucleic acids. However, further MD analysis of the recently published (bona fide high specificity) iGeoCas9 variant shows that K267E only imparts a portion of the effect of iGeoCas9, suggesting that even further modulation of GeoRec would be require for substantial functional impact. In addition, new MST binding studies with full-length variants and gRNAs show K267E does not dramatically alter gRNA binding, suggesting that the lack of functional impact, despite biophysical change, is suppressed by the surrounding GeoCas9 domains. We comment on this in the Discussion.

      Moreover, the time regime for the fit of the CPMG curves is surprisingly slow given the profiles, how were the minor state populations? Were the dynamics really correlated? Please provide numbers (also see minor points below). In that regime CEST experiments should work, was that done?

      The minor state populations were very low in the analysis, <1%.

      To examine the correlated dynamics, we compared the global fits to those of the individual fits for each residue and found them to be better for the global fit, based on the Akaike Information Criterion. For WT, the AIC showed the global fit to be ~10-fold better. For K267E, the global model was 4-fold better, and for R332A, the global model was 6-fold better. We have added language clarifying the use of AIC to the Methods section.

      We have done CEST experiments on _Geo_HNH (we did not see overly clear evidence for a minor state), but we did not perform these experiments on GeoRec. However, we strongly agree that a detailed follow-up study focusing on CEST and new GeoRec variants should investigate this further.

      Since the binding effects with gRNAs differ in the isolated domain and the full-length protein, we have tried not to over-analyze the impact of the relaxation data in this specific context. These data still provide useful information regarding the impact of point mutants on GeoCas9 domain biophysics, and MD simulations support the enhanced dynamics seen in CPMG and other relaxation data. However, the functional implication is clearly more complicated and requires further study.

      Mutations affect gRNA affinity - I am not convinced that affinity itself is significantly affected based on the MST data. This data could be reproduced as technical replicates to reduce the error bars, or another technique with less intrinsic noise (ITC, SPR) could be used to better support this claim. However, a 3-fold difference seen from NMR titrations could indicate a change in binding mode, for instance in koff. It would be interesting to obtain SPR or BLI data quantifying the kinetics of the interactions. Anyhow, this point should be more carefully discussed.

      We agree with the Reviewer on this point. We conducted additional replicates of MST trials, as well as new MST with a different gRNA sequence. Our updated analysis, including statistics, provides a better measure for “significance” in these data, which is now reported. We have also added some text discussing a possible change in binding mode, see Lines 256-259.

      We also carried out MST on full-length GeoCas9 with full-length gRNAs (the same two RNAs used as truncated constructs). We report these data in new Figure 7 and note there is essentially no difference between the gRNAs or the GeoCas9 variants under these conditions.

      Further, MD simulations suggest a change in binding energy associated with the gRNA interaction in the context of full-length GeoCas9. Since experimental studies are not able to parse these differences, collectively, we describe a scenario where the highly stable structure of GeoCas9 resists substantial mutation-induced change seen for analogous perturbations in SpCas9. See Lines 309-342, 414-418, and 448-461.

      Minor points:

      • Please detail how the error on R1 and R2 rates was calculated.

      We have included new text in Lines 514-518.

      • Please detail how hetNOE values were calculated (simply Isat/Iref?) and what values were used for Model Free.

      Yes, the Reviewer is correct. We have added specifically that we used Isat/Iref on Line 518.

      • Please elaborate on the Model Free analysis. What tensor was used for tumbling? What was the correlation time? This is needed to judge the trustworthiness of S2 parameters.

      We have included new text on Lines 520-526. The diffusion tensor used was an ellipsoid and the correlation time was 15.4 ns. The correlation time estimated from R2/R1 ratios was 16.3 ns.

      • Figure 1: Please indicate where Rec1 and Rec2 are located on panel A and indicate the residue assignments for each peak showcased in panel B.

      We have indicated the boundary of Rec1 and Rec2 in the new cartoon of Figure 1A. We have also noted the exact amino acids used for each construct in the Methods. We also added resonance labels to the spectral overlays in Figure 1B. We have done the same

      • Line 187: I believe this should refer to Figure S8C rather than Figure 3A.

      We have made this change.

      • Some fits of the CPMG curves look strange, e.g. R343 in Fig. 3B WT definitely does not contain significant us-ms dynamics and should be excluded from the analysis. Please double-check each profile. Were other models besides CR72 not providing better fits?

      The Reviewer has made a very careful observation. Our intent was to highlight these sites on purpose to show differences in CPMG relaxation dispersion between WT and variant samples. This was provided as some evidence for the redistribution of dynamics between samples, as many different sites found to be “rigid” on the ms timescale in WT GeoRec2 were flexible in GeoRec2 variants. We agree, however, that this Figure panel was confusing and have therefore removed it in favor of simple discussion in the text.

      • To what degree are the CPMG dynamics correlated, can you provide statistical measures for the global fits?

      We compared the global fits to those of the individual fits for each residue and found them to be better for the global fit, based on the Akaike Information Criterion. For WT, the AIC showed the global fit to be ~10-fold better. For K267E, the global model was 4fold better, and for R332A, the global model was 6-fold better.

      We have added language clarifying the use of AIC to the Methods section.

      • Error measured from replicates and p-values should be reported for DNA cleavage assays.

      We thank the Reviewer for pointing out this omission. We have included error bars on these plots.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      This work provides a new Python toolkit for combining generative modeling of neural dynamics and inversion methods to infer likely model parameters that explain empirical neuroimaging data. The authors provided tests to show the toolkit's broad applicability and accuracy; hence, it will be very useful for people interested in using computational approaches to better understand the brain.

      Strengths:

      The work's primary strength is the tool's integrative nature, which seamlessly combines forward modelling with backward inference. This is important as available tools in the literature can only do one and not the other, which limits their accessibility to neuroscientists with limited computational expertise. Another strength of the paper is the demonstration of how the tool can be applied to a broad range of computational models popularly used in the field to interrogate diverse neuroimaging data, ensuring that the methodology is not optimal to only one model. Moreover, through extensive in-silico testing, the work provided evidence that the tool can accurately infer ground-truth parameters, which is important to ensure results from future hypothesis testing are meaningful.

      We are happy to hear the positive feedback on our effort to provide an open-source and widely accessible tool for both fast forward simulations and flexible model inversion, applicable across popular models of large-scale brain dynamics.

      Weaknesses:

      Although the tool itself is the main strength of the work, the paper lacked a thorough analysis of issues concerning robustness and benchmarking relative to existing tools.

      The first issue is the robustness to the choice of features to be included in the objective function. This choice significantly affects the training and changes the results, as the authors even acknowledged themselves multiple times (e.g., Page 17 last sentence of first paragraph or Page 19 first sentence of second paragraph). This brings the question of whether the accurate results found in the various demonstrations are due to the biased selection of features (possibly from priors on what worked in previous works). The robustness of the neural estimator and the inference method to noise was also not demonstrated. This is important as most neuroimaging measurements are inherently noisy to various degrees.

      The second issue is on benchmarking. Because the tool developed is, in principle, only a combination of existing tools specific to modeling or Bayesian inference, the work failed to provide a more compelling demonstration of its added value. This could have been demonstrated through appropriate benchmarking relative to existing methodologies, specifically in terms of accuracy and computational efficiency.

      We fully agree with the reviewer that the VBI estimation heavily depends on the choice of data features, and this is the core of the inference procedure, not its weakness. We have demonstrated different scenarios showing how the informativeness of features (commonly used in the literature) results in varying uncertainty quantification. For instance, using summary statistics of functional connectivity (FC) and functional connectivity dynamics (FCD) matrices to estimate global coupling parameter leads to fast convergence; however, it is not sufficient to accurately estimate the whole-brain heterogeneous excitability parameter, which requires features such as statistical moments of time series. VBI provides a taxonomy of data features that users can employ to test their hypotheses. It is important to note that one major advantage of VBI is its ability to make estimation using a battery of data features, rather than relying on a limited set (such as only FC or FCD) as is often the case in the literature. In the revised version, we will elaborate further by presenting additional scenarios to demonstrate the robustness of the estimation. We will also evaluate the robustness of the neural density estimators to (dynamical/additive) noise.

      More importantly, relative to benchmarking, we would like to draw attention to a key point regarding existing tools and methods. The literature often uses optimization for fitting whole-brain network models, and its limitations for reliable causal hypothesis testing have been pointed out in the Introduction/Discussion. As also noted by the reviewer under strengths, and to the best of our knowledge, there are no existing tools other than VBI that can scale and generalize to operate across whole-brain models for Bayesian model inversion. Previously, we developed Hamiltonian Monte Carlo (HMC) sampling for Epileptor model in epilepsy (Hashemi et al., 2020, Jha et al., 2022). This phenomenological model is very well-behaved in terms of numerical integration, gradient calculation, and dynamical system properties (Jirsa et al., 2014). However, this does not directly generalize to other models, particularly the Montbrió model for resting-state, which exhibits bistability with noise driving transitions between states. As shown in Baldy et al., 2024, even at the level of a single neural mass model (i.e., one brain region), gradient-based HMC failed to capture such switching behaviour, particularly when only one state variable (membrane potential) was observed while the other (firing rate) was missing. Our attempts to use other methods (e.g., the second-derivative-based Laplace approximation used in Dynamic Causal Modeling) also failed, due to divergence in gradient calculation. Nevertheless, reparameterization techniques (Baldy et al., 2024) and hybrid algorithms (Gabrié et al., 2022) could offer improvements, although this remains an open problem for these classes of computational models.

      In sum, for oscillatory systems, it has been shown previously that SBI approach used in VBI substantially outperforms both gradient-based and gradient-free alternative methods (Gonçalves et al., 2020, Hashemi et al., 2023, Baldy et al., 2024). Importantly, for bistable systems with switching dynamics, gradient-based methods fail to converge, while gradient-free methods do not scale to the whole-brain level (Hashemi et al., 2020). Hence, the generalizability of VBI relies on the fact that neither the model nor the data features need to be differentiable. We will clarify this point in the revised version. Moreover, we will provide better explanations for some terms mentioned by the reviewer in Recommendations.

      Hashemi, M., Vattikonda, A. N., Sip, V., Guye, M., Bartolomei, F., Woodman, M. M., & Jirsa, V. K. (2020). The Bayesian Virtual Epileptic Patient: A probabilistic framework designed to infer the spatial map of epileptogenicity in a personalized large-scale brain model of epilepsy spread. NeuroImage, 217, 116839.

      Jha, J., Hashemi, M., Vattikonda, A. N., Wang, H., & Jirsa, V. (2022). Fully Bayesian estimation of virtual brain parameters with self-tuning Hamiltonian Monte Carlo. Machine Learning: Science and Technology, 3(3), 035016.

      Jirsa, V. K., Stacey, W. C., Quilichini, P. P., Ivanov, A. I., & Bernard, C. (2014). On the nature of seizure dynamics. Brain, 137(8), 2210-2230.

      Baldy, N., Breyton, M., Woodman, M. M., Jirsa, V. K., & Hashemi, M. (2024). Inference on the macroscopic dynamics of spiking neurons. Neural Computation, 36(10), 2030-2072.

      Baldy, N., Woodman, M., Jirsa, V., & Hashemi, M. (2024). Dynamic Causal Modeling in Probabilistic Programming Languages. bioRxiv, 2024-11.

      Gabrié, M., Rotskoff, G. M., & Vanden-Eijnden, E. (2022). Adaptive Monte Carlo augmented with normalizing flows. Proceedings of the National Academy of Sciences, 119(10), e2109420119.

      Gonçalves, P. J., Lueckmann, J. M., Deistler, M., Nonnenmacher, M., Öcal, K., Bassetto, G., ... & Macke, J. H. (2020). Training deep neural density estimators to identify mechanistic models of neural dynamics. elife, 9, e56261.

      Hashemi, M., Vattikonda, A. N., Jha, J., Sip, V., Woodman, M. M., Bartolomei, F., & Jirsa, V. K. (2023). Amortized Bayesian inference on generative dynamical network models of epilepsy using deep neural density estimators. Neural Networks, 163, 178-194.

      Reviewer #2 (Public review):

      Summary:

      Whole-brain network modeling is a common type of dynamical systems-based method to create individualized models of brain activity incorporating subject-specific structural connectome inferred from diffusion imaging data. This type of model has often been used to infer biophysical parameters of the individual brain that cannot be directly measured using neuroimaging but may be relevant to specific cognitive functions or diseases. Here, Ziaeemehr et al introduce a new toolkit, named "Virtual Brain Inference" (VBI), offering a new computational approach for estimating these parameters using Bayesian inference powered by artificial neural networks. The basic idea is to use simulated data, given known parameters, to train artificial neural networks to solve the inverse problem, namely, to infer the posterior distribution over the parameter space given data-derived features. The authors have demonstrated the utility of the toolkit using simulated data from several commonly used whole-brain network models in case studies.

      Strengths:

      (1) Model inversion is an important problem in whole-brain network modeling. The toolkit presents a significant methodological step up from common practices, with the potential to broadly impact how the community infers model parameters.

      (2) Notably, the method allows the estimation of the posterior distribution of parameters instead of a point estimation, which provides information about the uncertainty of the estimation, which is generally lacking in existing methods.

      (3) The case studies were able to demonstrate the detection of degeneracy in the parameters, which is important. Degeneracy is quite common in this type of model. If not handled mindfully, they may lead to spurious or stable parameter estimation. Thus, the toolkit can potentially be used to improve feature selection or to simply indicate the uncertainty.

      (4) In principle, the posterior distribution can be directly computed given new data without doing any additional simulation, which could improve the efficiency of parameter inference on the artificial neural network if well-trained.

      We thank the reviewer for the careful consideration of important aspects of the VBI tool, such as uncertainty quantification, degeneracy detection, parallelization, and amortization strategy.

      Weaknesses:

      (1) While the posterior estimator was trained with a large quantity of simulated data, the testing/validation is only demonstrated with a single case study (one point in parameter space) per model. This is not sufficient to demonstrate the method's accuracy and reliability, but only its feasibility. Demonstrating the accuracy and reliability of the posterior estimation in large test sets would inspire more confidence.

      (2) The authors have only demonstrated validation of the method using simulated data, but not features derived from actual EEG/MEG or fMRI data. So, it is unclear if the posterior estimator, when applied to real data, would produce results as sensible as using simulated data. Human data can often look quite different from the simulated data, which may be considered out of distribution. Thus, the authors should consider using simulated test data with out-of-distribution parameters to validate the method and using real human data to demonstrate, e.g., the reliability of the method across sessions.

      (3) The z-scores used to measure prediction error are generally between 1-3, which seems quite large to me. It would give readers a better sense of the utility of the method if comparisons to simpler methods, such as k-nearest neighbor methods, are provided in terms of accuracy.

      (4) A lot of simulations are required to train the posterior estimator, which seems much more than existing approaches. Inferring from Figure S1, at the required order of magnitudes of the number of simulations, the simulation time could range from days to years, depending on the hardware. Although once the estimator is well-trained, the parameter inverse given new data will be very fast, it is not clear to me how often such use cases would be encountered. Because the estimator is trained based on an individual connectome, it can only be used to do parameter inversion for the same subject. Typically, we only have one session of resting state data from each participant, while longitudinal resting state data where we can assume the structural connectome remains constant, is rare. Thus, the cost-efficiency and practical utility of training such a posterior estimator remains unclear.

      We agree with the reviewer that it is necessary to show results on larger synthetic test sets, and we will elaborate further by presenting additional scenarios to demonstrate the robustness of the estimation. However, there are some points raised by the reviewer that we need to clarify.

      The validation on empirical data was beyond the scope of this study, as it relates to model validation rather than the inversion algorithms. This is also because we aimed to avoid repetition, given that we have previously demonstrated model validation on empirical data using theses techniques, for invasive sEEG (Hashemi et al., 2023), MEG (Sorrentino et al., 2024), EEG (Angiolelli et al., 2025) and fMRI (Lavanga et al., 2024, Rabuffo et al., 2025). Note that if the features of the observed data are not included during training, VBI ignores them, as it requires an invertible mapping function between parameters and data features.

      We have used z-scores and posterior shrinkage to measure prediction performance, as these are Bayesian metrics that take into account the variance of both prior and posterior rather than only the mean value or thresholding for ranking of the prediction used in k-NN or confusion matrix methods. This helps avoid biased accuracy estimation, for instance, if the mean posterior is close to the true value but there is no posterior shrinkage. Although shrinkage is bounded between 0 and 1, we agree that z-scores have no upper bound for such diagnostics.

      Finally, the number of required simulations depends on the dimensionality of the parameter space and the informativeness of the data features. For instance, estimating a single global scaling parameter requires around 100 simulations, whereas estimating whole-brain heterogeneous parameters requires substantially more simulations. Nevertheless, we have provided fast simulations, and one key advantage of VBI is that simulations can be run in parallel (unlike MCMC sampling, which is more limited in this regard). Hence, with commonly accessible CPUs/GPUs, the fast simulations and parallelization capabilities of the VBI tool allow us to run on the order of 1 million simulations within 2–3 days on desktops, or in less than half a day on supercomputers at cohort level, rather than over several years! It has been previously shown that the SBI method used in VBI provides an order-of-magnitude faster inversion than HMC for whole-brain epilepsy spread (Hashemi et al., 2023). Moreover, after training, the amortized strategy is critical for enabling hypothesis testing within seconds to minutes. We agree that longitudinal resting-state data under the assumption of a constant structural connectome is rare; however, this strategy is essential in brain diseases such as epilepsy, where experimental hypothesis testing is prohibitive.

      We will clarify these points and better explain some terms mentioned by the reviewer in the revised manuscript.

      Hashemi, M., Vattikonda, A. N., Jha, J., Sip, V., Woodman, M. M., Bartolomei, F., & Jirsa, V. K. (2023). Amortized Bayesian inference on generative dynamical network models of epilepsy using deep neural density estimators. Neural Networks, 163, 178-194.

      Sorrentino, P., Pathak, A., Ziaeemehr, A., Lopez, E. T., Cipriano, L., Romano, A., ... & Hashemi, M. (2024). The virtual multiple sclerosis patient. IScience, 27(7).

      Angiolelli, M., Depannemaecker, D., Agouram, H., Regis, J., Carron, R., Woodman, M., ... & Sorrentino, P. (2025). The virtual parkinsonian patient. npj Systems Biology and Applications, 11(1), 40.

      Lavanga, M., Stumme, J., Yalcinkaya, B. H., Fousek, J., Jockwitz, C., Sheheitli, H., ... & Jirsa, V. (2023). The virtual aging brain: Causal inference supports interhemispheric dedifferentiation in healthy aging. NeuroImage, 283, 120403.

      Rabuffo, G., Lokossou, H. A., Li, Z., Ziaee-Mehr, A., Hashemi, M., Quilichini, P. P., ... & Bernard, C. (2025). Mapping global brain reconfigurations following local targeted manipulations. Proceedings of the National Academy of Sciences, 122(16), e2405706122.

      Recommendations for the authors:

      We appreciate the time and effort of the reviewers, and their insightful and constructive comments to improve the paper. We have now addressed the reviewers’ comments in our revised manuscript and provide here below detailed explanations of the changes.

      We have adapted the Wilson-Cowan model to follow the same brain network modeling notation as the other models (Fig. 3 in the main text and Figs. S2–S4 in the supplementary materials). Additionally, we have included multiple figures in the supplementary material presenting extensive in-silico testing to demonstrate the accuracy and reliability of the estimations across different configurations, as well as the sensitivity to both additive and dynamical noise.

      Reviewer #1 (Recommendations for the authors):

      (1) There were some inaccurate statements throughout the text that need to be corrected.

      a) In section 2.1, paragraph 1, the authors mentioned that they would describe network models corresponding to different types of neuroimaging recordings. This is inaccurate. The models were developed to approximate various aspects of the architecture of neural circuits. They were not developed per se to solely describe a specific neuroimaging modality.

      Thank you for pointing this out. We agree that our phrasing in Section 2.1, paragraph 1, was not clear that the network models were developed to generate neural activity at the source level, and that a projection needs to be established to transform the simulated neural activity into empirically measurable quantities, such as BOLD fMRI, EEG, or MEG. We have revised the wording in the revised manuscript to clarify this point accordingly.

      b) The use of the term "spatio-temporal data features" is misleading as there are no true spatial features extracted.

      We have clarified that:Following Hashemi et al., 2024, we use the term spatio-temporal data features to refer to both statistical and temporal features derived from time series. In contrast, we refer to the connectivity features extracted from FC/FCD matrices as functional data features. We would like to retain this term, as it is used consistently in the code.

      (2) The authors need to improve the model descriptions in Equations (1)-(10). Several variables/parameters were not explained, limiting the accessibility of the work to those without prior experience in computational modeling.

      Thank you for pointing this out. In the revised manuscript, we have improved the model descriptions, all variables and parameters used in these equations.

      (3) Various things need further clarification and/or explanation:

      a) There is a need to highlight that the models section only provides examples of one of the many possible variants of the models. For example, the Wilson-Cowan model described is not your typical and more popular cortico-cortical-based Wilson-Cowan model. This is important to ensure that the work reflects an accurate account of the literature, avoiding future references that the models presented are THE models.

      This is a very important point. We have now highlighted that each model represents one of many possible variants. Moreover, we adapted the Wilson-Cowan model as a whole-brain network modeling approach to harmonize with all other models.

      b) In Figure 1, it is unclear where the empirical data come into play. The neural density estimator also sounds like a black box and needs further explanation (e.g., its architecture).

      Thank you for the careful reading. This is correct. We have now clarified where the empirical data enters as input to the neural density estimator and have added further explanation in section 2.2.

      c) There is also a need to better explain what shrinkage means and what the z-score vs shrinkage implies.

      We have elaborated on the definition of posterior z-score and shrinkage.

      d) It is unclear how the authors decided on the number of training samples to use.

      There is no specific rule for determining the optimal number of simulations required for training. In general, the larger number of simulations, within the available computational budget, the better the posterior estimation is likely to be. In the case of synthetic data, we have monitored the z-score and posterior shrinkage to assess the quality and reliability of the inferred parameters.  This also critically depends on the parameter dimensionality. For instance, in estimating only global coupling parameter, a maximum of 300 simulations was used, demonstrating accurate estimation across models and different realizations (Fig S20), except for the Jansen-Rit model, where coupling did not induce a significant change in the intrinsic frequency of regional activity. We have now pointed this out in the discussion.

      e) In the Results section, paragraph 1, there is a need to clarify that "ground truth" is available because you simulate data using predefined parameters. In fact, these predefined parameters and how they were chosen to generate the observed data were never described in the text.

      The "ground truth" is often chosen randomly within biologically plausible ranges, typically with some level of heterogeneity, and this has now been highlighted.

      f) Can the authors comment on why the median of the posterior distributions (e.g., in Figure 4E) is actually far off from the ground truth parameters? This is probably understandable in the Jansen-Ritt model due to complexity, but not obvious in the very low-dimensional Stuart-Landau oscillator model.

      This can happen due to non-identifiability in high-dimensional settings. Figure 4E represents the posterior estimation using Jansen-Rit model with high-dimensional parameters. An accurate estimation close to the true values can be observed in the low-dimensional Stuart-Landau model, as shown in Figure 5.

      g) In Figure 7, the FC and FCD matrices look weird relative to those typically seen in other works.

      We have updated Figure 7. To do the our best, we have followed the code and the parameters from the following paper Kong et al., Nat Commun 12, 6373 (2021), and the following repo https://github.com/ThomasYeoLab/CBIG/blob/master/stable_projects/fMRI_dynamics/Kong2021_pMFM/examples/scripts/CBIG_pMFM_parameter_estimation_example.py

      We considered 300 iterations for optimizing the parameters, using CMA-ES method, and with window length of 60 sec, and TR=0.72 sec, yielding a 1118 × 1118 FCD matrix for each run. Nevertheless, some discrepancy can happen with the shown FC/FCD, due to convergence of the optimization process and other model parameters.

      h) In Figure 8, results for the J parameter are missing. Also, the BOLD signal time series of some regions in Figure 8B looks very weird, with some having very large deflections.

      We have updated Figure 8. In this figure, the parameter J is not inferred; it is instead presented in the appendix (S18). Please note that the system is in a bistable regime. We have implemented the full Wong-Wang model (Deco, 2014, Journal of Neuroscience), by optimized external current and global coupling (using CMA-ES optimization) to maximize the fluidity of FCD, as those typically seen in other works:

      Author response image 1.

      i) On page 14, the authors mentioned that they perform a PCA on the FC/FCD matrices. Can the authors explain this step further and what it specifically gives out, as this is something unusual in the generative model fitting literature?

      Indeed, PCA is a widely used dimension reduction method in machine learning. Please note that in SBI, any dimensionality reduction technique, such as PCA, can be used, as long as it preserves information relevant to the target parameters.

      j) On page 3, what does ABC in ABC methods stand for?

      ABC stands for Approximate Bayesian Computation, which is now spelled out in the text.

      Reviewer #2 (Recommendations for the authors):

      Overall, I found the paper well-written. These are basically just minor comments:

      We appreciate your positive feedback.

      (1) P3:

      - Amortization requires more explanation for the neuroscience audience.

      - What does ABC stand for?

      We have elaborated on Amortization. ABC stands for Approximate Bayesian Computation, which is now spelled out in the text.

      (2) Section 2.1:

      Should clarify the parcellation used

      In section 2.1, we now mentioned that: “The structural connectome was built with TVB-specific reconstruction pipeline using generally available neuroimaging software (Schirner et al., Neuroimage 2015)”.

      (3) P20: The method for sensitivity analysis (Figure 5F) is not clearly described.

      We have now added a subsection in the Methods section to explain the sensitivity analysis.

      (4) P21: statement that 10k simulations took less than 1 min doesn't match info shown in Figure S1. Please clarify.

      This is correct, as for the Epileptor model, the total integration time is less than 100 ms. Due to the model’s stable behavior with a large time step and the use of 10 CPU cores, all simulations were completed in less than a minute. Previously (Hashemi et al., 2023) it has been reported that each VEP run to simulate 100sec of whole-brain epileptic patterns takes only 0.003 s using a JIT compiler. The other models require more computational cost due to longer integration durations and smaller time steps. We have clarified this point.

      (5) P23-24: the distribution of FCDs also doesn't match well even if we don't consider element-wise correspondence. Please clarify.

      This is correct, as we used summary statistics of the FCD, such as fluidity, and due to noise, each realization of the FCD matrix exhibits different element-wise correspondence. We have already mentioned this point.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Liang et al. have conducted a small-scale pilot study focusing on the feasibility and tolerability of Low-dose chemotherapy combined with delayed immunotherapy in the neoadjuvant treatment of non-small cell lung cancer. The design of delayed immunotherapy after chemotherapy is relatively novel, while the reduced chemotherapy, although somewhat lacking in innovation, still serves as an early clue for exploring future feasible strategies. Also, the dynamic ctDNA and TCR profiles could give some important hints of intrinsic tumor reaction.

      However, as the author mentioned in the limitation part, due to the small sample size and lack of a control group, we cannot fully understand the advantages and disadvantages of this approach compared to standard treatment. Compared to standard immunotherapy, the treatment group in this study has three differences: (1) reduced chemotherapy, (2) the use of cisplatin instead of the commonly used carboplatin in neoadjuvant therapy trials, and (3) delayed immunotherapy. Generally, in the exploration of updated treatment strategies, the design should follow the principle of "controlling variables." If there are too many differences at once, it becomes difficult to determine which variable is responsible for the effects, leading to confusion in the interpretation of the results. Moreover, the therapeutic strategy may lack practical clinical operability due to the long treatment duration.

      Thank you for your advice. As you pointed out, incorporating too many variables can obscure research findings. Our study focuses on two primary objectives: (1) to demonstrate that our approach is less toxic than the standard regimen; and (2) to fully activate the immune system in order to achieve better therapeutic outcomes. Based on these two objectives, we reduced chemotherapy dosage to alleviate toxicity, and perform delayed immunotherapy administration to alleviate the killing of activated immune cells by chemotherapy so as to maximize the immune response. Therefore, the two variables of reduced chemotherapy and delayed immunotherapy are unified in this study. The reduction of cisplatin to 60mg/m2 is supported by data for Chinese people; A retrospective study conducted by our center found that delayed immunotherapy also has great therapeutic effects. Considering the previous blood toxicity of carboplatin and albumin paclitaxel, we replaced carboplatin with cisplatin to alleviate bone marrow suppression. Usually, our patients are hospitalized for 4-7 days to receive treatment, observe and manage potential side effects, including nausea, vomiting, diarrhea, bone marrow suppression and so on. Therefore, it is convenient and feasible for immunotherapy administration on the 5th day.

      Furthermore, in the exploration of biomarkers, the authors emphasized the procedure of whole RNA sequencing in tumor tissues in the method section, and this was also noted in the flowchart in Figure 1. However, I didn't find any mention of RNA-related analyses in the Results section, which raises some concerns about the quality of this paper for me. If the authors have inadvertently omitted some results, they should supplement the RNA-related analyses so that I can re-evaluate the paper.

      Thanks for your comment. In this study, we employed a multi-omics approach involving whole transcriptome, ctDNA, and TCR sequencing to investigate the effects of a neoadjuvant treatment on NSCLC. The sequencing details are described in the Materials and Methods section. RNA-related analyses are presented in Figure S3. Given that our primary focus is on the impact of this modified treatment on immune cells, we estimate immune cell compositions by using the xCell and immunCellAI algorithms based on the RNA sequencing results. The estimated immune cell profiles have been added to Supplementary Tables 5 and 6.

      To sum up, this article exhibited a certain degree of innovation to some extent, However, due to its intrinsic design defects and data omissions, the quality of the research warranted further improvement.

      Thanks for your comment. We have provided a more detailed explanation of the administration for all patients. Additionally, we have clarified and supplemented the sequencing results to enhance the clarity and overall quality of the article.

      Reviewer #2 (Public review):

      Summary:

      In this single center, single arm, open label non-randomised study the authors tested the use of paclitaxel at 180-220 mg/m2 and cisplatin at 60mg/m2 in patients with squamous NSCLC and pemetrexed at 500mg/m2 and cisplatin at 60mg/m2 in adenocarcinoma of lung origin in the neoadjuvant setting. The chemotherapy appears to have been given at a relatively standard dose; though the platin dose at 60mg/m2 is somewhat lower than has been used in the checkmate 816 trial (75mg/m2/dose), this is a well-established dose for NSCLC.

      Key differences to currently approved neoadjuvant chemo-ICI treatment is that anti-PD1 antibody sintilimab (at 200mg/dose) was given on day 5 and that only 2 cycles of chemotherapy were given pre surgery, but then repeated on two occasions post surgery. Between May/2020 and Nov/2023 50 patients were screened, 38 went on to have this schedule of tx, 31 (~82%) went on to have surgery and 27 had the adjuvant treatment. The rate of surgery is entirely consistent with the checkmate 816 data.

      Question to the authors:

      It would be very helpful to understand why 7 (~18% of the population) patients did not make it to surgery and whether this is related to disease progression, toxicity or other reasons for withdrawal.

      Thank you for your comment. No patients were denied surgery due to disease progression or side effects. 7 patients did not undergo surgery: three declined to undergo total pneumonectomy, 2 were unable to come to our hospital for treatment because of the COVID-19 pandemic, and 2 were ineligible for radical surgery due to tumor invasion of the arteries.

      The key clinical endpoints were pCR and mPR rates. 2/38 patients are reported to have achieved a radiological pCR but only 31 patients underwent surgery with histological verification. Supp table2 suggests that 10/31 patients achieved a pCR, 6/31 additional patients achieved a major pathological response and that 13/31 did not achieve a major pathological response.

      It would be really helpful for understanding the clinical outcome to present the histopathological findings in the text in a bit more detail and to refer the outcome to the radiological findings. I note that the reference for pathological responses incorrectly is 38 patients as only 31 patients underwent surgery and were evaluated histologically.

      Thanks for your comment. The ITT population consisted of 38 individuals, of whom 31 underwent surgery. After surgery, 18 patients achieved MPR, including 12 achieved pCR and 13 achieved non-MPR. So for ITT population, the rate of pCR and MPR is 12/38 (31.6%) and 18/38 (47.4%) respectively; for patients who have completed surgery, both pCR and MPR have improved, accounting for 12/31 (38.7%) and 18/31 (58.1%) respectively (Results, line 268 to 269).

      Author response image 1.

      The treatment was very well tolerated with only 1 grade 3 AE reported. The longer term outcome will need to be assessed over time as the cohort is very 'young'. It is not clear what the adjuvant chemo-ICI treatment would add and how this extra treatment would be evaluated for benefit - if all the benefit is in the neoadjuvant treatment then the extra post-operative tx would only add toxicity.

      Please consider what the two post-operative chemo-ICI cycles might add to the outcome and how the value of these cycles would be assessed. Would there be a case for a randomised assessment in the patients who have NOT achieved a mPR histologically?

      Thanks for your comment. The purpose of postoperative adjuvant therapy is to prevent recurrence and metastasis.  Both clinical trial Keynote091 and Impower010 have achieved positive test results. The clinical trial design of Checkmate-77T is neoadjuvant therapy followed by surgery and adjuvant therapy. Checkmate-77T resulted in significantly longer event-free survival than chemotherapy in patients with resectable NSCLC. So we designed this perioperative treatment method, which is currently a common approach, hoping to reduce tumor burden and improve surgical remission rate through neoadjuvant therapy; and to kill residual tumor cells and prolong the DFS through adjuvant therapy. As for DFS, follow-up shows that there are currently 3 cases of recurrence, but the overall data is not yet mature (updated in Table S1). The side effect includes all patients who received neoadjuvant therapy and adjuvant therapy, and the addition of immunotherapy shows no new safety signals.

      While the clinical dataset identifies that the proposed reduced chemo-ICI therapy has clinical merit and should be assessed in a randomized study, the translational work is less informative.

      Thanks for your comment. As mentioned in the shortcomings of the article, our research is preliminary and exploratory, and more large-scale randomized studies are needed to be invested in the future.

      The authors suggest that the treatment has a positive impact on T lymphocytes. Blood sampling was done at day 0 and day 5 of each of the four cycle of chemotherapy with an additional sample post cycle 4. The authors state that data were analysed at each stage.

      The data in Figure 3B are reported for three sets of pairs: baseline to pre day 5 in cycle 1, day 5 to day 21 in cycle 1, baseline of cycle to to day 5. It remains unclear whether the datasets contain the same top 20 clones and it would be very helpful to show kinetic change for the individual 'top 20 clones' throughout the events in individual patients; as it stands the 'top20 clones' may vary widely from timepoint to timepoint. Of note, the figures do not demonstrate that the top 20 TCR clones were 'continuously increased'.

      Thanks for your comment. The data in Fig. 3B do not represent the overlapping top 20 clones across all samples but rather illustrate the changes in the individual top 20 clones for each patient. The changes in the top 20 TCR clones during neoadjuvant treatment for specific samples are shown in Fig. S1. Due to tumor heterogeneity, both within and between samples, the top 20 clones for each patient at the same time point may differ. Additionally, since the top 20 TCR clones can vary between stages as a result of antigen exposure over time, the top 20 clones for the same patient may also differ across different time points. Indeed, when analyzing the data, we measured the dynamic changes of the top 20 TCR clones across three stages in cycle 1, and describing these changes as "continuously increased" may not be entirely accurate. Therefore, we believe it is more accurate to correct it to a phased increase. (Results line 293).

      Instead, the data suggest that there are fluctuations in the relative distributions over time but that may simply be a reflection of shifts in T cell populations following chemotherapy rather than of immunological effects in the cancer tissue.<br /> Consistent with this the authors conclude (line 304/5): "No significant difference was observed in the diversity, evenness, and clonality of TCR clones across the whole treatment procedure" and this seems to be a more persuasive conclusion than the statement 'that a positive effect on T lymphocytes was observed' - where it is also not clear what 'positive' means.

      Thanks for your comment. The scores for diversity, evenness, and clonality assess changes in the overall TCR repertoire. In our cohort, we did not observe significant changes in these three metrics throughout the treatment process, indicating the overall stability of the TCR repertoire. Despite this overall stability, we observed a significant increase in the top 20 and large clones—representative of major TCR clone dynamics—during the treatment period. Additionally, integrating RNA results (Table S5-S6 and Fig. S3) from baseline and surgical samples, we found an increasing trend in the proportion of T cells following neoadjuvant therapy. Therefore, we suggested that the treatment has a positive effect on T lymphocytes.

      The text needs a more balanced representation of the data: only a small subset of four patients appear to have been evaluated to generate the data for figure 3B and only three patients (P5, P6, P7) can have contributed to figure 3C if the sample collection is represented accurately in Figure 3A.

      Thanks for your comment. In Fig. 3B, we utilized TCR data from six patients (P1, P2, P3, P10, P11, P12) for the period from day 1 to day 5 of cycle 1. For the period from day 5 of cycle 1 to day 1 of cycle 2, we used data from six patients (P1, P2, P5, P10, P11, P12). For the period from day 1 of cycle 2 to day 5 of cycle 2, we included data from five patients (P2, P4, P10, P11, P12). In Fig. 3C, we used TCR data from eight patients (P1, P2, P4, P6, P7, P10, P11, P12) to generate the images for cycle 1, and data from two patients (P6, P7) to create the images for cycle 3. Therefore, the sampling illustration in Fig. 3A is accurate.

      The text refers to flow cytometric results in SF3. However, no information is given on the flow cytometry in M&M, markers or gating strategy.

      Thanks for your comment. In this study, we performed tissue sampling and whole transcriptome sequencing at both the baseline and surgical stages. Based on the sequencing results, we evaluated T cell populations using two algorithms, xCell and immunoCellAI, and detailed the analysis procedures in the Methods and Materials section. Additionally, we have included the assessment results from both algorithms in Supplementary Tables 5 and 6.

      Please consider changing the terminology of the 'phases' into something that is easier to understand. One option would be to use a reference to a more standard unit (cycle 1-4 of chemotherapy and then d0/d5/d21).

      Thanks for your advice. Since each treatment cycle consists of both chemotherapy and immunotherapy, with chemotherapy administered on day 1 and immunotherapy on day 5 of each cycle, blood samples are collected at these two time points. Following your suggestion, we will use the notation d0/d5 within each treatment cycle to better clarify this process for the readers.

      Please make it explicit in the text that molecular analyses were undertaken for some patients only, and how many patients contribute to the data in figures 3B-F. Figure 3A suggests paired mRNA data were obtained in 2 patients (P2 and P5) but I cannot find the results on these analyses; four individual blood samples to assess TCR changes int PH1/PH2/PH3and PH4 were only available in four patients (P4,P5,P7,P9). Only three patients seem to have the right samples collected to allow the analysis for 'C3' in figure 3C.

      Thanks for your comment. In Fig. 3B and 3D, we used TCR data from six patients (P1, P2, P3, P10, P11, P12) for the period from day 0 to day 5 of cycle 1. For the period from day 5 of cycle 1 to day 0 of cycle 2, data from six patients (P1, P2, P5, P10, P11, P12) were used. For the period from day 0 of cycle 2 to day 5 of cycle 2, we included data from five patients (P2, P4, P10, P11, P12). In Fig. 3C and 3E, TCR data from eight patients (P1, P2, P4, P6, P7, P10, P11, P12) were used to generate the images for cycle 1, while data from two patients (P6, P7) were used to create the images for cycle 3. In Fig. 3F, all patients who underwent sequencing are included in the analysis, with each patient's data represented by dots of different colors.

      For the mRNA data, we sampled and sequenced five patients (P1, P2, P4, P5, P7) before treatment. During the surgical phase, we sampled and sequenced three patients (P2, P5, P6). The T cell assessments and comparisons based on the mRNA sequencing results are presented in Fig. S3 and Tables S5-S6.

      Please display for each of the 'top 20 clones' at any one timepoint how these clones evolve throughout the study; I expect that a clone that is 'top 20' at a given timepoint may not be among the 'top twenty' at all timepoints.

      Thanks for your comment. Yes, due to the heterogeneity of tumors, a variety of different antigens are exposed during the course of cancer treatment. As a result, the formation of TCR dominant clones is a dynamic process, with new dominant clones emerging at each stage. Therefore, the top 20 clones at each time point do not necessarily represent the overall top 20 clones across all time points. However, there is still some overlap in the dominant TCR clones. We have chosen to present the data from P2, which provides the most complete results throughout the entire treatment process.

      Author response image 2.

      Please also assess if the expanded clonotypes are present (and expanded) in the cancer tissue at resection, to link the effect in blood to the tumour. Given that tissue was collected for 31 patients, mRNA sequencing to generate TCR data should be possible to add to the blood analyses in the 12 patients in Figure 3A. Without this data no clear link can be made to events in the cancer.

      Thanks for your comment. Due to limitations in sampling conditions, we were unable to collect samples from all patients at every time point. As shown in Fig. 3A, we performed tissue sampling and RNA sequencing on five patients (P1, P2, P4, P5, P7) before treatment. During the surgical phase, we sampled and conducted RNA sequencing on three patients (P2, P5, P6). This study primarily focuses on TCR analysis in peripheral blood. The relationship between peripheral blood TCR and tissue TCR clones will be addressed in future research.

      Please provide in M&M the missing information on the flow cytometry methodology (instrument, antibody clones, gating strategy) and what markers were used to define T cell subsets (naïve, memory, central memory, effector memory).

      Thanks for your comment. In this study, we evaluated immune cells based on RNA sequencing results rather than using flow cytometry. Subsequently, we compared T cell subsets between the baseline and post-neoadjuvant treatment stages. The steps for RNA sequencing and the evaluation of immune cells using the xCell and ImmunoCellAI algorithms are detailed in the Methods and Materials section. The comparison of T cell subsets is presented in Fig. S3. The estimated immune cell data have been added to Tables S5 and S6.

      The authors also describe that ctDNA reduces after chemo-ICI treatment. This is well documented in their data but ultimately irrelevant: if the cancer volume is reduced to the degree of a radiological or pathological response /complete response then the quantity of circulating DNA from the cancer cells must reduce. More interesting would be the question whether early changes predict clinical outcome and whether recurrent ct DNA elevations herald recurrence.

      Thanks for your comment. If the tumor responds to treatment, its volume will decrease. Over the long term, ctDNA levels in the blood are expected to decline. However, in the short term, as tumor cells are killed, there may be a surge of ctDNA released into the patient's bloodstream, potentially causing a rise in the maxVAF. Based on the current follow-up data, the ctDNA maxVAF for patient P8 has increased compared to baseline levels. However, given the relatively short follow-up period, no recurrence has been observed yet.

      Please probe whether the molecular data identify good radiological or pathological outcomes before cycle 2 is started and whether the ctDNA levels identify patients who will have a poor response and/or who relapse early.

      Thanks for your comment. Before initiating Cycle 2 of treatment, we observed all patients whom performed ctDNA sequencing. Among them, Patients P1 to P4 were classified as MPR, whereas Patients P5 to P9 were categorized as non-MPR. It was noted that Patients P7 and P8 showed a trend of increasing maximum variant allele frequency (maxVAF) in their ctDNA. Thus, 50% (2 out of 4) of the MPR patients could be identified as having potential issues through molecular testing before Cycle 2. Additionally, only P3 experienced a recurrence, which was predicted by molecular testing prior to starting cycle 2.

      Author response image 3.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      I have some detailed comments for the authors:

      (1) Please explain the reason for putting forward the opinion that "cytotoxic drugs with standard doses and anti-PD1 antibody were administrated on the same day (9), which may result in unsatisfactory eradication rates and relatively high incidence of severe treatment-related adverse events (TRAEs)" (Page 3 Line 76), especially "unsatisfactory eradication rates". Is this based on actual evidence, or is it purely theoretical speculation?

      Thanks for your comment. Our team have done relative research to explore impact of the combined timing of PD-1/PD-L1 inhibitors and chemotherapy on the outcomes in patients with refractory lung cancer. Our findings suggest that administering PD-1/PD-L1 inhibitors 1-10 days (especially 3-5 days) after chemotherapy is superior to administering PD-1/PD-L1 inhibitors before or concurrent with chemotherapy in patients with refractory lung cancer, but this result needs to be further explored by prospective studies. So we infer that cytotoxic drugs with standard doses and anti-PD1 antibody were administrated on the same day may lead to unsatisfactory eradication rates and more side-effects.

      Yao W, Zhao X, Gong Y, Zhang M, Zhang L, Wu Q, et al. Impact of the combined timing of PD-1/PD-L1 inhibitors and chemotherapy on the outcomes in patients with refractory lung cancer. ESMO Open. 2021;6(2):100094.

      (2) Due to the lack of a control group, we cannot assess the advantages and disadvantages of this treatment strategy compared to standardized neoadjuvant immuno-chemotherapy. We can refer to historical data. In the current clinical trials on neoadjuvant chemotherapy combined with immunotherapy (CheckMate-816, etc), what is the proportion of patients who had their chemotherapy reduced due to adverse reactions? Is there a difference in their efficacy? This could serve as a good historical reference.

      Thanks for your comment. In checkmate816, the rate of off neoadjuvant treatment in treatment group and control treatment group is 5.7% and 6.8% respectively. No patients have reduced their chemotherapy dosage due to intolerable side effects. However, it’s a excellent suggestion to find a historical refence, so we will check details in other clinical trials.

      (3) Among the 38 patients, there are 21 cases of SCC and 17 cases of LUAD. From the protocol, it can be seen that SCC patients had both albumin-bound paclitaxel and cisplatin reduced, whereas LUAD patients did not have a reduction in pemetrexed, only in cisplatin. Considering the different pathological subtypes and treatment strategies, I suggest the author to present the efficacy data for SCC and LUAD separately rather than combining them together.

      Thanks for your comment. In this cohort of 31 patients who underwent pathological evaluation, the ratio of squamous cell carcinoma (SCC) to lung adenocarcinoma (LUAD) was 16 vs 15. Upon comparing the groups, no statistically significant difference was observed in the treatment efficacy between SCC and LUAD patients.

      Author response table 1.

      (4) In the discussion, the authors mention that during the adjuvant treatment phase, "no significant change was observed in evenness or clonality of TCR" (Page 13, Line 364). However, in Figure 3E, it can be seen that the evenness and clonality of TCR during the adjuvant treatment phase (i.e., C3) are significantly increased (P < 0.05).

      Thanks for your comment. For the TCR repertoire evenness and clonality, we present these metrics in Fig. S2 B-C. Throughout the treatment process of all patients, there were no significant changes in the Pielou index (representing evenness) or clonality. In Fig. 3E, we defined TCR clones with a frequency greater than 0.001 as "large clones" and examined their changes during cycle 1 and cycle 3. Therefore, although there was a significant increase in large clones during cycle 3, the overall TCR evenness and clonality did not show notable changes.

      (5) The authors indicated that low-dose chemotherapy does not inhibit TCR expansion; however, due to the lack of a control group, we cannot conclude that "standard doses would affect TCR expansion." To better explore this possibility, please analyze the differences in TCR expansion between patients with bone marrow suppression and those without.

      We analyzed the incidence of bone marrow suppression in patients who underwent blood TCR testing. The statistical results are shown in the figure below. Patients were grouped based on the presence or absence of bone marrow suppression to compare differences in TCR clonal dynamics between the two groups during neoadjuvant therapy. As shown in the figure below, patients in the non-bone marrow suppression group exhibited higher TCR diversity (SW score) during treatment compared to those in the bone marrow suppression group. During neoadjuvant therapy, the dominant clones in both groups significantly increased from c2d0 to c2d5. However, from c1d0 to c2d0, there was no significant change observed in the non-bone marrow suppression group, possibly due to the limited sample size. Additionally, Patient P11 in the non-bone marrow suppression group showed a downward trend in dominant clones from c1d5 to c2d0, which may have influenced the overall results for this group during this phase.

      Author response table 2.

      Author response image 4.

      (6) In the analysis of ctDNA maxVAF, I noticed that one patient showed a significant drop at T1 (after C1 chemotherapy), followed by a notable rebound at T2 (after C1 delayed immunotherapy), and then a decline again at T3 (after C2 chemotherapy). Theoretically, maxVAF can reflect tumor burden and should change in accordance with treatment response. Could this indicate that the patient has a poor response to the delayed immunotherapy without concurrent chemotherapy? Additionally, please examine this patient's efficacy separately. What is the status of dynamic TCR? Does it show a trend opposite to that of maxVAF?

      Thanks for your comment. For Patient P7, the radiological evaluation reached PR, while the pathological assessment was non-MPR. The naming of time points has been revised according to the requirements: T0, T1, T2, and T3 were changed to c1d0, c1d5, c2d0, and c2d5, respectively. Combining both radiological and pathological evaluations, the patient experienced a certain degree of tumor shrinkage during neoadjuvant therapy but still retained some residual tumor cells. Theoretically, maxVAF can reflect the tumor burden in the bloodstream as a real-time indicator of treatment response. For patients with long-term benefits, maxVAF is expected to decrease as tumors are eliminated. However, in the short term, the release of large amounts of clonal ctDNA from destroyed tumor cells may lead to a temporary increase in maxVAF. Therefore, it is not possible to conclude that this patient had an adverse response to delayed immunotherapy based on individual cases. The increase in maxVAF from c1d5 to c2d0 might result from the extensive release of newly exposed antigens. During this period, the top 20 and large clone TCRs did not show significant changes, suggesting that the patient's immune response was insufficient, leading to suboptimal neoadjuvant treatment efficacy and failure to achieve MPR. Additionally, there were no noticeable changes in maxVAF or TCR metrics from c1d0 to c2d0 for this patient, indicating that there is no evidence to suggest an inverse trend between TCR and maxVAF.

      Author response image 5.

      (7) In line with the previous question, another patient's maxVAF shows a significant rebound at T3. Please examine this patient's efficacy as well as the status of dynamic TCR.

      Thanks for your comment. For Patient P4, the radiographic assessment showed SD, while the pathological assessment indicated a MPR. Although the reduction rate of the tumor volume in this patient was low, the tumor cell content within the lesion was less than 10%, which suggests that this patient had a good response to neoadjuvant therapy. From c1d0 to c2d0, the maxVAF of this patient showed a downward trend, while there was no significant change in the dominant clone indices of the TCR. From c2d0 to c2d5, both the maxVAF and the TCR dominant clone indices increased significantly. This implies that this patient had a stronger immune response level compared to Patient P7.

      Author response image 6.

      Minor Comments:

      (1) Figure 2E shows only OS, but the corresponding description in the text mentions that OS and DFS are not reached.

      Thanks for your comment. Both OS and disease-free survival (DFS) records are available in Table S1. By January 31, 2025, the follow-up data were updated for 31 patients in Supplementary Table1. Among them, three patients experienced tumor recurrence, one of whom passed away. Additionally, seven patients were lost to follow-up. As a result, neither the overall survival (OS) nor the progression-free survival (PFS) reached the median number of events required for analysis. Since neither OS nor DFS have reached their median values, we opted to display only the OS in Fig. 2E.

      (2) In the Discussion section, it is mentioned that there is controversy regarding chemotherapy combined with immunotherapy. I disagree with this statement. I believe that chemotherapy combined with immunotherapy is a consensus. The wording should be revised accordingly.

      Thanks for your comment. Yes, as you said, the combination of chemotherapy and immunotherapy has become a consensus. What we want to express is that how to optimize the administration time and dosage is worth further exploration. We will make a revise accordingly (Discussion line 328-331).

      (3) The authors mentioned that the study involves multi-omics, but only ctDNA and TCR levels are included, with no RNA-related content observed. Perhaps a different term could be used.

      Thanks for your comment. In this study, we employed a multi-omics approach involving whole transcriptome, ctDNA, and TCR sequencing to investigation. RNA-related analyses are presented in Figure S3. Given that our primary focus is on the impact of this modified treatment on immune cells, we utilized RNA sequencing results to estimate immune cell compositions using the xCell and immunCellAI algorithms. The estimated immune cell profiles have been added to Supplementary Tables 5 and 6.

      Reviewer #2 (Recommendations for the authors):

      Additional comment to the authors:

      The methods section refers to mRNA sequencing of the tumour tissue to define immune cell populations. Figure 3A also identifies that up to two timepoints were to be sequenced for individual patients. I could not find the results in the document.

      Please review the methods section and remove experimental methods where no data are presented.

      Thanks for your comment. As shown in Fig. 3A, for the mRNA data, we sampled and sequenced five patients (P1, P2, P4, P5, P7) before treatment. During the surgical phase, we sampled and sequenced three patients (P2, P5, P6). Then we utilized RNA sequencing results to estimate immune cell compositions using the xCell and immunCellAI algorithms. The estimated immune cell data have been added to Supplementary Tables 5 and 6. The T cells proportion comparisons were shown in fig. S3. The description of Whole transcriptome sequencing and immune cell abundance estimation were detailed in methods section.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer 1

      Since they used PBMCs, without other assays to confirm the cell subtypes, I am not sure if any of the heterogeneity they detected in 6 cytokine secretion would be able to relate back to biology.

      We agree with the reviewer that we cannot relate cytokine secretion back to specific cell populations and that part of the heterogeneity observed is due to various cellular populations and subpopulations. However, we would argue that the results obtained from measuring PBMCs especially relate to biology, not cellular identity, and provide useful information on how PBMCs will respond to a specific challenge since they offer more clinical relevance in patient stratification and monitoring. Thus, the possibility of identifying trends in polyfunctional cytokine secretion is not hindered by the isolated view of one specific cellular subpopulation. However, we agree that future experiments must identify the polyfunctional cells and decipher the extent of heterogeneity within the population.

      In addition, the two panels were measured on separate cells, I am not sure it is meaningful to make any comparisons of the two panels as they are on different cells.

      Thank you for mentioning this point. If this refers to Figure 3, where we compare the percentage of secreting cells incubation times, these cells are all individual data points, i.e., individual cells and then pooled. It is true that, potentially, these could be similar cell types (a cell co-secreting TNFa/IL-6 could also co-secrete IL-8/MIP-1a). Since they originate from the same cell batch and stimulation, only divided before encapsulation, we think it is a valid comparison as this would also be done in ELISpot or similar techniques.

      Reviewer 2

      The conclusions of the study are based on samples from a single donor, which makes the conclusions on secretion patterns difficult to interpret. The choice of cytokines is explained, but the justification of the groupings of the antibodies into the two panels is missing.

      Thank you for highlighting this valid criticism. We chose to use cells from one donor to examine the secretion patterns observed in one individual, as cells from different individuals might respond differently. The focus of the experiments described in this study was to describe secretion patterns with respect to the incubation times and secreted cytokine, including multiple donors, which would address a different question (i.e., how is polyfunctionality different between individuals). The cytokines were grouped according to expected secretion to observe overlaps between different cell types (to increase the chance of seeing secretion from both panels simultaneously). We have added complementary text discussing the justification of cytokine grouping in the updated manuscript.

      It would further be helpful to discuss how the single cell incubation might affect the secretion dynamics vs. the influence of co-culture of all cell types during the 24 h activation.

      Thank you for this input. We discussed this potential limitation in detail in a previous publication (Portmann et al., Cell Reports Methods, 2023) and added some addressing sentences to the discussion.

      The authors compare average secretion rates and levels. However, the right panel in Fig. 6 looks like there might be two different populations of mono- or polyfuntional cells that have two secretion rates. As the authors have single-cell data, I would find the separation into these populations more meaningful than comparing the mean values. In line with this comment, comparing the mean values for these cytokines instead of the mean of the populations with distinct seretion properties might actually show stronger differences than the authors report here.

      Thank you for this addition. This plot focuses on describing the relationship between secretion and incubation times. We agree that the data can be further divided into high and low secretion and the respective average plot. However, we finally decided against such a solution to avoid bias due to small event counts in certain high- and low-polysecreting populations. We checked whether dynamics are different between these populations, and the individual averages largely follow the overall trend, although on different plateaus – indeed, high-secreting cells will reach a plateau due to saturation. We have added the plot for IFNy here to visualize this point.

      Author response image 1.

      Is the plateau of the cytokine concentration caused by the fluorescence signal saturating the camera, saturation of the magnetic beads, exhaustion of the fluorescent antibodies, or constant cytokine concentrations?

      Thank you for raising this point. On the individual cell level, the plateau is caused by assay capacity limitations for high-secreting cell populations, i.e., the capacity of the nanoparticles. For low secreting populations, the plateau is caused by a cease in secretion, whereas for high-secreting cells, the capacity will be limiting. This has been extensively discussed in Portmann et al., Cell Report Methods, 2023.

      The high number of non-CSCs and the limited number of droplets decrease the statistical power of the method. The authors discuss their choice to use PBMCs and not solely T cells, but this aspect is missing in the discussion.

      As mentioned above, we chose PBMCs for their better representability and heterogeneity in clinical settings. Indeed, focusing on secreting cell subpopulations would increase the percentage of CSCs and the number, but we found the method to be sufficiently statistically powerful for our measurements. However, we also agree with the comment raised by reviewer 1 that a focus on a specific cell population might be interesting for many questions and applications. We have added respective text to the discussion section.

      The absolute cell number is missing. This might also answer the question of whether polyfunctional cells turn into monofunctional cells after stimulation for 24 hours or if the monofunctional population expands more.

      We are unsure of this comment. If the reviewer refers to a potential expansion ex vivo over 24 h, we have checked this for different conditions and could not observe cellular expansion within this timeframe – the numbers remained mostly stable, sometimes decreasing and only increasing in CD3/CD28. However, an overall change in cell counts does not necessarily relate to the functionalities of individual cells. This observation, combined with our results, hints towards a dynamic cellular restriction of polyfunctionality, but is no direct evidence for such a hypothesis as individual cells need to be followed in such an experiment over a much larger time frame.

      Fig. 4: Using a divergent colour scheme would be helpful. Fig. 6: Adding labels with the stimulation next to the plots would be helpful.

      We have changed the figures accordingly.

      A limitation of the approach is that the detection of polyfunctionality relies on how the three cytokines in each panel are selected and comparisons between the two panels are not otherwise helpful. Can the authors discuss how many panels would be needed to fully explore polyfunctionality among the six cytokines?

      Thank you for this comment. We agree that the identification of polyfunctional cells is dependent on the panel selection, and its composition. We had to select respective panels, and based our initial choice for this study on expected secretion behavior from PBMCs, instead of engineering panels specific for one cell type. However, these panels can be adapted to study additional questions. Interesting point. 6 cytokines into groups of 3 allows for 20 possible combinations. However, we very rarely see triple positive polyfunctional cells, and not all combinations would make sense due to cellular restrictions and differences in stimulations.

      Is there any way to increase the number of cytokines that could be detected in one droplet?

      This can be done on a lower throughput scale by removing the Cell Trace violet stain. This would allow the current method to measure up to 4 cytokines. An alternative would be adding different fluorophores without spectral overlap so that the throughput could increase to around 6-7 max, allowing us to measure polyfunctionality in a less biased manner. Other solutions are needed if >6-7 cytokines should be measured. Our experiments (with high-throughput cytokine detection systems, Fireplex and Isoplexis, i.e., 17-18 cytokines) showed that cells rarely secreted more than three cytokines at a time.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Ma, Yang et al. report a new investigation aimed at elucidating one of the key nutrients S. Typhimurium (STM) utilizes with the nutrient-poor intracellular niche within the macrophage, focusing on the amino acid beta-alanine. From these data, the authors report that beta-alanine plays an important role in mediating STM infection and virulence. The authors employ a multidisciplinary approach that includes some mouse studies and ultimately propose a mechanism by which panD, involved in B-Ala synthesis, mediates the regulation of zinc homeostasis in Salmonella. The impact of this work is questionable. There are already many studies reporting Salmonella-effector interactions, and while this adds to that knowledge it is not a significant advance over previous studies. While the authors are investigating an interesting question, the work has two important weaknesses; if addressed, the conclusions of this work and broader relevance to bacterial pathogenesis would be enhanced.

      Strengths:

      This reviewer appreciates the multidisciplinary nature of the work. The overall presentation of the figure graphics are clear and organized.

      Weaknesses:

      First, this study is very light on mechanistic investigations, even though a mechanism is proposed. Zinc homeostasis in cells, and roles in bacteria infections, are complex processes with many players. The authors have not thoroughly investigated the mechanisms underlying the roles of B-Ala and panD in impacting STM infection such that other factors cannot be ruled out. Defining the cellular content of Zn2+ STM in vivo would be one such route. With further mechanistic studies, the possibility cannot be ruled out that the authors have simply deleted two important genes and seen an infection defect - this may not relate directly to Zn2+ acquisition.

      Thank you for your patient and thoughtful reading, as well as the constructive comments and advice regarding our manuscript. We have revised the manuscript based on your comments and suggestions.

      You are correct that this work has not thoroughly investigated the mechanisms underlying the roles of β-alanine, panD, and zinc in impacting Salmonella infection. It is challenging to isolate sufficient amounts of Salmonella from infected cells or tissues and then measure the zinc concentration in the bacteria, and we have attempted to do so without success. Therefore, we investigated the zinc content in mouse liver and RAW264.7 cells infected with Salmonella Typhimurium 14028s wild-type (WT) and panD mutant (Δ_panD_), which can indirectly reflect zinc acquisition by intracellular Salmonella. We observed that the zinc content in Δ_panD_-infected mouse liver macrophages and RAW264.7 cells was increased compared with that in WT-infected mouse liver macrophages and RAW264.7 cells, respectively (Figures 5E and 6A). This implies that the panD gene and β-alanine are important for Salmonella to absorb zinc from host cells. This information has been added to the revised manuscript (lines 325-329, 344-348).

      Meanwhile, we concur that additional, unknown mechanisms are involved in the virulence regulation by β-alanine in Salmonella. Our findings indicate that the double mutant Δ_panD_Δ_znuA_, which cannot synthesize β-alanine nor uptake zinc, is more attenuated than the single mutant Δ_znuA_ (Figures 5D and 6B). This suggests that the contribution of β-alanine to Salmonella's virulence is partially dependent on zinc acquisition. We have revised the related descriptions throughout the manuscript for clarity (lines 31, 304, 341,1056, 1068).

      Second, the authors hint at their newly described mechanism/pathway being important for disease and possibly a target for therapeutics. This claim is not justified given that they have employed a single STM strain, which was isolated from chickens and is not even a clinical isolate. The authors could enhance the impact of their findings and relevance to human disease by demonstrating it occurs in human clinical isolates and possibly other serovars. Further, the use of mouse macrophage as a model, and mice, have limited translatability to human STM infections.

      We thank you for your comments and advice on our manuscript and are delighted to accept them. Salmonella Typhimurium causes systemic disease in mice, which is similar to the symptoms of typhoid fever in humans and has been widely used to explore the pathogenesis of Salmonella. Based on your comment, we have now performed additional experiments to confirm several key points of our findings in another typical Salmonella serovar, Salmonella enterica serovar Typhi, which is a human-limited serovar and the cause of typhoid fever in humans (PLoS Pathog. 2012, 8(10):e1002933).

      We constructed the panD mutant strain (ΔpanD) in the S. Typhi strain Ty2 and  subsequently compared the replication of ΔpanD with that of the Ty2 wild-type in the human THP-1 monocyte like cell line (ATCC TIB-22) using gentamicin protection assays. The results showed that the replication of ΔpanD in THP-1cells was reduced by 2.6-fold at 20 h post-infection compared to the Ty2 wild-type strain  (P < 0.01) (Figure 2_figure Supplement 3), suggesting that panD also facilitates S. Typhi replication in human macrophages and may be involved in the systemic infection of S. Typhi in humans. This result has been included in the revised manuscript. (lines 203-210).

      Based on these results, we speculate that PanD may serve as a potential target for treating Salmonella infection.

      Reviewer #1 (Recommendations for the authors):

      (1) Line 28. Latin phrases like de novo should be italicized.

      Thank you for your careful review. We have revised the manuscript thoroughly (Lines 28, 65, 77, 106, 171, 173, 214, 1002, 1023, 1078).

      (2) Line 45. 'survival' typo.

      We have corrected it in the revised manuscript (Line 45).

      (3) Line 57. What evidence or prior work supports the SCV of macrophages in a nutrient-poor environment? Citation needed.

      The relevant reference has now been added (lines 62-63).

      (4) Lines 65-68. If an 'increasing number of studies have focused' on this topic, please cite them here.

      The relevant reference has now been added (lines 72-73).

      (5) Lines 69-71. Citations are needed for these claims.

      The relevant reference has now been added (lines 76-77, 79-80).

      (6) Line 76-77. Citation needed for this claim.

      The relevant reference has now been added (lines 84, 86).

      (7) Line 116-122, and Figure 1C, and Figure 1 legend. An important claim in this work is that the amino acid content of the macrophage cytoplasm is different +/- STM infection. The authors need to explain this result more carefully and define their acronyms. What is VIP, Log2 FC, etc.? What do the colors in Figure 1C mean? They are not defined. If possible, it would be more approachable to list these as molar concentrations, weight/cell, or number of molecules/cell. The authors should calculate an effect size for each of these data to help assess if the differences are meaningful. Without this information, and a clearer explanation of what these data are, it is difficult to evaluate the authors' claim that "8 [amino acids] showed significant differences in abundance."

      Thank you for the comment. The full names of VIP (Variable Importance in the Projection) and FC (fold change) have been included in the revised manuscript. In Figure 1C of the original manuscript, pink represents the content of amino acids that increased following Salmonella infection, whereas blue signifies the content of amino acids that decreased after Salmonella infection.

      Based on your suggestion, we have revised Figure 1C (now Figure 1C, D in the revised manuscript) and the content of amino acids is now expressed as weight per cell (ng/ 10<sup>7</sup> cells). The legend has been updated accordingly. (lines 9931-997).

      (8) Line 134-138. Additional controls are required for this experiment. By adding a nutrient (B-Ala) you have increased the nutrient availability and growth potential of the bacteria. This may not relate to anything special to B-Ala. Perhaps the addition of another amino acid, or sugar, would have a similar impact. Further, this result would be more compelling if the authors demonstrated a dose-dependent effect of B-Ala addition.

      Thank you for the comment. To further confirm that host-derived β-alanine can promote intracellular Salmonella replication, we have added varying concentrations of β-alanine (0.5, 1, 2, and 4 mM) to the culture medium (RPMI) of RAW264.7 cells. Subsequently, we infected these cells with Salmonella to assess the impact of β-alanine supplementation on the bacterium's replication within macrophages. Our observations indicate that the addition of 1, 2, and 4 mM β-alanine significantly (P < 0.001) enhanced Salmonella replication in RAW264.7 cells. Furthermore, the increase in Salmonella intracellular replication was dose-dependent, as illustrated in the revised Figure 1E. These findings suggest that host-derived β-alanine facilitates Salmonella replication inside macrophages. We have included these results in the revised manuscript (lines 141-149).

      (9) Lines 181-184, and Figure 2E. In addition to the fold-change replication data, here and elsewhere the authors should provide raw CFU counts for data transparency.

      Thank you for bringing this to our attention. In this work, we have utilized “fold intracellular replication (20 h intracellular bacterial CFU/ 2 h intracellular bacterial CFU)” to illustrate the differences in intracellular replication of different Salmonella strains in macrophages. The term “fold intracellular replication” is commonly employed in recently published reports (eg. FEMS Microbiol Lett. 2024, 9;371:fnae067; mBio. 2024, 15(7):e0112824; Front Microbiol. 2024, 14:1340143). To ensure data transparency, we have included the raw CFU counts in the source data file.

      (10) Line 197. Why employ i.p. injection of STM? As a non-typhoidal serovar, STM infection is enteric, and so i.p. injection seems very artificial if the goal is to understand the role B-Ala synthesis in disease.

      Thank you for the comment. Salmonella can induce gastroenteritis or systemic infection, which are associated with its capacity to invade intestinal epithelial cells and replicate within macrophages, respectively. In this study, using gentamicin protection assays and immunofluorescence analysis, we demonstrated that β-alanine is crucial for Salmonella replication inside macrophages. Since replication in macrophages is a key determinant of systemic Salmonella infection, we hypothesized that β-alanine also affects Salmonella systemic infection in vivo. Intraperitoneal (i.p.) injection enables Salmonella to disseminate directly to systemic sites via the lymphatic and bloodstream systems, bypassing the need for intestinal invasion (Microbiol Res. 2023, 275:127460; Int Immunopharmacol. 2016, 31:233-8). Thus, we conducted the mice infection assays via intraperitoneal (i.p.) injection to ascertain whether β-alanine affects systemic Salmonella infection. We have included the description in the revised manuscript to enhance clarity. (lines 217-221).

      Whether β-alanine influences Salmonella invasion of intestinal epithelial cells and intestinal colonization has not been investigated in this work; this issue will be explored in our future studies.

      (11) Line 207-214 and Figure 3. If the hypothesis is that B-Ala mediates STM survival/virulence through enhancing metabolism in the SCV and intracellular niche, why did the authors not investigate/enumerate STM in this niche in their in vivo studies?

      Thank you for the comment. Through immunofluorescence staining, we have investigated the bacterial count of Salmonella wild-type (WT), panD mutant (Δ_panD_), and complemented strain (cpanD) within the macrophages of the mouse liver. The findings indicated that the number of Δ_panD_ in each liver macrophage was significantly (P < 0.0001) lower than that of WT, and the complementation of Δ_panD_ increased the bacterial count in each liver macrophage to the level of WT (refer to Figure 3E in the revised manuscript). These results have been included in the revised manuscript. (lines 234-239).

      (12) Figure 4B - the down genes label is cut off.

      Thank you for your careful review. We have corrected it in the revised Figure 4B.

      (13) Line 260-265. SPI-2 needs to be defined and introduced, as do other terms here, to make the work approachable to non-STM specialists.

      The introduction of SPI-2 has been added to the revised manuscript. (Lines 290-292).

      (14) Line 300-301. Additional experiments are needed to support the claim that "data indicate that β-alanine promotes in vivo virulence of Salmonella, partially by increasing the expression of zinc transporter genes." Gene up- or down-regulation does not necessarily have any meaningful impact on function or activity. The authors here need an assay that confirms that the function of znuA is disrupted, such as examining the cell Zn2+ content in vivo at different levels of B-Ala exposure and/or panD activity. Moreover, more Zn2+ is not necessarily beneficial for STM, at levels too high zinc can exert cell toxicity. So, the authors have a correlation but no data supporting this mechanism explains their observations of virulence and infection. How much Zn2+ is ideal for STM growth?

      Thank you for the comment. It is challenging to isolate sufficient amounts of Salmonella from infected cells or tissues and then measure the zinc concentration in the bacteria, and we have attempted to do so without success. Therefore, we investigated the zinc content in mouse liver and RAW264.7 cells infected with Salmonella Typhimurium 14028s wild-type (WT) and panD mutant (ΔpanD), which can indirectly reflect zinc acquisition by intracellular Salmonella. We observed that the zinc content in Δ_panD_-infected mouse liver macrophages and RAW264.7 cells was increased compared with that in WT-infected mouse liver macrophages and RAW264.7 cells, respectively (Figures 5E and 6A). This implies that the panD gene and β-alanine are important for Salmonella to absorb zinc from host cells. This information has been added to the revised manuscript (lines 325-329, 344-348).

      Zinc is essential for bacterial survival and growth, as zinc-binding proteins constitute approximately 5% of the bacterial proteome and play crucial roles in bacterial metabolism and growth (J Proteome Res. 2006, 5(11):3173-8; Future Med Chem. 2017, 9(9):899-910). Regarding Salmonella, zinc is also employed to undermine the antimicrobial host defense mechanisms of macrophages, by inhibiting NF-кB activation and impairing NF-кB-dependent bacterial clearance (J Biol Chem. 2018, 293(39):15316-15329; Infect Immun. 2017, 85(12):e00418-17). Thus, the efficient acquisition of zinc may play a crucial role in the survival and replication of Salmonella within macrophages, where zinc availability is extremely limited (Infect Immun. 2007, 75(12):5867-76; Biochim Biophys Acta. 2016, 1860(3):534-41). It has been reported that Salmonella utilizes the high-affinity ZnuABC zinc transporter to maximize zinc availability within host cells (Infect Immun. 2007, 75(12):5867-76). Here, we discovered that β-alanine can enhance the expression of the zinc transporter genes znuABC, which might serve as a supplementary mechanism for the efficient uptake of zinc by Salmonella within macrophages.

      You are correct that more zinc is not necessarily beneficial for Salmonella, as excessive zinc can inhibit the growth of Salmonella. Considering that zinc availability is limited within macrophages and the znuABC genes are significantly upregulated when Salmonella resides inside macrophages (PLoS Pathog. 2015, 11(11):e1005262; Science. 2018, 362(6419):1156-1160), it is likely that zinc acts as a limiting factor and may not attain very high concentrations during Salmonella's growth within macrophages. We have included a discussion on this matter in the revised manuscript.t (lines 459-466).

      (15) Figure 6B. Related to the above, these data would be more compelling with higher n and a dose-dependent response demonstrated for Zn2+ addition. This is a central point of the manuscript, and effectively what the authors propose as the underlying mechanism, and it should be more robustly substantiated.

      Thank you for the comment. As stated in the previous response, we were unable to directly assess the bacterial zinc concentration during Salmonella growth within macrophages. Instead, we investigated the zinc content in mouse liver and RAW264.7 cells infected with Salmonella Typhimurium 14028s wild-type (WT) and panD mutant (ΔpanD), which can indirectly reflect zinc acquisition by intracellular Salmonella. We observed that the zinc content in Δ_panD_-infected mouse liver macrophages and RAW264.7 cells was increased compared with that in WT-infected mouse liver macrophages and RAW264.7 cells, respectively (Figures 5E and 6A). This implies that the panD gene and β-alanine are important for Salmonella to absorb zinc from host cells. Moreover, considering that zinc availability is limited within macrophages and the znuABC genes are significantly upregulated when Salmonella resides inside macrophages (PLoS Pathog. 2015, 11(11):e1005262; Science. 2018, 362(6419):1156-1160), it is likely that zinc acts as a limiting factor and may not attain very high concentration during Salmonella's growth within macrophages.

      Reviewer #2 (Public review):

      Summary:

      Salmonella exploits host- and bacteria-derived β-alanine to efficiently replicate in host macrophages and cause systemic disease. β-alanine executes this by increasing the expression of zinc transporter genes and therefore the uptake of zinc by intracellular Salmonella.

      Strengths:

      The experiments designed are thorough and the claims made are directly related to the outcome of the experiments. No overreaching claims were made.

      Weaknesses:

      A little deeper insight was expected, particularly towards the mechanistic aspects. For example, zinc transport was found to be the cause of the b-alanine-mediated effect on Salmonella intracellular replication. It would have been very interesting to see which are the governing factors that may get activated or inhibited due to Zn accumulation that supports such intracellular replication.

      We appreciate your review and advice. To further investigate the mechanisms by which β-alanine, panD, and zinc influence Salmonella infection, we have conducted additional experiments as suggested. For instance, we examined the zinc content in mouse liver and RAW264.7 cells infected with Salmonella Typhimurium 14028s wild-type (WT) and panD mutant (Δ_panD_). This approach indirectly reflects zinc acquisition by intracellular Salmonella, as it is challenging to isolate sufficient amounts of the bacteria from infected cells or tissues for zinc concentration measurement. We observed that the zinc content in Δ_panD_-infected mouse liver macrophages and RAW264.7 cells was increased compared to that in WT-infected counterparts (Figures 5E and 6A). This suggests that the panD gene and β-alanine are crucial for Salmonella to absorb zinc from host cells. This new information has been included in the revised manuscript (lines 325-329, 344-348).

      Zinc is essential for bacterial survival and growth, as zinc-binding proteins constitute approximately 5% of the bacterial proteome and play crucial roles in bacterial metabolism and growth. (J Proteome Res. 2006, 5(11):3173-8; Future Med Chem. 2017, 9(9):899-910 ). Regarding Salmonella, zinc is also employed to undermine the antimicrobial host defense mechanisms of macrophages, by inhibiting NF-кB activation and impairing NF-кB-dependent bacterial clearance (J Biol Chem. 2018, 293(39):15316-15329; Infect Immun. 2017, 85(12):e00418-17). Thus, efficient zinc uptake could be crucial for Salmonella survival and replication within macrophages, where zinc availability is extremely limited (Infect Immun. 2007, 75(12):5867-76; Biochim Biophys Acta. 2016, 1860(3):534-41). It has been reported that Salmonella exploits the high-affinity ZnuABC zinc transporter to maximize zinc availability in host cells (Infect Immun. 2007, 75(12):5867-76). Here, we discovered that β-alanine can enhance the expression of the zinc transporter genes znuABC, which might serve as a supplementary mechanism for the efficient uptake of zinc by Salmonella within macrophages. We have addressed this issue in the revised manuscript (lines 459-466).

      Reviewer #2 (Recommendations for the authors):

      A few general clarifications and suggested experiments:

      (1) Metabolome analysis: Salmonella can itself produce b-alanine. Given that it is isolated from infected cells where salmonella has scavenged b-alanine from host cytosol as well as produced it, how b-alanine levels went down in metabolome analysis is confusing.

      Thank you for the comment. The method for targeted metabolic profiling is conducted as outlined in a recently published paper by our group (Nat Commun. 2021, 12(1):879). To prevent delays and changes in metabolite concentrations during the separation of bacterial contents from macrophages, we determined the combined metabolite concentrations directly from infected cells and Salmonella. We observed that each Salmonella cell contained only 0.01%-0.02% of the concentration of each corresponding combined metabolite. Approximately 94% of the infected macrophages contained no more than ten bacteria at 8 hours post-infection, confirming that the combined metabolites were predominantly from the host. We have included an explanation of this issue in the method section. (lines 557-560).

      (2) What is the basal level of b-alanine produced by macrophages? How was 1 mM conc. chosen?

      According to our results, the content of β-alanine in uninfected RAW264.7 cells is 26-33 μM/10<sup>7</sup> cell (700-900 ng/10<sup>7</sup> cell). The 1 mM concentration was chosen based on a published report (Appl Microbiol Biotechnol. 2004, 65(5):576-82).

      Additionally, we have supplemented the culture medium (RPMI) of RAW264.7 cells with 0.5, 1, 2, and 4 mM β-alanine and subsequently infected them with Salmonella to assess the impact of β-alanine supplementation on the bacterium's replication within macrophages. Our observations revealed that the supplementation with 1, 2, and 4 mM β-alanine significantly (P < 0.001) enhanced Salmonella replication in RAW264.7 cells. Furthermore, the addition of β-alanine to the infected cells resulted in a dose-dependent increase in Salmonella intracellular replication, as depicted in Figure 1E. These findings further support the notion that host-derived β-alanine facilitates Salmonella replication within macrophages. This data has been incorporated into the revised manuscript (lines 141-149).

      (3) The antimicrobial activity of macrophages preventing the growth of intracellular Salmonella will primarily be governed by genes such as GBPs, defensins, nitric oxide, etc. The expression of these genes should be tested rather than cytokines which are secreted with little effect on intracellular Salmonella.

      Thank you for the suggestion. We have investigated the levels of ROS (reactive oxygen species) and RNS (reactive nitrogen species) in Salmonella-infected RAW264.7 cells, both in the presence and absence of 1 mM β-alanine. The results indicated that β-alanine did not affect the ROS and RNS levels in RAW 264.7 cells (Figure 1_figure Supplement 1), suggesting that β-alanine does not influence the antimicrobial activity of macrophages. We have included these results in the revised manuscript (lines150-153).

      (4) For animal experiments, how many times was the experiment repeated? Can the animal experiment be done with b-alanine supplementation and panD mutant? Can the liver be stained to detect the bacteria?

      Thank you for the comment.

      i) Mouse infection assays were conducted twice, with at least 2 mice (n ≥ 2) in each injection group. The combined data from the two experiments was used for statistical analysis. This information has been added to the revised manuscript. (lines 678-681).

      ii) As suggested, mice infected with the panD mutant (Δ_panD_) were administered β-alanine (500 mg/kg/day, Behav Brain Res. 2014, 272:131-40; Physiol Behav. 2015, 145:29-37) orally on a daily basis. On the third day post-infection, the bacterial burden in the liver and spleen and the body weight of the infected mice were measured. The results indicated that administering β-alanine to mice did not affect the bacterial burden of ΔpanD in the liver and spleen nor did it influence the body weight of the infected mice (please refer to Author response image 1 below). It has been reported that β-alanine is a rate-limiting precursor for the biosynthesis of carnosine in mammals (Med Sci Sports Exerc. 2010, 42(6):1162-73; Neurochem Int. 2010, 57(3):177-88). Following supplementation, β-alanine may be rapidly synthesized into carnosine in mice, and the free β-alanine, particularly that which enters the macrophages of the liver and spleen, may be limited and insufficient to enhance Salmonella replication.

      Author response image 1.

      iii) Through immunofluorescence staining, we have investigated the bacterial count of Salmonella wild-type (WT), panD mutant (Δ_panD_), and complemented strain (c_panD_) within the macrophages of the mouse liver. The findings indicate that the number of Δ_panD_ in each liver macrophage was significantly (P < 0.0001) lower than that of WT, and the complementation of Δ_panD_ increased the bacterial count in each liver macrophage to the level of WT (Figure 3E in the revised manuscript). These results have been included in the revised manuscript. (lines 234-239).

      Reviewer #3 (Public review):

      Summary:

      Salmonella is interesting due to its life within a compact compartment, which we call SCV or Salmonella containing vacuole in the field of Salmonella. SCV is a tight-fitting vacuole where the acquisition of nutrients is a key factor by Salmonella. The authors among many nutrients, focussed on beta-alanine. It is also known from many other studies that Salmonella requires beta-alanine. The authors have done in vitro RAW macrophage infection assays and In vivo mouse infection assays to see the life of Salmonella in the presence of beta-alanine. They concluded by comprehending that beta-alanine modulates the expression of many genes including zinc transporters which are required for pathogenesis.

      Strengths:

      This study made a couple of knockouts in Salmonella and did a transcriptomic investigation to understand the global gene expression pattern.

      Weaknesses:

      The following questions are unanswered:

      (1) It is not clear how the exogenous beta-alanine is taken up by macrophages.

      We thank the reviewer for the question. It has been reported that β-alanine is transported into eukaryotic cells via the TauT (SLC6A6) and PAT1 (SLC36A1) transporters (Acta Physiol (Oxf). 2015, 213(1):191-212; Am J Physiol Cell Physiol. 2020 Apr 1;318(4):C777-C786; Biochim Biophys Acta. 1994, 1194(1):44-52.).

      (2) It is not clear how the Beta-alanine from the cytosol of the macrophage enters the SCV.

      According to the published report, translocation of SPI2 effector proteins induces the formation of specific tubular membrane compartments extend from the SCV, known as Salmonella-induced filaments (SIFs) (Traffic. 2001, 2(9):643-53; Traffic. 2007, 8(3):212-25; Traffic. 2008, 9(12):2100-16; Microbiology (Reading). 2012, 158(Pt 5):1147-1161). The membranes and lumens of both SIFs and SCVs form a continuous network, allowing vacuolar Salmonella to access various types of endocytosed materials (Front Cell Infect Microbiol. 2021, 11:624650; Cell Host Microbe. 2017, 21(3):390-402). We hypothesize that β-alanine may enter SCVs from the cytoplasm of macrophages via SIFs. This information has been included in the revised manuscript (lines 56-61).

      (3) It is not clear how the beta-alanine from SCV enters the bacterial cytosol.

      Thank you for the question. We have attempted to identify the transporter of β-alanine in Salmonella, but we found that the CycA transporter, which transports β-alanine in Escherichia coli, does not function in the same manner in Salmonella, despite Salmonella being closely related to E. coli.

      BasC is a bacterial LAT (L-Amino acid transporter) with an APC fold (J Gen Physiol. 2019, 151(4):505-517). The basC gene is reported to be present in the genomes of Pseudomonas, Acinetobacter, and Aeromonas, etc. Following your suggestion, we searched the genome of Salmonella Typhimurium at NCBI and did not find any basC gene or genes with a sequence similar to basC. Unfortunately, we have yet to identify the β-alanine transporter in Salmonella, and we will persist in our search in future work.

      (4) There is no clarity on the utilization of exogenous beta-alanine of the host and the de novo synthesis of beta-alanine by panD of Salmonella.

      Thank you for the comment. Our findings indicated that β-alanine levels were reduced in Salmonella-infected RAW264.7 cells. Furthermore, the addition of β-alanine to the culture medium (RPMI) of RAW264.7 cells significantly enhanced Salmonella replication, suggesting that the intracellular Salmonella utilize host-derived β-alanine for their growth. However, to date, we have not identified the transporter responsible for the uptake of exogenous β-alanine into the Salmonella cytosol.

      Moreover, we have discovered that the replication of the Salmonella panD mutant within macrophages and its virulence in mice are significantly reduced compared to the wild type (WT), indicating that the de novo synthesis of β-alanine is crucial for Salmonella's intracellular replication and virulence.

      These results indicate that either acquisition from the host or de novo synthesis of β-alanine is critical for Salmonella replication inside macrophages.

      Reviewer #3 (Recommendations for the authors):

      Cite this paper from 1985, which talks about the role of beta-alanine in Salmonella infection J Gen Microbiol,. 1985 May;131(5):1083-90. doi: 10.1099/00221287-131-5-1083. A Salmonella typhimurium strain defective in uracil catabolism and beta-alanine synthesis, T P West, T W Traut, M S Shanley, G A O'Donovan

      We have now cited this paper in the revised manuscript (lines 82-83).

      (2) BasC- can be important for beta-alanine transport. CycA transporter was not found to be involved in beta-alanine. However, it is important to find out which transporter is required for the uptake of beta-alaine.

      Thank you for pointing it out. We agree that it is important to determine which transporter is necessary for the uptake of β-alanine in Salmonella. BasC is a bacterial LAT (L-Amino acid transporter) with an APC fold (J Gen Physiol. 2019, 151(4):505-517). The basC gene is reported to be present in the genomes of Pseudomonas, Acinetobacter, and Aeromonas, etc. Following your suggestion, we searched the genome of Salmonella Typhimurium at NCBI and did not find any basC gene or genes with a sequence similar to basC. Unfortunately, we have yet to identify the β-alanine transporter in Salmonella, and we will persist in our search in future work.

      (3) Bacteria being quite stringent with its energy resources, it is unlikely that it will use de novo synthesis if the host resources are available. Only if the host resources are depleted, can it turn on the de novo synthesis involving panD. What is the status of fold-replication of panD mutant in the presence of exogenous addition of beta-alanine?

      Thank you for the comment. The addition of 1 to 4 mM of β-alanine increased the replication of the panD mutant (Δ_panD_) in RAW264.7 cells by 1.7- to 3.1-fold. This increase in Salmonella intracellular replication was dose-dependent, as shown in Figure 2H of the revised manuscript, further illustrating that host-derived β-alanine promotes Salmonella replication inside macrophages.

      We agree that bacteria are quite stringent with their energy resources. The results of this work indicate that either acquisition from the host or de novo synthesis of β-alanine is critical for Salmonella replication inside macrophages. We speculate that Salmonella relies on a large amount of β-alanine to efficiently replicate in macrophages, thereby highlighting the importance of β-alanine for Salmonella intracellular growth. We have discussed this issue in the revised manuscript. (lines 392-396).

      (4) 100% survival of animals infected with panD mutant is a bit of concern. What happens when beta-alanine is fed to mice and infected with panD mutant?

      Thank you for the comment. As suggested, mice infected with the panD mutant (ΔpanD) were administered β-alanine (500 mg/kg/day, as reported in Behav Brain Res. 2014, 272:131-40; Physiol Behav. 2015, 145:29-37) orally on a daily basis. On the third day post-infection, the bacterial load in the liver and spleen, as well as the body weight of the infected mice, were measured. The results indicated that administering β-alanine did not affect the bacterial load of Δ_panD_ in the liver and spleen nor did it influence the body weight of the infected mice (refer to Author response image 1). It has been reported that β-alanine is a rate-limiting precursor for the biosynthesis of carnosine in mammals (Med Sci Sports Exerc. 2010, 42(6):1162-73; Neurochem Int. 2010, 57(3):177-88). Following supplementation, β-alanine may be rapidly converted into carnosine in mice, and the free β-alanine, particularly that which enters the macrophages of the liver and spleen, may be limited and insufficient to enhance Salmonella replication.

      (5) How does beta-alanine from macrophages' cytosol enter the SCV.

      Thank you for pointing it out. According to published reports, the translocation of SPI2 effectors triggers the formation of specialized tubular membrane compartments, known as Salmonella-induced filaments (SIFs), which extend from the SCV (Traffic. 2001, 2(9):643-53; Traffic. 2007, 8(3):212-25; Traffic. 2008, 9(12):2100-16; Microbiology. 2012, 158:1147-1161). The membranes and lumens of SIFs and SCVs create a continuous network, allowing vacuolar Salmonella to access various types of endocytosed materials (Front Cell Infect Microbiol. 2021, 11:624650; Cell Host Microbe. 2017, 21(3):390-402). Consequently, it is plausible that β-alanine enters SCVs from the macrophage cytoplasm via SIFs. This information has been included in the revised manuscript.(lines 56-61).

      (6) It would be essential to dissect the role of exogenous beta-alanine and the use of de novo synthesized beta-alanine.

      We agree that it is essential to dissect the role of exogenous β-alanine and the use of de novo synthesized β-alanine. Our results indicate that Salmonella-infected macrophages exhibited lower levels of β-alanine compared to mock-infected macrophages. Furthermore, β-alanine supplementation in the cell medium enhanced Salmonella replication within macrophages in a dose-dependent manner, revealing that Salmonella utilizes host-derived β-alanine to promote intracellular replication. Additionally, a deficiency in the biosynthesis of β-alanine, resulting from mutation of the rate-limiting gene panD, led to reduced Salmonella replication in macrophages and systemic infection in mice. This suggests that Salmonella also employs bacterial-derived β-alanine to enhance intracellular replication and pathogenicity.

      We sought to identify the main transporters responsible for β-alanine uptake in Salmonella. Unfortunately, we have not yet found the transporter. We will address this issue in our future work.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This study investigated the factors related to understudied genes in biomedical research. It showed that understudied genes are largely abandoned at the writing stage, and it identified a number of biological and experimental factors that influence which genes are selected for investigation. The study is a valuable contribution to this branch of meta-research, and while the evidence in support of the findings is solid, the interpretation and presentation of the results (especially the figures) needs to be improved.

      We thank the editor and reviewers for their detailed and thoughtful assessment of our work. Below, we present detailed responses to reviewers’ comments and suggestions. We are also submitting a version edited for clarity of presentation and precision of interpretation.

      Following the eLife assessment, we also tried to identify further statements where results could be presented in a more precise way.

      First, in the section Subsequent reception by other scientists does not penalize studies on understudied genes, we now state “This result again opposes the hypothesis that less-investigated genes will yield articles with lower impact.”

      Second, in section Identification of biological and experimental factors associated with selection of highlighted genes, we now state:

      “We cautiously hypothesize that this might reflect on many different research groups producing reagents surrounding the genes that they actively study. The most informative continuous factor is the number of research articles about a gene (Figure 1B).”, removing claims of causality.

      Finally, for improved readability, we have moved all supplemental tables into separate .xlsx files.

      Reviewer #1 (Public Review):

      Summary and strengths

      The authors tried to address why only a subset of genes are highlighted in many publications. Is it because these highlighted genes are more important than others? Or is it because there are non-genetic reasons? This is a critical question because in the effort to discover new genes for drug targets and clinical benefit, we need to expand a pool of genes for deep analyses. So I appreciate the authors' efforts in this study, as it is timely and important. They also provided a framework called FMUG (short for Find My Understudied Gene) to evaluate genes for a number of features for subsequent analyses.

      We thank the reviewer for their insightful comments and are pleased that the reviewer shares our appreciation for the gravity of these questions. As the reviewer emphasizes, it is critical to understand whether the choice of genes reflects their importance or non-genetic reasons. Previously we and others demonstrated that this choice does not reflect biological importance, when the latter is assessed through unbiased genome-wide data (e.g.: Haynes et al., 2018; Stoeger et al. 2018). Now we contribute to this critical question by systematically evaluating individual non-genetic reasons. We address the reviewer’s comments below.

      Weaknesses

      Many of the figures are hard to comprehend, and the figure legends do not sufficiently explain them.

      For example, what was plotted in Fig 1b? The number of articles increased from results -> write-ups -> follow-ups in all four categories with different degrees. But it does not seem to match what the authors meant to deliver.

      We apologize for the lack of clarity. We identified two interrelated elements that we have now fixed: i) the prior figure legend provided for each genomics approach n number of articles, such as “GWAS (n=450 articles)”; ii) the prior y-axis was labelled “Number of articles”.

      Addressing the first element, we now rephrased the legend for clarity:

      “b, We identified articles reporting on genome-wide CRISPR screens (CRISPR, 15 focus articles and 18 citing articles), transcriptomics (T-omics, 148 focus articles and 1,678 citing articles), affinity purification–mass spectrometry (AP-MS, 296 focus articles and 1,320 citing articles), and GWAS (450 focus articles and 3,524 citing articles). Focusing only on protein-coding genes (white box plot), we retrieved data uploaded to repositories describing which genes came up as “hits” in each experiment (first colored box plot). We then retrieved the hits mentioned in the titles and abstracts of those articles (second colored box plot) and hits mentioned in the titles and abstracts of articles citing those articles (third colored box plot). Unique hit genes are only counted once.”

      The number of genes in each box plot is now reported in the x-axis labels for each step. For example, the results for CRISPR were obtained from 15 focus studies (original research) and 18 subsequent studies (papers citing focus articles). Those 15 studies identified 9,268 genes where loss-of-function changed phenotypes but, in their titles and abstracts, mentioned only 18 of those 9,268 genes. While the 9,268 hit genes have received similar research attention to the entirety of protein-coding genes, the 18 hit genes mentioned in the title or abstract are significantly more well studied. The articles citing the focus articles also only mentioned in their titles and abstracts 19 highly studied hit genes.

      Addressing the second element, we updated the axis label to “Number of articles about gene”, to distinguish it from number of articles mentioned in the legend, convey that this is the number of articles about each gene that were published independently of the genomics assays we inspect. To further underscore this point we now label the “20% highest-studied genes” that we mention in the main text, and reworded the figure caption to better capture where the critical increase occurs: “A shift in focus towards well-studied genes occurs during the summarization and write-up of results and remains in subsequent studies.”.

      Fig 4 is also confusing. It appears that the genes were clustered by many features that the authors developed. But does it have any relationship with genes being under- or over-studied?

      We again apologize for the lack of clarity. As is described in the main text, while the results of Figs. 1-2 suggest that gene popularity may be predict the highlighting of a differentially expressed gene in the title or abstract, we want to conduct a systematically analysis of the factors that correlate with such a decision. We thus build a set of 45 factors that have been discussed as factors explaining why some genes receive increased research attention.

      The data in Fig. 4 shows that those 45 factors are not independent but that some are highly correlated. Because of those correlations, we are able to select a smaller number as representative of the full set. Those are the default factors shown to users of FMUG. While users can choose all factors that are significantly correlated with the highlighting in title or abstract, the default of presenting factors representing different clusters of factors enabled us to limit the number of factors that are initially displayed.

      Please note that following the suggestion of Reviewer 3, we have now moved this Figure to the supplemental material, as Figure S11.

      Reviewer #2 (Public Review)

      Summary and strengths

      In this manuscript the authors analyse the trajectory of understudied genes (UGs) from experiment to publication and study the reasons for why UGs remain underrepresented in the scientific literature. They show that UGs are not underrepresented in experimental datasets, but in the titles and abstracts of the manuscripts reporting experimental data as well as subsequent studies referring to those large-scale studies. They also develop an app that allows researchers to find UGs and their annotation state. Overall, this is a timely article that makes an important contribution to the field. It could help to boost the future investigation of understudied genes, a fundamental challenge in the life sciences. It is concise and overall well-written, and I very much enjoyed reading it. However, there are a few points that I think the authors should address.

      We thank the reviewer for their kind assessment.

      Weaknesses

      The authors conclude that many UGs "are lost" from genome-wide assay at the manuscript writing stage. If I understand correctly, this is based on gene names not being reported in the title or abstract of these manuscripts. However, for genome-wide experiments, it would be quite difficult for authors to mention large numbers of understudied genes in the abstract. In contrast, one might highlight the expected behaviour of a well-studied protein simply to highlight that the genome-wide study provides credible results.

      We agree that it is not reasonable to expect a title or abstract to highlight hundreds or even thousands of differentially expressed genes. We’ve now extended our Study Limitations section to address this:

      “we take a gene being mentioned in the title or abstract of an article as a proxy for a gene receiving attention by the article’s authors. The title and abstract are space-limited and thus cannot accommodate discussion of large numbers of genes.”

      We also agree that highlighting the expected behavior of a well-studied protein may provide credibility to a study and increase confidence on other results. The soundness of such a strategy was quantitatively studied in a study by Uzzi et al. (Science 2013), which we now include in the section on study limitations as:

      “authors beginning manuscripts with something familiar before introducing something new”.

      To convey the practical limitation of abstracts needing to be concise, we added the following sentence to our discussion section, when suggesting controlled trials that add genes to abstracts:

      “This intervention would need to be carefully designed since abstracts are limited in their size.”

      To avoid over-interpretation we have in the discussion also extended the sentence on “lost in a leaky pipeline” to “lost to titles and abstracts of research articles in a leaky pipeline”.

      Our focus on titles and abstracts has been equally motivated by their availability (full text still is often behind paywalls and/or not accessible for bulk-download and text-mining) and by abstracts being the most visible and most read parts of research articles (e.g.: bioRxiv estimates that for the preprint for the present manuscript, the abstract was read ~10 times more frequently than full-text HTML and 4 times more frequently than the pdf).

      Could this bias the authors' conclusions and, if so, how could this be addressed? For example, would it be worth to normalise studies based on the total number of genes they cover?

      We previously described that – in line with the reviewer’s expectations – unstudied genes are preferentially added to the title or abstract of articles that feature more genes in the title or abstract (Stoeger et al., Plos Biology, 2022; Fig. 2B). Normalizing by the total number of genes should thus preserve the pronounced division between well-studied genes and unstudied genes show in Figure 1B. In line with these predictions, we randomly select one gene per title/abstract and find that the effect remains (see new Figure S7).

      Author response image 1.

      Figure 1B is confusing in its present form. I think the plot and/or the legend need revising. For example, what "numbers to the right of each box plot" are the authors referring to? Also, I assume that the filled boxes are understudied genes and the empty/white box is "all genes", but that's not explained in the legend. In the main text, the figure is referred to with the sentence "we found that hit genes that are highlighted in the title or abstract are strongly over-represented among the 20% highest-studied genes in all biomedical literature ". I cannot follow how the figure shows this. My interpretation is that the y-axis is not showing the number of articles, but represents the percentage of articles mentioning a gene in the title/abstract, displayed on a log scale. If so, perhaps a better axis labels and legend text could be sufficient. But then one would also need to somehow connect this to the statement in the main text about the 20% highest-studied genes (a dashed line?). Alternatively, the authors could consider other ways of plotting these data, e.g. simply plotting the "% of publication in which a gene appears" from 0-100% or so.

      Reviewer 1 raised a similar point on overall figure clarity. We identified two interrelated elements that contribute to overall confusion and have now fixed them (see response to Reviewer 1 beginning on page 2 of this document).

      We attempted an alternative plotting of Fig 1B according to the reviewer’s suggestion. In the version below, the y-axis instead shows the percent of gene-related articles that are about each gene. We chose to keep the original y-axis (showing number of articles about each gene) as it additionally conveys the absolute scale of scholarship on individual genes.

      Author response image 2.

      Reviewer #3 (Public Review):

      Summary and strengths

      The manuscript investigated the factors related to understudied genes in biomedical research. It showed that understudied are largely abandoned at the writing stage and identified biological and experimental factors associated with selection of highlighted genes.

      It is very important for the research community to recognize the systematic bias in research of human genes and take precautions when designing experiments and interpreting results. The authors have tried to profile this issue comprehensively and promoted more awareness and investigation of understudied genes.

      We thank the reviewer for their kind assessment of our work.

      Weaknesses

      Regarding result section 1 "Understudied genes are abandoned at synthesis/writing stage", the figures are not clear and do not convey the messages written in the main text. For example, in Figure 1B, figure S5 and S6,

      • There is no "numbers to the right of each box plot".

      The “numbers to the right” statement in the caption was an erroneous inclusion from an earlier version of the figure. We apologize for our error and have now removed this statement.

      • Do these box plots only show understudied genes? How many genes are there in each box plot? The definition and numbers of understudied genes are not clear.

      The x-axis describes genes featured in each stage of the publication process (from all protein-coding genes to genes found as hits in genome-wide screen to genes found in the title/abstract to genes found in the title/abstract of citing articles) and the y-axis describes the number of articles annotated to those genes. We have also now added the number of genes in each box plot to the figure. This information is also in Materials and Methods under each technology’s heading (see also response to Reviewer 1 beginning on page 2 of this document).

      Author response image 3.

      • "We found that hit genes that are highlighted in the title or abstract are strongly over-represented among the 20% highest-studied genes in all biomedical literature (Figure 1B)". This is not clear from the figure.

      We have revised Figure 1B and its caption to better communicate the main point of the figure: that genes which make it to the title/abstract of the reporting article tend to be more popular than genes which are hits in genome-wide experiments from those articles. We have added a horizontal line that shows the cutoff for the top 20% most popular genes.

      Regarding result section 2 "Subsequent reception by other scientists does not penalize studies on understudied genes", the authors showed in figure 2 that there is a negative correlation between articles per gene before 2015 and median citations to articles published in 2015. Another explanation could be that for popular genes, there are more low-quality articles that didn't get citations, not necessarily that less popular genes attract more citations.

      We believe that both explanations for the observed phenomenon are not mutually exclusive. Previously, we focused on the median of citations to articles about a gene to capture the typical effect. In a new analysis, we also find support for the possibility outlined by the reviewer and believe that adding this to our manuscript complements and balances our analysis of citations. Specifically, in the new Figure S8B we find that most popular genes are slightly more likely to be among least cited papers (and in Figure S8A that the least studied genes have been much more likely to be among the most cited papers). In-text, we state:

      “Further, since 1990, articles about the least popular genes have at times been 3 to 4 times more likely to be among the most cited articles than articles on the most popular genes whereas articles on the most popular genes have been slightly less to be highly cited than lowly cited (Figure S8)”.

      We thank the reviewer for their suggestion, which strengthens our manuscript. The figure caption reads:

      “Figure S8: Likelihoods of being highly cited (top 5% of citations among all articles about genes, panel a) or lowly cited (bottom 5% of citations among all articles about genes, panel b) for articles about the most popular genes (top 5% accumulated articles) versus articles about the least popular genes (bottom 5% accumulated articles) by year of publication. Only articles with a single gene in the title/abstract are considered. Shaded regions show ±1 standard error of the proportion."

      Author response image 4.

      Regarding result section 3 "Identification of biological and experimental factors associated with selection of highlighted genes", in Figure 3 and table s2, the author stated that "hits with a compound known to affect gene activity are 5.114 times as likely to be mentioned in the title/abstract in an article using transcriptomics", The number 5.144 comes out of nowhere both in the figure and the table. In addition, figure 4 is not informative enough to be included as a main figure.

      This is the result of both a typo and imprecise terminology. The number should read 4.262 (the likelihood ratio of being mentioned in the title/abstract between genes with and without a compound), which corresponds to an odds ratio of 4.331. We have clarified this in the table caption, stating:

      “e.g. hits with a compound known to affect gene activity are 4.262 times as likely to be mentioned in the title/abstract in an article using transcriptomics, corresponding to an odds ratio of 4.331".

      We have removed Figure 4 as a main-text figure and added a version, with revised color scheme along comments of Reviewer 1, as Figure S11. We added to the figure caption “Bold indicates FMUG ‘s default factors, which we selected based on this clustering and based on their strength of association with gene selection (Figure 3, Table S2 and Table S3)."

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      • Fig 2a shows that papers highlighting understudied genes are actually cited more. I wonder why authors only looked at data before 2015. Fig 2b shows an increased correlation since 2015. Please consider redrawing Fig 2a to include data from 2015-2020?

      We highlight data from 2015 since, from our used version of iCite (v32, released July 2022, covering citations made through most of 2021), papers published in 2015 have had about 6 years to accumulate citations. With fewer years to accumulate citations, insufficient signal may cause correlation to converge toward zero. Below, we repeat the analysis in Figure 2 but only considering citations made within a year of an article’s publication, which substantially reduces correlation (although remaining significant).

      Author response image 5.

      We added a note to the figure caption:

      “We forgo depicting more recent years than 2015 to allow for citations to accumulate over multiple years, providing a more sensitive and robust readout of long-term impact.”

      For Figure 2B, we add:

      “For more recent years, where articles have had less time to accumulate citations, insufficient signal may cause correlation to converge toward zero.”

      • Can FMUG be posted on the web for easy access by researchers with non-computational backgrounds?"

      We presently regretfully do not have the resources to create or maintain a web-based version. We hope that the publication of this manuscript will enable us to attract resources to create and maintain a web-based version.

      Reviewer #2 (Recommendations for the authors):

      • Related to the first weakness in my public review: The observed disparity between CRISPR and GWAS study in terms of which genes they promote to the abstract is interesting. I wonder if this has to do with the application of these techniques. GWAS studies will often highlight that they retrieve known associations between a gene and a phenotype, to show that a screen is working. I guess often the point is to subsequently identify more genes associated with a particular phenotype, but often it is unclear how to validate/verify newly found associations. In contrast, CRISPR screens might be more focussed on functionally/mechanistically understanding unknown processes, e.g. observing a phenotype that appears/disappears in response to a gene deletion. In such studies, the follow-up of a previously unknown gene could be more straightforward and relevant to the outcome. Does that mean CRIPSR screens are better than GWAS studies for addressing the UG problem? Perhaps the authors could briefly discuss this issue.

      The number of studies we included featuring CRISPR screens is relatively small (n = 15 compared to n = 450 for GWAS). Thus, it is not possible to conclude in a statistically sound manner whether authors of CRISPR screens are truly more likely to highlight understudied genes.

      However, the reviewer raises compelling reasons for why this might be the case, and we now embed the broader discussion point that some techniques might be more powerful toward understudied genes.

      The discussion now includes:

      “Further, the observed discrepancy between the popularity of hits highlighted by GWAS versus other technologies suggests that some -omics technologies may be more powerful than others for characterizing understudied genes. This possibility merits further research and researchers participating in unknomics should consider the relative strengths of each technology towards providing tractable results for follow-up.”

      • Affinity capture mass spectrometry (Aff-MS): Perhaps I misunderstood this, but typically this is referred to as affinity purification MS (AP-MS)

      Thank you for the clarification. We have changed ‘Aff-MS’ to ‘AP-MS’ throughout the manuscript.

      • Page 3, line 96. The sentence "The first possibility is that seemingly understudied genes are, in fact, not understudied as they would rarely be identified through experiments.". Would they not still be understudied, just not intentionally?

      We have rephrased this sentence to:

      “The first possibility is that some genes are less studied because they are rarely identified as hits in experiments.”

      • Fig 4 is very interesting, but I also found it a bit confusing. First, the choice of colour scheme, where blue shows the absence and white shows the presence of something, seems counterintuitive, especially on a white background. Second, I find it confusing that only some of the experiments are labelled in the heatmap. Could the authors not simply use Fig S9 as Fig 4? Or alternatively, only include the 8 labelled factors in the simplified figure.

      In line with this feedback and that of Review #1 and #3, we have removed Figure 4 as a main-text figure and instead include this figure as Supplementary Figure S11. We have reversed the color scheme so that purple indicates one and white indicates zero. We also now label all factors. Previously we had only listed the default features of FMUG. We also now updated the figure legend to convey how it assisted the choice of default factors in FMUG. It reads:

      “Bold indicates FMUG ‘s default factors, which we selected based on this clustering and based on their strength of association with gene selection (Figure 3, Table S2 and Table S3)”.

      • The FMUG app is fantastic and sounds exactly like something that is required to boost the visibility of understudied genes and overcome the understudied gene bias. However, I did not understand the choice of reporting this in the Discussion section.

      We thank the reviewer for their enthusiasm, and have now moved FMUG into the results section.

      • To further increase usability of the FMUG app, is there a way it could be deployed online? I appreciate this could require a major amount of coding work, which would not be reasonable to demand. So please consider this a suggestion, potentially for a future implementation.

      We presently regretfully do not have the resources to create or maintain a web-based version. We hope that the publication of this manuscript will enable us to attract resources to create and maintain a web-based version.

      Reviewer #3 (Recommendations for the authors):

      Table s2 and s3: p values are indicated by star signs. However, with so many hypothesis tests, the p values should be corrected for multiple tests.

      We have now applied Benjamini-Hochberg multiple hypothesis correction to these tables, correcting p-values within each of the four technologies. We update our significance calling to read:

      “We identified 45 factors that relate to genes and found 33 (12 out of 23 binary factors and 21 out of 22 continuous factors) associated with selection in at least one assay type at Benjamini-Hochberg FDR < 0.001.”

      Figure S1 - S4

      These figures contain too many noninformative boxes. In all the figures, only the last three boxes are informative (reports assessed for eligibility, reports excluded, and studies included in review). The rest boxes convey little information and should be simplified.

      We have simplified these diagrams, removing boxes which contained no information.

      Figure S6: what does it mean by "prior to the publication of the first article represented in this sample"? What is "this sample"?

      “This sample” refers to the collection of 450 GWAS articles, 296 articles using AP-MS, 148 transcriptomics articles, and 15 genome-wide CRISPR screen articles. We have rephrased this sentence to make this clear. It now reads:

      “Variant of Figure 1B only considering articles published in 2002 or before, prior to the publication of any of the articles featuring -omics experiments which we considered for this analysis.”

    1. Author response:

      The following is the authors’ response to the current reviews.

      eLife Assessment

      This neuroimaging and electrophysiology study in a small cohort of congenital cataract patients with sight recovery aims to characterize the effects of early visual deprivation on excitatory and inhibitory balance in visual cortex. While contrasting sight-recovery with visually intact controls suggested the existence of persistent alterations in Glx/GABA ratio and aperiodic EEG signals, it provided only incomplete evidence supporting claims about the effects of early deprivation itself. The reported data were considered valuable, given the rare study population. However, the small sample sizes, lack of a specific control cohort and multiple methodological limitations will likely restrict usefulness to scientists working in this particular subfield.

      We thank the reviewing editors for their consideration and updated assessment of our manuscript after its first revision.

      In order to assess the effects of early deprivation, we included an age-matched, normally sighted control group recruited from the same community, measured in the same scanner and laboratory. This study design is analogous to numerous studies in permanently congenitally blind humans, which typically recruited sighted controls, but hardly ever individuals with a different, e.g. late blindness history. In order to improve the specificity of our conclusions, we used a frontal cortex voxel in addition to a visual cortex voxel (MRS). Analogously, we separately analyzed occipital and frontal electrodes (EEG).

      Moreover, we relate our findings in congenital cataract reversal individuals to findings in the literature on permanent congenital blindness. Note, there are, to the best of our knowledge, neither MRS nor resting-state EEG studies in individuals with permanent late blindness.

      Our participants necessarily have nystagmus and low visual acuity due to their congenital deprivation phase, and the existence of nystagmus is a recruitment criterion to diagnose congenital cataracts.

      It might be interesting for future studies to investigate individuals with transient late blindness. However, such a study would be ill-motivated had we not found differences between the most “extreme” of congenital visual deprivation conditions and normally sighted individuals (analogous to why earlier research on permanent blindness investigated permanent congenitally blind humans first, rather than permanently late blind humans, or both in the same study). Any result of these future work would need the reference to our study, and neither results in these additional groups would invalidate our findings.

      Since all our congenital cataract reversal individuals by definition had visual impairments, we included an eyes closed condition, both in the MRS and EEG assessment. Any group effect during the eyes closed condition cannot be due to visual acuity deficits changing the bottom-up driven visual activation.

      As we detail in response to review 3, our EEG analyses followed the standards in the field.

      Public Reviews:

      Reviewer (1 (Public review):

      Summary

      In this human neuroimaging and electrophysiology study, the authors aimed to characterise effects of a period of visual deprivation in the sensitive period on excitatory and inhibitory balance in the visual cortex. They attempted to do so by comparing neurochemistry conditions ('eyes open', 'eyes closed') and resting state, and visually evoked EEG activity between ten congenital cataract patients with recovered sight (CC), and ten age-matched control participants (SC) with normal sight.

      First, they used magnetic resonance spectroscopy to measure in vivo neurochemistry from two locations, the primary location of interest in the visual cortex, and a control location in the frontal cortex. Such voxels are used to provide a control for the spatial specificity of any effects, because the single-voxel MRS method provides a single sampling location. Using MR-visible proxies of excitatory and inhibitory neurotransmission, Glx and GABA+ respectively, the authors report no group effects in GABA+ or Glx, no difference in the functional conditions 'eyes closed' and 'eyes open'. They found an effect of group in the ratio of Glx/GABA+ and no similar effect in the control voxel location. They then perform multiple exploratory correlations between MRS measures and visual acuity, and report a weak positive correlation between the 'eyes open' condition and visual acuity in CC participants.

      The same participants then took part in an EEG experiment. The authors selected two electrodes placed in the visual cortex for analysis and report a group difference in an EEG index of neural activity, the aperiodic intercept, as well as the aperiodic slope, considered a proxy for cortical inhibition. Control electrodes in the frontal region did not present with the same pattern. They report an exploratory correlation between the aperiodic intercept and Glx in one out of three EEG conditions.

      The authors report the difference in E/I ratio, and interpret the lower E/I ratio as representing an adaptation to visual deprivation, which would have initially caused a higher E/I ratio. Although intriguing, the strength of evidence in support of this view is not strong. Amongst the limitations are the low sample size, a critical control cohort that could provide evidence for higher E/I ratio in CC patients without recovered sight for example, and lower data quality in the control voxel. Nevertheless, the study provides a rare and valuable insight into experience-dependent plasticity in the human brain.

      Strengths of study

      How sensitive period experience shapes the developing brain is an enduring and important question in neuroscience. This question has been particularly difficult to investigate in humans. The authors recruited a small number of sight-recovered participants with bilateral congenital cataracts to investigate the effect of sensitive period deprivation on the balance of excitation and inhibition in the visual brain using measures of brain chemistry and brain electrophysiology. The research is novel, and the paper was interesting and well written.

      Limitations

      Low sample size. Ten for CC and ten for SC, and further two SC participants were rejected due to lack of frontal control voxel data. The sample size limits the statistical power of the dataset and increases the likelihood of effect inflation.

      In the updated manuscript, the authors have provided justification for their sample size by pointing to prior studies and the inherent difficulties in recruiting individuals with bilateral congenital cataracts. Importantly, this highlights the value the study brings to the field while also acknowledging the need to replicate the effects in a larger cohort.

      Lack of specific control cohort. The control cohort has normal vision. The control cohort is not specific enough to distinguish between people with sight loss due to different causes and patients with congenital cataracts with co-morbidities. Further data from a more specific populations, such as patients whose cataracts have not been removed, with developmental cataracts, or congenitally blind participants, would greatly improve the interpretability of the main finding. The lack of a more specific control cohort is a major caveat that limits a conclusive interpretation of the results.

      In the updated version, the authors have indicated that future studies can pursue comparisons between congenital cataract participants and cohorts with later sight loss.

      MRS data quality differences. Data quality in the control voxel appears worse than in the visual cortex voxel. The frontal cortex MRS spectrum shows far broader linewidth than the visual cortex (Supplementary Figures). Compared to the visual voxel, the frontal cortex voxel has less defined Glx and GABA+ peaks; lower GABA+ and Glx concentrations, lower NAA SNR values; lower NAA concentrations. If the data quality is a lot worse in the FC, then small effects may not be detectable.

      In the updated version, the authors have added more information that informs the reader of the MRS quality differences between voxel locations. This increases the transparency of their reporting and enhances the assessment of the results.

      Because of the direction of the difference in E/I, the authors interpret their findings as representing signatures of sight improvement after surgery without further evidence, either within the study or from the literature. However, the literature suggests that plasticity and visual deprivation drives the E/I index up rather than down. Decreasing GABA+ is thought to facilitate experience dependent remodelling. What evidence is there that cortical inhibition increases in response to a visual cortex that is over-sensitised to due congenital cataracts? Without further experimental or literature support this interpretation remains very speculative.

      The updated manuscript contains key reference from non-human work to justify their interpretation.

      Heterogeneity in patient group. Congenital cataract (CC) patients experienced a variety of duration of visual impairment and were of different ages. They presented with co-morbidities (absorbed lens, strabismus, nystagmus). Strabismus has been associated with abnormalities in GABAergic inhibition in the visual cortex. The possible interactions with residual vision and confounds of co-morbidities are not experimentally controlled for in the correlations, and not discussed.

      The updated document has addressed this caveat.

      Multiple exploratory correlations were performed to relate MRS measures to visual acuity (shown in Supplementary Materials), and only specific ones shown in the main document. The authors describe the analysis as exploratory in the 'Methods' section. Furthermore, the correlation between visual acuity and E/I metric is weak, not corrected for multiple comparisons. The results should be presented as preliminary, as no strong conclusions can be made from them. They can provide a hypothesis to test in a future study.

      This has now been done throughout the document and increases the transparency of the reporting.

      P.16 Given the correlation of the aperiodic intercept with age ("Age negatively correlated with the aperiodic intercept across CC and SC individuals, that is, a flattening of the intercept was observed with age"), age needs to be controlled for in the correlation between neurochemistry and the aperiodic intercept. Glx has also been shown to negatively correlates with age.

      This caveat has been addressed in the revised manuscript.

      Multiple exploratory correlations were performed to relate MRS to EEG measures (shown in Supplementary Materials), and only specific ones shown in the main document. Given the multiple measures from the MRS, the correlations with the EEG measures were exploratory, as stated in the text, p.16, and in Fig.4. yet the introduction said that there was a prior hypothesis "We further hypothesized that neurotransmitter changes would relate to changes in the slope and intercept of the EEG aperiodic activity in the same subjects." It would be great if the text could be revised for consistency and the analysis described as exploratory.

      This has been done throughout the document and increases the transparency of the reporting.

      The analysis for the EEG needs to take more advantage of the available data. As far as I understand, only two electrodes were used, yet far more were available as seen in their previous study (Ossandon et al., 2023). The spatial specificity is not established. The authors could use the frontal cortex electrode (FP1, FP2) signals as a control for spatial specificity in the group effects, or even better, all available electrodes and correct for multiple comparisons. Furthermore, they could use the aperiodic intercept vs Glx in SC to evaluate the specificity of the correlation to CC.

      This caveat has been addressed. The authors have added frontal electrodes to their analysis, providing an essential regional control for the visual cortex location.

      Comments on the latest version:

      The authors have made reasonable adjustments to their manuscript that addressed most of my comments by adding further justification for their methodology, essential literature support, pointing out exploratory analyses, limitations and adding key control analyses. Their revised manuscript has overall improved, providing valuable information, though the evidence that supports their claims is still incomplete.

      We thank the reviewer for suggesting ways to improve our manuscript and carefully reassessing our revised manuscript.

      Reviewer 2 (Public review):

      Summary:

      The study examined 10 congenitally blind patients who recovered vision through the surgical removal of bilateral dense cataracts, measuring neural activity and neuro chemical profiles from the visual cortex. The declared aim is to test whether restoring visual function after years of complete blindness impacts excitation/inhibition balance in the visual cortex.

      Strengths:

      The findings are undoubtedly useful for the community, as they contribute towards characterising the many ways in which this special population differs from normally sighted individuals. The combination of MRS and EEG measures is a promising strategy to estimate a fundamental physiological parameter - the balance between excitation and inhibition in the visual cortex, which animal studies show to be heavily dependent upon early visual experience. Thus, the reported results pave the way for further studies, which may use a similar approach to evaluate more patients and control groups.

      Weaknesses:

      The main methodological limitation is the lack of an appropriate comparison group or condition to delineate the effect of sight recovery (as opposed to the effect of congenital blindness). Few previous studies suggested that Excitation/Inhibition ratio in the visual cortex is increased in congenitally blind patients; the present study reports that E/I ratio decreases instead. The authors claim that this implies a change of E/I ratio following sight recovery. However, supporting this claim would require showing a shift of E/I after vs. before the sight-recovery surgery, or at least it would require comparing patients who did and did not undergo the sight-recovery surgery (as common in the field).

      We thank the reviewer for suggesting ways to improve our manuscript and carefully reassessing our revised manuscript.

      Since we have not been able to acquire longitudinal data with the experimental design of the present study in congenital cataract reversal individuals, we compared the MRS and EEG results of congenital cataract reversal individuals  to published work in congenitally permanent blind individuals. We consider this as a resource saving approach. We think that the results of our cross-sectional study now justify the costs and enormous efforts (and time for the patients who often have to travel long distances) associated with longitudinal studies in this rare population.

      There are also more technical limitations related to the correlation analyses, which are partly acknowledged in the manuscript. A bland correlation between GLX/GABA and the visual impairment is reported, but this is specific to the patients group (N=10) and would not hold across groups (the correlation is positive, predicting the lowest GLX/GABA ratio values for the sighted controls - opposite of what is found). There is also a strong correlation between GLX concentrations and the EEG power at the lowest temporal frequencies. Although this relation is intriguing, it only holds for a very specific combination of parameters (of the many tested): only with eyes open, only in the patients group.

      Given the exploratory nature of the correlations, we do not base the majority of our conclusions on this analysis. There are no doubts that the reported correlations need replication; however, replication is only possible after a first report. Thus, we hope to motivate corresponding analyses in further studies.

      It has to be noted that in the present study significance testing for correlations were corrected for multiple comparisons, and that some findings replicate earlier reports (e.g. effects on EEG aperiodic slope, alpha power, and correlations with chronological age).

      Conclusions:

      The main claim of the study is that sight recovery impacts the excitation/inhibition balance in the visual cortex, estimated with MRS or through indirect EEG indices. However, due to the weaknesses outlined above, the study cannot distinguish the effects of sight recovery from those of visual deprivation. Moreover, many aspects of the results are interesting but their validation and interpretation require additional experimental work.

      We interpret the group differences between individuals tested years after congenital visual deprivation and normally sighted individuals as supportive of the E/I ratio being impacted by congenital visual deprivation. In the absence of a sensitive period for the development of an E/I ratio, individuals with a transient phase of congenital blindness might have developed a visual system indistinguishable  from normally sighted individuals. As we demonstrate, this is not so. Comparing the results of congenitally blind humans with those of congenitally permanently blind humans (from previous studies) allowed us to identify changes of E/I ratio, which add to those found for congenital blindness.  

      We thank the reviewer for the helpful comments and suggestions related to the first submission and first revision of our manuscript. We are keen to translate some of them into future studies.

      Reviewer 3 (Public review):

      This manuscript examines the impact of congenital visual deprivation on the excitatory/inhibitory (E/I) ratio in the visual cortex using Magnetic Resonance Spectroscopy (MRS) and electroencephalography (EEG) in individuals whose sight was restored. Ten individuals with reversed congenital cataracts were compared to age-matched, normally sighted controls, assessing the cortical E/I balance and its interrelationship and to visual acuity. The study reveals that the Glx/GABA ratio in the visual cortex and the intercept and aperiodic signal are significantly altered in those with a history of early visual deprivation, suggesting persistent neurophysiological changes despite visual restoration.

      First of all, I would like to disclose that I am not an expert in congenital visual deprivation, nor in MRS. My expertise is in EEG (particularly in the decomposition of periodic and aperiodic activity) and statistical methods.

      Although the authors addressed some of the concerns of the previous version, major concerns and flaws remain in terms of methodological and statistical approaches along with the (over)interpretation of the results. Specific concerns include:

      (1 3.1 Response to Variability in Visual Deprivation<br /> Rather than listing the advantages and disadvantages of visual deprivation, I recommend providing at least a descriptive analysis of how the duration of visual deprivation influenced the measures of interest. This would enhance the depth and relevance of the discussion.

      Although Review 2 and Review 3 (see below) pointed out problems in interpreting multiple correlational analyses in small samples, we addressed this request by reporting such correlations between visual deprivation history and measured EEG/MRS outcomes.

      Calculating the correlation between duration of visual deprivation and behavioral or brain measures is, in fact, a common suggestion. The existence of sensitive periods, which are typically assumed to not follow a linear gradual decline of neuroplasticity, does not necessary allow predicting a correlation with duration of blindness. Daphne Maurer has additionally worked on the concept of “sleeper effects” (Maurer et al., 2007), that is, effects on the brain and behavior by early deprivation which are observed only later in life when the function/neural circuits matures.

      In accordance with this reasoning, we did not observe a significant correlation between duration of visual deprivation and any of our dependent variables.

      (2 3.2) Small Sample Size

      The issue of small sample size remains problematic. The justification that previous studies employed similar sample sizes does not adequately address the limitation in the current study. I strongly suggest that the correlation analyses should not feature prominently in the main manuscript or the abstract, especially if the discussion does not substantially rely on these correlations. Please also revisit the recommendations made in the section on statistical concerns.

      In the revised manuscript, we explicitly mention that our sample size is not atypical for the special group investigated, but that a replication of our results in larger samples would foster their impact. We only explicitly mention correlations that survived stringent testing for multiple comparisons in the main manuscript.

      Given the exploratory nature of the correlations, we have not based the majority of our claims on this analysis.

      (3 3.3) Statistical Concerns

      While I appreciate the effort of conducting an independent statistical check, it merely validates whether the reported statistical parameters, degrees of freedom (df), and p-values are consistent. However, this does not address the appropriateness of the chosen statistical methods.

      We did not intend for the statcheck report to justify the methods used for statistics, which we have done in a separate section with normality and homogeneity testing (Supplementary Material S9), and references to it in the descriptions of the statistical analyses (Methods, Page 13, Lines 326-329 and Page 15, Lines 400-402).

      Several points require clarification or improvement:

      (4) Correlation Methods: The manuscript does not specify whether the reported correlation analyses are based on Pearson or Spearman correlation.

      The depicted correlations are Pearson correlations. We will add this information to the Methods.

      (5) Confidence Intervals: Include confidence intervals for correlations to represent the uncertainty associated with these estimates.

      We will add the confidence intervals to the second revision of our manuscript.

      (6) Permutation Statistics: Given the small sample size, I recommend using permutation statistics, as these are exact tests and more appropriate for small datasets.

      Our study focuses on a rare population, with a sample size limited by the availability of participants. Our findings provide exploratory insights rather than make strong inferential claims. To this end, we have ensured that our analysis adheres to key statistical assumptions (Shapiro-Wilk as well as Levene’s tests, Supplementary Material S9),and reported our findings with effect sizes, appropriate caution and context.

      (7) Adjusted P-Values: Ensure that reported Bonferroni corrected p-values (e.g., p > 0.999) are clearly labeled as adjusted p-values where applicable.

      In the revised manuscript, we will change Figure 4 to say ‘adjusted p,’  which we indeed reported.

      (8) Figure 2C

      Figure 2C still lacks crucial information that the correlation between Glx/GABA ratio and visual acuity was computed solely in the control group (as described in the rebuttal letter). Why was this analysis restricted to the control group? Please provide a rationale.

      Figure 2C depicts the correlation between Glx/GABA+ ratio and visual acuity in the congenital cataract reversal group, not the control group. This is mentioned in the Figure 2 legend, as well as in the main text where the figure is referred to (Page 18, Line 475).

      The correlation analyses between visual acuity and MRS/EEG measures were only performed in the congenital cataract reversal group since the sighed control group comprised of individuals with vision in the normal range; thus this analyses would not make sense. Table 1 with the individual visual acuities for all participants, including the normally sighted controls, shows the low variance in the latter group.  

      For variables in which no apiori group differences in variance were predicted, we performed the correlation analyses across groups (see Supplementary Material S12, S15).

      We will highlight these motivations more clearly in the Methods of the revised manuscript.

      (9 3.4) Interpretation of Aperiodic Signal

      Relying on previous studies to interpret the aperiodic slope as a proxy for excitation/inhibition (E/I) does not make the interpretation more robust.

      How to interpret aperiodic EEG activity has been subject of extensive investigation. We cite studies which provide evidence from multiple species (monkeys, humans) and measurements (EEG, MEG, ECoG), including studies which pharmacologically manipulated E/I balance.

      Whether our findings are robust, in fact, requires a replication study. Importantly, we analyzed the intercept of the aperiodic activity fit as well, and discuss results related to the intercept.

      Quote:

      “3.4 Interpretation of aperiodic signal:

      - Several recent papers demonstrated that the aperiodic signal measured in EEG or ECoG is related to various important aspects such as age, skull thickness, electrode impedance, as well as cognition. Thus, currently, very little is known about the underlying effects which influence the aperiodic intercept and slope. The entire interpretation of the aperiodic slope as a proxy for E/I is based on a computational model and simulation (as described in the Gao et al. paper).

      Response: Apart from the modeling work from Gao et al., multiple papers which have also been cited which used ECoG, EEG and MEG and showed concomitant changes in aperiodic activity with pharmacological manipulation of the E/I ratio (Colombo et al., 2019; Molina et al., 2020; Muthukumaraswamy & Liley, 2018). Further, several prior studies have interpreted changes in the aperiodic slope as reflective of changes in the E/I ratio, including studies of developmental groups (Favaro et al., 2023; Hill et al., 2022; McSweeney et al., 2023; Schaworonkow & Voytek, 2021) as well as patient groups (Molina et al., 2020; Ostlund et al., 2021).

      - The authors further wrote: We used the slope of the aperiodic (1/f) component of the EEG spectrum as an estimate of E/I ratio (Gao et al., 2017; Medel et al., 2020; Muthukumaraswamy & Liley, 2018). This is a highly speculative interpretation with very little empirical evidence. These papers were conducted with ECoG data (mostly in animals) and mostly under anesthesia. Thus, these studies only allow an indirect interpretation by what the 1/f slope in EEG measurements is actually influenced.

      Response: Note that Muthukumaraswamy et al. (2018) used different types of pharmacological manipulations and analyzed periodic and aperiodic MEG activity in humans, in addition to monkey ECoG (Muthukumaraswamy & Liley, 2018). Further, Medel et al. (now published as Medel et al., 2023) compared EEG activity in addition to ECoG data after propofol administration. The interpretation of our results are in line with a number of recent studies in developing (Hill et al., 2022; Schaworonkow & Voytek, 2021) and special populations using EEG. As mentioned above, several prior studies have used the slope of the 1/f component/aperiodic activity as an indirect measure of the E/I ratio (Favaro et al., 2023; Hill et al., 2022; McSweeney et al., 2023; Molina et al., 2020; Ostlund et al., 2021; Schaworonkow & Voytek, 2021), including studies using scalp-recorded EEG from humans.

      In the introduction of the revised manuscript, we have made more explicit that this metric is indirect (Page 3, Line 91), (additionally see Discussion, Page 24, Lines 644-645, Page 25, Lines 650-657).

      While a full understanding of aperiodic activity needs to be provided, some convergent ideas have emerged. We think that our results contribute to this enterprise, since our study is, to the best of our knowledge, the first which assessed MRS measured neurotransmitter levels and EEG aperiodic activity.“

      (10) Additionally, the authors state:

      "We cannot think of how any of the exploratory correlations between neurophysiological measures and MRS measures could be accounted for by a difference e.g. in skull thickness."

      (11) This could be addressed directly by including skull thickness as a covariate or visualizing it in scatterplots, for instance, by representing skull thickness as the size of the dots.

      We are not aware of any study that would justify such an analysis.

      Our analyses were based on previous findings in the literature.

      Since to the best of our knowledge, no evidence exists that congenital cataracts go together with changes in skull thickness, and that skull thickness might selectively modulate visual cortex Glx/GABA+ but not NAA measures, we decided against following this suggestion.

      Notably, the neurotransmitter concentration reported here is after tissue segmentation of the voxel region. The tissue fraction was shown to not differ between groups in the MRS voxels (Supplementary Material S4). The EEG electrode impedance was lowered to <10 kOhm in every participant (Methods, Page 13, Line 344), and preparation was identical across groups.

      (12 3.5) Problems with EEG Preprocessing and Analysis

      Downsampling: The decision to downsample the data to 60 Hz "to match the stimulation rate" is problematic. This choice conflates subsequent spectral analyses due to aliasing issues, as explained by the Nyquist theorem. While the authors cite prior studies (Schwenk et al., 2020; VanRullen & MacDonald, 2012) to justify this decision, these studies focused on alpha (8-12 Hz), where aliasing is less of a concern compared of analyzing aperiodic signal. Furthermore, in contrast, the current study analyzes the frequency range from 1-20 Hz, which is too narrow for interpreting the aperiodic signal as E/I. Typically, this analysis should include higher frequencies, spanning at least 1-30 Hz or even 1-45 Hz (not 20-40 Hz).

      As mentioned in the Methods (Page 15 Line 376) and the previous response, the pop_resample function used by EEGLAB applies an anti-aliasing filter, at half the resampling frequency (as per the Nyquist theorem https://eeglab.org/tutorials/05_Preprocess/resampling.html). The upper cut off of the low pass filter set by EEGlab prior to down sampling (30 Hz) is still far above the frequency of interest in the current study  (1-20 Hz), thus allowing us to derive valid results.

      Quote:

      “- The authors downsampled the data to 60Hz to "to match the stimulation rate". What is the intention of this? Because the subsequent spectral analyses are conflated by this choice (see Nyquist theorem).

      Response: This data were collected as part of a study designed to evoke alpha activity with visual white-noise, which ranged in luminance with equal power at all frequencies from 1-60 Hz, restricted by the refresh rate of the monitor on which stimuli were presented (Pant et al., 2023). This paradigm and method was developed by VanRullen and colleagues (Schwenk et al., 2020; Vanrullen & MacDonald, 2012), wherein the analysis requires the same sampling rate between the presented frequencies and the EEG data. The downsampling function used here automatically applies an anti-aliasing filter (EEGLAB 2019) .”

      Moreover, the resting-state data were not resampled to 60 Hz. We will make this clearer in the Methods of the revised manuscript.

      Our consistent results of group differences across all three  EEG conditions, thus, exclude any possibility that they were driven by aliasing artifacts.

      The expected effects of this anti-aliasing filter can be seen in the attached Figure R1, showing an example participant’s spectrum in the 1-30 Hz range (as opposed to the 1-20 Hz plotted in the manuscript), clearly showing a 30-40 dB drop at 30 Hz. Any aliasing due to, for example, remaining line noise, would additionally be visible in this figure (as well as Figure 3) as a peak.

      Author response image 1.

      Power spectral density of one congenital cataract-reversal (CC) participant in the visual stimulation condition across all channels. The reduced power at 30 Hz shows the effects of the anti-aliasing filter applied by EEGLAB’s pop_resample function.

      As we stated in the manuscript, and in previous reviews, so far there has been no consensus on the exact range of measuring aperiodic activity. We made a principled decision based on the literature (showing a knee in aperiodic fits of this dataset at 20 Hz) (Medel et al., 2023; Ossandón et al., 2023), data quality (possible contamination by line noise at higher frequencies) and the purpose of the visual stimulation experiment (to look at the lower frequency range by stimulating up to 60 Hz, thereby limiting us to quantifying below 30 Hz), that 1-20 Hz would be the fit range in this dataset.

      Quote:

      “(3) What's the underlying idea of analyzing two separate aperiodic slopes (20-40Hz and 1-19Hz). This is very unusual to compute the slope between 20-40 Hz, where the SNR is rather low.

      "Ossandón et al. (2023), however, observed that in addition to the flatter slope of the aperiodic power spectrum in the high frequency range (20-40 Hz), the slope of the low frequency range (1-19 Hz) was steeper in both, congenital cataract-reversal individuals, as well as in permanently congenitally blind humans."

      Response: The present manuscript computed the slope between 1-20 Hz. Ossandón et al. as well as Medel et al. (2023) found a “knee” of the 1/f distribution at 20 Hz and describe further the motivations for computing both slope ranges. For example, Ossandón et al. used a data driven approach and compared single vs. dual fits and found that the latter fitted the data better. Additionally, they found the best fit if a knee at 20 Hz was used. We would like to point out that no standard range exists for the fitting of the 1/f component across the literature and, in fact, very different ranges have been used (Gao et al., 2017; Medel et al., 2023; Muthukumaraswamy & Liley, 2018).“

      (13) Baseline Removal: Subtracting the mean activity across an epoch as a baseline removal step is inappropriate for resting-state EEG data. This preprocessing step undermines the validity of the analysis. The EEG dataset has fundamental flaws, many of which were pointed out in the previous review round but remain unaddressed. In its current form, the manuscript falls short of standards for robust EEG analysis. If I were reviewing for another journal, I would recommend rejection based on these flaws.

      The baseline removal step from each epoch serves to remove the DC component of the recording and detrend the data. This is a standard preprocessing step (included as an option in preprocessing pipelines recommended by the EEGLAB toolbox, FieldTrip toolbox and MNE toolbox), additionally necessary to improve the efficacy of ICA decomposition (Groppe et al., 2009).

      In the previous review round, a clarification of the baseline timing was requested, which we added. Beyond this request, there was no mention of the appropriateness of the baseline removal and/or a request to provide reasons for why it might not undermine the validity of the analysis.

      Quote:

      “- "Subsequently, baseline removal was conducted by subtracting the mean activity across the length of an epoch from every data point." The actual baseline time segment should be specified.

      Response: The time segment was the length of the epoch, that is, 1 second for the resting state conditions and 6.25 seconds for the visual stimulation conditions. This has been explicitly stated in the revised manuscript (Page 13, Line 354).”

      Prior work in the time (not frequency) domain on event-related potential (ERP) analysis has suggested that the baselining step might cause spurious effects (Delorme, 2023) (although see (Tanner et al., 2016)). We did not perform ERP analysis at any stage. One recent study suggests spurious group differences in the 1/f signal might be driven by an inappropriate dB division baselining method (Gyurkovics et al., 2021), which we did not perform.

      Any effect of our baselining procedure on the FFT spectrum would be below the 1 Hz range, which we did not analyze.  

      Each of the preprocessing steps in the manuscript match pipelines described and published in extensive prior work. We document how multiple aspects of our EEG results replicate prior findings (Supplementary Material S15, S18, S19), reports of other experimenters, groups and locations, validating that our results are robust.

      We therefore reject the claim of methodological flaws in our EEG analyses in the strongest possible terms.

      Quote:

      “3.5 Problems with EEG preprocessing and analysis:

      - It seems that the authors did not identify bad channels nor address the line noise issue (even a problem if a low pass filter of below-the-line noise was applied).

      Response: As pointed out in the methods and Figure 1, we only analyzed data from two occipital channels, O1 and O2 neither of which were rejected for any participant. Channel rejection was performed for the larger dataset, published elsewhere (Ossandón et al., 2023; Pant et al., 2023). As control sites we added the frontal channels FP1 and Fp2 (see Supplementary Material S14)

      Neither Ossandón et al. (2023) nor Pant et al. (2023) considered frequency ranges above 40 Hz to avoid any possible contamination with line noise. Here, we focused on activity between 0 and 20 Hz, definitely excluding line noise contaminations (Methods, Page 14, Lines 365-367). The low pass filter (FIR, 1-45 Hz) guaranteed that any spill-over effects of line noise would be restricted to frequencies just below the upper cutoff frequency.

      Additionally, a prior version of the analysis used spectrum interpolation to remove line noise; the group differences remained stable (Ossandón et al., 2023). We have reported this analysis in the revised manuscript (Page 14, Lines 364-357).

      Further, both groups were measured in the same lab, making line noise (~ 50 Hz) as an account for the observed group effects in the 1-20 Hz frequency range highly unlikely. Finally, any of the exploratory MRS-EEG correlations would be hard to explain if the EEG parameters would be contaminated with line noise.

      - What was the percentage of segments that needed to be rejected due to the 120μV criteria? This should be reported specifically for EO & EC and controls and patients.

      Response: The mean percentage of 1 second segments rejected for each resting state condition and the percentage of 6.25 long segments rejected in each group for the visual stimulation condition have been added to the revised manuscript (Supplementary Material S10), and referred to in the Methods on Page 14, Lines 372-373).

      - The authors downsampled the data to 60Hz to "to match the stimulation rate". What is the intention of this? Because the subsequent spectral analyses are conflated by this choice (see Nyquist theorem).

      Response: This data were collected as part of a study designed to evoke alpha activity with visual white-noise, which changed in luminance with equal power at all frequencies from 1-60 Hz, restricted by the refresh rate of the monitor on which stimuli were presented (Pant et al., 2023). This paradigm and method was developed by VanRullen and colleagues (Schwenk et al., 2020; VanRullen & MacDonald, 2012), wherein the analysis requires the same sampling rate between the presented frequencies and the EEG data. The downsampling function used here automatically applies an anti-aliasing filter (EEGLAB 2019) .

      - "Subsequently, baseline removal was conducted by subtracting the mean activity across the length of an epoch from every data point." The actual baseline time segment should be specified.

      The time segment was the length of the epoch, that is, 1 second for the resting state conditions and 6.25 seconds for the visual stimulation conditions. This has now been explicitly stated in the revised manuscript (Page 14, Lines 379-380).<br /> - "We excluded the alpha range (8-14 Hz) for this fit to avoid biasing the results due to documented differences in alpha activity between CC and SC individuals (Bottari et al., 2016; Ossandón et al., 2023; Pant et al., 2023)." This does not really make sense, as the FOOOF algorithm first fits the 1/f slope, for which the alpha activity is not relevant.

      Response: We did not use the FOOOF algorithm/toolbox in this manuscript. As stated in the Methods, we used a 1/f fit to the 1-20 Hz spectrum in the log-log space, and subtracted this fit from the original spectrum to obtain the corrected spectrum. Given the pronounced difference in alpha power between groups (Bottari et al., 2016; Ossandón et al., 2023; Pant et al., 2023), we were concerned it might drive differences in the exponent values. Our analysis pipeline had been adapted from previous publications of our group and other labs (Ossandón et al., 2023; Voytek et al., 2015; Waschke et al., 2017).

      We have conducted the analysis with and without the exclusion of the alpha range, as well as using the FOOOF toolbox both in the 1-20 Hz and 20-40 Hz ranges (Ossandón et al., 2023). The findings of a steeper slope in the 1-20 Hz range as well as lower alpha power in CC vs SC individuals remained stable. In Ossandón et al., the comparison between the piecewise fits and FOOOF fits led the authors to use the former, as it outperformed the FOOOF algorithm for their data.

      - The model fits of the 1/f fitting for EO, EC, and both participant groups should be reported.

      Response: In Figure 3 of the manuscript, we depicted the mean spectra and 1/f fits for each group.

      In the revised manuscript, we added the fit quality metrics (average R<sup>2</sup> values > 0.91 for each group and condition) (Methods Page 15, Lines 395-396; Supplementary Material S11) and additionally show individual subjects’ fits (Supplementary Material S11).“

      (14) The authors mention:

      "The EEG data sets reported here were part of data published earlier (Ossandón et al., 2023; Pant et al., 2023)." Thus, the statement "The group differences for the EEG assessments corresponded to those of a larger sample of CC individuals (n=38) " is a circular argument and should be avoided."

      The authors addressed this comment and adjusted the statement. However, I do not understand, why not the full sample published earlier (Ossandón et al., 2023) was used in the current study?

      The recording of EEG resting state data stated in 2013, while MRS testing could only be set up by the end of 2019. Moreover, not all subjects who qualify for EEG recording qualify for being scanned (e.g. due to MRI safety, claustrophobia)

      References

      Bottari, D., Troje, N. F., Ley, P., Hense, M., Kekunnaya, R., & Röder, B. (2016). Sight restoration after congenital blindness does not reinstate alpha oscillatory activity in humans. Scientific Reports. https://doi.org/10.1038/srep24683

      Colombo, M. A., Napolitani, M., Boly, M., Gosseries, O., Casarotto, S., Rosanova, M., Brichant, J. F., Boveroux, P., Rex, S., Laureys, S., Massimini, M., Chieregato, A., & Sarasso, S. (2019). The spectral exponent of the resting EEG indexes the presence of consciousness during unresponsiveness induced by propofol, xenon, and ketamine. NeuroImage, 189(September 2018), 631–644. https://doi.org/10.1016/j.neuroimage.2019.01.024

      Delorme, A. (2023). EEG is better left alone. Scientific Reports, 13(1), 2372. https://doi.org/10.1038/s41598-023-27528-0

      Favaro, J., Colombo, M. A., Mikulan, E., Sartori, S., Nosadini, M., Pelizza, M. F., Rosanova, M., Sarasso, S., Massimini, M., & Toldo, I. (2023). The maturation of aperiodic EEG activity across development reveals a progressive differentiation of wakefulness from sleep. NeuroImage, 277. https://doi.org/10.1016/J.NEUROIMAGE.2023.120264

      Gao, R., Peterson, E. J., & Voytek, B. (2017). Inferring synaptic excitation/inhibition balance from field potentials. NeuroImage, 158(March), 70–78. https://doi.org/10.1016/j.neuroimage.2017.06.078

      Groppe, D. M., Makeig, S., & Kutas, M. (2009). Identifying reliable independent components via split-half comparisons. NeuroImage, 45(4), 1199–1211. https://doi.org/10.1016/j.neuroimage.2008.12.038

      Gyurkovics, M., Clements, G. M., Low, K. A., Fabiani, M., & Gratton, G. (2021). The impact of 1/f activity and baseline correction on the results and interpretation of time-frequency analyses of EEG/MEG data: A cautionary tale. NeuroImage, 237. https://doi.org/10.1016/j.neuroimage.2021.118192

      Hill, A. T., Clark, G. M., Bigelow, F. J., Lum, J. A. G., & Enticott, P. G. (2022). Periodic and aperiodic neural activity displays age-dependent changes across early-to-middle childhood. Developmental Cognitive Neuroscience, 54, 101076. https://doi.org/10.1016/J.DCN.2022.101076

      Maurer, D., Mondloch, C. J., & Lewis, T. L. (2007). Sleeper effects. In Developmental Science. https://doi.org/10.1111/j.1467-7687.2007.00562.x

      McSweeney, M., Morales, S., Valadez, E. A., Buzzell, G. A., Yoder, L., Fifer, W. P., Pini, N., Shuffrey, L. C., Elliott, A. J., Isler, J. R., & Fox, N. A. (2023). Age-related trends in aperiodic EEG activity and alpha oscillations during early- to middle-childhood. NeuroImage, 269, 119925. https://doi.org/10.1016/j.neuroimage.2023.119925

      Medel, V., Irani, M., Crossley, N., Ossandón, T., & Boncompte, G. (2023). Complexity and 1/f slope jointly reflect brain states. Scientific Reports, 13(1), 21700. https://doi.org/10.1038/s41598-023-47316-0

      Molina, J. L., Voytek, B., Thomas, M. L., Joshi, Y. B., Bhakta, S. G., Talledo, J. A., Swerdlow, N. R., & Light, G. A. (2020). Memantine Effects on Electroencephalographic Measures of Putative Excitatory/Inhibitory Balance in Schizophrenia. Biological Psychiatry: Cognitive Neuroscience and Neuroimaging, 5(6), 562–568. https://doi.org/10.1016/j.bpsc.2020.02.004

      Muthukumaraswamy, S. D., & Liley, D. T. (2018). 1/F electrophysiological spectra in resting and drug-induced states can be explained by the dynamics of multiple oscillatory relaxation processes. NeuroImage, 179(November 2017), 582–595. https://doi.org/10.1016/j.neuroimage.2018.06.068

      Ossandón, J. P., Stange, L., Gudi-Mindermann, H., Rimmele, J. M., Sourav, S., Bottari, D., Kekunnaya, R., & Röder, B. (2023). The development of oscillatory and aperiodic resting state activity is linked to a sensitive period in humans. NeuroImage, 275, 120171. https://doi.org/10.1016/J.NEUROIMAGE.2023.120171

      Ostlund, B. D., Alperin, B. R., Drew, T., & Karalunas, S. L. (2021). Behavioral and cognitive correlates of the aperiodic (1/f-like) exponent of the EEG power spectrum in adolescents with and without ADHD. Developmental Cognitive Neuroscience, 48, 100931. https://doi.org/10.1016/j.dcn.2021.100931

      Pant, R., Ossandón, J., Stange, L., Shareef, I., Kekunnaya, R., & Röder, B. (2023). Stimulus-evoked and resting-state alpha oscillations show a linked dependence on patterned visual experience for development. NeuroImage: Clinical, 103375. https://doi.org/10.1016/J.NICL.2023.103375

      Schaworonkow, N., & Voytek, B. (2021). Longitudinal changes in aperiodic and periodic activity in electrophysiological recordings in the first seven months of life. Developmental Cognitive Neuroscience, 47. https://doi.org/10.1016/j.dcn.2020.100895

      Schwenk, J. C. B., VanRullen, R., & Bremmer, F. (2020). Dynamics of Visual Perceptual Echoes Following Short-Term Visual Deprivation. Cerebral Cortex Communications, 1(1). https://doi.org/10.1093/TEXCOM/TGAA012

      Tanner, D., Norton, J. J. S., Morgan-Short, K., & Luck, S. J. (2016). On high-pass filter artifacts (they’re real) and baseline correction (it’s a good idea) in ERP/ERMF analysis. Journal of Neuroscience Methods, 266, 166–170. https://doi.org/10.1016/j.jneumeth.2016.01.002

      Vanrullen, R., & MacDonald, J. S. P. (2012). Perceptual echoes at 10 Hz in the human brain. Current Biology. https://doi.org/10.1016/j.cub.2012.03.050

      Voytek, B., Kramer, M. A., Case, J., Lepage, K. Q., Tempesta, Z. R., Knight, R. T., & Gazzaley, A. (2015). Age-related changes in 1/f neural electrophysiological noise. Journal of Neuroscience, 35(38). https://doi.org/10.1523/JNEUROSCI.2332-14.2015

      Waschke, L., Wöstmann, M., & Obleser, J. (2017). States and traits of neural irregularity in the age-varying human brain. Scientific Reports 2017 7:1, 7(1), 1–12. https://doi.org/10.1038/s41598-017-17766-4


      The following is the authors’ response to the original reviews.

      eLife Assessment

      This potentially useful study involves neuro-imaging and electrophysiology in a small cohort of congenital cataract patients after sight recovery and age-matched control participants with normal sight. It aims to characterize the effects of early visual deprivation on excitatory and inhibitory balance in the visual cortex. While the findings are taken to suggest the existence of persistent alterations in Glx/GABA ratio and aperiodic EEG signals, the evidence supporting these claims is incomplete. Specifically, small sample sizes, lack of a specific control cohort, and other methodological limitations will likely restrict the usefulness of the work, with relevance limited to scientists working in this particular subfield.

      As pointed out in the public reviews, there are very few human models which allow for assessing the role of early experience on neural circuit development. While the prevalent research in permanent congenital blindness reveals the response and adaptation of the developing brain to an atypical situation (blindness), research in sight restoration addresses the question of whether and how atypical development can be remediated if typical experience (vision) is restored. The literature on the role of visual experience in the development of E/I balance in humans, assessed via Magnetic Resonance Spectroscopy (MRS), has been limited to a few studies on congenital permanent blindness. Thus, we assessed sight recovery individuals with a history of congenital blindness, as limited evidence from other researchers indicated that the visual cortex E/I ratio might differ compared to normally sighted controls.

      Individuals with total bilateral congenital cataracts who remained untreated until later in life are extremely rare, particularly if only carefully diagnosed patients are included in a study sample. A sample size of 10 patients is, at the very least, typical of past studies in this population, even for exclusively behavioral assessments. In the present study, in addition to behavioral assessment as an indirect measure of sensitive periods, we investigated participants with two neuroimaging methods (Magnetic Resonance Spectroscopy and electroencephalography) to directly assess the neural correlates of sensitive periods in humans. The electroencephalography data allowed us to link the results of our small sample to findings documented in large cohorts of both, sight recovery individuals and permanently congenitally blind individuals. As pointed out in a recent editorial recommending an “exploration-then-estimation procedure,” (“Consideration of Sample Size in Neuroscience Studies,” 2020), exploratory studies like ours provide crucial direction and specific hypotheses for future work.

      We included an age-matched sighted control group recruited from the same community, measured in the same scanner and laboratory, to assess whether early experience is necessary for a typical excitatory/inhibitory (E/I) ratio to emerge in adulthood. The present findings indicate that this is indeed the case. Based on these results, a possible question to answer in future work, with individuals who had developmental cataracts, is whether later visual deprivation causes similar effects. Note that even if visual deprivation at a later stage in life caused similar effects, the current results would not be invalidated; by contrast, they are essential to understand future work on late (permanent or transient) blindness.

      Thus, we think that the present manuscript has far reaching implications for our understanding of the conditions under which E/I balance, a crucial characteristic of brain functioning, emerges in humans.

      Finally, our manuscript is one of the first few studies that relate MRS neurotransmitter concentrations to parameters of EEG aperiodic activity. Since present research has been using aperiodic activity as a correlate of the E/I ratio, and partially of higher cognitive functions, we think that our manuscript additionally contributes to a better understanding of what might be measured with aperiodic neurophysiological activity.

      Public Reviews:<br /> Reviewer #1 (Public Review):

      Summary:

      In this human neuroimaging and electrophysiology study, the authors aimed to characterize the effects of a period of visual deprivation in the sensitive period on excitatory and inhibitory balance in the visual cortex. They attempted to do so by comparing neurochemistry conditions ('eyes open', 'eyes closed') and resting state, and visually evoked EEG activity between ten congenital cataract patients with recovered sight (CC), and ten age-matched control participants (SC) with normal sight.

      First, they used magnetic resonance spectroscopy to measure in vivo neurochemistry from two locations, the primary location of interest in the visual cortex, and a control location in the frontal cortex. Such voxels are used to provide a control for the spatial specificity of any effects because the single-voxel MRS method provides a single sampling location. Using MR-visible proxies of excitatory and inhibitory neurotransmission, Glx and GABA+ respectively, the authors report no group effects in GABA+ or Glx, no difference in the functional conditions 'eyes closed' and 'eyes open'. They found an effect of the group in the ratio of Glx/GABA+ and no similar effect in the control voxel location. They then performed multiple exploratory correlations between MRS measures and visual acuity, and reported a weak positive correlation between the 'eyes open' condition and visual acuity in CC participants.

      The same participants then took part in an EEG experiment. The authors selected only two electrodes placed in the visual cortex for analysis and reported a group difference in an EEG index of neural activity, the aperiodic intercept, as well as the aperiodic slope, considered a proxy for cortical inhibition. They report an exploratory correlation between the aperiodic intercept and Glx in one out of three EEG conditions.

      The authors report the difference in E/I ratio, and interpret the lower E/I ratio as representing an adaptation to visual deprivation, which would have initially caused a higher E/I ratio. Although intriguing, the strength of evidence in support of this view is not strong. Amongst the limitations are the low sample size, a critical control cohort that could provide evidence for a higher E/I ratio in CC patients without recovered sight for example, and lower data quality in the control voxel.

      Strengths of study:

      How sensitive period experience shapes the developing brain is an enduring and important question in neuroscience. This question has been particularly difficult to investigate in humans. The authors recruited a small number of sight-recovered participants with bilateral congenital cataracts to investigate the effect of sensitive period deprivation on the balance of excitation and inhibition in the visual brain using measures of brain chemistry and brain electrophysiology. The research is novel, and the paper was interesting and well-written.

      Limitations:

      (1.1) Low sample size. Ten for CC and ten for SC, and a further two SC participants were rejected due to a lack of frontal control voxel data. The sample size limits the statistical power of the dataset and increases the likelihood of effect inflation.

      Applying strict criteria, we only included individuals who were born with no patterned vision in the CC group. The population of individuals who have remained untreated past infancy is small in India, despite a higher prevalence of childhood cataract than Germany. Indeed, from the original 11 CC and 11 SC participants tested, one participant each from the CC and SC group had to be rejected, as their data had been corrupted, resulting in 10 participants in each group.

      It was a challenge to recruit participants from this rare group with no history of neurological diagnosis/intake of neuromodulatory medications, who were able and willing to undergo both MRS and EEG. For this study, data collection took more than 2.5 years.

      We took care of the validity of our results with two measures; first, we assessed not just MRS, but additionally, EEG measures of E/I ratio. The latter allowed us to link results to a larger population of CC individuals, that is, we replicated the results of a larger group of 28 additional individuals (Ossandón et al., 2023) in our sub-group.

      Second, we included a control voxel. As predicted, all group effects were restricted to the occipital voxel.

      (1.2) Lack of specific control cohort. The control cohort has normal vision. The control cohort is not specific enough to distinguish between people with sight loss due to different causes and patients with congenital cataracts with co-morbidities. Further data from more specific populations, such as patients whose cataracts have not been removed, with developmental cataracts, or congenitally blind participants, would greatly improve the interpretability of the main finding. The lack of a more specific control cohort is a major caveat that limits a conclusive interpretation of the results.

      The existing work on visual deprivation and neurochemical changes, as assessed with MRS, has been limited to permanent congenital blindness. In fact, most of the studies on permanent blindness included only congenitally blind or early blind humans (Coullon et al., 2015; Weaver et al., 2013), or, in separate studies, only late-blind individuals (Bernabeu et al., 2009). Thus, accordingly, we started with the most “extreme” visual deprivation model, sight recovery after congenital blindness. If we had not observed any group difference compared to normally sighted controls, investigating other groups might have been trivial. Based on our results, subsequent studies in late blind individuals, and then individuals with developmental cataracts, can be planned with clear hypotheses.

      (1.3) MRS data quality differences. Data quality in the control voxel appears worse than in the visual cortex voxel. The frontal cortex MRS spectrum shows far broader linewidth than the visual cortex (Supplementary Figures). Compared to the visual voxel, the frontal cortex voxel has less defined Glx and GABA+ peaks; lower GABA+ and Glx concentrations, lower NAA SNR values; lower NAA concentrations. If the data quality is a lot worse in the FC, then small effects may not be detectable.

      Worse data quality in the frontal than the visual cortex has been repeatedly observed in the MRS literature, attributable to magnetic field distortions (Juchem & Graaf, 2017) resulting from the proximity of the region to the sinuses (recent example: (Rideaux et al., 2022)). Nevertheless, we chose the frontal control region rather than a parietal voxel, given the potential neurochemical changes in multisensory regions of the parietal cortex due to blindness. Such reorganization would be less likely in frontal areas associated with higher cognitive functions. Further, prior MRS studies of the visual cortex have used the frontal cortex as a control region as well (Pitchaimuthu et al., 2017; Rideaux et al., 2022). In the revised manuscript, we more explicitly inform the reader about this data quality difference between regions in the Methods (Pages 11-12, MRS Data Quality/Table 2) and Discussion (Page 25, Lines 644- 647).

      Importantly, while in the present study data quality differed between the frontal and visual cortex voxel, it did not differ between groups (Supplementary Material S6).  

      Further, we checked that the frontal cortex datasets for Glx and GABA+ concentrations were of sufficient quality: the fit error was below 8.31% in both groups (Supplementary Material S3). For reference, Mikkelsen et al. reported a mean GABA+ fit error of 6.24 +/- 1.95% from a posterior cingulate cortex voxel across 8 GE scanners, using the Gannet pipeline. No absolute cutoffs have been proposed for fit errors. However, MRS studies in special populations (I/E ratio assessed in narcolepsy (Gao et al., 2024), GABA concentration assessed in Autism Spectrum Disorder (Maier et al., 2022) have used frontal cortex data with a fit error of <10% to identify differences between cohorts (Gao et al., 2024; Pitchaimuthu et al., 2017). Based on the literature, MRS data from the frontal voxel of the present study would have been of sufficient quality to uncover group differences.

      In the revised manuscript, we added the recently published MRS quality assessment form to the supplementary materials (Supplementary Excel File S1). Additionally, we would like to allude to our apriori prediction of group differences for the visual cortex, but not for the frontal cortex voxel. Finally, EEG data quality did not differ between frontal and occipital electrodes; therefore, lower sensitivity of frontal measures cannot easily explain the lack of group differences for frontal measures.

      (1.4) Because of the direction of the difference in E/I, the authors interpret their findings as representing signatures of sight improvement after surgery without further evidence, either within the study or from the literature. However, the literature suggests that plasticity and visual deprivation drive the E/I index up rather than down. Decreasing GABA+ is thought to facilitate experience-dependent remodelling. What evidence is there that cortical inhibition increases in response to a visual cortex that is over-sensitised due to congenital cataracts? Without further experimental or literature support this interpretation remains very speculative.

      Indeed, higher inhibition was not predicted, which we attempt to reconcile in our discussion section. We base our discussion mainly on the non-human animal literature, which has shown evidence of homeostatic changes after prolonged visual deprivation in the adult brain (Barnes et al., 2015). It is also interesting to note that after monocular deprivation in adult humans, resting GABA+ levels decreased in the visual cortex (Lunghi et al., 2015). Assuming that after delayed sight restoration, adult neuroplasticity mechanisms must be employed, these studies would predict a “balancing” of the increased excitatory drive following sight restoration by a commensurate increase in inhibition (Keck et al., 2017). Additionally, the EEG results of the present study allowed for speculation regarding the underlying neural mechanisms of an altered E/I ratio. The aperiodic EEG activity suggested higher spontaneous spiking (increased intercept) and increased inhibition (steeper aperiodic slope between 1-20 Hz) in CC vs SC individuals (Ossandón et al., 2023).

      In the revised manuscript, we have more clearly indicated that these speculations are based primarily on non-human animal work, due to the lack of human studies on the subject (Page 23, Lines 609-613).

      (1.5) Heterogeneity in the patient group. Congenital cataract (CC) patients experienced a variety of duration of visual impairment and were of different ages. They presented with co-morbidities (absorbed lens, strabismus, nystagmus). Strabismus has been associated with abnormalities in GABAergic inhibition in the visual cortex. The possible interactions with residual vision and confounds of co-morbidities are not experimentally controlled for in the correlations, and not discussed.

      The goal of the present study was to assess whether we would observe changes in E/I ratio after restoring vision at all. We would not have included patients without nystagmus in the CC group of the present study, since it would have been unlikely that they experienced congenital patterned visual deprivation. Amongst diagnosticians, nystagmus or strabismus might not be considered genuine “comorbidities” that emerge in people with congenital cataracts. Rather, these are consequences of congenital visual deprivation, which we employed as diagnostic criteria. Similarly, absorbed lenses are clear signs that cataracts were congenital. As in other models of experience dependent brain development (e.g. the extant literature on congenital permanent blindness, including anophthalmic individuals (Coullon et al., 2015; Weaver et al., 2013), some uncertainty remains regarding whether the (remaining, in our case) abnormalities of the eye, or the blindness they caused, are the factors driving neural changes. In case of people with reversed congenital cataracts, at least the retina is considered to be intact, as they would otherwise not receive cataract removal surgery.

      However, we consider it unlikely that strabismus caused the group differences, because the present study shows group differences in the Glx/GABA+ ratio at rest, regardless of eye opening or eye closure, for which strabismus would have caused distinct effects. By contrast, the link between GABA concentration and, for example, interocular suppression in strabismus, have so far been documented during visual stimulation (Mukerji et al., 2022; Sengpiel et al., 2006), and differed in direction depending on the amblyopic vs. non-amblyopic eye. Further, one MRS study did not find group differences in GABA concentration between the visual cortices of 16 amblyopic individuals and sighted controls (Mukerji et al., 2022), supporting that the differences in Glx/GABA+ concentration which we observed were driven by congenital deprivation, and not amblyopia-associated visual acuity or eye movement differences. 

      In the revised manuscript, we discussed the inclusion criteria in more detail, and the aforementioned reasons why our data remains interpretable (Page 5, Lines 143 – 145, Lines 147-149). 

      (1.6) Multiple exploratory correlations were performed to relate MRS measures to visual acuity (shown in Supplementary Materials), and only specific ones were shown in the main document. The authors describe the analysis as exploratory in the 'Methods' section. Furthermore, the correlation between visual acuity and E/I metric is weak, and not corrected for multiple comparisons. The results should be presented as preliminary, as no strong conclusions can be made from them. They can provide a hypothesis to test in a future study.

      In the revised manuscript, we have clearly indicated that the exploratory correlation analyses are reported to put forth hypotheses for future studies (Page 4, Lines 118-128; Page 5, Lines 132-134; Page 25, Lines 644- 647).

      (1.7) P.16 Given the correlation of the aperiodic intercept with age ("Age negatively correlated with the aperiodic intercept across CC and SC individuals, that is, a flattening of the intercept was observed with age"), age needs to be controlled for in the correlation between neurochemistry and the aperiodic intercept. Glx has also been shown to negatively correlate with age.

      The correlation between chronological age and aperiodic intercept was observed across groups, but the correlation between Glx and the intercept of the aperiodic EEG activity was seen only in the CC group, even though the SC group was matched for age. Thus, such a correlation was very unlikely to be predominantly driven by an effect of chronological age.

      In the revised manuscript, we added the linear regressions with age as a covariate (Supplementary Material S16, referred to in the main Results, Page 21, Lines 534-537), demonstrating the significant relationship between aperiodic intercept and Glx concentration in the CC group. 

      (1.8) Multiple exploratory correlations were performed to relate MRS to EEG measures (shown in Supplementary Materials), and only specific ones were shown in the main document. Given the multiple measures from the MRS, the correlations with the EEG measures were exploratory, as stated in the text, p.16, and in Figure 4. Yet the introduction said that there was a prior hypothesis "We further hypothesized that neurotransmitter changes would relate to changes in the slope and intercept of the EEG aperiodic activity in the same subjects." It would be great if the text could be revised for consistency and the analysis described as exploratory.

      In the revised manuscript, we improved the phrasing (Page 5, Lines 130-132) and consistently reported the correlations as exploratory in the Methods and Discussion. We consider the correlation analyses as exploratory due to our sample size and the absence of prior work. However, we did hypothesize that both MRS and EEG markers would concurrently be altered in CC vs SC individuals.

      (1.9) The analysis for the EEG needs to take more advantage of the available data. As far as I understand, only two electrodes were used, yet far more were available as seen in their previous study (Ossandon et al., 2023). The spatial specificity is not established. The authors could use the frontal cortex electrode (FP1, FP2) signals as a control for spatial specificity in the group effects, or even better, all available electrodes and correct for multiple comparisons. Furthermore, they could use the aperiodic intercept vs Glx in SC to evaluate the specificity of the correlation to CC.

      The aperiodic intercept and slope did not differ between CC and SC individuals for Fp1 and Fp2, suggesting the spatial specificity of the results. In the revised manuscript, we added this analysis to the Supplementary Material (Supplementary Material S14) and referred to it in our Results (Page 20, Lines 513-514).

      Further, Glx concentration in the visual cortex did not correlate with the aperiodic intercept in the SC group (Figure 4), suggesting that this relationship was indeed specific to the CC group.

      The data from all electrodes has been analyzed and published in other studies as well (Pant et al., 2023; Ossandón et al., 2023). 

      Reviewer #2 (Public Review):

      Summary:

      The manuscript reports non-invasive measures of activity and neurochemical profiles of the visual cortex in congenitally blind patients who recovered vision through the surgical removal of bilateral dense cataracts. The declared aim of the study is to find out how restoring visual function after several months or years of complete blindness impacts the balance between excitation and inhibition in the visual cortex.

      Strengths:

      The findings are undoubtedly useful for the community, as they contribute towards characterising the many ways this special population differs from normally sighted individuals. The combination of MRS and EEG measures is a promising strategy to estimate a fundamental physiological parameter - the balance between excitation and inhibition in the visual cortex, which animal studies show to be heavily dependent upon early visual experience. Thus, the reported results pave the way for further studies, which may use a similar approach to evaluate more patients and control groups.

      Weaknesses:

      (2.1) The main issue is the lack of an appropriate comparison group or condition to delineate the effect of sight recovery (as opposed to the effect of congenital blindness). Few previous studies suggested an increased excitation/Inhibition ratio in the visual cortex of congenitally blind patients; the present study reports a decreased E/I ratio instead. The authors claim that this implies a change of E/I ratio following sight recovery. However, supporting this claim would require showing a shift of E/I after vs. before the sight-recovery surgery, or at least it would require comparing patients who did and did not undergo the sight-recovery surgery (as common in the field).

      Longitudinal studies would indeed be the best way to test the hypothesis that the lower E/I ratio in the CC group observed by the present study is a consequence of sight restoration.

      We have now explicitly stated this in the Limitations section (Page 25, Lines 654-655).

      However, longitudinal studies involving neuroimaging are an effortful challenge, particularly in research conducted outside of major developed countries and dedicated neuroimaging research facilities. Crucially, however, had CC and SC individuals, as well as permanently congenitally blind vs SC individuals (Coullon et al., 2015; Weaver et al., 2013), not differed on any neurochemical markers, such a longitudinal study might have been trivial. Thus, in order to justify and better tailor longitudinal studies, cross-sectional studies are an initial step.

      (2.2) MR Spectroscopy shows a reduced GLX/GABA ratio in patients vs. sighted controls; however, this finding remains rather isolated, not corroborated by other observations. The difference between patients and controls only emerges for the GLX/GABA ratio, but there is no accompanying difference in either the GLX or the GABA concentrations. There is an attempt to relate the MRS data with acuity measurements and electrophysiological indices, but the explorative correlational analyses do not help to build a coherent picture. A bland correlation between GLX/GABA and visual impairment is reported, but this is specific to the patients' group (N=10) and would not hold across groups (the correlation is positive, predicting the lowest GLX/GABA ratio values for the sighted controls - the opposite of what is found). There is also a strong correlation between GLX concentrations and the EEG power at the lowest temporal frequencies. Although this relation is intriguing, it only holds for a very specific combination of parameters (of the many tested): only with eyes open, only in the patient group.

      We interpret these findings differently, that is, in the context of experiments from non-human animals and the larger MRS literature (Page 23, Lines 609-611).

      Homeostatic control of E/I balance assumes that the ratio of excitation (reflected here by Glx) and inhibition (reflected here by GABA+) is regulated. Like prior work (Gao et al., 2024, 2024; Narayan et al., 2022; Perica et al., 2022; Steel et al., 2020; Takado et al., 2022; Takei et al., 2016), we assumed that the ratio of Glx/GABA+ is indicative of E/I balance rather than solely the individual neurotransmitter levels. One of the motivations for assessing the ratio vs the absolute concentration is that as per the underlying E/I balance hypothesis, a change in excitation would cause a concomitant change in inhibition, and vice versa, which has been shown in non-human animal work (Fang et al., 2021; Haider et al., 2006; Tao & Poo, 2005) and modeling research (Vreeswijk & Sompolinsky, 1996; Wu et al., 2022). Importantly, our interpretation of the lower E/I ratio is not just from the Glx/GABA+ ratio, but additionally, based on the steeper EEG aperiodic slope (1-20 Hz). 

      As stated in the Discussion section and Response 1.4, we did not expect to see a lower Glx/GABA+ ratio in CC individuals. We discuss the possible reasons for the direction of the correlation with visual acuity and aperiodic offset during passive visual stimulation, and offer interpretations and (testable) hypotheses.

      We interpret the direction of the Glx/GABA+ correlation with visual acuity to imply that patients with highest (compensatory) balancing of the consequences of congenital blindness (hyperexcitation), in light of visual stimulation, are those who recover best. Note, the sighted control group was selected based on their “normal” vision. Thus, clinical visual acuity measures are not expected to sufficiently vary, nor have the resolution to show strong correlations with neurophysiological measures. By contrast, the CC group comprised patients highly varying in visual outcomes, and thus were ideal to investigate such correlations.

      This holds for the correlation between Glx and the aperiodic intercept, as well. Previous work has suggested that the intercept of the aperiodic activity is associated with broadband spiking activity in neural circuits (Manning et al., 2009). Thus, an atypical increase of spiking activity during visual stimulation, as indirectly suggested by “old” non-human primate work on visual deprivation (Hyvärinen et al., 1981) might drive a correlation not observed in healthy populations.

      In the revised manuscript, we have more clearly indicated in the Discussion that these are possible post-hoc interpretations (Page 23, Lines 584-586; Page 24, Lines 609-620; Page 24, Lines 644-647; Pages 25, Lines 650 - 657). We argue that given the lack of such studies in humans, it is all the more important that extant data be presented completely, even if the direction of the effects are not as expected.

      (2.3) For these reasons, the reported findings do not allow us to draw firm conclusions on the relation between EEG parameters and E/I ratio or on the impact of early (vs. late) visual experience on the excitation/inhibition ratio of the human visual cortex.

      Indeed, the correlations we have tested between the E/I ratio and EEG parameters were exploratory, and have been reported as such.

      We have now made this clear in all the relevant parts of the manuscript (Introduction, Page 5, Lines 132-135; Methods, Page 16, Line 415; Results, Page 21, Figure 4; Discussion, Page 22, Line 568, Page 25, Lines 644-645, Page 25, Lines 650-657).

      The goal of our study was not to compare the effects of early vs. late visual experience. The goal was to study whether early visual experience is necessary for a typical E/I ratio in visual neural circuits. We provided clear evidence in favor of this hypothesis. Thus, the present results suggest the necessity of investigating the effects of late visual deprivation. In fact, such research is missing in permanent blindness as well.

      Reviewer #3 (Public Review):

      This manuscript examines the impact of congenital visual deprivation on the excitatory/inhibitory (E/I) ratio in the visual cortex using Magnetic Resonance Spectroscopy (MRS) and electroencephalography (EEG) in individuals whose sight was restored. Ten individuals with reversed congenital cataracts were compared to age-matched, normally sighted controls, assessing the cortical E/I balance and its interrelationship to visual acuity. The study reveals that the Glx/GABA ratio in the visual cortex and the intercept and aperiodic signal are significantly altered in those with a history of early visual deprivation, suggesting persistent neurophysiological changes despite visual restoration.

      My expertise is in EEG (particularly in the decomposition of periodic and aperiodic activity) and statistical methods. I have several major concerns in terms of methodological and statistical approaches along with the (over)interpretation of the results. These major concerns are detailed below.

      (3.1) Variability in visual deprivation:

      - The document states a large variability in the duration of visual deprivation (probably also the age at restoration), with significant implications for the sensitivity period's impact on visual circuit development. The variability and its potential effects on the outcomes need thorough exploration and discussion.

      We work with a rare, unique patient population, which makes it difficult to systematically assess the effects of different visual histories while maintaining stringent inclusion criteria such as complete patterned visual deprivation at birth. Regardless, we considered the large variance in age at surgery and time since surgery as supportive of our interpretation: group differences were found despite the large variance in duration of visual deprivation. Moreover, the existing variance was used to explore possible associations between behavior and neural measures, as well as neurochemical and EEG measures.

      In the revised manuscript, we have detailed the advantages (Methods, Page 5, Lines 143 – 145, Lines 147-149; Discussion, Page 26, Lines 677-678) and disadvantages (Discussion, Page 25, Lines 650-657) of our CC sample, with respect to duration of congenital visual deprivation.

      (3.2) Sample size:

      - The small sample size is a major concern as it may not provide sufficient power to detect subtle effects and/or overestimate significant effects, which then tend not to generalize to new data. One of the biggest drivers of the replication crisis in neuroscience.

      We address the small sample size in our Discussion, and make clear that small sample sizes were due to the nature of investigations in special populations. In the revised manuscript, we added the sample sizes of previous studies using MRS in permanently blind individuals (Page 4, Lines 108 - 109). It is worth noting that our EEG results fully align with those of larger samples of congenital cataract reversal individuals (Page 25, Lines 666-676, Supplementary Material S18, S19) (Ossandón et al., 2023), providing us confidence about their validity and reproducibility. Moreover, our MRS results and correlations of those with EEG parameters were spatially specific to occipital cortex measures.

      The main problem with the correlation analyses between MRS and EEG measures is that the sample size is simply too small to conduct such an analysis. Moreover, it is unclear from the methods section that this analysis was only conducted in the patient group (which the reviewer assumed from the plots), and not explained why this was done only in the patient group. I would highly recommend removing these correlation analyses.

      In the revised manuscript, we have more clearly marked the correlation analyses as exploratory (Introduction, Page 4, Lines 118-128 and Page 5, Lines 132-134; Methods Page 16, Line 415; Discussion Page 22, Line 568, Page 24, Lines 644-645, Page 25, Lines 650-657); note that we do not base most of our discussion on the results of these analyses.

      As indicated by Reviewer 1, reporting them allows for deriving more precise hypothesis for future studies. It has to be noted that we investigate an extremely rare population, tested outside of major developed economies and dedicated neuroimaging research facilities. In addition to being a rare patient group, these individuals come from poor communities. Therefore, we consider it justified to report these correlations as exploratory, providing direction for future research.

      (3.3) Statistical concerns:

      - The statistical analyses, particularly the correlations drawn from a small sample, may not provide reliable estimates (see https://www.sciencedirect.com/science/article/pii/S0092656613000858, which clearly describes this problem).

      It would undoubtedly be better to have a larger sample size. We nonetheless think it is of value to the research community to publish this dataset, since 10 multimodal data sets from a carefully diagnosed, rare population, representing a human model for the effects of early experience on brain development, are quite a lot. Sample sizes in prior neuroimaging studies in transient blindness have most often ranged from n = 1 to n = 10. They nevertheless provided valuable direction for future research, and integration of results across multiple studies provides scientific insights. 

      Identifying possible group differences was the goal of our study, with the correlations being an exploratory analysis, which we have clearly indicated in the methods, results and discussion.

      - Statistical analyses for the MRS: The authors should consider some additional permutation statistics, which are more suitable for small sample sizes. The current statistical model (2x2) design ANOVA is not ideal for such small sample sizes. Moreover, it is unclear why the condition (EO & EC) was chosen as a predictor and not the brain region (visual & frontal) or neurochemicals. Finally, the authors did not provide any information on the alpha level nor any information on correction for multiple comparisons (in the methods section). Finally, even if the groups are matched w.r.t. age, the time between surgery and measurement, the duration of visual deprivation, (and sex?), these should be included as covariates as it has been shown that these are highly related to the measurements of interest (especially for the EEG measurements) and the age range of the current study is large.

      In our ANOVA models, the neurochemicals were the outcome variables, and the conditions were chosen as predictors based on prior work suggesting that Glx/GABA+ might vary with eye closure (Kurcyus et al., 2018). The study was designed based on a hypothesis of group differences localized to the occipital cortex, due to visual deprivation. The frontal cortex voxel was chosen to indicate whether these differences were spatially specific. Therefore, we conducted separate ANOVAs based on this study design.

      We have now clarified the motivation for these conditions in the Introduction (Page 4, Lines 122-125) and the Methods (Page 9, Lines 219-224).

      In the revised manuscript, we added the rationale for parametric analyses for our outcomes (Shapiro-Wilk as well as Levene’s tests, Supplementary Material S9). Note that in the Supplementary Materials (S12, S14), we have reported the correlations between visual history metrics and MRS/EEG outcomes, thereby investigating whether the variance in visual history might have driven these results. Specifically, we found a (negative) correlation between visual cortex Glx/GABA+ concentration during eye closure and the visual acuity in the CC group (Figure 2c). None of the other exploratory correlations between MRS/EEG outcomes vs time since surgery, duration of blindness or visual acuity were significant in the CC group (Supplementary Material S12, S15).  

      The alpha level used for the ANOVA models specified in the Methods section was 0.05. The alpha level for the exploratory analyses reported in the main manuscript was 0.008, after correcting for (6) multiple comparisons using the Bonferroni correction, also specified in the Methods. Note that the p-values following correction are expressed as multiplied by 6, due to most readers assuming an alpha level of 0.05 (see response regarding large p-values).

      We used a control group matched for age, recruited and tested in the same institutes, using the same setup. We feel that we followed the gold standards for recruiting a healthy control group for a patient group.

      - EEG statistical analyses: The same critique as for the MRS statistical analyses applies to the EEG analysis. In addition: was the 2x3 ANOVA conducted for EO and EC independently? This seems to be inconsistent with the approach in the MRS analyses, in which the authors chose EO & EC as predictors in their 2x2 ANOVA.

      The 2x3 ANOVA was not conducted independently for the eyes open/eyes closed condition. The ANOVA conducted on the EEG metrics was 2x3 because it had two groups (CC, SC) and three conditions (eyes open (EO), eyes closed (EC) and visual stimulation (LU)) as predictors.

      - Figure 4: The authors report a p-value of >0.999 with a correlation coefficient of -0.42 with a sample size of 10 subjects. This can't be correct (it should be around: p = 0.22). All statistical analyses should be checked.

      As specified in the Methods and Figure legend, the reported p values in Figure 4 have been corrected using the Bonferroni correction, and therefore multiplied by the number of comparisons, leading to the seemingly large values.

      Additionally, to check all statistical analyses, we put the manuscript through an independent Statistics Check (Nuijten & Polanin, 2020) (https://michelenuijten.shinyapps.io/statcheck-web/) and have uploaded the consistency report with the revised Supplementary Material (Supplementary Report 1).

      - Figure 2c. Eyes closed condition: The highest score of the *Glx/GABA ratio seems to be ~3.6. In subplot 2a, there seem to be 3 subjects that show a Glx/GABA ratio score > 3.6. How can this be explained? There is also a discrepancy for the eyes-closed condition.

      The three subjects that show the Glx/GABA+ ratio > 3.6 in subplot 2a are in the SC group, whereas the correlations plotted in figure 2c are only for the CC group, where the highest score is indeed ~3.6.

      (3.4) Interpretation of aperiodic signal:

      - Several recent papers demonstrated that the aperiodic signal measured in EEG or ECoG is related to various important aspects such as age, skull thickness, electrode impedance, as well as cognition. Thus, currently, very little is known about the underlying effects which influence the aperiodic intercept and slope. The entire interpretation of the aperiodic slope as a proxy for E/I is based on a computational model and simulation (as described in the Gao et al. paper).

      Apart from the modeling work from Gao et al., multiple papers which have also been cited which used ECoG, EEG and MEG and showed concomitant changes in aperiodic activity with pharmacological manipulation of the E/I ratio (Colombo et al., 2019; Molina et al., 2020; Muthukumaraswamy & Liley, 2018). Further, several prior studies have interpreted changes in the aperiodic slope as reflective of changes in the E/I ratio, including studies of developmental groups (Favaro et al., 2023; Hill et al., 2022; McSweeney et al., 2023; Schaworonkow & Voytek, 2021) as well as patient groups (Molina et al., 2020; Ostlund et al., 2021).

      In the revised manuscript, we have cited those studies not already included in the Introduction (Page 3, Lines 92-94).

      - Especially the aperiodic intercept is a very sensitive measure to many influences (e.g. skull thickness, electrode impedance...). As crucial results (correlation aperiodic intercept and MRS measures) are facing this problem, this needs to be reevaluated. It is safer to make statements on the aperiodic slope than intercept. In theory, some of the potentially confounding measures are available to the authors (e.g. skull thickness can be computed from T1w images; electrode impedances are usually acquired alongside the EEG data) and could be therefore controlled.

      All electrophysiological measures indeed depend on parameters such as skull thickness and electrode impedance. As in the extant literature using neurophysiological measures to compare brain function between patient and control groups, we used a control group matched in age/sex, recruited in the same region, tested with the same devices, and analyzed with the same analysis pipeline. For example, impedance was kept below 10 kOhm for all subjects.

      This is now mentioned in the Methods, Page 13, Line 344.

      There is no evidence available suggesting that congenital cataracts are associated with changes in skull thickness that would cause the observed pattern of group results. Moreover, we cannot think of how any of the exploratory correlations between neurophysiological measures and MRS measures could be accounted for by a difference e.g. in skull thickness.

      - The authors wrote: "Higher frequencies (such as 20-40 Hz) have been predominantly associated with local circuit activity and feedforward signaling (Bastos et al., 2018; Van Kerkoerle et al., 2014); the increased 20-40 Hz slope may therefore signal increased spontaneous spiking activity in local networks. We speculate that the steeper slope of the aperiodic activity for the lower frequency range (1-20 Hz) in CC individuals reflects the concomitant increase in inhibition." The authors confuse the interpretation of periodic and aperiodic signals. This section refers to the interpretation of the periodic signal (higher frequencies). This interpretation cannot simply be translated to the aperiodic signal (slope).

      Prior work has not always separated the aperiodic and periodic components, making it unclear what might have driven these effects in our data. The interpretation of the higher frequency range was intended to contrast with the interpretations of lower frequency range, in order to speculate as to why the two aperiodic fits might go in differing directions. Note that Ossandón et al. reported highly similar results (group differences for CC individuals and for permanently congenitally blind humans) for the aperiodic activity between 20-40 Hz and oscillatory activity in the gamma range.

      In the revised Discussion, we removed this section. We primarily interpret the increased offset and prior findings from fMRI-BOLD data (Raczy et al., 2023) as an increase in broadband neuronal firing.

      - The authors further wrote: We used the slope of the aperiodic (1/f) component of the EEG spectrum as an estimate of E/I ratio (Gao et al., 2017; Medel et al., 2020; Muthukumaraswamy & Liley, 2018). This is a highly speculative interpretation with very little empirical evidence. These papers were conducted with ECoG data (mostly in animals) and mostly under anesthesia. Thus, these studies only allow an indirect interpretation by what the 1/f slope in EEG measurements is actually influenced.

      Note that Muthukumaraswamy et al. (2018) used different types of pharmacological manipulations and analyzed periodic and aperiodic MEG activity in humans, in addition to monkey ECoG (Muthukumaraswamy & Liley, 2018). Further, Medel et al. (now published as Medel et al., 2023) compared EEG activity in addition to ECoG data after propofol administration. The interpretation of our results are in line with a number of recent studies in developing (Hill et al., 2022; Schaworonkow & Voytek, 2021) and special populations using EEG. As mentioned above, several prior studies have used the slope of the 1/f component/aperiodic activity as an indirect measure of the E/I ratio (Favaro et al., 2023; Hill et al., 2022; McSweeney et al., 2023; Molina et al., 2020; Ostlund et al., 2021; Schaworonkow & Voytek, 2021), including studies using scalp-recorded EEG from humans.

      In the introduction of the revised manuscript, we have made more explicit that this metric is indirect (Page 3, Line 91), (additionally see Discussion, Page 24, Lines 644-645, Page 25, Lines 650-657).

      While a full understanding of aperiodic activity needs to be provided, some convergent ideas have emerged. We think that our results contribute to this enterprise, since our study is, to the best of our knowledge, the first which assessed MRS measured neurotransmitter levels and EEG aperiodic activity.

      (3.5) Problems with EEG preprocessing and analysis:

      - It seems that the authors did not identify bad channels nor address the line noise issue (even a problem if a low pass filter of below-the-line noise was applied).

      As pointed out in the methods and Figure 1, we only analyzed data from two occipital channels, O1 and O2 neither of which were rejected for any participant. Channel rejection was performed for the larger dataset, published elsewhere (Ossandón et al., 2023; Pant et al., 2023). As control sites we added the frontal channels FP1 and Fp2 (see Supplementary Material S14)

      Neither Ossandón et al. (2023) nor Pant et al. (2023) considered frequency ranges above 40 Hz to avoid any possible contamination with line noise. Here, we focused on activity between 0 and 20 Hz, definitely excluding line noise contaminations (Methods, Page 14, Lines 365-367). The low pass filter (FIR, 1-45 Hz) guaranteed that any spill-over effects of line noise would be restricted to frequencies just below the upper cutoff frequency.

      Additionally, a prior version of the analysis used spectrum interpolation to remove line noise; the group differences remained stable (Ossandón et al., 2023). We have reported this analysis in the revised manuscript (Page 14, Lines 364-357).

      Further, both groups were measured in the same lab, making line noise (~ 50 Hz) as an account for the observed group effects in the 1-20 Hz frequency range highly unlikely. Finally, any of the exploratory MRS-EEG correlations would be hard to explain if the EEG parameters would be contaminated with line noise.

      - What was the percentage of segments that needed to be rejected due to the 120μV criteria? This should be reported specifically for EO & EC and controls and patients.

      The mean percentage of 1 second segments rejected for each resting state condition and the percentage of 6.25 long segments rejected in each group for the visual stimulation condition have been added to the revised manuscript (Supplementary Material S10), and referred to in the Methods on Page 14, Lines 372-373).

      - The authors downsampled the data to 60Hz to "to match the stimulation rate". What is the intention of this? Because the subsequent spectral analyses are conflated by this choice (see Nyquist theorem).

      This data were collected as part of a study designed to evoke alpha activity with visual white-noise, which changed in luminance with equal power at all frequencies from 1-60 Hz, restricted by the refresh rate of the monitor on which stimuli were presented (Pant et al., 2023). This paradigm and method was developed by VanRullen and colleagues (Schwenk et al., 2020; VanRullen & MacDonald, 2012), wherein the analysis requires the same sampling rate between the presented frequencies and the EEG data. The downsampling function used here automatically applies an anti-aliasing filter (EEGLAB 2019) .

      - "Subsequently, baseline removal was conducted by subtracting the mean activity across the length of an epoch from every data point." The actual baseline time segment should be specified.

      The time segment was the length of the epoch, that is, 1 second for the resting state conditions and 6.25 seconds for the visual stimulation conditions. This has now been explicitly stated in the revised manuscript (Page 14, Lines 379-380).

      - "We excluded the alpha range (8-14 Hz) for this fit to avoid biasing the results due to documented differences in alpha activity between CC and SC individuals (Bottari et al., 2016; Ossandón et al., 2023; Pant et al., 2023)." This does not really make sense, as the FOOOF algorithm first fits the 1/f slope, for which the alpha activity is not relevant.

      We did not use the FOOOF algorithm/toolbox in this manuscript. As stated in the Methods, we used a 1/f fit to the 1-20 Hz spectrum in the log-log space, and subtracted this fit from the original spectrum to obtain the corrected spectrum. Given the pronounced difference in alpha power between groups (Bottari et al., 2016; Ossandón et al., 2023; Pant et al., 2023), we were concerned it might drive differences in the exponent values. Our analysis pipeline had been adapted from previous publications of our group and other labs (Ossandón et al., 2023; Voytek et al., 2015; Waschke et al., 2017).

      We have conducted the analysis with and without the exclusion of the alpha range, as well as using the FOOOF toolbox both in the 1-20 Hz and 20-40 Hz ranges (Ossandón et al., 2023). The findings of a steeper slope in the 1-20 Hz range as well as lower alpha power in CC vs SC individuals remained stable. In Ossandón et al., the comparison between the piecewise fits and FOOOF fits led the authors to use the former, as it outperformed the FOOOF algorithm for their data.

      - The model fits of the 1/f fitting for EO, EC, and both participant groups should be reported.

      In Figure 3 of the manuscript, we depicted the mean spectra and 1/f fits for each group.

      In the revised manuscript, we added the fit quality metrics (average R<sup>2</sup> values > 0.91 for each group and condition) (Methods Page 15, Lines 395-396; Supplementary Material S11) and additionally show individual subjects’ fits (Supplementary Material S11).

      (3.6) Validity of GABA measurements and results:

      - According the a newer study by the authors of the Gannet toolbox (https://analyticalsciencejournals.onlinelibrary.wiley.com/doi/abs/10.1002/nbm.5076), the reliability and reproducibility of the gamma-aminobutyric acid (GABA) measurement can vary significantly depending on acquisition and modeling parameter. Thus, did the author address these challenges?

      We took care of data quality while acquiring MRS data by ensuring appropriate voxel placement and linewidth prior to scanning (Page 9, Lines 229-237). We now address this explicitly in the Methods in the “MRS Data Quality” section. Acquisition as well as modeling parameters were constant for both groups, so they cannot have driven group differences.

      The linked article compares the reproducibility of GABA measurement using Osprey (Oeltzschner et al., 2020), which was released in 2020 and uses linear combination modeling to fit the peak, as opposed to Gannet’s simple peak fitting (Hupfeld et al., 2024). The study finds better test-retest reliability for Osprey compared to Gannet’s method.

      As the present work was conceptualized in 2018, we used Gannet 3.0, which was the state-of-the-art edited-spectrum analysis toolbox at the time, and still is widely used.

      In the revised manuscript, we re-analyzed the data using linear combination modeling with Osprey (Oeltzschner et al., 2020), and reported that the main findings remained the same, i.e. the Glx/GABA+ concentration ratio was lower in the visual cortex of congenital cataract reversal individuals compared to normally sighted controls, regardless of whether participants were scanned with eyes open or with eyes closed. Further, NAA concentration did not differ between groups (Supplementary Material S3). Thus, we demonstrate that our findings were robust to analysis pipelines, and state this in the Methods (Page 9, Lines 242-246) and Results (Page 19, Lines 464-467).

      - Furthermore, the authors wrote: "We confirmed the within-subject stability of metabolite quantification by testing a subset of the sighted controls (n=6) 2-4 weeks apart. Looking at the supplementary Figure 5 (which would be rather plotted as ICC or Blant-Altman plots), the within-subject stability compared to between-subject variability seems not to be great. Furthermore, I don't think such a small sample size qualifies for a rigorous assessment of stability.

      Indeed, we did not intend to provide a rigorous assessment of within-subject stability. Rather, we aimed to confirm that data quality/concentration ratios did not systematically differ between the same subjects tested longitudinally; driven, for example, by scanner heating or time of day. As with the phantom testing, we attempted to give readers an idea of the quality of the data, as they were collected from a primarily clinical rather than a research site.

      In the revised manuscript, we have removed the statement regarding stability and the associated section.

      - "Why might an enhanced inhibitory drive, as indicated by the lower Glx/GABA ratio" Is this interpretation really warranted, as the results of the group differences in the Glx/GABA ratio seem to be rather driven by a decreased Glx concentration in CC rather than an increased GABA (see Figure 2).

      We used the Glx/GABA+ ratio as a measure, rather than individual Glx or GABA+ concentration, which did not significantly differ between groups. As detailed in Response 2.2, we think this metric aligns better with an underlying E/I balance hypothesis and has been used in many previous studies (Gao et al., 2024; Liu et al., 2015; Narayan et al., 2022; Perica et al., 2022).

      Our interpretation of an enhanced inhibitory drive additionally comes from the combination of aperiodic EEG (1-20 Hz) and MRS measures, which, when considered together, are consistent with a decreased E/I ratio.

      In the revised manuscript, we have rewritten the Discussion and removed this section.   

      - Glx concentration predicted the aperiodic intercept in CC individuals' visual cortices during ambient and flickering visual stimulation. Why specifically investigate the Glx concentration, when the paper is about E/I ratio?

      As stated in the methods, we exploratorily assessed the relationship between all MRS parameters (Glx, GABA+ and Glx/GABA+ ratio) with the aperiodic parameters (slope, offset), and corrected for multiple comparisons accordingly. We think this is a worthwhile analysis considering the rarity of the dataset/population (see 1.2, 1.6, 2.1 and Reviewer 1’s comments about future hypotheses). We only report the Glx – aperiodic intercept correlation in the main manuscript as it survived correction for multiple comparisons.

      (3.7) Interpretation of the correlation between MRS measurements and EEG aperiodic signal:

      - The authors wrote: "The intercept of the aperiodic activity was highly correlated with the Glx concentration during rest with eyes open and during flickering stimulation (also see Supplementary Material S11). Based on the assumption that the aperiodic intercept reflects broadband firing (Manning et al., 2009; Winawer et al., 2013), this suggests that the Glx concentration might be related to broadband firing in CC individuals during active and passive visual stimulation." These results should not be interpreted (or with very caution) for several reasons (see also problem with influences on aperiodic intercept and small sample size). This is a result of the exploratory analyses of correlating every EEG parameter with every MRS parameter. This requires well-powered replication before any interpretation can be provided. Furthermore and importantly: why should this be specifically only in CC patients, but not in the SC control group?

      We have indicated clearly in all parts of the manuscript that these correlations are presented as exploratory. Further, we interpret the Glx-aperiodic offset correlation, and none of the others, as it survived the Bonferroni correction for multiple comparisons. We offer a hypothesis in the Discussion as to why such a correlation might exist in the CC but not the SC group (see response 2.2), and do not speculate further.

      (3.8) Language and presentation:

      - The manuscript requires language improvements and correction of numerous typos. Over-simplifications and unclear statements are present, which could mislead or confuse readers (see also interpretation of aperiodic signal).

      In the revised manuscript, we have checked that speculations are clearly marked, and typos are removed.

      - The authors state that "Together, the present results provide strong evidence for experience-dependent development of the E/I ratio in the human visual cortex, with consequences for behavior." The results of the study do not provide any strong evidence, because of the small sample size and exploratory analyses approach and not accounting for possible confounding factors.

      We disagree with this statement and allude to convergent evidence of both MRS and neurophysiological measures. The latter link to corresponding results observed in a larger sample of CC individuals (Ossandón et al., 2023). In the revised manuscript, we have rephrased the statement as “to provide initial evidence” (Page 22, Line 676).

      - "Our results imply a change in neurotransmitter concentrations as a consequence of *restoring* vision following congenital blindness." This is a speculative statement to infer a causal relationship on cross-sectional data.

      As mentioned under 2.1, we conducted a cross-sectional study which might justify future longitudinal work. In order to advance science, new testable hypotheses were put forward at the end of a manuscript.

      In the revised manuscript, we rephrased the sentence and added “might imply” to better indicate the hypothetical character of this idea (Page 22, Lines 586-587).

      - In the limitation section, the authors wrote: "The sample size of the present study is relatively high for the rare population , but undoubtedly, overall, rather small." This sentence should be rewritten, as the study is plein underpowered. The further justification "We nevertheless think that our results are valid. Our findings neurochemically (Glx and GABA+ concentration), and anatomically (visual cortex) specific. The MRS parameters varied with parameters of the aperiodic EEG activity and visual acuity. The group differences for the EEG assessments corresponded to those of a larger sample of CC individuals (n=38) (Ossandón et al., 2023), and effects of chronological age were as expected from the literature." These statements do not provide any validation or justification of small samples. Furthermore, the current data set is a subset of an earlier published paper by the same authors "The EEG data sets reported here were part of data published earlier (Ossandón et al., 2023; Pant et al., 2023)." Thus, the statement "The group differences for the EEG assessments corresponded to those of a larger sample of CC individuals (n=38) " is a circular argument and should be avoided.

      Our intention was not to justify having a small sample, but to justify why we think the results might be valid as they align with/replicate existing literature.

      In the revised manuscript, we added a figure showing that the EEG results of the 10 subjects considered here correspond to those of the 28 other subjects of Ossandón et al (Supplementary Material S18). We adapted the text accordingly, clearly stating that the pattern of EEG results of the ten subjects reported here replicate those of the 28 additional subjects of Ossandón et al. (2023) (Page 25, Lines 671-672).

      References (Public Review)

      Barnes, S. J., Sammons, R. P., Jacobsen, R. I., Mackie, J., Keller, G. B., & Keck, T. (2015). Subnetwork-specific homeostatic plasticity in mouse visual cortex in vivo. Neuron, 86(5), 1290–1303. https://doi.org/10.1016/J.NEURON.2015.05.010

      Bernabeu, A., Alfaro, A., García, M., & Fernández, E. (2009). Proton magnetic resonance spectroscopy (1H-MRS) reveals the presence of elevated myo-inositol in the occipital cortex of blind subjects. NeuroImage, 47(4), 1172–1176. https://doi.org/10.1016/j.neuroimage.2009.04.080

      Bottari, D., Troje, N. F., Ley, P., Hense, M., Kekunnaya, R., & Röder, B. (2016). Sight restoration after congenital blindness does not reinstate alpha oscillatory activity in humans. Scientific Reports. https://doi.org/10.1038/srep24683

      Colombo, M. A., Napolitani, M., Boly, M., Gosseries, O., Casarotto, S., Rosanova, M., Brichant, J. F., Boveroux, P., Rex, S., Laureys, S., Massimini, M., Chieregato, A., & Sarasso, S. (2019). The spectral exponent of the resting EEG indexes the presence of consciousness during unresponsiveness induced by propofol, xenon, and ketamine. NeuroImage, 189(September 2018), 631–644. https://doi.org/10.1016/j.neuroimage.2019.01.024

      Consideration of Sample Size in Neuroscience Studies. (2020). Journal of Neuroscience, 40(21), 4076–4077. https://doi.org/10.1523/JNEUROSCI.0866-20.2020

      Coullon, G. S. L., Emir, U. E., Fine, I., Watkins, K. E., & Bridge, H. (2015). Neurochemical changes in the pericalcarine cortex in congenital blindness attributable to bilateral anophthalmia. Journal of Neurophysiology. https://doi.org/10.1152/jn.00567.2015

      Fang, Q., Li, Y. T., Peng, B., Li, Z., Zhang, L. I., & Tao, H. W. (2021). Balanced enhancements of synaptic excitation and inhibition underlie developmental maturation of receptive fields in the mouse visual cortex. Journal of Neuroscience, 41(49), 10065–10079. https://doi.org/10.1523/JNEUROSCI.0442-21.2021

      Favaro, J., Colombo, M. A., Mikulan, E., Sartori, S., Nosadini, M., Pelizza, M. F., Rosanova, M., Sarasso, S., Massimini, M., & Toldo, I. (2023). The maturation of aperiodic EEG activity across development reveals a progressive differentiation of wakefulness from sleep. NeuroImage, 277. https://doi.org/10.1016/J.NEUROIMAGE.2023.120264

      Gao, Y., Liu, Y., Zhao, S., Liu, Y., Zhang, C., Hui, S., Mikkelsen, M., Edden, R. A. E., Meng, X., Yu, B., & Xiao, L. (2024). MRS study on the correlation between frontal GABA+/Glx ratio and abnormal cognitive function in medication-naive patients with narcolepsy. Sleep Medicine, 119, 1–8. https://doi.org/10.1016/j.sleep.2024.04.004

      Haider, B., Duque, A., Hasenstaub, A. R., & McCormick, D. A. (2006). Neocortical network activity in vivo is generated through a dynamic balance of excitation and inhibition. Journal of Neuroscience. https://doi.org/10.1523/JNEUROSCI.5297-05.2006

      Hill, A. T., Clark, G. M., Bigelow, F. J., Lum, J. A. G., & Enticott, P. G. (2022). Periodic and aperiodic neural activity displays age-dependent changes across early-to-middle childhood. Developmental Cognitive Neuroscience, 54, 101076. https://doi.org/10.1016/J.DCN.2022.101076

      Hupfeld, K. E., Zöllner, H. J., Hui, S. C. N., Song, Y., Murali-Manohar, S., Yedavalli, V., Oeltzschner, G., Prisciandaro, J. J., & Edden, R. A. E. (2024). Impact of acquisition and modeling parameters on the test–retest reproducibility of edited GABA+. NMR in Biomedicine, 37(4), e5076. https://doi.org/10.1002/nbm.5076

      Hyvärinen, J., Carlson, S., & Hyvärinen, L. (1981). Early visual deprivation alters modality of neuronal responses in area 19 of monkey cortex. Neuroscience Letters, 26(3), 239–243. https://doi.org/10.1016/0304-3940(81)90139-7

      Juchem, C., & Graaf, R. A. de. (2017). B0 magnetic field homogeneity and shimming for in vivo magnetic resonance spectroscopy. Analytical Biochemistry, 529, 17–29. https://doi.org/10.1016/j.ab.2016.06.003

      Keck, T., Hübener, M., & Bonhoeffer, T. (2017). Interactions between synaptic homeostatic mechanisms: An attempt to reconcile BCM theory, synaptic scaling, and changing excitation/inhibition balance. Current Opinion in Neurobiology, 43, 87–93. https://doi.org/10.1016/J.CONB.2017.02.003

      Kurcyus, K., Annac, E., Hanning, N. M., Harris, A. D., Oeltzschner, G., Edden, R., & Riedl, V. (2018). Opposite Dynamics of GABA and Glutamate Levels in the Occipital Cortex during Visual Processing. Journal of Neuroscience, 38(46), 9967–9976. https://doi.org/10.1523/JNEUROSCI.1214-18.2018

      Liu, B., Wang, G., Gao, D., Gao, F., Zhao, B., Qiao, M., Yang, H., Yu, Y., Ren, F., Yang, P., Chen, W., & Rae, C. D. (2015). Alterations of GABA and glutamate-glutamine levels in premenstrual dysphoric disorder: A 3T proton magnetic resonance spectroscopy study. Psychiatry Research - Neuroimaging, 231(1), 64–70. https://doi.org/10.1016/J.PSCYCHRESNS.2014.10.020

      Lunghi, C., Berchicci, M., Morrone, M. C., & Russo, F. D. (2015). Short‐term monocular deprivation alters early components of visual evoked potentials. The Journal of Physiology, 593(19), 4361. https://doi.org/10.1113/JP270950

      Maier, S., Düppers, A. L., Runge, K., Dacko, M., Lange, T., Fangmeier, T., Riedel, A., Ebert, D., Endres, D., Domschke, K., Perlov, E., Nickel, K., & Tebartz van Elst, L. (2022). Increased prefrontal GABA concentrations in adults with autism spectrum disorders. Autism Research, 15(7), 1222–1236. https://doi.org/10.1002/aur.2740

      Manning, J. R., Jacobs, J., Fried, I., & Kahana, M. J. (2009). Broadband shifts in local field potential power spectra are correlated with single-neuron spiking in humans. The Journal of Neuroscience : The Official Journal of the Society for Neuroscience, 29(43), 13613–13620. https://doi.org/10.1523/JNEUROSCI.2041-09.2009

      McSweeney, M., Morales, S., Valadez, E. A., Buzzell, G. A., Yoder, L., Fifer, W. P., Pini, N., Shuffrey, L. C., Elliott, A. J., Isler, J. R., & Fox, N. A. (2023). Age-related trends in aperiodic EEG activity and alpha oscillations during early- to middle-childhood. NeuroImage, 269, 119925. https://doi.org/10.1016/j.neuroimage.2023.119925

      Medel, V., Irani, M., Crossley, N., Ossandón, T., & Boncompte, G. (2023). Complexity and 1/f slope jointly reflect brain states. Scientific Reports, 13(1), 21700. https://doi.org/10.1038/s41598-023-47316-0

      Molina, J. L., Voytek, B., Thomas, M. L., Joshi, Y. B., Bhakta, S. G., Talledo, J. A., Swerdlow, N. R., & Light, G. A. (2020). Memantine Effects on Electroencephalographic Measures of Putative Excitatory/Inhibitory Balance in Schizophrenia. Biological Psychiatry: Cognitive Neuroscience and Neuroimaging, 5(6), 562–568. https://doi.org/10.1016/j.bpsc.2020.02.004

      Mukerji, A., Byrne, K. N., Yang, E., Levi, D. M., & Silver, M. A. (2022). Visual cortical γ−aminobutyric acid and perceptual suppression in amblyopia. Frontiers in Human Neuroscience, 16. https://doi.org/10.3389/fnhum.2022.949395

      Muthukumaraswamy, S. D., & Liley, D. T. (2018). 1/F electrophysiological spectra in resting and drug-induced states can be explained by the dynamics of multiple oscillatory relaxation processes. NeuroImage, 179(November 2017), 582–595. https://doi.org/10.1016/j.neuroimage.2018.06.068

      Narayan, G. A., Hill, K. R., Wengler, K., He, X., Wang, J., Yang, J., Parsey, R. V., & DeLorenzo, C. (2022). Does the change in glutamate to GABA ratio correlate with change in depression severity? A randomized, double-blind clinical trial. Molecular Psychiatry, 27(9), 3833—3841. https://doi.org/10.1038/s41380-022-01730-4

      Nuijten, M. B., & Polanin, J. R. (2020). “statcheck”: Automatically detect statistical reporting inconsistencies to increase reproducibility of meta-analyses. Research Synthesis Methods, 11(5), 574–579. https://doi.org/10.1002/jrsm.1408

      Oeltzschner, G., Zöllner, H. J., Hui, S. C. N., Mikkelsen, M., Saleh, M. G., Tapper, S., & Edden, R. A. E. (2020). Osprey: Open-source processing, reconstruction & estimation of magnetic resonance spectroscopy data. Journal of Neuroscience Methods, 343, 108827. https://doi.org/10.1016/j.jneumeth.2020.108827

      Ossandón, J. P., Stange, L., Gudi-Mindermann, H., Rimmele, J. M., Sourav, S., Bottari, D., Kekunnaya, R., & Röder, B. (2023). The development of oscillatory and aperiodic resting state activity is linked to a sensitive period in humans. NeuroImage, 275, 120171. https://doi.org/10.1016/J.NEUROIMAGE.2023.120171

      Ostlund, B. D., Alperin, B. R., Drew, T., & Karalunas, S. L. (2021). Behavioral and cognitive correlates of the aperiodic (1/f-like) exponent of the EEG power spectrum in adolescents with and without ADHD. Developmental Cognitive Neuroscience, 48, 100931. https://doi.org/10.1016/j.dcn.2021.100931

      Pant, R., Ossandón, J., Stange, L., Shareef, I., Kekunnaya, R., & Röder, B. (2023). Stimulus-evoked and resting-state alpha oscillations show a linked dependence on patterned visual experience for development. NeuroImage: Clinical, 103375. https://doi.org/10.1016/J.NICL.2023.103375

      Perica, M. I., Calabro, F. J., Larsen, B., Foran, W., Yushmanov, V. E., Hetherington, H., Tervo-Clemmens, B., Moon, C.-H., & Luna, B. (2022). Development of frontal GABA and glutamate supports excitation/inhibition balance from adolescence into adulthood. Progress in Neurobiology, 219, 102370. https://doi.org/10.1016/j.pneurobio.2022.102370

      Pitchaimuthu, K., Wu, Q. Z., Carter, O., Nguyen, B. N., Ahn, S., Egan, G. F., & McKendrick, A. M. (2017). Occipital GABA levels in older adults and their relationship to visual perceptual suppression. Scientific Reports, 7(1). https://doi.org/10.1038/S41598-017-14577-5

      Rideaux, R., Ehrhardt, S. E., Wards, Y., Filmer, H. L., Jin, J., Deelchand, D. K., Marjańska, M., Mattingley, J. B., & Dux, P. E. (2022). On the relationship between GABA+ and glutamate across the brain. NeuroImage, 257, 119273. https://doi.org/10.1016/J.NEUROIMAGE.2022.119273

      Schaworonkow, N., & Voytek, B. (2021). Longitudinal changes in aperiodic and periodic activity in electrophysiological recordings in the first seven months of life. Developmental Cognitive Neuroscience, 47. https://doi.org/10.1016/j.dcn.2020.100895

      Schwenk, J. C. B., VanRullen, R., & Bremmer, F. (2020). Dynamics of Visual Perceptual Echoes Following Short-Term Visual Deprivation. Cerebral Cortex Communications, 1(1). https://doi.org/10.1093/TEXCOM/TGAA012

      Sengpiel, F., Jirmann, K.-U., Vorobyov, V., & Eysel, U. T. (2006). Strabismic Suppression Is Mediated by Inhibitory Interactions in the Primary Visual Cortex. Cerebral Cortex, 16(12), 1750–1758. https://doi.org/10.1093/cercor/bhj110

      Steel, A., Mikkelsen, M., Edden, R. A. E., & Robertson, C. E. (2020). Regional balance between glutamate+glutamine and GABA+ in the resting human brain. NeuroImage, 220. https://doi.org/10.1016/J.NEUROIMAGE.2020.117112

      Takado, Y., Takuwa, H., Sampei, K., Urushihata, T., Takahashi, M., Shimojo, M., Uchida, S., Nitta, N., Shibata, S., Nagashima, K., Ochi, Y., Ono, M., Maeda, J., Tomita, Y., Sahara, N., Near, J., Aoki, I., Shibata, K., & Higuchi, M. (2022). MRS-measured glutamate versus GABA reflects excitatory versus inhibitory neural activities in awake mice. Journal of Cerebral Blood Flow & Metabolism, 42(1), 197. https://doi.org/10.1177/0271678X211045449

      Takei, Y., Fujihara, K., Tagawa, M., Hironaga, N., Near, J., Kasagi, M., Takahashi, Y., Motegi, T., Suzuki, Y., Aoyama, Y., Sakurai, N., Yamaguchi, M., Tobimatsu, S., Ujita, K., Tsushima, Y., Narita, K., & Fukuda, M. (2016). The inhibition/excitation ratio related to task-induced oscillatory modulations during a working memory task: A multtimodal-imaging study using MEG and MRS. NeuroImage, 128, 302–315. https://doi.org/10.1016/J.NEUROIMAGE.2015.12.057

      Tao, H. W., & Poo, M. M. (2005). Activity-dependent matching of excitatory and inhibitory inputs during refinement of visual receptive fields. Neuron, 45(6), 829–836. https://doi.org/10.1016/J.NEURON.2005.01.046

      Vanrullen, R., & MacDonald, J. S. P. (2012). Perceptual echoes at 10 Hz in the human brain. Current Biology. https://doi.org/10.1016/j.cub.2012.03.050

      Voytek, B., Kramer, M. A., Case, J., Lepage, K. Q., Tempesta, Z. R., Knight, R. T., & Gazzaley, A. (2015). Age-related changes in 1/f neural electrophysiological noise. Journal of Neuroscience, 35(38). https://doi.org/10.1523/JNEUROSCI.2332-14.2015

      Vreeswijk, C. V., & Sompolinsky, H. (1996). Chaos in neuronal networks with balanced excitatory and inhibitory activity. Science, 274(5293), 1724–1726. https://doi.org/10.1126/SCIENCE.274.5293.1724

      Waschke, L., Wöstmann, M., & Obleser, J. (2017). States and traits of neural irregularity in the age-varying human brain. Scientific Reports 2017 7:1, 7(1), 1–12. https://doi.org/10.1038/s41598-017-17766-4

      Weaver, K. E., Richards, T. L., Saenz, M., Petropoulos, H., & Fine, I. (2013). Neurochemical changes within human early blind occipital cortex. Neuroscience. https://doi.org/10.1016/j.neuroscience.2013.08.004

      Wu, Y. K., Miehl, C., & Gjorgjieva, J. (2022). Regulation of circuit organization and function through inhibitory synaptic plasticity. Trends in Neurosciences, 45(12), 884–898. https://doi.org/10.1016/J.TINS.2022.10.006

      Recommendations for the Authors:

      Reviewer #1 (Recommendations for The Authors):

      Thank you for the interesting submission. I have inserted my comments to the authors here. Some of them will be more granular comments related to the concerns raised in the public review.

      (1) Introduction:

      Could you please justify the rationale for using eyes open and eyes closed in the MRS condition, and the use of the three different conditions in the EEG experiment? If these resulted in negative findings, then the implications should be discussed.

      Previous work with MRS in sighted individuals has suggested that eye opening in darkness results in a decrease of visual cortex GABA+ concentration, while visual stimulation results in an increase of Glx concentration, compared to a baseline concentration at eye closure (Kurcyus et al., 2018). Moreover visual stimulation/eye opening is known to result in an alpha desynchronization (Adrian & Matthews, 1934).

      While previous work of our group has shown significantly reduced alpha oscillatory activity in congenital cataract reversal individual, desynchronization following eye opening was indistinguishable when compared to normally sighted controls (Ossandón et al., 2023; Pant et al., 2023).

      Thus, we decided to include both conditions to test whether a similar pattern of results would emerge for GABA+/Glx concentration.

      We added our motivation to the Introduction of the revised manuscript (Page 4, Lines 122-125) along with the Methods (Page 9, Lines 219-223).

      It does not become clear from the introduction why a higher intercept is predicted in the EEG measure. The rationale for this hypothesis needs to be explained better.

      Given the prior findings suggesting an increased E/I ratio in CC individuals and the proposed link between neuronal firing (Manning et al., 2009) and the aperiodic intercept, we expected a higher intercept for the CC compared to the SC group.

      We have now added this explanation to the Introduction (Page 4, Lines 126-128).

      (2) Participants

      Were participants screened for common MRS exclusion criteria such as history of psychiatric conditions or antidepressant medication, which could alter neurochemistry? If not, then this needs to be pointed out.

      All participants were clinically screened at the LV Prasad Eye Institute, and additionally self-reported no neurological or psychiatric conditions or medications. Moreover, all subjects were screened based exclusion criteria for being scanned using the standard questionnaire of the radiology center.

      We have now made this clear in the Methods (Page 7, Lines 168-171).

      Table 1 needs to show the age of the participant, which can only be derived by adding the columns 'duration of deprivation' and 'time since surgery'. Table 1 also needs to include the controls.

      We have accordingly modified Table 1 in the revised manuscript and added age for the patients as well as the controls (Table 1, Pages 6-7).

      The control cohort is not specific enough to exclude reduced visual acuity, or co-morbidities, as the primary driver of the differences between groups. Ideally, a cohort with developmental cataracts is recruited. Normally sighted participants as a control cohort cannot distinguish between different types of sight loss, or stages of plasticity.

      The goal of this study was not to distinguish between different types of sight loss or stages of plasticity. We aimed to assess whether the most extreme forms of visual deprivation (i.e. congenital and total patterned vision loss) affected the E/I ratio. Low visual acuity and nystagmus are genuine diagnostic criteria (Methods, Page 5, Lines 142-145). Visual acuity cannot solely explain the current findings, since the MRS data were acquired both with eyes closed or diffuse visual stimulation in a dimly lit room, without any visual task.

      With the awareness of the present results, we consider it worthwhile for the future to investigate additional groups such as developmental cataract-reversal individuals, to narrow down the contribution of the age of onset and degree of visual deprivation to the observed group differences.

      (3) Data collection and analysis

      - More detail is needed: how long were the sessions, how long was each part?

      We have added this information on Page 7, Lines 178-181 of the Methods. MRS scanning took between 45 and 60 minutes, EEG testing took 20 minutes excluding the time for capping, and visual acuity testing took 3-5 minutes.

      - It should be mentioned here that the EEG data is a reanalysis of a subset of legacy data, published previously in Ossandón et al., 2023; Pant et al., 2023.

      In the revised manuscript, we explicitly state at the beginning of the “Electrophysiology recordings” section of the Methods (Page 13, Lines 331-334) that the EEG datasets were a subset of previously published data.

      (4) MRS Spectroscopy

      - Please fill out the minimum reporting standards form (Lin et al., 2021), or report all the requested measures in the main document https://pubmed.ncbi.nlm.nih.gov/33559967/

      We have now filled out this form and added it as Supplementary Material (Supplementary Excel File 1). Additionally, all the requested information has been moved to the Methods section of the main document (MRS Data Quality, Pages 10-12).

      - Information on how the voxels were placed is missing. The visual cortex voxel is not angled parallel to the calcarine, as is a common way to capture processing in the early visual cortex. Describe in the paper what the criteria for successful placement were, and how was it ensured that non-brain tissue was avoided in a voxel of this size.

      Voxel placement was optimized in each subject to avoid the meninges, ventricles, skull and subcortical structures, ensured by examining the voxel region across slices in the acquired T1 volume for each subject. Saturation bands were placed to nullify the skull signal during MRS acquisition, at the anterior (frontal) and posterior (visual) edge of the voxel for every subject. Due to limitations in the clinical scanner rotated/skewed voxels were not possible, and thus voxels were not always located precisely parallel to the calcarine.

      We have added this information to Page 9 (Lines 229-237) of the revised manuscript.

      - Figure 1. shows voxels that are very close to the edge of the brain (frontal cortex) or to the tentorium (visual cortex). Could the authors please calculate the percentage overlap between the visual cortex MRS voxel and the visual cortex, and compare them across groups to ensure that there is no between-group bias from voxel placement?

      We have now added the requested analysis to Supplementary Material S2 and referred to it in the main manuscript on Page 9, Lines 236-237.

      Briefly, the percentage overlap with areas V1-V6 in every individual subject’s visual cortex voxel was 60% or more; the mean overlap in the CC group was 67% and the SC group 70%. The percentage overlap did not differ between groups ( t-test (t(18) = -1.14, p = 0.269)).

      - Figure 1. I would recommend displaying data on a skull-stripped image to avoid identifying information from the participant's T1 profile.

      We have now replaced the images in Figure 1 with skull-stripped images. Note that images from SPM12 were used instead of GannetCoregister, as GannetCoregister only displays images with the skull.

      - Please show more rigor with the MRS quality measures. Several examples of inconsistency and omissions are below.

      • SNR was quantified and shows a difference in SNR between voxel positions, with lower SNR in the frontal cortex. No explanation or discussion of the difference was provided.

      • Looking at S1, the linewidth of NAA seems to be a lot broader in the frontal cortex than in the visual cortex. The figures suggest that acquisition quality was very different between voxel locations, making the comparison difficult.

      • Linewidth of NAA is a generally agreed measure of shim quality in megapress acquisitions (Craven et al., 2022).

      The data quality difference between the frontal and visual cortices has been observed in the literature (Juchem & Graaf, 2017; Rideaux et al., 2022). We nevertheless chose a frontal cortex voxel as control site instead of the often-chosen sensorimotor cortex. The main motivation was to avoid any cortical region linked to sensory processing since crossmodal compensation as a consequence of visual deprivation is a well-documented phenomenon.

      We now make this clearer in the Methods (Page 11, Lines 284 – 299), in the Discussion/Limitations (Page 25, Lines 662 - 665).  

      - To get a handle on the data quality, I would recommend that the authors display their MRS quality measures in a separate section 'MRS quality measure', including NAA linewidth, NAA SNR, GABA+ CRLB, Glx CRLB, and test for the main effects and interaction of voxel location (VC, FC) and group (SC, CC) and discuss any discrepancies.

      We have moved all the quality metric values for GABA+, Glx and NAA from the supplement to the Methods section (see Table 2), and added the requested section titled “MRS Data quality.”

      We have conducted the requested analyses and reported them in Supplementary Material S6: there was a strong effect of region confirming that data quality was better in the visual than frontal region. We have referred to this in the main manuscript on Page 11, Line 299.

      In the revised manuscript, we discuss the data quality in the frontal cortex, and how we ensured it was comparable to prior work. Moreover, there were no significant group effects, or group-by-region interactions, suggesting that group differences observed for the visual cortex voxel cannot be accounted for by differences in data quality. We now included a section on data quality, both in the Methods (Page 11, Lines 284 – 299), and the limitations section of the Discussion (Page 25, Lines 662 - 665).

      Please clarify the MRS acquisition, "Each MEGA- PRESS scan lasted for 8 minutes and was acquired with the following specifications: TR = 2000 ms, TE = 68 ms, Voxel size = 40 mm x 30 mm x 25mm, 192 averages (each consists of two TRs). "192 averages x 2 TRs x 2s TR = 12.8 min, not 8 min, apologies if I have misunderstood these details.

      We have corrected this error in the revised manuscript and stated the parameters more clearly – there were a total of 256 averages, resulting in an (256 repetitions with 1 TR * 2 s/60) 8.5-minute scan (Page 8, Lines 212-213).

      - What was presented to participants in the eyes open MRS? Was it just normal room illumination or was it completely dark? Please add details to your methods.

      The scans were conducted in regular room illumination, with no visual stimulation.

      We have now clarified this on Page 9 (Lines 223-224) of the Methods.

      (5) MRS analysis

      How was the tissue fraction correction performed? Please add or refer to the exact equation from Harris et al., 2015.

      We have clarified that the reported GABA+/Glx values are water-normalized alpha corrected values (Page 10, Line 249), and cited Harris et al., 2015 on Page 10 (Line 251) of the Methods.

      (6) Statistical approach

      How was the sample size determined? Please add your justification for the sample size

      We collected as many qualifying patients as we were able to recruit for this study within 2.5 years of data collection (commencing August 2019, ending February 2022), given the constraints of the patient population and the pandemic. We have now made this clear in the Discussion (Page 25, Lines 650-652).

      Please report the tests for normality.

      We have now reported the Shapiro-Wilk test results for normality as well as Levene’s test for homogeneity of variance between groups for every dependent variable in our dataset in Supplementary Material S9, and added references to it in the descriptions of the statistical analyses (Methods, Page13, Lines 326-329 and Page 15, Lines 400-402).

      Calculate the Bayes Factor where possible.

      As our analyses are all frequentist, instead of re-analyzing the data within a Bayesian framework, we added partial eta squared values for all the reported ANOVAs (η<sub>p</sub><sup>²</sup>) for readers to get an idea of the effect size (Results).

      I recommend partial correlations to control for the influence of age, duration, and time of surgery, rather than separate correlations.

      Given the combination of small sample size and the expected multicollinearity in our variables (duration of blindness, for example, would be expected to correlate with age, as well as visual acuity post-surgery), partial correlations could not be calculated on this data.

      We are aware of the limits of correlational analyses. Given the unique data set of a rare population we had exploratorily planned to relate behavioral, EEG and MRS parameters by calculating correlations. Since no similar data existed when we started (and to the best of our knowledge our data set is still unique), these correlation analyses were explorative, but the most transparent to run.

      We have now clearly outlined these limitations in our Introduction (Page 5, Lines 133-135), Methods (Page 15, Lines 408-410) and Discussion section (Page 24, Line 634, Page 25, Lines 652-65) to ensure that the results are interpreted with appropriate caution.

      (7) Visual acuity

      Is the VA monocular average, from the dominant eye, or bilateral?

      We have now clarified that the VA reported here is bilateral (Methods, Page 7 Line 165 and Page 15, Line 405). Bilateral visual acuity in congenital cataract-reversal individuals typically corresponds to the visual acuity of the best eye.

      It is mentioned here that correlations with VA are exploratory, please be consistent as the introduction mentions that there was a hypothesis that you sought to test.

      We have now accordingly modified the Introduction (Page 5, Lines 133-135) and added the appropriate caveats in the discussion with regards to interpretations (Page 25, Lines 652-665).

      (8) Correlation analyses between MRS and EEG

      It is mentioned here that correlations between EEG and MRS are exploratory, please consistently point out the exploratory nature, as these results are preliminary and should not be overinterpreted ("We did not have prior hypotheses as to the best of our knowledge no extant literature has tested the correlation between aperiodic EEG activity and MRS measures of GABA+,Glx and Glx/GABA+." ).

      In the revised manuscript, we explicitly state the reported associations between EEG (aperiodic component) and MRS parameters allow for putting forward directed / more specific hypotheses for future studies (Introduction, Page 5, Lines 133-135; Methods, Page 15, Line 415. Discussion, Page 25, Lines 644-645 and Lines 652-665).

      (9) Results

      Figure 2 uses the same y-axis for the visual cortex and frontal cortex to facilitate a comparison between the two locations. Comparing Figure 2 a with b demonstrates poorer spectral peaks and reduced amplitudes. Lower spectral quality in the frontal cortex voxel could contribute to the absence of a group effect in the control voxel location. The major caveat that spectral quality differs between voxels needs to be pointed out and the limitations thereof discussed.

      We have now explicitly pointed out this issue in the Methods (MRS Data Quality, Supplementary Material S6) and Discussion in the Limitations section (Page 25, Lines 662-665). While data quality was lower for the frontal compared to the visual cortex voxels, as has been observed previously (Juchem & Graaf, 2017; Rideaux et al., 2022), this was not an issue for the EEG recordings. Thus, lower sensitivity of frontal measures cannot easily explain the lack of group differences for frontal measures. Crucially, data quality did not differ between groups.

      The results in 2c are the result of multiple correlations with metabolite values ("As in previous studies, we ran a number of exploratory correlation analyses between GABA+, Glx, and Glx/GABA+ concentrations, and visual acuity at the date of testing, duration of visual deprivation, and time since surgery respectively in the CC group"), it seems at least six for the visual acuity measure (VA vs Glx, VA vs GABA+, VA vs Glx/GABA+ x 2 conditions). While the trends are interesting, they should be interpreted with caution because of the exploratory nature, small sample size, the lack of multiple comparison correction, and the influence of two extreme data points. The authors should not overinterpret these results and should point out the need for replication.

      See response to (6) last section, which we copy here for convenience:

      We are aware of the limits of correlational analyses. Given the unique data set of a rare population we exploratorily related behavioral, EEG and MRS parameters by calculating correlations. Since no similar data existed when we started (and to the best of our knowledge our data set is still unique), these correlation analyses were explorative, but the most transparent to run.

      We have now clearly outlined these limitations in our Discussion section to ensure that the results are interpreted with appropriate caution (Discussion, Page 25, Lines 644-645 and Lines 652-665).

      (10) Discussion:

      Please explain the decrease in E/I balance from MRS in view of recent findings on an increase in E/I balance in CC using RSN-fMRI (Raczy et al., 2022) and EEG (Ossandon et al. 2023).

      We have edited our Abstract (Page 1-2, Lines 31-35) and Discussion (Page 23, Lines 584-590; Page 24, Lines 613-620). In brief, we think our results reflect a homeostatic regulation of E/I balance, that is, an increase in inhibition due to an increase in stimulus driven excitation following sight restoration.

      Names limitations but does nothing to mitigate concerns about spatial specificity. The limitations need to be rewritten to include differences in SNR between the visual cortex and frontal lobe. Needs to include caveats of small samples, including effect inflation.

      We have now discussed the data quality differences between the visual and frontal cortex voxel in MRS data quality, which we find irrespective of group (MRS Data Quality, Supplementary Material S6). We also reiterate why this might not explain our results; data quality was comparable to prior studies which have found group differences in frontal cortex (Methods Page 11, Lines 284 – 299), and data quality did not differ between groups. Further, EEG data quality did not differ across frontal and occipital regions, but group differences in EEG datasets were localized to the occipital cortex.

      Reviewer #2 (Recommendations for The Authors):

      To address the main weakness, the authors could consider including data from a third group, of congenitally blind individuals. Including this would go a very long way towards making the findings interpretable and relating them to the rest of the literature.

      Unfortunately, recruitment of these groups was not possible due to the pandemic. Indeed, we would consider a pre- vs post- surgery approach the most suitable design in the future, which, however, will require several years to be completed. Such time and resource intensive longitudinal studies are justified by the present cross-sectional results.

      We have explicitly stated our contribution and need for future studies in the Limitations section of the Discussion (Page 25, Lines 650-657).

      Analysing the amplitude of alpha rhythms, as well as the other "aperiodic" components, would be useful to relate the profile of the tested patients with previous studies. Visual inspection of Figure 3 suggests that alpha power with eyes closed is not reduced in the patients' group compared to the controls. This would be inconsistent with previous studies (including research from the same group) and it could suggest that the small selected sample is not really representative of the sight-recovery population - certainly one of the most heterogeneous study populations. This further highlights the difficulty of drawing conclusions on the effects of visual experience merely based on this N=10 set of patients.

      Alpha power was indeed reduced in the present subsample of 10 CC individuals (Supplementary Material S19). A possible source of the confusion (that the graphs of the CC and SC group look so similar for the EC condition in Figure 3) likely is that the spectra are shown with aperiodic components not yet removed, and scales to accommodate very different alpha power values. As documented in Supplementary Material S18 and S19, alpha power and the aperiodic intercept/slope results of the resting state data in the present 10 CC individuals correspond to the results from a larger sample of CC individuals (n = 28) in Ossandón et al., 2023. We explicitly highlight this “replication” in the main manuscript (Page 25 -26, Lines 671-676). Thus, the present sub-sample of CC individuals are representative for their population.

      To further characterise the MRS results, the authors may consider an alternative normalisation scheme. It is not clear whether the lack of significant GABA and GLX differences in the face of a significant group difference in the GLX/GABA ratio is due to the former measures being noisier since taking the ratio between two metabolites often helps reduce inter-individual variability and thereby helps revealing group differences. It remains an open question whether the GABA or GLX concentrations would show significant group differences after appropriate normalisation (e.g. NAA?).

      We repeated the analysis with Creatine-normalized values of GABA+ and Glx, and the main results i.e. reduced Glx/GABA+ concentration in the visual cortex of CC vs SC individuals, and no such difference in the frontal cortex, remained the same (Supplementary Material S5).

      Further, we re-analyzed the data using Osprey, an open-source toolbox that uses linear combination modeling, and found once more that our results did not change (Supplementary Material S3). We refer to these findings in the Methods (Page 10, Lines 272-275) and Results (Page 10, Lines 467-471) of the main manuscript.

      In fact, the Glx concentration in the visual cortex of CC vs SC individuals was significantly decreased when Cr-normalized values were used (which was not significant in the original analysis). However, we do not interpret this result as it was not replicated with the water-normalized values from Gannet or Osprey.

      I suggest revising the discussion to present a more balanced picture of the existent evidence of the relation between E/I and EEG indices. Although there is evidence that the 1/f slope changes across development, in a way that could be consistent with a higher slope reflecting more immature and excitable tissue, the link with cortical E/I is far from established, especially when referring to specific EEG indices (intercept vs. slope, measured in lower vs. higher frequency ranges).

      We have revised the Introduction (Page 4, Line 91, Lines 101-102) and Discussion (Page 22, Lines 568-569, Page 24, Lines 645-647 and Lines 654-657) in the manuscript accordingly; we allude to the fact that the links between cortical E/I and aperiodic EEG indices have not yet been unequivocally established in the literature.

      Minor:

      - The authors estimated NAA concentration with different software than the one used to estimate GLX and GABA; this examined the OFF spectra only; I suggest that the authors consider running their analysis with LCModel, which would allow a straightforward approach to estimate concentrations of all three metabolites from the same edited spectrum and automatically return normalised concentrations as well as water-related ones.

      We re-analyzed all of the MRS datasets using Osprey, which uses linear combination modelling and has shown quantification results similar to LCModel for NAA (Oeltzschner et al., 2020). The results of a lower Glx/GABA+ concentration in the visual cortex of CC vs SC individuals, and no difference in NAA concentration, were replicated using this pipeline.

      We have now added these analyses to the Supplementary Material S3 and referred to them in the Methods (Page 9, Lines 242-246) and Results (Page 18, Lines 464-467).

      - Of course the normalisation used to estimate GABA and GLX values is completely irrelevant when the two values are expressed as ratio GLX/GABA - this may be reflected in the text ("water normalised GLX/GABA concentration" should read "GLX/GABA concentration" instead).

      We have adapted the text on Page 16 (Line 431) and have ensured that throughout the manuscript the use of “water-normalized” is in reference to Glx or GABA+ concentration, and not the ratio.

      - Please specify which equation was used for tissue correction - is it alpha-correction?

      We have clarified that the reported GABA+/Glx values are water-normalized alpha corrected values (Page 10, Line 249), and cited Harris et al., 2015 on Page 10 (Line 251) of the Methods.

      - Since ANOVA was used, the assumption is that values are normally distributed. Please report evidence supporting this assumption.

      We have now reported the Shapiro-Wilk test results for normality as well as Levene’s test for homogeneity of variance between groups for every dependent variable in our dataset in Supplementary Material S9, and added references to it in the Methods (Page 13, Lines 326-329 and Page 15, Lines 400-402).

      Reviewer #3 (Recommendations for The Authors):

      In addition to addressing major comments listed in my Public Review, I have the following, more granular comments, which should also be addressed:

      (1) The paper's structure could be improved by presenting visual acuity data before diving into MRS and EEG results to better contextualize the findings.

      We now explicitly state in the Methods (Page 5, Line 155) that lower visual acuity is expected in a cohort of CC individuals with long lasting congenital visual deprivation.

      We have additionally included a plot of visual acuities of the two groups (Supplementary Material S1).

      (2) The paper should better explain the differences between CC for which sight is restored and congenitally blind patients. The authors write in the introduction that there are sensitive periods/epochs during the lifespan for the development of local inhibitory neural circuits. and "Human neuroimaging studies have similarly demonstrated that visual experience during the first weeks and months of life is crucial for the development of visual circuits. If human infants born with dense bilateral cataracts are treated later than a few weeks from birth, they suffer from a permanent reduction of not only visual acuity (Birch et al., 1998; Khanna et al., 2013) and stereovision (Birch et al., 1993; Tytla et al., 1993) but additionally from impairments in higher-level visual functions, such as face perception (Le Grand et al., 2001; Putzar et al., 2010; Röder et al., 2013)...".

      Thus it seems that the current participants (sight restored after a sensitive period) seem to be similarly affected by the development of the local inhibitory circuits as congenitally blind. To assess the effect of plasticity and sight restoration longitudinal data would be necessary.

      In the Introduction (Page 2, Lines 59-64; Page 3, Lines 111-114) we added that in order to identify sensitive periods e.g. for the elaboration of visual neural circuits, sight recovery individuals need to be investigated. The study of permanently blind individuals allows for investigating the role of experience (whether sight is necessary to introduce the maturation of visual neural circuits), but not whether visual input needs to be available at early epochs in life (i.e. whether sight restoration following congenital blindness could nevertheless lead to the development of visual circuits).

      This is indeed the conclusion we make in the Discussion section. We have now highlighted the need for longitudinal assessments in the Discussion (Page 25, Lines 654-656).

      (3) What's the underlying idea of analyzing two separate aperiodic slopes (20-40Hz and 1-19Hz). This is very unusual to compute the slope between 20-40 Hz, where the SNR is rather low.

      "Ossandón et al. (2023), however, observed that in addition to the flatter slope of the aperiodic power spectrum in the high frequency range (20-40 Hz), the slope of the low frequency range (1-19 Hz) was steeper in both, congenital cataract-reversal individuals, as well as in permanently congenitally blind humans."

      The present manuscript computed the slope between 1-20 Hz. Ossandón et al. as well as Medel et al. (2023) found a “knee” of the 1/f distribution at 20 Hz and describe further the motivations for computing both slope ranges. For example, Ossandón et al. used a data driven approach and compared single vs. dual fits and found that the latter fitted the data better. Additionally, they found the best fit if a knee at 20 Hz was used. We would like to point out that no standard range exists for the fitting of the 1/f component across the literature and, in fact, very different ranges have been used (Gao et al., 2017; Medel et al., 2023; Muthukumaraswamy & Liley, 2018).

      (4) "For this scan, participants were instructed to keep their eyes closed and stay as still as possible." Why should it be important to have the eyes closed during a T1w data acquisition? This statement at this location does not make sense.

      To avoid misunderstandings, we removed this statement in this context.

      (5) "Two SC subjects did not complete the frontal cortex scan for the EO condition and were excluded from the statistical comparisons of frontal cortex neurotransmitter concentrations."<br /> Why did the authors not conduct whole-brain MRS, which seems to be on the market for quite some time (e.g. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3590062/) ?

      Similar to previous work (Coullon et al., 2015; Weaver et al., 2013) our hypothesis was related to the visual cortex, and we chose the frontal cortex voxel as a control. This has now been clarified in the Introduction (Page 4, Lines 103-114), Methods (Page 9, Lines 225-227) and Discussion (Page 25, Lines 662-665).

      (6) In "....during visual stimulation with stimuli that changed in luminance (LU) (Pant et al., 2023)." the authors should provide a link on the visual stimulation, which is provided further below

      In the revised manuscript, we have moved up the description of the visual stimulation (Page 13, Line 336).

      (7) "During the EO condition, participants were asked to fixate on a blank screen." This is not really possible. Typically, resting state EO conditions include a fixation cross, as the participants would not be able to fixate on a blank screen and move their eyes, which would impact the recordings.

      We have now rephrased this as “look towards” with the goal of avoiding eye movements (Page 14, Line 347).

      (8) "Components corresponding to horizontal or vertical eye movements were identified via visual inspection and removed (Plöchl et al., 2012)." It is unclear what the Plöchl reference should serve for. Is the intention of the authors to state that manual (and subjective) visual inspection of the ICA components is adequate? I would recommend removing this reference.

      The intention was to provide the basis for classification during the visual inspection, as opposed to an automated method such as ICLabel.

      We stated this clearly in the revised manuscript (Page 14 Lines 368-370).

      (9) "The datasets were divided into 6.25 s long epochs corresponding to each trial." This is a bit inaccurate, as the trial also included some motor response task. Thus, I assume the 6.25 s are related to the visual stimulation.

      We have modified the sentence accordingly (Page 15, Line 378).

      (10) Figure 2. a & b. Just an esthetic suggestion: I would recommend removing the lines between the EC and EO conditions, as they suggest some longitudinal changes. Unless it is important to highlight the changes between EC and EO within each subject.

      In fact, EC vs. EO was a within-subject factor with expected changes for the EEG and possible changes in the MRS parameters. To allow the reader to track changes due to EC vs. EO for individual subjects (rather than just comparing the change in the mean scores), we use lines.  

      (11) Figure 3A: I would plot the same y-axis range for both groups to make it more comparable.

      We have changed Figure 3A accordingly.

      (12) " flattening of the intercept" replaces flattening, as it is too related to slope.

      We have replaced “flattening” with “reduction” (Page 20, Line 517).

      (13) The plotting of only the significant correlation between MRS measures and EEG measures seems to be rather selective reporting. For this type of exploratory analysis, I would recommend plotting all of the scatter plots and moving the entire exploratory analysis to the supplementary (as this provides the smallest evidence of the results).

      We have made clear in the Methods (Page 16, Lines 415-426), Results and Discussion (page 24, Lines 644-645), as well as in the Supplementary material, that the reason for only reporting the significant correlation was that this correlation survived correction for multiple comparisons, while all other correlations did not. We additionally explicitly allude to the Supplementary Material where the plots for all correlations are shown (Results, Page 21, Lines 546-552).

      (14) "Here, we speculate that due to limited structural plasticity after a phase of congenital blindness, the neural circuits of CC individuals, which had adapted to blindness after birth, employ available, likely predominantly physiological plasticity mechanisms (Knudsen, 1998; Mower et al., 1985; Röder et al., 2021), in order to re-adapt to the newly available visual excitation following sight restoration."

      I don't understand the logic here. The CC individuals are congenitally blind, thus why should there be any physiological plasticity mechanism to adapt to blindness, if they were blind at birth?

      With “adapt to blindness” we mean adaptation of a brain to an atypical or unexpected condition when taking an evolutionary perspective (i.e. the lack of vision). We have made this clear in the revised manuscript (Introduction, Page 4, Lines 111-114; Discussion, Page 23, Lines 584-591).

      (15) "An overall reduction in Glx/GABA ratio would counteract the aforementioned adaptations to congenital blindness, e.g. a lower threshold for excitation, which might come with the risk of runaway excitation in the presence of restored visually-elicited excitation."

      This could be tested by actually investigating the visual excitation by visual stimulation studies.

      The visual stimulation condition in the EEG experiment of the present study found a higher aperiodic intercept in CC compared to SC individuals. Given the proposed link between the intercept and spontaneous neural firing (Manning et al., 2009), we interpreted the higher intercept in CC individuals as increased broadband neural firing during visual stimulation (Results Figure 3; Discussion Page 24, Lines 635-640). This idea is compatible with enhanced BOLD responses during an EO condition in CC individuals (Raczy et al., 2022). Future work should systematically manipulate visual stimulation to test this idea.

      (16) As the authors also collected T1w images, the hypothesis of increased visual cortex thickness in CC. Was this investigated?

      This hypothesis was investigated in a separate publication which included this subset of participants (Hölig et al., 2023), and found increased visual cortical thickness in the CC group. We refer to this publication, and related work (Feng et al., 2021) in the present manuscript.

      (17) The entire discussion of age should be omitted, as the current data set is too small to assess age effects.

      We have removed this section and just allude to the fact that we replicated typical age trends to underline the validity of the present data (Page 26, Lines 675-676).

      (18) Table1: should include the age and the age at the time point of surgery.

      We added age to the revised Table 1. We clarified that in CC individuals, duration of blindness is the same as age at the time point of surgery (Page 6, Line 163).

      (19) Why no group comparisons of visual acuity are reported?

      Lower visual acuity in CC than SC individuals is a well-documented fact.

      We have now added the visual acuity plots for readers (Supplementary Material S1, referred to in the Methods, Page 5, Line 155) which highlight this common finding.

      References (Recommendations to the Authors)

      Adrian, E. D., & Matthews, B. H. C. (1934). The berger rhythm: Potential changes from the occipital lobes in man. Brain. https://doi.org/10.1093/brain/57.4.355

      Coullon, G. S. L., Emir, U. E., Fine, I., Watkins, K. E., & Bridge, H. (2015). Neurochemical changes in the pericalcarine cortex in congenital blindness attributable to bilateral anophthalmia. Journal of Neurophysiology. https://doi.org/10.1152/jn.00567.2015

      Feng, Y., Collignon, O., Maurer, D., Yao, K., & Gao, X. (2021). Brief postnatal visual deprivation triggers long-lasting interactive structural and functional reorganization of the human cortex. Frontiers in Medicine, 8, 752021. https://doi.org/10.3389/FMED.2021.752021/BIBTEX

      Gao, R., Peterson, E. J., & Voytek, B. (2017). Inferring synaptic excitation/inhibition balance from field potentials. NeuroImage, 158(March), 70–78. https://doi.org/10.1016/j.neuroimage.2017.06.078

      Hölig, C., Guerreiro, M. J. S., Lingareddy, S., Kekunnaya, R., & Röder, B. (2023). Sight restoration in congenitally blind humans does not restore visual brain structure. Cerebral Cortex, 33(5), 2152–2161. https://doi.org/10.1093/CERCOR/BHAC197

      Juchem, C., & Graaf, R. A. de. (2017). B0 magnetic field homogeneity and shimming for in vivo magnetic resonance spectroscopy. Analytical Biochemistry, 529, 17–29. https://doi.org/10.1016/j.ab.2016.06.003

      Kurcyus, K., Annac, E., Hanning, N. M., Harris, A. D., Oeltzschner, G., Edden, R., & Riedl, V. (2018). Opposite Dynamics of GABA and Glutamate Levels in the Occipital Cortex during Visual Processing. Journal of Neuroscience, 38(46), 9967–9976. https://doi.org/10.1523/JNEUROSCI.1214-18.2018

      Manning, J. R., Jacobs, J., Fried, I., & Kahana, M. J. (2009). Broadband shifts in local field potential power spectra are correlated with single-neuron spiking in humans. The Journal of Neuroscience : The Official Journal of the Society for Neuroscience, 29(43), 13613–13620. https://doi.org/10.1523/JNEUROSCI.2041-09.2009

      Medel, V., Irani, M., Crossley, N., Ossandón, T., & Boncompte, G. (2023). Complexity and 1/f slope jointly reflect brain states. Scientific Reports, 13(1), 21700. https://doi.org/10.1038/s41598-023-47316-0

      Muthukumaraswamy, S. D., & Liley, D. T. (2018). 1/F electrophysiological spectra in resting and drug-induced states can be explained by the dynamics of multiple oscillatory relaxation processes. NeuroImage, 179(November 2017), 582–595. https://doi.org/10.1016/j.neuroimage.2018.06.068

      Oeltzschner, G., Zöllner, H. J., Hui, S. C. N., Mikkelsen, M., Saleh, M. G., Tapper, S., & Edden, R. A. E. (2020). Osprey: Open-source processing, reconstruction & estimation of magnetic resonance spectroscopy data. Journal of Neuroscience Methods, 343, 108827. https://doi.org/10.1016/j.jneumeth.2020.108827

      Ossandón, J. P., Stange, L., Gudi-Mindermann, H., Rimmele, J. M., Sourav, S., Bottari, D., Kekunnaya, R., & Röder, B. (2023). The development of oscillatory and aperiodic resting state activity is linked to a sensitive period in humans. NeuroImage, 275, 120171. https://doi.org/10.1016/J.NEUROIMAGE.2023.120171

      Pant, R., Ossandón, J., Stange, L., Shareef, I., Kekunnaya, R., & Röder, B. (2023). Stimulus-evoked and resting-state alpha oscillations show a linked dependence on patterned visual experience for development. NeuroImage: Clinical, 103375. https://doi.org/10.1016/J.NICL.2023.103375

      Raczy, K., Holig, C., Guerreiro, M. J. S., Lingareddy, S., Kekunnaya, R., & Roder, B. (2022). Typical resting-state activity of the brain requires visual input during an early sensitive period. Brain Communications, 4(4). https://doi.org/10.1093/BRAINCOMMS/FCAC146

      Rideaux, R., Ehrhardt, S. E., Wards, Y., Filmer, H. L., Jin, J., Deelchand, D. K., Marjańska, M., Mattingley, J. B., & Dux, P. E. (2022). On the relationship between GABA+ and glutamate across the brain. NeuroImage, 257, 119273. https://doi.org/10.1016/J.NEUROIMAGE.2022.119273

      Weaver, K. E., Richards, T. L., Saenz, M., Petropoulos, H., & Fine, I. (2013). Neurochemical changes within human early blind occipital cortex. Neuroscience. https://doi.org/10.1016/j.neuroscience.2013.08.004

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      This manuscript by Neininger-Castro and colleagues presents a novel automatic image analysis method for assessing sarcomeres, the basic units of myofibrils and validates this tool in a couple of experimental approaches that interfere with sarcomere assembly in iPSCcardiomyocytes (iPSC-CM).

      Automatic quantification of sarcomeres is definitely something that is useful to the field. I am surprised that there is no reference in the manuscript to SarcTrack, published by Toepfer and colleagues in 2019 (PMID 30700234), which has exactly the same purpose. The advantage of the image analysis software presented in the current manuscript appears to me to be that it can cover both mature sarcomeres and nascent sarcomeres in premyofibrils effectively.

      We whole-heartedly disagree that SarcTrack has the exact same purpose as sarcApp. sarcApp measures more than the frequency of actinin2 images, and can measure real-space quantifications of actinin, myomesin, and titin, which has not been done before in this way. However, SarcTrack is an interesting method that we hope many researchers find helpful in their research. SarcTrack is a particle tracker that outputs the dimensions of the objects found, but does not distinguish between Z-Lines and other actinin2-positive structures (Z-Bodies, adhesions). It also does not group these structures into higher order structures such as myofibrils and muscle stress fibers.

      When going through the manuscript there were a few issues that should be addressed in a revised version of the manuscript:

      1) I am a bit puzzled that they took 1.4 um length as a cutoff length for a mature A-band in their quantifications, since the consensus in the field for thick filament length seems to be 1.6 um?

      We use 1.4 µm as a cutoff length for the length of a Z-Line rather than the A-Band. We believe the reviewer is referring to the width of the A-Band perpendicular to the Z-lines, which is indeed 1.6 µm. However, we are referring to the length of the Z-Lines, which can span anywhere from 1.4 µm to up to 10 or more µm. Thank you for allowing us to make the clarification.

      2) When doing the knockdown for alpha and beta-myosin heavy chain, respectively, why did they not also do a Western blot for the "other" isoform as well (Figure 7)? We know that iPSCCM express a mixture, so the relatively mild phenotype that they observe in single knockdown experiments may well be due to concomitant upregulation of the expression of the other isoform. In my point of view this should be checked.

      It is likely that in the single knockdown experiments the other isoform is upregulated, which is why we were careful in stating that neither muscle myosin alone is required for sarcomere formation. We do agree this would be an interesting experiment to check beyond the scope of this manuscript.

      3) There seems to be a disconnect between the images for myomesin knockdown shown in Figure 8H and the quantification shown in Figure 8I, which makes me wonder whether the image shown in H middle (MYOM1 (1) KD), where the beta-myosin doublets do not seem to be much affected is really representative?

      The image shown in the middle of H is representative of the mean length of beta-myosin doublets in MYOM1 (1) KD hiCMs. While the beta-myosin doublets are still present and organized, they are significantly shorter. In the zoomed out image, you can appreciate much shorter arrays of beta-myosin doublets that, while extending across the entire cell, are thinner than control cells.

      Reviewer #2 (Public Review):

      Neininger-Castro et al report on their original study entitled "Independent regulation of Z-lines and M-lines during sarcomere assembly in cardiac myocytes revealed by the automatic image analysis software sarcApp", In this study, the research team developed two software, yoU-Net and sarcApp, that provide new binarization and sarcomere quantification methods. The authors further utilized human induced pluripotent stem cell-derived cardiomyocytes (hiCMs) as their model to verify their software by staining multiple sarcomeric components with and without the treatment of Blebbistatin, a known myosin II activity inhibitor. With the treatment of different Blebbistatin concentrations, the morphology of sarcomeric proteins was disturbed. These disrupted sarcomeric structures were further quantified using sarcApp and the quantification data supported the phenotype. The authors further investigated the roles of muscle myosins in sarcomere assembly by knocking down MYH6, MYH7, or MYOM in hiCMs. The knockdown of these genes did not affect Z-line assembly yet the knockdown of MYOM affected M-line assembly. The authors demonstrated that different muscle myosins participate in sarcomere assembly in different manners.

      Reviewer #3 (Public Review):

      Neininger-Castro and colleagues developed software tools for the quantification of sarcomeres and sarcomere-precursor features in immunostained human induced pluripotent stem cellderived cardiac myocytes (hiCMs). In the first part they used a deep-learning- based model called a U-Net to construct and train a network for binarization of immunostained cardiomyocyte images. They also wrote graphical user interface (GUI) software that will assist other labs in using this approach and made it publicly available. They did not compare their approach to existing ones, but an example from one image suggests their binarization tool outperforms Otsu thresholding binarization.

      In the second part they developed a software tool called sarcApp that classifies sarcomere structures in the binarized image as a Z-Line or Z-Body and assigns each to either a myofibril or to stress fibers. The tools can then automatically count and measure multiple features (33 per cell and 24 per myofibril) and report them on a per-cell, per-myofibril, and per- stress fiber basis.

      To test the tools they used Blebbistatin to inhibit sarcomere assembly and showed that the sarcApp tool could capture changes in multiple features such as fewer myofibrils, fewer Z-Lines, decreased myofibril persistence, decreased Z-Line length and altered myofibril orientation in the Blebbistatin treated cells. With some changes the tool was also shown to quantify sarcomeres in titin and myomesin stained cardiomyocytes.

      Finally they used sarcApp to quantify the changes in sarcomere assembly after siRNA mediated knockout of MYH7, MYH7, or MYOM. The analysis indicates that neither MYH6 nor MYH7 knockdown perturbed the assembly of Z- or M-lines, and that knockdown of MYOM perturbed the A-band/M-Line but not the Z-Line assembly according to features captured by the sarcApp tool.

      Overall the authors developed and made publicly available an excellent software tool that will be very useful for labs that are interested in studying sarcomere assembly. Multiple features that are difficult to measure or count manually can be automatically measured by the software quickly and accurately.

      There are however some remaining questions about these tools:

      1) The binarization tool which is tailored to sarcomere image binarization appears promising but was not systematically compared with existing approaches.

      We compared it with the existing approach we used previously in the lab, which was Otsu’s method for binarization. We are not aware of several other binarization approaches to compare to, other than using other machine learning techniques that are less advanced than a U-Net, the current standard in image-to-image translation.

      2) How robust is the tool? The tool was tested on images from one type of cardiomyocytes (hiCMs) taken from one lab using Nikon Spinning Disk confocal microscope equipped with Apo TIRF Oil 100X 1.49 NA objective or instant Structured Illumination Microscopy (iSIM), using deconvolution (Microvolution software) and in a specific magnification. It remains to be seen whether the tool would be equally effective with images taken with other microscopy systems, with other cardiomyocytes (chick or neonatal rat), with different magnifications, live imaging, etc.

      We tested the software with several magnifications, with live imaging, and with other tissues. We did not include the information in the manuscript because the data we tested the software with is for future manuscripts studying different aspects of sarcomere formation and maintenance. sarcApp reliably identifies Z-Lines and sarcomeres with deconvolved widefield fluorescence images of hiCMs and frozen human tissue, and are currently using it to measure zebrafish data for another study. Further, it works for live imaging with an actinin2-GFP (or similar) label. For the titin quantification, we would recommend using only 60-100X magnification, as the titin structures (doublets and rings) are not resolvable at lower magnifications.

      3) The tool was developed for evaluation of sarcomere assembly. The authors show that for this application it can detect the perturbation by Blebbistatin, or knockdown of sarcomeric genes. It remains to be seen if this tool is also useful for assessment of sarcomere structure for other questions beside sarcomere assembly and in other sarcomere pathologies.

      While this is beyond the scope of this specific methods paper, we welcome other researchers to use our software for other questions in other pathologies. We are currently doing the same for other manuscripts from our lab.

      Reviewer #1 (Recommendations For The Authors):

      1)"alpha-actinin..., which border the sarcomeric contractile machinery (thin and thick filaments); Z-lines do NOT border thick filaments in a relaxed sarcomere

      We have removed “(thin and thick filaments)” from the text.

      2) myomesin targeting siRNAs (gene name MYOM): there are actually three genes encoding for myomesin family members, specify, which one was targeted (I am assuming MYOM1).

      Thank you for the clarification: we do target MYOM1

      3) I am not surprised that they found not many mature Z-lines in the absence of both sarcomeric myosins; a similar codependence of assembly of mature Z-discs and the presence of functional thick filaments was previously shown by Geach and colleagues in 2015 (PMID 25845369)

      Thank you for sharing this manuscript: we have added a reference to it in our study.

      Reviewer #2 (Recommendations For The Authors):

      This work offers the possibility to gain more insights into the process of sarcomere assembly through the advancement in sarcomeric or myofibril structure analyses. However, some clarifications are needed from the authors, please see below for the comments.

      1) It is recommended that the authors include the time points for replating and harvesting hiCMs. After replating, the cardiomyocytes require at least three to four days for sarcomeric structures to reform. If the hiCMs were fixed before sarcomere assembly had completed, the staining of sarcomeric proteins including ACTN2 and titin could be compromised and it is difficult to tell if the phenotypes observed were consequences of drug treatments or knockdown of sarcomeric genes or simply because the replating hiCMs were fixed before their sarcomeric structures had fully regrown. It is also recommended that the authors replate hiCMs at a fixed time point to avoid discrepancies in the data.

      Cardiomyocytes do not require three to four days for sarcomeric structures to re-form, and indeed only require 24 hours, with the first sarcomeres typically appearing at ~6 hours. We and others have published several studies demonstrating this (Fenix et al., eLIfe 2018, Taneja, Neininger and Burnette MBoC 2020, Chen et al. Nature Methods, 2022). While sarcomeres continue to develop and turn over after this time, our lab is interested in the beginning steps of sarcomerogenesis rather than the turnover of mature structures.

      2) The sarcApp automatically identifies Z-lines and Z-bodies; however, is there an option for the users to set their own thresholds? Some users may select different criterions when quantifying sarcomeres. Moreover, the Z-lines and Z-bodies identified by the software are not always accurate. Can the users modify the list manually in an unbiased way. If this function is not available, the authors may consider adding this function to their software. sarcApp measures Zline and Z-bodies length but does not measure Z-line and Z-bodies width, but sometimes it is also necessary to measure the width.

      Absolutely, users can modify the thresholds to identify Z-Lines and Z-Bodies. There is not a way for users to modify the list in an unbiased way per se, as editing the list of Z-Lines and Z-Bodies based on non-mathematical measurements is inherently biased, but the user is free to add in other Z-Lines and Z-Bodies as they wish. In this context, “manually” and “unbiased” is mutually exclusive.

      3) It is recommended that the authors include the original images beside the sarcomeric structures identified by sarcApp (Figure 2A, 2C, 4C-F and more). It would be easier to compare the original Z-lines and Z-bodies with those identified by the software.

      We have added these in Author response image 1.

      Author response image 1.

      Uncropped images and merges from Figures 2, 4 and 6, respectively.

      4) The M-line length quantification data in Figure 3G, 5F, and 6H showed different colored-dots labeling n1 to n3, but the authors did not discuss the significance of these symbols.

      We are not sure what the reviewer means by this statement: there is no significance of the different colored dots other than to mark the biological replicate shown. These graphs were created using SuperPlots, which was not stated in the original methods. It has now been added to the Statistical Analysis section.

      5) Can the authors elaborate more on the reasons why they treated Blebbistatin at concentrations of 50µM and 100µM. Previous studies showed that 25µM of Blebbistatin was sufficient to delay the transformation of cardiomyocytes (PMID 27072942). Can the authors also comment on why they selected 6 hours, 12 hours, and 24 hours post replating for drug treatment. Moreover, the drug treatment at different time points was only done on ACTN2 but not titin or myomesin.

      We selected 6, 12, and 24 hours for actinin2 to show the time course of sarcomere formation and to show that sarcomeres are developed by 24 hours, as also mentioned above. We are interested in future studies of the time course of titin and myomesin over time, and are working on it in the lab.

      We chose 50 and 100 µM Blebbistatin as these completely blocked sarcomere assembly whereas treatment with 25 µM did not. This manuscript is a methods paper that aims to validate sarcApp and show how it could be used. We did not intend for it to be a comprehensive study of how different concentrations of blebbistatin affects sarcomere assembly.

      We are also unsure what the reviewer means by “transformation of cardiomyocytes”. The manuscript with the PMID of 27072942 does not address this issue. The paper is a “review and analyze readmission data for patients who received a continuous flow left ventricular assist device (LVAD)”. We assume the reviewer is referring to differentiation. The model system we developed and published in eLife in 2018 does not use differentiating iPSC cardiac myocytes. The hiCMs we use are terminally differentiated but still immature, as they are more transcriptionally similar to primary fetal myocytes. As such, they do not maintain their sarcomeres when they removed from the 96 well and plated onto a glass coverslip for highresolution microscopy. These assemble sarcomeres within 24 hours with the sarcomeres forming close to the dorsal membrane and then rearrange overtime (e.g., moving from the top of the cell to the bottom) (Fenix et al., eLife 2018). With that said, we do agree with the reviewer that a study of sarcomere assembly in the context of cardiac myocyte differentiation would be a fascinating direction for future studies, and we think sarcApp could facilitate such studies.

      6) The authors mentioned that the myofibrils of Z-line, titin, and M-line were randomly oriented after Blebbistatin treatments. The myofibrils were randomly oriented for titin and M-line. However, the orientation of Z-line after 50µM Blebbistatin treatment was not necessarily random, only the orientation after 100µM Blebbistatin treatment was randomized. The authors might consider changing bar graph to other types of charts if the orientation was really randomized after quantification.

      We find that the bar chart is the most informative to us, but users can consider other types of charts in their analyses.

      7) It is recommended that the authors include images staining ACTN2 at lower magnifications (Figure 1A, 1C). With current images, it is true that yoU-Net can separate Z-lines from Z-bodies yet it is difficult to tell if yoU-Net can still distinguish Z-lines from Z-bodies with larger images or it only applies to a small portion of the image.

      The yoU-Net can distinguish Z-Lines from Z-Bodies with images of any size, as image size (height vs. width in pixels) does not affect how binarization occurs. During binarization, the only pixel requirement is that the width and height are divisible by 8 (for downsampling purposes). Usually this is not the case with raw images, so the image borders are slightly cropped to make them usable. In terms of resolution, we recommend using 60X-100X objectives on confocal or superresolution data for the clearest results. We have, however, successfully binarized deconvolved widefield images at 100X as well.

      8) The authors mentioned that the knockdown of MYH7 did not affect Z-lines and M-lines; however, the structures of ACTN2, myomesin, and titin appeared more organized as compared to those in control.

      We agree that the sarcomeres and myofibrils look slightly more organized, and did mean to state that the knockdown did not negatively affect Z-Lines and M-Lines and have updated the manuscript to be more accurate.

      9) Please provide the merge images for Fig. 4D, 4E, 6B

      The merge images for Fig. 4D, 4E, and 6B are included with the original images requested above (point 3)

      10) In the text, they described" "antibodies to the titin I-band localize to both MSFs and sarcomeres in hiCMs (Figure 4A). Titin forms ring-like structures around the Z-Bodies of MSFs that are closer to the apparent sarcomere transition point (Figure 4A)" However, based on the antibody information they provided, it is not explicitly recognized for N-or C-terminus TITIN. Please provide TTN N-terminus or TTN-C terminus co-stainings with ACTN2 antibody to understand which part of TTN together with ACTN2 forms a Z-Body.

      The TTN antibody is an N-terminal antibody localizing to the I-Band region of sarcomeres. We agree with the reviewer that a more thorough study of titin will be of interest and we are currently undertaking such a study. However, this is a methods paper presenting a tool. While some of the data we present does point to mechanistic hypotheses, it is beyond the scope of this study to fully characterize titin during sarcomere assembly.

      11) TITIN doublet was used to indicate a sarcomere in Fig. 4C-D. Moreover, they also used another combination (myomesin and F-ACTIN) to label a sarcomere in Fig. 6D. Can they compare the difference between these two methods or by using these two methods (TITIN doublet) and (myomesin and F-ACTIN), how is the average length of sarcomere? Will the sarcomere length be the same?

      We noted in the manuscript that due to the organization of titin doublets (wrapping around the ends of Z-Lines) that the average titin doublet will be approximately 0.3 um longer than the ZLine. We did not expect to see a difference in lengths of myomesin M-Lines and mature actinin2 Z-Lines and indeed do not see major differences in the average lengths (between 2.0 and 2.5 um in 24 hour control cells)

      12) They used siRNA method to knockdown MYH6, MYH7 and MYOM and concluded that the knockdown of these genes did not affect the Z-line assembly. Even though they showed very nice knockdown efficiency of these proteins, they should (1) co-stain MYH6/TITIN/actinin2 and MYH6/ myomesin /actinin2 for Fig. 7C. (2) MYH7/TITIN/actinin2 and MYH7/ myomesin /actinin2 for Fig. 7I. (3) MYOM1/TITIN/actinin2 and MYOM2/TITIN/actinin2 for Fig. 8A. (4) MYH7/MYOM1 and MYH7/MYOM2 for Fig. 8H to make sure the cells they measured were truly knockdownpositive cells,

      The antibodies for alpha and beta myosin are not very efficient for immunofluorescence, and work best for western blots. We decided also to choose a random subset of the cells on the dish to be sure to eliminate any risk of cherry-picking. While imaging cells on the dish, we looked only at the DAPI nuclear channel and selected 50 cells minimum per dish with only this channel, then imaged the other channels.

      Minor comments:

      1) Well-organized sarcomere structure on DMSO treated cells in Fig.5A and Fig. 6A, but it was disarray in Fig. S3M. Why?

      Figure S3 shows hiCMs that have only been allowed to spread for 6 hours, which have not formed mature sarcomeres yet, hence the disarray.

      2) Fig 1A, Fig2B: please label the name of the antibody, not the actin filament

      We used phalloidin labelling here, which marks actin filaments. We have updated the figure legends to be more clear. Thank you!

      3) Fig. 7I: actinin2 instead of actinin

      Thank you for catching this! We have fixed it.

      Reviewer #3 (Recommendations For The Authors):

      Testing the app using images shot by other microscopy systems, magnifications, and cardiomyocytes from other species, as noted in the public review above, should make the app even more wildly useful.

      A more formal head-to-head comparison with other approaches will be more convincing in showing the new tool is superior

      I also think that a more detailed protocol for using the app will help other investigators.

      The app counts and measures many features, but it is not always clear how and using what algorithm these are measured. Including these details in a protocol or even as comments in the code will be very helpful for others.

      The protocol found on the public GitHub for the app will help other investigators to download, use, and understand the application. We have received contact from researchers who have been able to use the application without assistance from us, which is a good sign that the application is user-friendly and that the online protocol is sufficient.

    1. Author response:

      The following is the authors’ response to the original reviews.

      The reviewers praised multiple aspects of our study. Reviewer 1 noted that “the work aligns well with current research trends and will greatly interest researchers in the field.” Reviewer 2 highlighted the unique capability of our imaging approach, which “allows for investigation of the heterogeneity of response across individual dopamine axons, unlike other common approaches such as fiber photometry.” Reviewer 3 commented that “the experiments are beautifully executed” and “are revealing novel information about how aversive and rewarding stimuli is encoded at the level of individual axons, in a way that has not been done before.”

      In addition to the positive feedback, the reviewers also provided useful criticisms and suggestions, some of which may not be fully addressed in a single study. For instance, questions regarding whether dopamine axons encode the valence or specific identity of the stimuli, or the most salient aspects of the environment, remain open. At the same time, as all the reviewers agreed, our report on the diversity of dopamine axonal responses using a novel imaging design introduces significant new insights to the neuroscience community. Following the reviewers’ recommendations, we have refrained from making interpretations that could be perceived as overinterpretation, such as concluding that “dopamine axons are involved in aversive processing.” This has necessitated extensive revisions, including modifying the title of our manuscript to make clear that the novelty of our work is revealing ‘functional diversity’ using our new imaging approach.

      Below, we respond to the reviewers’ comments point by point.

      eLife assessment

      This valuable study shows that distinct midbrain dopaminergic axons in the medial prefrontal cortex respond to aversive and rewarding stimuli and suggest that they are biased toward aversive processing. The use of innovative microprism based two-photon calcium imaging to study single axon heterogeneity is solid, although the experimental design could be optimized to distinguish aversive valence from stimulus salience and identity in this dopamine projection. This work will be of interest to neuroscientists working on neuromodulatory systems, cortical function and decision making.

      Reviewer #1

      Summary:

      In this manuscript, Abe and colleagues employ in vivo 2-photon calcium imaging of dopaminergic axons in the mPFC. The study reveals that these axons primarily respond to unconditioned aversive stimuli (US) and enhance their responses to initially-neutral stimuli after classical association learning. The manuscript is well-structured and presents results clearly. The utilization of a refined prism-based imaging technique, though not entirely novel, is well-implemented. The study's significance lies in its contribution to the existing literature by offering single-axon resolution functional insights, supplementing prior bulk measurements of calcium or dopamine release. Given the current focus on neuromodulator neuron heterogeneity, the work aligns well with current research trends and will greatly interest researchers in the field.

      However, I would like to highlight that the authors could further enhance their manuscript by addressing study limitations more comprehensively and by providing essential details to ensure the reproducibility of their research. In light of this, I have a number of comments and suggestions that, if incorporated, would significantly contribute to the manuscript's value to the field.

      Strengths:

      • Descriptive.

      • Utilization of a well-optimized prism-based imaging method.

      • Provides valuable single-axon resolution functional observations, filling a gap in existing literature.

      • Timely contribution to the study of neuromodulator neuron heterogeneity.

      We thank the reviewer for this positive assessment.

      Weaknesses:

      (1) It's important to fully discuss the fact that the measurements were carried out only on superficial layers (30-100um), while major dopamine projections target deep layers of the mPFC as discussed in the cited literature (Vander Weele et al., 2018) and as illustrated in FigS1B,C. This limitation should be explicitly acknowledged and discussed in the manuscript, especially given the potential functional heterogeneity among dopamine neurons in different layers. This potential across-layer heterogeneity could also be the cause of discrepancy among past recording studies with different measurement modalities. Also, mentioning technical limitations would be informative. For example: how deep the authors can perform 2p-imaging through the prism? was the "30-100um" maximum depth the authors could get?

      Thank you for pointing out this important issue about layer differences.

      It is possible that the mesocortial pathway has layer-specific channels, with some neurons targeting supra granular layers and others targeting infragranular ones. Alternatively, it is also plausible that the axons of the same neurons branch into both superficial and deep layers. This is a critical issue that has not been investigated in anatomical studies and will require single-cell labeling of dopamine neurons (Matsuda et al 2009 and Aransay et al 2015). We now discuss this issue in the Discussion.

      As for the imaging depth of 30–100 m, we were unable to visualize deeper axons in a live view mode. Our imaging system has already been optimized to detect weak signals (e.g., we have employed an excitation wavelength of 980 nm, dispersion compensation, and a hybrid photodetector). It is possible that future studies using improved imaging approaches may be able to visualize deeper layers. Importantly, sparse axons in the supragranular layers are advantageous in detecting weak signals; dense labeling of axons would increase the background fluorescence relative to signals. We now reference this layer issue in the Results and Discussion sections.

      (2) In the introduction, it seems that the authors intended to refer to Poulin et al. 2018 regarding molecular/anatomical heterogeneity of dopamine neurons, but they inadvertently cited Poulin et al. 2016 (a general review on scRNAseq). Additionally, the statement that "dopamine neurons that project to the PFC show unique genetic profiles (line 85)" requires clarification, as Poulin et al. 2018 did not specifically establish this point. Instead, they found at least the Vglut2/Cck+ population projects into mPFC, and they did not reject the possibility of other subclasses projecting to mPFC. Rather, they observed denser innervation with DAT-cre, suggesting that non-Vglut2/Cck populations would also project to mPFC. Discuss the potential molecular heterogeneity among mPFC dopamine axons in light of the sampling limitation mentioned earlier.

      We thank the reviewer for pointing this out. Genetic profiles of PFC-projecting DA neurons are still being investigated, so describing them as “unique” was misleading. We have edited the Introduction accordingly, and now discuss this issue in detail in the Discussion.

      (3) I find the data presented in Figure 2 to be odd. Firstly, the latency of shock responses in the representative axons (right panels of G, H) is consistently very long - nearly 500ms. It raises a query whether this is a biological phenomenon or if it stems from a potential technical artifact, possibly arising from an issue in synchronization between the 2-photon imaging and stimulus presentation. My reservations are compounded by the notable absence of comprehensive information concerning the synchronization of the experimental system in the method section.

      The synchronization of the stimulus and data acquisition is accomplished at a sub-millisecond resolution. We use a custom-made MATLAB program that sends TTL commands to standard imaging software (ThorImage or ScanImage) and a stimulator for electrical shocks. All events are recorded as analogue inputs to a different DAQ to ensure synchronization. We have provided additional details regarding the configuration in the Methods section.

      We consider that the long latency of shock response is biological. For instance, a similar long latency was found after electrical shock in a photometry imaging study (Kim, …, Deisseroth, 2016).

      Secondly, there appear to be irregularities in Panel J. While the authors indicate that "Significant axons were classified as either reward-preferring (cyan) or aversive-preferring (magenta), based on whether the axons are above or below the unity line of the reward/aversive scatter plot (Line 566)," a cyan dot slightly but clearly deviates above the unity line (around coordinates (x, y) = (20, 21)). This needs clarification. Lastly, when categorizing axons for analysis of conditioning data in Fig3 (not Fig2), the authors stated "The color-coded classification (cyan/magenta) was based on k-means clustering, using the responses before classical conditioning (Figure 2J)". I do not understand why the authors used different classification methods for two almost identical datasets.

      We thank the reviewer for pointing out these insufficient descriptions. We classified the axons using k-means clustering, and the separation of the two clusters happened to roughly coincide with the unity line of the reward/aversive scatter plot in Fig 2J. In other words, we did not use the unity line to classify the data points (which is why the color separation of the histogram is not at 45 degrees). We have clarified this point in the Methods section.

      (4) In connection with Point 3, conducting separate statistical analyses for aversive and rewarding stimuli would offer a fairer approach. This could potentially reveal a subset of axons that display responses to both aversive and appetitive stimuli, aligning more accurately with the true underlying dynamics. Moreover, the characterization of Figure 2J as a bimodal distribution while disregarding the presence of axons responsive to both aversive and appetitive cues seems somewhat arbitrary and circular logic. A more inclusive consideration of this dual-responsive population could contribute to a more comprehensive interpretation.

      We also attempted k-means clustering with additional dimensions (e.g., temporal domains as shown in Fig. 3I, J), but no additional clusters were evident. We note that the lack of other clusters does not exclude the possibility of their existence, which may only become apparent with a substantial increase in the number of samples. In the current report, we present the clusters that were the easiest/simplest for us to identify.

      Additionally, we have revised our manuscript to reflect that many axons respond to both reward and aversive stimuli, and that aversive-preferring axons do not exclusively respond to the aversive stimulus.

      (5) The contrast in initialization to novel cues between aversive and appetitive axons mirrors findings in other areas, such as the tail-of-striatum (TS) and ventral striatum (VS) projecting dopamine neurons (Menegas et al., 2017, not 2018). You might consider citing this very relevant study and discussing potential collateral projections between mPFC and TS or VS.

      Thank you for pointing this out. We have now included Menegas et al., 2017, and also discuss the possibility of collaterals to these areas. In addition, we also referred to Azcorra et al., 2023 - this was published after our initial submission.

      (6) The use of correlation values (here >0.65) to group ROIs into axons is common but should be justified based on axon density in the FOV and imaging quality. It's important to present the distribution of correlation values and demonstrate the consistency of results with varying cut-off values. Also, provide insights into the reliability of aversive/appetitive classifications for individual ROIs with high correlations. Importantly, if you do the statistical testing and aversive/appetitive classifications for individual ROIs with above-threshold high correlation (to be grouped into the same axon), do they always fall into the same category? How many false positives/false negatives are observed?


      "Our results remained similar for different correlation threshold values (Line 556)" (data not shown) is obsolete.

      We have conducted additional analysis using correlation values 0.5 and 0.3 that resulted in a smaller number of axon terminals. In essence, the relationship between reward responses and aversive responses remained very similar to Fig. 2J, K.

      Author response image 1.

      Reviewer #2 (Public Review):

      Summary:

      This study aims to address existing differences in the literature regarding the extent of reward versus aversive dopamine signaling in the prefrontal cortex. To do so, the authors chose to present mice with both a reward and an aversive stimulus during different trials each day. The authors used high spatial resolution two-photon calcium imaging of individual dopaminergic axons in the medial PFC to characterize the response of these axons to determine the selectivity of responses in unique axons. They also paired the reward (water) and an aversive stimulus (tail shock) with auditory tones and recorded across 12 days of associative learning.

      The authors find that some axons respond to both reward and aversive unconditioned stimuli, but overall, there is a strong preference to respond to aversive stimuli consistent with expectations from prior studies that used other recording methods. The authors find that both of their two auditory stimuli initially drive responses in axons, but that with training axons develop more selective responses for the shock associated tone indicating that associative learning led to changes in these axon's responses. Finally, the authors use anticipatory behaviors during the conditioned stimuli and facial expressions to determine stimulus discrimination and relate dopamine axons signals with this behavioral evidence of discrimination. This study takes advantage of cutting-edge imaging approaches to resolve the extent to which dopamine axons in PFC respond appetitive or aversive stimuli. They conclude that there is a strong bias to respond to the aversive tail shock in most axons and weaker more sparse representation of water reward.

      Strengths:

      The strength of this study is the imaging approach that allows for investigation of the heterogeneity of response across individual dopamine axons, unlike other common approaches such as fiber photometry which provide a measure of the average population activity. The use of appetitive and aversive stimuli to probe responses across individual axons is another strength.

      We thank the reviewer for this positive assessment.

      Weaknesses:

      A weakness of this study is the design of the associative conditioning paradigm. The use of only a single reward and single aversive stimulus makes it difficult to know whether these results are specific to the valence of the stimuli versus the specific identity of the stimuli. Further, the reward presentations are more numerous than the aversive trials making it unclear how much novelty and habituation account for results. Moreover, the training seems somewhat limited by the low number of trials and did not result in strong associative conditioning. The lack of omission responses reported may reflect weak associative conditioning. Finally, the study provides a small advance in our understanding of dopamine signaling in the PFC and lacks evidence for if and what might be the consequence of these axonal responses on PFC dopamine concentrations and PFC neuron activity.

      We thank the reviewer for the suggestions.

      We agree that interpreting the response change during classical conditioning is not straightforward. Although the reward and aversive stimuli we employed are commonly used in the field, future studies with more sophisticated paradigms will be necessary to address whether dopamine axons encode the valence of the stimuli, the specific identity of the stimuli, or novelty and habituation. In our current manuscript, we refrain from making a conclusion that distinct groups of neurons encode different valances. In fact, many axons respond to both stimuli, at different ratios. We have removed descriptions that may suggest exclusive coding of reward or aversive processing. Additionally, we have extensively discussed possible interpretations.

      In terms of the strength of the conditioning association, behavioral results indicated that the learning plateaued – anticipatory behaviors did not increase during the last two phases when the conditioned span was divided into six phases (Figure 3–figure supplement 1).

      Our goal in the current manuscript is to provide new insight into the functional diversity of dopamine axons in the mPFC. Investigating the impact of dopamine axons on local dopamine concentration and neural activity in the mPFC is important but falls beyond the scope of our current study. In particular, given the functional diversity of dopamine axons, interpreting bulk optogenetic or chemogenetic axonal manipulation experiments would not be straightforward. As suggested, measuring the dopamine concentration through two-photon imaging of dopamine sensors and monitoring the activity of dopamine recipient neurons (e.g., D1R- or D2R-expressing neurons) is a promising approach that we plan to undertake in the near future.

      Reviewer #3 (Public Review):

      Summary:

      The authors image dopamine axons in medial prefrontal cortex (mPFC) using microprism-mediated two-photon calcium imaging. They image these axons as mice learn that two auditory cues predict two distinct outcomes, tailshock or water delivery. They find that some axons show a preference for encoding of the shock and some show a preference for encoding of water. The authors report a greater number of dopamine axons in mPFC that respond to shock. Across time, the shock-preferring axons begin to respond preferentially to the cue predicting shock, while there is a less pronounced increase in the water-responsive axons that acquire a response to the water-predictive cue (these axons also increase non-significantly to the shock-predictive cue). These data lead the authors to argue that dopamine axons in mPFC preferentially encode aversive stimuli.

      Strengths:

      The experiments are beautifully executed and the authors have mastered an impressively complex technique. Specifically, they are able to image and track individual dopamine axons in mPFC across days of learning. This technique is used the way it should be: the authors isolate distinct dopamine axons in mPFC and characterize their encoding preferences and how this evolves across learning of cue-shock and cue-water contingencies. Thus, these experiments are revealing novel information about how aversive and rewarding stimuli is encoded at the level of individual axons, in a way that has not been done before. This is timely and important.

      We thank the reviewer for this positive assessment.

      Weaknesses:

      The overarching conclusion of the paper is that dopamine axons preferentially encode aversive stimuli. This is prevalent in the title, abstract, and throughout the manuscript. This is fundamentally confounded. As the authors point out themselves, the axonal response to stimuli is sensitive to outcome magnitude (Supp Fig 3). That is, if you increase the magnitude of water or shock that is delivered, you increase the change in fluorescence that is seen in the axons. Unsurprisingly, the change in fluorescence that is seen to shock is considerably higher than water reward.

      We agree that the interpretation of our results is not straightforward. Our current manuscript now focuses on our strength, which is reporting the functional diversity of dopamine axons. Therefore, we avoid using the word ‘encode’ when describing the response.

      We believe that our results could reconcile the apparent discrepancy as to why some previous studies reported only aversive responses while others reported reward responses. In particular, if the reward volume were very small, the reward response could go undetected.

      Further, when the mice are first given unexpected water delivery and have not yet experienced the aversive stimuli, over 40% of the axons respond [yet just a few lines below the authors write: "Previous studies have demonstrated that the overall dopamine release at the mPFC or the summed activity of mPFC dopamine axons exhibits a strong response to aversive stimuli (e.g., tail shock), but little to rewards", which seems inconsistent with their own data].

      We always recorded the reward and aversive response together, which might have confused the reviewer. Therefore, there is no inconsistency in our data. We have clarified our methods and reasoning accordingly.

      Given these aspects of the data, it could be the case that the dopamine axons in mPFC encodes different types of information and delegates preferential processing to the most salient outcome across time.

      This is certainly an exciting interpretation, so we have included it in our discussion. Meanwhile, ‘the most salient outcome’ alone cannot fully capture the diverse response patterns of the dopaminergic axons, particularly reward-preferring axons. We discuss our findings in more detail in the revised manuscript.

      The use of two similar sounding tones (9Khz and 12KHz) for the reward and aversive predicting cues are likely to enhance this as it requires a fine-grained distinction between the two cues in order to learn effectively. There is considerable literature on mPFC function across species that would support such a view. Specifically, theories of mPFC function (in particular prelimbic cortex, which is where the axon images are mostly taken) generally center around resolution of conflict in what to respond, learn about, and attend to. That is, mPFC is important for devoting the most resources (learning, behavior) to the most relevant outcomes in the environment. This data then, provides a mechanism for this to occur in mPFC. That is, dopamine axons signal to the mPFC the most salient aspects of the environment, which should be preferentially learned about and responded towards. This is also consistent with the absence of a negative prediction error during omission: the dopamine axons show increases in responses during receipt of unexpected outcomes, but do not encode negative errors. This supports a role for this projection in helping to allocate resources to the most salient outcomes and their predictors, and not learning per se. Below are a just few references from the rich literature on mPFC function (some consider rodent mPFC analogous to DLPFC, some mPFC), which advocate for a role in this region in allocating attention and cognitive resources to most relevant stimuli, and do not indicate preferential processing of aversive stimuli.

      Distinguishing between 9 kHz and 12 kHz sound tones may not be that difficult, considering anticipatory licking and running are differentially manifested. In addition, previous studies have shown that mice can distinguish between two sound tones when they are separated by 7% (de Hoz and Nelken 2014). Nonetheless, we agree with the attractive interpretation that “the mPFC devotes the most resources (learning, behavior) to the most relevant outcomes in the environment” and that dopamine is a mechanism for this. Therefore, we discuss this interpretation in the revised text.

      References:

      (1) Miller, E. K., & Cohen, J. D. (2001). An integrative theory of prefrontal cortex function. Annual review of neuroscience, 24(1), 167-202.

      (2) Bissonette, G. B., Powell, E. M., & Roesch, M. R. (2013). Neural structures underlying set-shifting: roles of medial prefrontal cortex and anterior cingulate cortex. Behavioural brain research, 250, 91101.

      (3) Desimone, R., & Duncan, J. (1995). Neural mechanisms of selective visual attention. Annual review of neuroscience, 18(1), 193-222.

      (4) Sharpe, M. J., Stalnaker, T., Schuck, N. W., Killcross, S., Schoenbaum, G., & Niv, Y. (2019). An integrated model of action selection: distinct modes of cortical control of striatal decision making. Annual review of psychology, 70, 53-76.

      (5) Ridderinkhof, K. R., Ullsperger, M., Crone, E. A., & Nieuwenhuis, S. (2004). The role of the medial frontal cortex in cognitive control. science, 306(5695), 443-447.

      (6) Nee, D. E., Kastner, S., & Brown, J. W. (2011). Functional heterogeneity of conflict, error, taskswitching, and unexpectedness effects within medial prefrontal cortex. Neuroimage, 54(1), 528-540.

      (7) Isoda, M., & Hikosaka, O. (2007). Switching from automatic to controlled action by monkey medial frontal cortex. Nature neuroscience, 10(2), 240-248.

      Reviewer #1 (Recommendations For The Authors):

      Specific Suggestions and Questions on the Methods Section:

      In general, the methods part is not well documented and sometimes confusing. Thus, as it stands, it hinders reproducible research. Specific suggestions/questions are listed in the following section.

      (1) Broussard et al. 2018 introduced axon-GCaMP6 instead of axon-jGCaMP8m. The authors should provide details about the source of this material. If it was custom-made, a description of the subcloning process would be appreciated. Additionally, consider depositing sequence information or preferably the plasmid itself. Furthermore, the introduction of the jGCaMP8 series by Zhang, Rozsa, et al. 2023 should be acknowledged and referenced in your manuscript.

      We thank the reviewer for pointing this out. We have now included details on how we prepared the axon-jGCaMP8m, which was based on plasmids available at Addgene. Additionally, we have deposited our construct to Addgene ( https://www.addgene.org/216533/ ). We have also cited Janelia’s report on jGCaMP8, Zhang et al.

      (2) The authors elaborate on the approach taken for experimental synchronization. Specifically, how was the alignment achieved between 2-photon imaging, treadmill recordings, aversive/appetitive stimuli, and videography? It would be important to document the details of the software and hardware components employed for generating TTLs that trigger the pump, stimulator, cameras, etc.

      We have now included a more detailed explanation about the timing control. We utilize a custommade MATLAB program that sends TTL square waves and analogue waves via a single National Instruments board (USB-6229) to control two-photon image acquisition, behavior camera image acquisition, water syringe movement, current flow from a stimulator, and sound presentation. We also continuously recorded at 30 kHz via a separate National Instrument board (PCIe-6363) the frame timing of two-photon imaging, the frame timing of a behavior camera, copies of command waves (sent to the syringe pump, the stimulator, and the speaker), and signals from the treadmill corresponding to running speed.

      (3) The information regarding the cameras utilized in the study presents some confusion. In one instance, you mention, "To monitor licking behavior, the face of each mouse was filmed with a camera at 60 Hz (CM3-U3-13Y3M-CS, FLIR)" (Line 488). However, there's also a reference to filming facial expressions using an infrared web camera (Line 613). Could you clarify whether the FLIR camera (which is an industrial CMOS not a webcam) is referred to as a webcam? Alternatively, if it's a different camera being discussed, please provide product details, including pixel numbers and frame rate for clarity.

      We thank the reviewer for pointing this out. This was a mistake on our end. The camera used in the current project was a CM3-U3-13Y3M-CS, not a web camera. We have now corrected this.

      (4) Please provide more information about the methodology employed for lick detection. Specifically, did the authors solely rely on videography for this purpose? If so, why was an electrical (or capacitive) detector not used? It would provide greater accuracy in detecting licking.

      Lick detection was performed offline based on videography, using DeepLabCut. As licking occurs at a frequency of ~6.5 Hz (Xu, …, O’Connor Nature Neurosci, 2022), the movement can be detected at a frame rate of 60 Hz. Initially, we used both a lick sensor and videography. However, we favored videography because it could potentially provide non-binary information.

      Other Minor Points:

      (5) Ensure consistency in the citation format; both Vander Weele et al. 2018 and Weele et al. 2019, share the same first author.

      Thank you for pointing this out. Endnote processes the first author’s name differently depending on the journal. We fixed the error manually. The first paper (2018) is an original research paper, and the second one (2019) is a review about how dopamine modulates aversive processing in the mPFC. We cited the second one in three instances where we mentioned review papers.

      (6) The distinction between "dashed vs dotted lines" in Figure 3K and 3M appears to be very confusing. Please consider providing a clearer visualization/labeling to mitigate this confusion.

      We have now changed the line styles.

      (7) Additionally plotting mean polar angles of aversive/appetitive axons as vectors in the Cartesian scatter plots (2J, 3I,J) would make interpretation easier.

      We have now made this change to Figures 2, 3, 4.

      (8) Data and codes should be shared in a public database. This is important for reproducible research and we believe that "available from the corresponding author upon reasonable request" is outdated language.

      We have uploaded the data to GitHub, https://github.com/pharmedku/2024-elife-da-axon.

      Reviewer #2 (Recommendations For The Authors):

      (1) Authors don't show which mouse each axon data comes from making it hard to know if differences arise from inter-mouse differences vs differences in axons. The best way to address this point is to show similar plots as Figure 2J & K but broken down by mouse to shows whether each mouse had evidence of these two clusters.

      We have now made this change to Figure 2-figure supplement 3.

      (2) Line 166: Should this sentence point to panels 2F, G, H rather than 2I which doesn't show a shock response?

      We thank the reviewer for pointing this out. We have fixed the incorrect labels.

      Line 195: The population level bias to aversive stimuli was shown previously using photometry so it is not justified to say "for the first time" regarding this statement.

      We have adjusted this sentences so the claim of ”for the first time” is not associated with the population-level bias.

      (4) The paper lacks a discussion of the potential role that novelty plays in the amplitude of the responses given that tail shocks occur less often that rewards. Is the amplitude of the first reward of the day larger than subsequent rewards? Would tail shock responses decay if they occurred in sequential trials?

      Following the reviewer's suggestion, we conducted a comparison of individual axonal responses to both conditioned and unconditioned stimuli across the first trial and subsequent trials. Our findings reveal a notable trend: aversive-preferring axons exhibited attenuation in response to CSreward, yet enhancement in response to CSaversive. Conversely, the response of these axons to USreward was attenuated, with no significant change observed for USaversive. In contrast, reward-preferring axons displayed an invariable activity pattern from the initial trial, highlighting the functional diversity present within dopamine axons. This analysis has been integrated into Figure 3-figure supplement 4 and is elaborated upon in the Discussion section.

      (5) Fix typo in Figure 1 - supplement 1. Shift

      We have now corrected this. Thank you.

      (6) The methods section needs information about trial numbers. Please indicate how many trials were presented to each mouse per day.

      We have now added the information about trial numbers to the Methods section.

      Reviewer #3 (Recommendations For The Authors):

      In line with the public review, my recommendation is for the authors to remain as objective about their data as possible. There are many points in the manuscript where the authors seem to directly contradict their own data. For example, they first detail that dopamine axons respond to unexpected water rewards. Indeed, they find that there are 40% of dopamine axons that respond in this way. Then, a few paragraphs later they state: "Previous studies have demonstrated that the overall dopamine release at the mPFC or the summed activity of mPFC dopamine axons exhibits a strong response to aversive stimuli (e.g., tail shock), but little to rewards". As detailed above, I do not think these data support an idea that dopamine axons in mPFC preferentially encode aversive outcomes. If the authors wanted to examine a role for mPFC in preferential encoding of aversive stimuli, you would first have to equate the outcomes by magnitude and then compare how the axons acquire preferences across time. Alternatively, a prediction of a more general process that I detail above would predict that you could give mice two rewards that differ in magnitude (e.g., lots of food vs. small water) and you would see the same results that the authors have seen here (i.e., a preference for the food, which is the larger and more salient outcome). Without other tests of how dopamine axons in mPFC respond to situations like this, I don't think any conclusion around mPFC in favoring aversive stimuli can be made.

      As suggested, we have made the current manuscript as objective as possible, removing interpretation aspects regarding what dopamine axons encode and emphasizing their functional diversity. In particular, we remove the word ‘encode’ when describing the response of dopamine axons.

      Although it may have appeared unclear, there was no contradiction within our data regarding the response to reward and aversive stimuli. We have now improved the readability of the Results and Methods sections. Concerning the interpretation of what exactly the mPFC dopamine axons encode, we have rewritten the discussion to be as objective about our data as possible, as suggested. We also have edited our title and abstract accordingly. Meanwhile, we wish to emphasize that our reward and aversive stimuli are standard paradigms commonly used in the field. We believe, and all the reviewers agreed, that reporting the diversity of dopamine axonal responses with a novel imaging design constitutes new insight for the neuroscience community. Therefore, we have decided to leave the introduction of new behavioral tasks for future studies and instead expanded our discussion.

      As mentioned, I think the experiments are executed really well and the technological aspects of the authors' methods are impressive. However, there are also some aspects of the data presentation that would be improved. Some of the graphs took a considerable amount of effort to unpack. For example, Figure 4 is hard going. Is there a way to better illustrate the main points that this figure wants to convey? Some of this might be helped by a more complete description in the figure captions about what the data are showing. It would also be great to see how the response of dopamine axons changes across trial within a session to the shock and water-predictive cues. Supp Figure 1 should be in the main text with standard error and analyses across time. Clarifying these aspects of the data would make the paper more relevant and accessible to the field.

      We thank the reviewer for pointing out that the legend of Figure 4 was incomplete. We have fixed it, along with improving the presentation of the figure. We have also prepared a new figure (Figure 3– figure supplement 4) to compare CSaversive and CSreward signals for the first and rest of the trials within daily sessions, revealing further functional diversity in dopamine axons. We have decided to keep Figure 1–figure supplement 2 as a figure supplement with an additional analysis, as another reviewer pointed out that the design is not completely new. Furthermore, as eLife readers can easily access figure supplements, we believe it is appropriate to maintain it in this way.

      Minor points:

      (1) What is the control period for the omission test? Was omission conducted for the shock?

      The control period for reward omission is a 2-second period just before the CS onset. We did not include shock omission, because a sufficient number of trials (> 6 trials) for the rare omission condition could not be achieved within a single day.

      (2) The authors should mention how similar the tones were that predicted water and shock.

      According to de Hoz and Nelken (2014), a frequency difference of 4–7% is enough for mice to discriminate between tones. In addition, anticipatory licking and running confirmed that the mice could discriminate between the frequencies. We have now included this information in the Discussion.

      (3) I realize the viral approach used in the current studies may not allow for an idea of where in VTA dopamine neurons are that project to mPFC- is there data in the literature that speak to this? Particularly important as we now know that there is considerable heterogeneity in dopamine neuronal responses, which is often captured by differences in medial/lateral position within VTA.

      Some studies have suggested that mesocortical dopamine neurons are located in the medial posterior VTA (e.g., Lammel et al., 2008). However, in mouse anterograde tracing, it is not possible to spatially confine the injection of conventional viruses/tracers. We now refer to Lammel et al., 2008 in the Introduction.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      HP1 plays a pivotal role in orchestrating chromatin packaging through the creation of biomolecular condensates. The existence of distinct homologs offers an intriguing avenue for investigating the interplay between genetic sequence and condensate formation. In this study, the authors conducted extensive coarse-grained simulations to delve into the phase separation behavior of HP1 paralogs. Additionally, the researchers delved into the captivating possibility of various HP1 paralogs co-localizing within assemblies composed of multiple components. Importantly, the study also delved into the critical role of DNA in finely tuning this complex process.

      Strengths:

      I applaud the authors for their methodical approach in conducting simulations aimed at dissecting the contributions of hinges, CTE, NTE, and folded regions. The comprehensive insights unveiled in Figure 3 compellingly substantiate the significance of these protein components in facilitating the process of phase separation.

      This systematic exploration has yielded several innovative revelations. Notably, the authors uncovered a nuanced interplay between the folded and disordered domains. Although disordered regions have traditionally been linked to driving phase separation through their capacity for forming multivalent interactions, the authors have demonstrated that the contribution of the CD cannot be overlooked, as it significantly impacts the saturation concentration.

      The outcomes of this study serve to elucidate the intricate mechanisms and regulatory aspects governing HP1 LLPS.

      Weaknesses:

      The authors do not provide an assessment of the quantitative precision of their model. To illustrate, HP1a is anticipated to undergo phase separation primarily under low salt concentrations. Does the model effectively capture this sensitivity to salt conditions? Regrettably, the specific salt conditions employed in the simulations are not explicitly stated. While I anticipate that numerous findings in the manuscript remain valid, it could be beneficial to acknowledge potential limitations tied to the simulations. For instance, might the absence of quantitative precision impact certain predictions, such as the CD's influence on phase separation?

      We thank the reviewer for their kind feedback and for highlighting the essential mechanistic insights obtained from our study. We have addressed the concerns raised by the reviewer below, and the specific amendments made in the manuscript are also delineated.

      We appreciate the reviewer's comment on our model. Our coarse-grained (CG) physics-based model integrates electrostatic and short-range interactions, parametrized based on the Urry hydrophobicity scale. This approach effectively bridges the timescale gap between simulation and experiment, offering a transferable framework to compute protein phase diagrams in temperature-concentration space that can be compared to experimental phase behavior (1). Additionally, the vdW contact probability per residue correlation between AA and CG simulations (Fig. S1 f-h) underscores our model’s capability to uncover the mechanistic insights into the phase separation of HP1 paralogs. Despite its simplicity and widespread adoption for studying sequence-dependent phase separation in biomolecular condensates, we recognize that our CG model does not yet fully replicate experimental observations or the nuanced effects of local secondary structures on phase-separation propensities. We are actively refining our methods and exploring new strategies to enhance the accuracy and efficiency of CG models for the study of biological phase separation.

      In assessing the influence of salt on the LLPS of HP1α, we note that Wang et al. (2) demonstrated that HP1α can undergo LLPS at a low salt concentration (50 mM KCl). Furthermore, Wohl et al. (3) showed that the CG HPS (Kapcha-Rossky) model can capture the salt-dependent LLPS behavior through the electrostatic screening in HP1a, a Drosophila homolog of human HP1α. In our CG model, the salt concentration is captured by the DebyeHuckle term with tunable screening lengths, which allows for the simulations of salt-dependent effects in the low salt regime. We have added Figure S5 to illustrate the influence of salt on the LLPS propensity of HP1α. In the low-salt regime (50 mM), the Csat of HP1α was reduced by twofold compared to that at 100 mM. Increasing the salt concentration to 150 mM raised the Csat and started destabilizing the condensate. In the high salt regime (200500 mM), HP1α did not undergo phase separation, consistent with the experimental observations (2, 4–6).

      Author response image 1.

      Salt-dependent effects on the LLPS of HP1α homodimer. (a, b) Density profiles and snapshots of HP1α homodimer simulation with the box dimensions of 170x170x1190 Å3 at differing salt concentrations, 50, 100, 150, 200, 250, and 500 mM, respectively. The simulations were conducted at 320 K using the HPS-Urry model.

      However, the primary objectives of our study are to elucidate the molecular interactions and to delineate the domain contributions that dictate the distinct phase-separation behaviors of the HP1 paralogs. To this end, we standardized our simulation conditions to a physiological salt concentration of 100 mM for all paralog constructs, facilitating a direct comparison and enabling physiologically relevant predictions, including those for the CD domain. We have added the salt concentration used in the CG simulations in the Materials and Methods section, relevant figure captions, and the following sentence in the third paragraph of the Discussions section to improve clarity.

      “…Our CG simulations corroborate these experimental observations, indicating that a low salt concentration (50 mM) promotes the LLPS of HP1α. Raising the salt concentration weakens the electrostatic interactions and increases the Csat, eventually precluding HP1α’s phase separation at high salt regimes (200-500 mM) (Fig. S5).”

      Reviewer #2 (Public Review):

      In this paper, Phan et al. investigate the properties of human HP1 paralogs, their interactions and abilities to undergo liquid-liquid phase separation. For this, they use a coarse-grained computational approach (validated with additional all-atom simulations) which allows to explore complex mixtures. Matching (wet-lab) experimental results, HP1 beta (HP1b) exhibits different properties from HP1 alpha and gamma (HP1a,g), in that it does not phase separate. Using domain switch experiments, the authors determine that the more negatively charged hinge in HP1b, compared to HP1a and HP1g, is mainly responsible for this effect. Exploring heterotypic complexes, mixtures between HP1 subtypes and DNA, the authors further show that HP1a can serve as a scaffold for HP1b to enter into condensed phases and that DNA can further stabilize phase separated compartments. Most interestingly, they show that a multicomponent mixture containing DNA, and HP1a and HP1b generates spatial separation between the HP1 paralogs: due to increased negative charge of DNA within the condensates, HP1b is pushed out and accumulates at the phase boundary. This represents an example how complex assemblies could form in the cell.

      Overall, this is purely computational work, which however builds on extensive experimental results (including from the authors). The methods showcase how coarse-grained models can be employed to generate and test hypotheses how proteins can condense. Applied to HP1 proteins, the results from this tour-de-force study are consistent and convincing, within the experimental constraints. Moreover, they generate further models to test experimentally, in particular in light of multicomponent mixtures.

      There are, of course, some limitations to these models.

      First, the CG models employed probably will not be able to pick up more complex structure-driven interactions (i.e. specific binding of a peptide in a protein cleft, including defined H-bonds, or induced structural elements). Some of those interactions (i.e. beyond charge-charge or hydrophobics) may also play a role in HP1, and might be ignored here. There is also the question of specificity, i.e. how can diverse phases coexist in cells, when the only parameters are charge and hydrophobicity? Does the arrangement of charges in the NTD, hinges and CTDs matter or are only the average properties important?

      We thank the reviewer for the thoughtful comments. We also appreciate the opportunity to incorporate the feedback on the reviewer’s concerns below.

      We agree that the interaction picture becomes more sophisticated, and many interaction modes may be involved in the phase coexistence in the cell environment. However, due to system sizes and required sampling, studying LLPS at an atomistic resolution remains challenging with the current state-of-the-art computer hardware. Our approach employs the CG model to reduce the computational cost but still capture the predominant interactions at the residue level. We have added the plots (Fig. S1 f-h) to show the correlation of the vdW contact probability per residue for each paralog between AA and CG simulation. The Pearson correlation coefficient is approximately 0.86, suggesting a strong positive linear correlation in the contact propensity between AA and CG simulations.

      Author response image 2.

      Our sequence analysis reveals a high fraction of charged residues in HP1 paralogs, with Arg, Lys, Glu, and Asp constituting 39-45% of the total amino acid count in the sequence. This property may explain why the electrostatic interactions are predominantly involved in the phase-separation behaviors of HP1 paralogs. Our findings on electrostatically driven phase separation and co-localization of HP1 paralogs are consistent with experimental observations by Larson et al. and Keenen et al. (5, 6). Significantly, we observe that the charge patterning in the disordered regions (NTE, hinge, and CTE) plays a critical role in the LLPS of HP1 paralogs, as articulated in the second paragraph of the Discussions section. Modifying this charge patterning, such as by phosphorylating serine residues in HP1α, excising the HP1α CTE, or substituting four acidic residues with basic ones in the HP1β hinge, can profoundly augment the LLPS of these proteins (4, 5, 7). Our in silico molecular details, complemented by in vitro observations, lay a solid foundation for future experiments. These future investigations may delve deeper into the specificity of interactions and the role of structural elements in modulating HP1 phase separation.

      Second, the authors fix CSD-CSD dimers, whereas these interactions are expected to be quite dynamic. In the particular example of HP1 proteins, having dimerization equilibria may change the behavior of complex mixtures significantly, e.g. in view of the proposed accumulation of HP1b at a phase boundary. This point would warrant more discussion in the paper. Moreover, the biological plausibility of such a behavior would be interesting. Is there any experimental data supporting such assemblies?

      We appreciate the reviewer's insightful comment regarding the dynamic nature of CSD-CSD interactions in HP1 proteins. Our assumption of fixing CSD-CSD dimers is grounded on reported dissociation constant (Kd) values for HP1α and HP1β, which are within the nanomolar range, indicative of strong dimerization affinity (4, 8). While the precise Kd values for HP1γ are not available, a study has demonstrated that HP1γ dimerization is crucial for its interaction with chromatin, suggesting a similar strong dimerization tendency as its paralogs (9, 10). Furthermore, evidence from the literature underscores the dimeric functionality of HP1 paralogs facilitated by their ChromoShadow Domains (CSD), which are instrumental in forming stable genomic domains and engaging in crucial interactions within chromatin architecture (5, 6, 11).

      However, we acknowledge that despite the strong dimerization affinity, the CSD-CSD interactions exhibit dynamics, which may influence the behavior of complex mixtures, particularly at phase boundaries. A study by Nielsen et al. (12) shows that mammalian HP1 paralogs can interact directly with one another to form heterodimers. Moreover, the CSD-CSD interface has been shown to act as a hub for transient interactions with diverse binding partner proteins (5, 13). These experimental observations reflect the dynamic nature of CSD-CSD interactions. However, due to the computational constraints and the focus of our study, a simplified static model was employed to gain initial insights into the phase separation behaviors of HP1 paralogs. We believe that the dynamic nature of CSD-CSD interactions and its implications for phase behavior in complex mixtures form an exciting avenue for future computational and experimental studies.

      In light of the reviewer’s comment, we have expanded our discussion in the 6th paragraph of the Discussions Section:

      “... It is important to emphasize that our model is predicated on the assumption that HP1 proteins establish stable chromoshadow domain (CSD-CSD) dimers, a hypothesis supported by their Kd values being in the nanomolar range (13, 53). While this simplification serves as a useful starting point, it may not fully capture the dynamic nature of HP1 dimerization. Further computational and experimental studies are needed to understand better the behavior of the complex mixtures of HP1 paralogs, particularly at phase boundaries.”

      References: 1) R. M. Regy, J. Thompson, Y. C. Kim, J. Mittal, Improved coarse‐grained model for studying sequence dependent phase separation of disordered proteins. Protein Sci., doi: 10.1002/pro.4094 (2021).

      2) L. Wang, Y. Gao, X. Zheng, C. Liu, S. Dong, R. Li, G. Zhang, Y. Wei, H. Qu, Y. Li, C. D. Allis, G. Li, H. Li, P. Li, Histone Modifications Regulate Chromatin Compartmentalization by Contributing to a Phase Separation Mechanism. Mol. Cell 76, 646-659.e6 (2019).

      3) S. Wohl, M. Jakubowski, W. Zheng, Salt-Dependent Conformational Changes of Intrinsically Disordered Proteins. J. Phys. Chem. Lett. 12, 6684–6691 (2021).

      4) C. Her, T. M. Phan, N. Jovic, U. Kapoor, B. E. Ackermann, A. Rizuan, Y. C. Kim, J. Mittal, G. T. Debelouchina, Molecular interactions underlying the phase separation of HP1α: role of phosphorylation, ligand and nucleic acid binding. Nucleic Acids Res., gkac1194 (2022).

      5) A. G. Larson, D. Elnatan, M. M. Keenen, M. J. Trnka, J. B. Johnston, A. L. Burlingame, D. A. Agard, S. Redding, G. J. Narlikar, Liquid droplet formation by HP1α suggests a role for phase separation in heterochromatin. Nature 547, 236–240 (2017).

      6) M. M. Keenen, D. Brown, L. D. Brennan, R. Renger, H. Khoo, C. R. Carlson, B. Huang, S. W. Grill, G. J. Narlikar, S. Redding, HP1 proteins compact dna into mechanically and positionally stable phase separated domains. eLife 10, 1–38 (2021).

      7) W. Qin, A. Stengl, E. Ugur, S. Leidescher, J. Ryan, M. C. Cardoso, H. Leonhardt, HP1β carries an acidic linker domain and requires H3K9me3 for phase separation. Nucleus 12, 44–57 (2021).

      8) S. V. Brasher, The structure of mouse HP1 suggests a unique mode of single peptide recognition by the shadow chromo domain dimer. EMBO J. 19, 1587–1597 (2000).

      9) X. Li, S. Wang, Y. Xie, H. Jiang, J. Guo, Y. Wang, Z. Peng, M. Hu, M. Wang, J. Wang, Q. Li, Y. Wang, Z. Liu, Deacetylation induced nuclear condensation of HP1γ promotes multiple myeloma drug resistance. Nat. Commun. 14, 1290 (2023).

      10) Y. Mishima, C. D. Jayasinghe, K. Lu, J. Otani, M. Shirakawa, T. Kawakami, H. Kimura, H. Hojo, P. Carlton, S. Tajima, I. Suetake, Nucleosome compaction facilitates HP1γ binding to methylated H3K9. Nucleic Acids Res. 43, 10200–10212 (2015).

      11) D. O. Trembecka-Lucas, J. W. Dobrucki, A heterochromatin protein 1 (HP1) dimer and a proliferating cell nuclear antigen (PCNA) protein interact in vivo and are parts of a multiprotein complex involved in DNA replication and DNA repair. Cell Cycle 11, 2170–2175 (2012).

      12) A. L. Nielsen, M. Oulad-Abdelghani, J. A. Ortiz, E. Remboutsika, P. Chambon, R. Losson, Heterochromatin formation in mammalian cells: Interaction between histones and HP1 Proteins. Mol. Cell 7, 729–739 (2001).

      13) A. Thiru, D. Nietlispach, H. R. Mott, M. Okuwaki, D. Lyon, P. R. Nielsen, M. Hirshberg, A. Verreault, N. V. Murzina, E. D. Laue, Structural basis of HP1/PXVXL motif peptide interactions and HP1 localisation to heterochromatin. EMBO J. 23, 489–499 (2004).

      14) P. Yu Chew, J. A. Joseph, R. Collepardo-Guevara, A. Reinhardt, Thermodynamic origins of two-component multiphase condensates of proteins. Chem. Sci. 14, 1820–1836 (2023).

      Recommendations for the authors:

      In this important work, the authors apply a residue-resolution protein coarse-grained model to investigate the differences in molecule dimensions and phase behaviour of three HP1 paralogs, HP1 paralog mixtures, and HP1/DNA mixtures. The simulations are well designed to investigate the impact of HP1 sequence on its phase behaviour. The work reveals that electrostatic interactions are a key determinant of HP1 paralog phase behaviour; hence advancing our understanding of the molecular mechanisms driving the phase separation behaviour of HP1 paralogs. Notably, the authors uncovered a nuanced interplay between the folded and disordered domains of HP1. Although disordered regions have traditionally been linked to driving phase separation through their capacity for forming multivalent interactions, the authors demonstrate that the contribution of the CD cannot be overlooked, as it significantly impacts the saturation concentration.

      Essential revisions (based on reviewers assessment below):

      1) The manuscript describes the results of both single-molecule simulations and direct coexistence simulations. However, it is not very easy for the reader to determine which types simulations were performed in each section. The details on the simulations input parameters are also missing. Such details are needed throughout, i.e. to allow readers to follow the work and its implications. For instance, the specific salt conditions employed in the simulations are not explicitly stated. Since HP1 charge is presented as a key regulator for the modulation of HP1 paralogs radii of gyration and their phase behaviour, it is crucial for the authors to explicitly describe the salt concentration used for the different simulations and highlight how the relative differences observed are expected to change as the salt concentration decreases/increases.

      We have turned the first sentences in the paragraphs into subtitles to describe the results of single homodimers in dilute phase and multi-dimers in phase coexistence simulations.

      “Sequence variation affects the conformations of HP1 paralogs in the dilute phase.”

      “Sequence variation in HP1 paralogs leads to their distinct phase separation behaviors.”

      To improve the clarity, we have also added the following sentence to Fig. 2 caption.

      “… Figs. 2a-e show the results obtained under dilute conditions, while Figs. 2f-m illustrate the conditions of phase coexistence.”

      We have specified the salt concentration used in the CG simulations in the Materials and Methods section and the relevant figure captions to improve clarity. We also addressed the reviewer’s comment on salt concentration in the public review above.

      2) Since direct coexistence simulations suffer from important finite-size effects, especially for multi-component mixtures as those investigated here, describing how many proteins/DNA copies were used per system, the size of the simulation, and which checks were done to check for finite-size effects is important. Regarding this point, estimating C_sat from Direct Coexistence simulations is extremely challenging, given the sensitivity of the dilute phase concentration to the box dimensions. Hence, it would be valuable if the authors clarify that the differences on C_sat provided represent a qualitative comparison and are sensitive to the simulation conditions. Importantly, the observation of spatial segregation of components in multi-component condensates could be an artefact of the box dimensions, relative copies of the various components, and overall system density.

      We appreciate the reviewer’s concern regarding the finite-size effects in phase coexistence simulations and potential artifacts arising from box dimensions and system composition. In response to this, we have expanded the Materials and Methods section to elaborate on the specific checks to examine the finite-size effects. The new texts and additional SI figures are shown below.

      “Previous studies have demonstrated that slab geometry can help mitigate finite-size effects and facilitate efficient sampling of the phase diagram (41). To assess the potential impact of finite-size effects with our chosen box dimensions, we conducted a test using the HP1α homodimer, which serves as a representative system given the comparable sequence lengths of HP1 paralogs and their chimeras. By reducing the system size by 30% and constructing its phase diagram, we observed that both the original system size (50 dimers) and the reduced counterpart (35 dimers) produced similar phase diagrams, with critical temperatures of 353.3 K and 352.1 K, respectively, as shown in Figs. S4a,b.

      We further evaluated the influence of the xy cross-sectional area on the measurement of Csat. With the z-direction box length fixed at 1190 ų, we varied the xy cross-sectional areas (120x120, 150x150, and 200x200 Ų) while maintaining the protein density consistent with the control case (170x170 Ų). Given that HP1 dimers are multidomain proteins, a 120x120 Ų cross-section was the minimum size feasible to prevent particle overlap in HOOMD simulations due to the constraints of the small box size. Our findings indicate that the condensates remained stable across all tested cross-sectional areas and that there were no significant differences in Csat measurements within the margin of error, as depicted in Figs. S4c,d. These results confirm that our chosen box size is sufficiently large to minimize finite-size effects, thus ensuring the robustness of our results.”

      Author response image 3.

      Finite-size analysis. (a) Phase diagrams for the HP1α homodimer (50 dimers) and for a system reduced in size by 30% (35 dimers), with critical temperatures of 353.3 K and 352.1 K, respectively. (b) Density profiles of HP1α and its reduced size counterpart at various temperatures. (c, d) Density profiles and snapshots of HP1α homodimer simulation with box dimensions of 170x170x1190 Å3 and for systems with z-direction length fixed at 1190 Å and varying cross-sectional areas: 120x120, 150x150, and 200x200 Å2. The black dashed line shows the simulated saturation concentration of wildtype HP1α homodimer in the box dimensions of 170x170x1190 Å3. The simulations were conducted at 320 K and 100 mM salt concentrations. The error bars represent the standard deviation from triplicate simulation sets.

      In response to the observed spatial segregation in our multi-component condensates, we have carefully considered finite-size effects and are confident that the segregation reflects genuine phase behavior rather than an artifact of simulation parameters. This interpretation is supported by findings from Chew et al. (14), who observed similar multilayered condensates and conducted thorough validations to verify these phases. To clarify our approach, we have included additional details in the Materials and Methods section of our manuscript.

      “... By selecting a box size that minimizes finite-size effects, we can ensure that the spatial segregation observed in our multi-component condensates reflects genuine phase behavior. This finding aligns with Chew et al. (66), who also reported well-separated multilayered condensates and conducted thorough validations to confirm these phases.”

      3) The authors should provide a clearer assessment of the quantitative precision of their model. For instance, the authors use all-atom simulations to compare with CG interaction maps. The all-atom maps are sparser due to less sampling, but the authors state that the maps are 'in good agreement'. How do the authors judge this? The issue of model validation is very important: to illustrate, HP1a is anticipated to undergo phase separation primarily under low salt concentrations. Does the model effectively capture this sensitivity to salt conditions? While numerous findings in the manuscript likely remain valid, it could be beneficial to acknowledge potential limitations tied to the simulations. For instance, might the absence of quantitative precision impact certain predictions, such as the CD's influence on phase separation?<br /> The CG models employed do not consider the specific binding of a peptide in a protein cleft, including defined H-bonds, or induced structural elements. Thus, the authors should discuss whether specific interactions (i.e. beyond charge-charge or hydrophobics) may also play a role in the phase behaviour of HP1, and why it makes sense to ignore them in this study. If the only important parameters are charge and hydrophobicity, how can diverse phases coexist in cells? Does the arrangement of charges in the NTD, hinges and CTDs matter or are only the average properties important?

      This is similar to the point made by Reviewer 2 in the Public Review. We have addressed these questions in the public review and incorporated new plots (Fig. S1 f-h) in the SI.

      4) The authors fix CSD-CSD dimers, whereas these interactions are expected to be quite dynamic. In the particular example of HP1 proteins, having dimerization equilibria may change the behaviour of complex mixtures significantly, e.g. in view of the proposed accumulation of HP1b at a phase boundary. This point warrants more discussion in the paper.

      We have addressed the comment in the public review and extended the discussion in the Discussion section.

      Reviewer #2 (Recommendations For The Authors):

      The authors use all-atom simulations to validate their CG model. In Figure S1, they compare interaction maps. Of course, the AA maps are sparser due to less sampling, but the authors state that the maps are 'in good agreement'. How do the authors judge this (they do not look very similar to me, e.g. the NTD-hinge interactions are mostly lacking)?

      This is similar to Reviewer 1’s concern. We agree that the AA simulations are moderately limited over 5 μs due to the large size of the HP1 proteins (~400 residues in a dimer). However, the expansion trends of the average dimensions of the HP1 paralogs agree with the CG simulations (Fig. S1 a,b). Regarding the AA contact maps, we agree that they are relatively sparse, which makes it difficult to compare them to the CG maps. We have added new plots (Fig. S1 f-h) to show the correlation of the vdW contact probability per residue for each paralog in the AA and CG simulations. The Pearson correlation coefficients are approximately 0.86, suggesting a strong positive linear correlation in the contact propensity between AA and CG simulations.

    1. Author Response

      The following is the authors’ response to the original reviews.

      REVIEWER #1:

      The authors present a carefully controlled set of experiments that demonstrate an additional complexity for GPCR signaling in that endosomal signaling make be different when b-arrestin is or isn't associated with a G protein-bound V2R vasopressin receptor. It uses state of the art biosensorbased approaches and b-arrestin KO lines to assess this. It adds to a growing body of evidence that G proteins and b-arrestin can associate with GPCR complexes simultaneously. They also demonstrate the possibility that Gaq might also be activated by the V2R receptor. My sense is one thing they may need to be considered is the possibility of such "megacomplexes" might actually involve receptor dimers or oligomers.

      1.1 Can the authors please review the data that describes the concept of "GPCR megacomplexes"? I feel this is missing from the introduction. The notion means different things to different people. As you will see from my other comments, you should especially focus on evidence at the level of the single receptor.

      We appreciate the reviewer’s comments and have now included a more wholesome description of the GPCR megacomplex, or ‘megaplex’, concept in the introduction (page 2, 1st paragraph).

      1.2 The authors use mini-G proteins to conclude that V2R receptors interact with Gaq (in addition to Gas). I would prefer if there were a more direct measure of this. Can the authors show that the receptor interacts with full length Gaq (and not the other G proteins in Figure)? Is there a signaling phenotype associated with Gaq coupling? Is it sensitive to Gaq inhibition?

      Excellent point and we are happy to expand further on this. The ability of the V2R to activate Gq/11 has already been demonstrated before (Zhu, X. et al. Mol Pharmacol 46(3):460-9 (1994); Lykke, K. et al. Physiol Rep. 3(8):e12519 (2015); Avet, C. et al. eLife 11: e74101 (2022); Heydenreich, F.M. et al. Mol Pharmacol 102(3):139-49 (2022). Therefore, we did not attempt to document this activation using more traditional assays. On the other hand, to demonstrate an interaction between V2R and Ga subunit in cells is challenging for several reasons. First, the full-length Ga subunit is already located at the plasma membrane at basal state, and thus, generates high background signals in proximity assays. Second, upon receptor activation, the Ga subunit interaction with V2R is so transient that it is difficult, if not impossible, to catch this transient moment in a proximity assay. Although the miniG proteins are highly engineered, coupling specificity of the different subtypes (Gas, Gai/o, Gaq/11, and Ga12/13) to GPCRs is maintained. In addition, as they are homogenously expressed in the cytosol under basal states rather than at the membrane, they generate low background noise. Upon agonist stimulation, miniG proteins are recruited from the cytosol to the V2R at the plasma membrane, resulting in a robust signal in proximity assays. Thus, miniG proteins are unique in that they can actually detect GPCR–G protein interactions in cellular proximity assays, which is very challenging using full-length Ga subunits.

      That being said, we fully understand the reviewer’s concern and greatly value the effort in enhancing robustness of our study. Therefore, we have now monitored downstream signaling events of Gaq/11 in the absence or presence of the selective Gaq/11 inhibitor YM-254890 as a secondary method of documenting Gaq/11 activity. Specifically, we used a newly developed biosensor to measure diacylglycerol (DAG) production, a downstream second messenger of Gaq/11 activation, at both the plasma membrane and endosomes. Using a second biosensor, we detect general protein kinase C (PKC) activation, which is another downstream signaling event of Gaq/11 activation. Together, we demonstrated that AVP-stimulation leads to DAG production at both the plasma membrane and endosomes (Fig. 1C-D) as well as PKC activation (Fig. 1E), which all are sensitive to YM-254890 inhibition (Fig. 1C-D and E). Together these results rigorously suggest that the V2R interacts with and activates Gaq/11.

      1.3 I raise a similar concern with Gaq coupling in endosomes.

      For similar reasons that miniG proteins are excellent tools for demonstrating V2R interaction with G proteins at the plasma membrane, miniG proteins can also be used to detect V2R interaction with G proteins at endosomes by measuring proximity between miniG and an endosomal marker in response to agonist challenge. However, to ensure that the endosomal recruitment of miniGsq to the V2R demonstrated in our study corresponds to endosomal Gaq/11 activation, we monitored the production of DAG at the early endosomes in a similar way to which we detected DAG production at the plasma membrane. As shown in Fig. 1D, stimulation of V2R with AVP induces recruitment of the DAG-binding biosensor to the early endosomal marker Rab5. Pre-treatment of the cells with the selective Gaq/11 inhibitor YM-254890 abrogated this response, confirming that V2R activation leads to production of DAG at the early endosomes in a Gaq/11-dependent manner (Fig. 1D).

      1.4 Can the confocal data be shown for Gai and Ga12?

      Yes, we can certainly show this data as negative control. We have now included the confocal data using Halo-mGsi as a negative control for confocal microscopy (Fig. 2). As seen on this figure, mGsi does not colocalize with Lck (plasma membrane), nor with EEA1 (early endosomes) upon stimulation of cells with AVP in line with a receptor that does not couple to Gai/o.

      We did not include data using Halo-mG12, as this G protein subtype, similar to Gi/o, does not couple functionally to V2R. Therefore, it is highly unlikely we would obtain different results from the experiments using Halo-mGsi.

      1.5 The authors want us to believe that there is simultaneous binding of G proteins and b-arrestin. This is never demonstrated and is at odds with the structural basis of G protein and b-arrestin binding. Have the authors considered that "simultaneous" occupancy might simply reflect binding at distinct GPCR monomers in the context of dimeric or oligomeric receptors? They could I suppose provide data at the level of a single receptor rather than using the bulk BRET approaches used.

      We appreciate the comment and opportunity to highlight some of our previous work, which address the megacomplexes at the level of a single receptor. First, we have characterized the megacomplex biochemically and structurally at a low resolution (Thomsen ARB et al. 2016, Cell 166(4):907-19). The results unequivocally demonstrate that a single GPCR interacts simultaneously with heterotrimeric G protein, at the receptor core, and with b-arrestin via the phosphorylated receptor carboxy-terminal. We also documented functionality of the megacomplex as the receptor can interact with and activate the G protein, which were shown by 3 different biochemical approaches (Thomsen ARB et al. 2016, Cell 166(4):907-19). In addition, we solved a high-resolution cryo-EM structure of a megacomplex further highlighting the architecture of this complex (Nguyen AH et al. 2019, Nat Struct Mol Biol 26:1123-31). As both biochemical and structural analyses were done in vitro in which the receptor was embedded in a detergent micelle, we also confirmed that the megacomplex structural architecture fits naturally within the context of a membrane in molecular dynamics simulation experiments (Nguyen AH et al. 2019, Nat Struct Mol Biol 26:1123-31).

      In cells, we and others have also showed that GPCRs such as the V2R can bind b-arrestins exclusively via the phosphorylated carboxy-terminal tail as it does in the megacomplex (Kumari P et al. 2016, Nat Commun 7:13416; Cahill III TJ et al. 2017, PNAS 114(10):2562-67; Kumari P et al. 2017, Mol Biol Cell 28(8):1003-10; Chen K et al. 2023, Nature (online doi: https://doi.org/10.1038/s41586-023-06420-x). In addition, we and others have used BRET and confocal microscopy to show that the V2R and other GPCRs recruit G protein and b-arrestin simultaneously and that the three components colocalize in endosomes upon prolonged agonist exposure (Thomsen ARB et al. 2016, Cell 166(4):907-19; Chen K et al. 2023, Nature (online doi: https://doi.org/10.1038/s41586-023-06420-x). As the reviewer correctly points out, in these cellular experiments (as well as in single molecule microscopy), the working resolution is not high enough to rule out that the receptors that co-recruit G protein and b-arrestin in endosomes could be dimeric instead of monomeric. Thus, we conducted a series of experiments with GPCR–b-arrestin fusions where the two proteins are covalently attached at the receptor carboxy-terminal tail. We showed that despite the GPCR–b-arrestin coupling being fully functional (in respect to b-arrestin promoting a highaffinity state of the receptor for agonist binding and constitutively internalizing the receptor) the receptor could still activate G proteins (Thomsen ARB et al. 2016, Cell 166(4):907-19; Nguyen AH et al. 2019, Nat Struct Mol Biol 26:1123-31), which demonstrates that the single receptor megaplex can physically form in cells.

      We have now included an extra paragraph in the discussion to go over these megaplex-related considerations (5th paragraph in the discussion), and we thank the reviewer for raising this point.

      1.6 Please introduce abbreviations when you first use this- this was not done consistently.

      Thank you for noticing these errors, which we now have corrected.  

      REVIEWER #2:

      This manuscript by Daly et al., probes the emerging paradigm of GPCR signaling from endosomes using the V2R as a model system with an emphasis on Gaq/11 and b-arrestins. The study employs cellular imaging, enzyme complementation assays and energy transfer-based sensors to probe the potential formation of GPCR-G-protein-b-arrestin megaplexes. While the study is certainly very interesting, it appears to be very preliminary at many levels, and clearly requires further development in order to make robust conclusions. The authors should consider expanding on this work further to make the points more convincingly to make the work solid and impactful. The two corresponding authors are among the leaders in the field having demonstrated the existence of megaplexes, and building on the work in a systematic fashion should certainly move the paradigm forward. As the work presented in the current manuscript is already pre-printed, the authors should take this opportunity to present a completer and more comprehensive story to the field.

      We are grateful for the time and efforts the reviewer has put into reviewing our work. We are certainly excited to learn that the reviewer finds our work “very interesting”. Regarding the robustness, we have added extra control experiments to increase the completeness of the study. These experiments include:

      • Measurements of AVP-stimulated diacylglycerol production, a signaling event downstream of Gaq/11 activation. These measurements were conducted both at plasma membrane (Fig. 1C) and early endosomes (Fig. 1D) using a newly developed DAG-binding biosensor, and demonstrate that the V2R activates Gaq/11 at both of these subcellular locations.

      • Monitoring AVP-promoted protein kinase C activation, another downstream signaling effect of Gaq/11 activation (Fig. 1E). The result of this approach shows in another way that V2R activates of Gaq/11.

      • Inhibition of signaling events downstream of Gaq/11 activation using the selective of Gaq/11 inhibitor YM254890. YM-254890 inhibits both AVP-stimulated DAG production at plasma membrane and endosomes as well as PKC activation (Fig. 1C-E), which strongly confirms that these signaling outputs are results of Gaq/11 activation.

      • We have also included the confocal data using Halo-mGsi as a negative control for confocal microscopy (Fig. 2). As seen in this figure, mGsi does not translocate to the plasma membrane or early endosomes upon stimulation with AVP, which validates that V2R activation does not couple to and activate Gai/o.

      Finally, we would like to kindly remind the reviewer that the production of the pre-print manuscript is part of the peer-review process in eLife.

      2.1 The use of miniG proteins in these experiments is a major concern as these are highly engineered and may not represent the true features of G proteins. While these have been used as a readout in other publications, their use in demonstrating megaplex formation is sub-optimal, and native, full-length G proteins should be used.

      We are a bit unsure as to what the reviewer means by using native full-length G proteins. If the reviewer is suggesting to co-immunoprecipitate V2R with native unlabeled G protein and b-arrestin, it should be considered that the G protein interaction with the receptor is extremely transient and unlikely to survive the pull-down procedure unless stabilized by a nanobody or crosslinking. Although the b-arrestin interaction with the receptor is more stable of nature, co-immunoprecipitation with the receptor requires crosslinking or stabilization with a Fab/nanobody. Therefore, we do not think this approach can be used as a more accurate way of detecting native megaplexes.

      If the reviewer is suggesting the use of full-length G proteins in our cell-based proximity assays instead of miniG proteins, we would like to highlight that this approach is somewhat prone to false-positive responses. The major reason behind this is that G proteins are located at regions in membranes close to the receptor whereas b-arrestins are distributed throughout the cytosol. Upon activation of the V2R, barrestins translocate to the receptor at the plasma membrane, which results in enhanced BRET between V2R-coupled G protein subtypes and b-arrestins (see Author response image 1 below of preliminary data). This translocation also results in non-specific BRET signals between b-arrestins and G protein subtypes at the plasma membrane that do not couple to V2R but are located in close proximity to the receptor. As these nonspecific BRET signals do not report on the formation of functional V2R megaplexes (see Author response image 1), we have purposely not used this approach.

      Author response image 1.

      To overcome this technical hurdle in detection of functional megaplexes, we have replaced full-length G proteins by miniG proteins as the latter are located in the cytosol at resting states and only translocate to the membrane area if a receptor adopts an active conformation. This replacement is advantageous since activation of megaplex-forming receptors such as the V2R results in simultaneous translocation of miniG proteins and b-arrestins from the cytosol to the receptor at the plasma membrane, which produces a highly specific proximity signal (see Author response image 2 below of preliminary data). When stimulating the V2R, we only observe increases in proximity between b-arrestin1 and miniG proteins that are activated by the V2R (miniGs and miniGsq) but not the miniG proteins that are not activated by this receptor (miniGsi and miniG12) (see Author response image 2). Therefore, usage of miniG proteins offers a more accurate experimental approach to detect functional megaplexes as compared to the usage of full-length G proteins.

      Author response image 2.

      2.2 The interpretation of complementation (NanoLuc) or proximity (BRET) as evidence of signaling is not appropriate, especially when overexpression system and engineered constructs are being used.

      We thank the reviewer for raising this concern. We have previously demonstrated global Gas activation and Gas signaling in form of cAMP stimulated by internalized V2R (Thomsen ARB et al. 2016, Cell 166(4):907-19). As mentioned previously, in the current updated manuscript we have now included experiments to document downstream signaling events in response to Gaq/11 activation. These experiments include measurement of production of DAG at the plasma membrane (Fig. 1C) and early endosomes (Fig. 1D), as well as phosphorylation/activation of PKC (Fig. 1E). Pre-incubation with the selective Gaq/11 inhibitor YM-254890, abrogated all these downstream signals and confirms that the V2R stimulates Gaq/11 protein signaling at both the plasma membrane and endosomes (Fig. 1C-E).

      2.3 After the original work from the same corresponding authors on megaplex formation, the major challenge in the field is to demonstrate the existence and relevance of megaplex formation at endogenous levels of components, and the current study focuses solely on showing the proximity of Gaq and b-arrestins.

      We completely agree with the reviewer that it will be important to demonstrate functionality endogenous megaplexes and we are currently working on this in other studies using different receptor systems. However, doing this is not trivial and we will have to overcome major technical barriers that we feel is somewhat out of the scope of the current study. The goal of our V2R study is to demonstrate that V2R megaplexes form with Gaq/11 resulting to Gaq/11 activation at endosomes, and that endosomal G protein activation by the V2R can occur independently of b-arrestin, which we in our humble opinion accomplish.

      2.4 The study lacks a coherent approach, and the assays are often shifted back and forth between the two b-arrestin isoforms (1 and 2), for example, confocal vs. complementation etc.

      We understand the reviewer’s concern. However, as opposed to the β2-adrenergic receptor that binds βarrestin2 with higher affinity than β-arrestin1, V2R has a strong affinity for both β-arrestin1 and β-arrestin2 (Oakley et al. 2000, JBC 275(22):17201-10). The V2R’s almost identical affinity for β-arrestin1 and βarrestin2 is well illustrated in Fig. 3B. Thus, although different β-arrestin isoforms were used in some experiments, it is very unlikely that the overall results and conclusions from this study will change by adding extra experiments to ensure that both β-arrestin isoforms are used in every experiment.

      2.5 In every assay, only the G proteins and b-arrestins are monitored without a direct assessment of the presence of receptor, and absent that data, it is difficult to justify calling these entities megaplexes.

      Mini G proteins and b-arrestin come into close proximity upon agonist stimulation of the V2R. Using confocal microscopy, we observed this co-recruitment of miniGs/miniGsq and b-arrestin in response to prolonged V2R stimulation at endosomes specifically (Fig. 3D-F). In absence of GPCR stimulation, both miniG and b-arrestin would be homogenously distributed throughout the cytosol, and thus, the only reason to why both proteins have been recruited to endosomes in response to AVP challenge is that they are recruited to internalized and active V2R. This point was obviously not adequately described in the original manuscript, and thus, we have now clarified this further in the updated manuscript at the 8th sentence of the last paragraph of the "The V2R recruits Gas/Gaq and barrs simultaneously" section.

      REVIEWER #3:

      The manuscript by Daly et al. examines endosomal signaling of the vasopressin type 2 receptors using engineered mini G protein (mG proteins) and a number of novel techniques to address if sustained G protein signaling in the endosomal compartment is enhanced by b-arrestin. Employing these interesting techniques they have how V2R could activates Gas and Gaq in the endosomal compartments and how this modulation could occur in arrestin-dependent and -independent manner. Although the phenomenon of endosomal signaling is complex to address the authors have tried their best to examine these using a number of well controlled set of experiments. Though this is an interesting and well carried out study of endosomal signaling of G proteins, my concerns are:

      3.1 The study is done in overexpressed HEK 293 cells with these engineered constructs making me wonder if the kinetics would be the same in primary cells?

      The reviewer raises an interesting and valid point. It is possible that in the context of primary cells the kinetic would differ slightly and it would definitely be interesting to address this in a subsequent study. However, despite being an interesting aspect of our study, the kinetic itself is not our major take home message, but rather the subcellular localization of the G protein activation and the role of β-arrestin in these events. We have now highlighted this aspect in our updated manuscript (1st paragraph of the discussion) and we thank the reviewer for addressing this.

      3.2 The use of the phrase "G protein activation independent of b-arrestins to a minor degree" would make me question its physiological relevance. The authors should discuss the relevance of their findings in physiological or pathological context.

      We are glad that the reviewer focuses on this point, and we would like to highlight that other GPCRs including the glucagon-like peptide-1 receptor (GLP1R) internalizes in a β-arrestin-independent manner (Claing A et al. 2000 PNAS 97(3):1119-24), while signaling through Gas from endosomes. In the case of the GLP1R, this endosomal Gas signaling promotes glucose-stimulated insulin secretion in pancreatic βcells (Kuna RS et al. 2013 Am J Physiol Endocrinol Metab 305:E161-70). Consequently, β-arrestinindependent endosomal G protein signaling appears to have some physiological relevance. Similarly, in a very recent pre-print from the von Zastrow group (Blythe EE and von Zastrow M 2023 BioRxiv https://doi.org/10.1101/2022.09.07.506997), it was reported that endogenously-expressed vasoactive intestinal peptide receptor 1 (VIPR1), which regulates gastro-intestinal functions, promotes robust G protein signaling from endosomes in a completely β-arrestin-independent fashion. This again suggest that endogenously expressed GPCRs can internalize and activate G proteins from endosomes independently from β-arrestin to produce physiological responses. We have now discussed about these studies in the 6th paragraph of the discussion.

      3.3 The confocal colocalization studies shown in Figure 2 and their conclusion "suggesting a certain level of endosomal Gas/Gaq signaling despite the absence of barr2" seems rather inconclusive.

      As opposed to V2R a receptor that retains β-arrestin in endosomes upon internalization, β-arrestin quickly dissociates from V2b2AR after internalization due to the low affinity of the carboxy-terminal of β2AR for βarrestin. In the previous Fig. 2 (now Fig. 3), after 45 minutes of AVP stimulation, no β-arrestin is visible at endosomes in cells expressing V2b2AR as β-arrestin has already dissociated from the receptor and translocated back to the cytosol. However, clear green clusters of mGs and mGsq are still visible at endosomes indicating the presence of active receptor interacting with Gas or Gaq despite the fact that βarrestin is back to the cytosol. We quantified the percentage of the green mGs or mGsq clusters that do not colocalize with β-arrestin and have added this information to the updated version of the manuscript (Fig. 3G). In V2R-expressing cells, almost all active receptors that interact with Gas or Gaq/11 also associate with β-arrestin (Fig. 3G). In contrast, in V2b2AR-expressing cells, approximately 75% of the active receptors do not interact with β-arrestin (Fig. 3G). This suggests that β-arrestin binding to V2R is not an absolute requirement for endosomal Gas and Gaq activation by V2R. This point was obviously not addressed adequately in the original manuscript, and thus, we have now elaborated further on this in the updated version in the last paragraph of the "The V2R recruits Gas/Gaq and βarrs simultaneously" section.

      3.4 Though a novel observation it is not clear to me how V2R would internalize after activation without arrestin. Is it some sort of generalized microcytosis occurring in these overexpressed cells? Should discuss.

      This is certainly a very interesting observation and something other research laboratories also have seen recently – in particular, in context to endosomal G protein signaling (Blythe EE and von Zastrow M 2023 BioRxiv https://doi.org/10.1101/2022.09.07.506997). The main and best characterized pathway for GPCR internalization is clathrin-dependent where receptors most commonly are associated with β-arrestins. However, for some GPCRs, the β-arrestin association is not required for clathrin-mediated internalization. One example is the apelin receptor that can internalize via clathrin-coated pits, but in β-arrestinindependent manner (Pope GR et al. 2016 Moll Cell Endocrinol. 437:108-19). Alternatively, GPCRs can also internalize independently of any clathrin and β-arrestin associations via caveolae or fast endophilinmediated endocytosis (FEME). We have now expanded our discussion of possible mechanisms for βarrestin-independent receptor internalization in the updated manuscript in the 6th paragraph of the discussion, and we thank the reviewer for the suggestion.

      3.5 Is use of mini G protein a good representation? The authors should justify.

      Excellent point and something we have comprehensively discussed in our response to reviewer 1 and 2 (points 1.2 and 2.1).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      Bendzunas, Byrne et al. explore two highly topical areas of protein kinase regulation in this manuscript. Firstly, the idea that Cys modification could regulate kinase activity. The senior authors have published some standout papers exploring this idea of late, and the current work adds to the picture of how active site Cys might have been favoured in evolution to serve critical regulatory functions. Second, BRSK1/2 are understudied kinases listed as part of the "dark kinome" so any knowledge of their underlying regulation is of critical importance to advancing the field.

      Strengths:

      In this study, the author pinpoints highly-conserved, but BRSK-specific, Cys residues as key players in kinase regulation. There is a delicate balance between equating what happens in vitro with recombinant proteins relative to what the functional consequence of Cys mutation might be in cells or organisms, but the authors are very clear with the caveats relating to these connections in their descriptions and discussion. Accordingly, by extension, they present a very sound biochemical case for how Cys modification might influence kinase activity in cellular environs.

      Weaknesses:

      I have very few critiques for this study, and my major points are barely major.

      Major points

      (1) My sense is that the influence of Cys mutation on dimerization is going to be one of the first queries readers consider as they read the work. It would be, in my opinion, useful to bring forward the dimer section in the manuscript.

      We agree that the influence of Cys on BRSK dimerization is a topic of significant interest. Our primary focus was to explore oxidative regulation of the understudied BRSK kinases as they contain a conserved T-loop Cys, and we have previously demonstrated that equivalent residues at this position in related kinases were critical drivers of oxidative modulation of catalytic activity. We have demonstrated here that BRSK1 & 2 are similarly regulated by redox and this is due to oxidative modification of the T+2 Cys, in addition to Cys residues that are conserved amongst related ARKs as well as BRSK-specific Cys. Although we also provide evidence for limited redox-sensitive higher order BRSK species (dimers) in our in vitro analysis, these represent a small population of the total BRSK protein pool (this was validated by SEC-MALs analysis). As such, we do not have strong evidence to suggest that these limited dimers significantly contribute to the pronounced inhibition of BRSK1 & 2 in the presence of oxidizing agents, and instead believe that other biochemical mechanisms likely drive this response. This may result from oxidized Cys altering the conformation of the activation loop. Indeed, the formation of an intramolecular disulfide within the T-loop of BRSK1 & 2, which we detected by MS, is one such regulatory modification. It is noteworthy, that intramolecular disulfide bonds within the T-loop of AKT and MELK have already been shown to induce an inactive state in the kinase, and we posit a similar mechanism for BRSKs.

      While we recognize the potential importance of dimerization in this context, our current data from in vitro and cell-based assays do not provide substantial evidence to assert dimerization as a primary regulatory mechanism. Hence, we maintained a more conservative stance in our manuscript, discussing dimerization in later sections where it naturally followed from the initial findings. That being said, we acknowledge the potential significance of dimerization in the regulation of the BRSK T-loop cysteine. We believe this aspect merits further investigation and could indeed be the focus of a follow-up study.

      (2) Relatedly, the effect of Cys mutation on the dimerization properties of preparations of recombinant protein is not very clear as it stands. Some SEC traces would be helpful; these could be included in the supplement.

      In order to determine whether our recombinant BRSK proteins (and T-loop mutants) existed as monomers or dimers, we performed SDS-PAGE under reducing and non-reducing conditions (Fig 7). This unambiguously revealed that a monomer was the prominent species, with little evidence of dimers under these experimental conditions (even in the presence of oxidizing agents). Although we cannot discount a regulatory role for BRSK dimers in other physiological contexts, we could not produce sufficient evidence to suggest that multimerization played a substantial role in modifying BRSK kinase activity in our assays. We note that our in vitro analysis was performed using truncated forms of the protein, and as such it is entirely possible that regions of the protein that flank the kinase domain may serve additional regulatory functions that may include higher order BRSK conformations. In this regard, although we have not included SEC traces of our recombinant proteins, we have included analytical SEC-MALS of the truncated proteins (Supplementary Figure 6) which we believe to be more informative. We have also now included additional SEC-MALS data for BRSK2 C176A and C183A (Supplementary Figure 6d and e), which supports our findings in Fig 7, demonstrating the presence of limited dimer species under non-reducing conditions.

      (3) Is there any knowledge of Cys mutants in disease for BRSK1/2?

      We have conducted an extensive search across several databases: COSMIC (Catalogue of Somatic Mutations in Cancer), ProKinO (Protein Kinase Ontology), and TCGA (The Cancer Genome Atlas). These databases are well-regarded for their comprehensive and detailed records of mutations related to cancer and protein kinases. Our analysis using the COSMIC and TCGA databases focused on identifying any reported instances of Cys mutations in BRSK1/2 that are implicated in cancer. Additionally, we utilized the ProKinO database to explore the broader landscape of protein kinase mutations, including any potential disease associations of Cys mutations in BRSK1/2. However, we found no evidence to indicate the presence of Cys mutations in BRSK1/2 that are associated with cancer or disease. This lack of association in the current literature and database records suggests that, as of our latest search, Cys mutations in BRSK1/2 have not been reported as significant contributors to pathogenesis.

      (4) In bar charts, I'd recommend plotting data points. Plus, it is crucial to report in the legend what error measure is shown, the number of replicates, and the statistical method used in any tests.

      We have added the data points to the bar charts and included statistical methods in figure legends.

      (5) In Figure 5b, the GAPDH loading control doesn't look quite right.

      The blot has been repeated and updated.

      (6) In Figure 7 there is no indication of what mode of detection was used for these gels.

      We have updated the figure legend to confirm that the detection method was western blot.

      (7) Recombinant proteins - more detail should be included on how they were prepared. Was there a reducing agent present during purification? Where did they elute off SEC... consistent with a monomer of higher order species?

      We have added ‘produced in the absence of reducing agents unless stated otherwise’ in the methods section to improve clarity. Although we have not added additional sentences to describe the elution profile of the BRSK proteins by SEC during purification, we believe that the inclusion of analytical SEC-MALS data is sufficient evidence that the proteins are largely monomeric under non-reducing conditions.

      Reviewer #2 (Public Review):

      Summary:

      In this study by Bendzunas et al, the authors show that the formation of intra-molecular disulfide bonds involving a pair of Cys residues near the catalytic HRD motif and a highly conserved T-Loop Cys with a BRSK-specific Cys at an unusual CPE motif at the end of the activation segment function as repressive regulatory mechanisms in BSK1 and 2. They observed that mutation of the CPE-Cys only, contrary to the double mutation of the pair, increases catalytic activity in vitro and drives phosphorylation of the BRSK substrate Tau in cells. Molecular modeling and molecular dynamics simulations indicate that oxidation of the CPE-Cys destabilizes a conserved salt bridge network critical for allosteric activation. The occurrence of spatially proximal Cys amino acids in diverse Ser/Thr protein kinase families suggests that disulfide-mediated control of catalytic activity may be a prevalent mechanism for regulation within the broader AMPK family. Understanding the molecular mechanisms underlying kinase regulation by redox-active Cys residues is fundamental as it appears to be widespread in signaling proteins and provides new opportunities to develop specific covalent compounds for the targeted modulation of protein kinases.

      The authors demonstrate that intramolecular cysteine disulfide bonding between conserved cysteines can function as a repressing mechanism as indicated by the effect of DTT and the consequent increase in activity by BSK-1 and -2 (WT). The cause-effect relationship of why mutation of the CPE-Cys only increases catalytic activity in vitro and drives phosphorylation of the BRSK substrate Tau in cells is not clear to me. The explanation given by the authors based on molecular modeling and molecular dynamics simulations is that oxidation of the CPE-Cys (that will favor disulfide bonding) destabilizes a conserved salt bridge network critical for allosteric activation. However, no functional evidence of the impact of the salt-bridge network is provided. If you mutated the two main Cys-pairs (aE-CHRD and A-loop T+2-CPE) you lose the effect of DTT, as the disulfide pairs cannot be formed, hence no repression mechanisms take place, however when looking at individual residues I do not understand why mutating the CPE only results in the opposite effect unless it is independent of its connection with the T+2residue on the A-loop.

      Strengths:

      This is an important and interesting study providing new knowledge in the protein kinase field with important therapeutic implications for the rationale design and development of next-generation inhibitors.

      Weaknesses:

      There are several issues with the figures that this reviewer considers should be addressed.

      Reviewer #1 (Recommendations for The Authors):

      Major points

      Page 26 - the discussion could be more concise. There's an element of recapping the results, which should be avoided.

      Regarding the conciseness of the discussion section, we have thoroughly revised it to ensure a more succinct presentation, deliberately avoiding the recapitulation of results. The revised discussion now focuses on interpreting the findings and their implications, steering clear of redundancy with the results section.

      Figure 1b seems to be mislabeled/annotated. I recommend checking whether the figure legends match more broadly. Figure 1 appears to be incorrectly cited throughout the results.

      Thank you for pointing out the discrepancies in the labeling and citation of Figure 1b. We have carefully reviewed and corrected these issues to ensure that all figure labels, legends, and citations accurately reflect the corresponding data and illustrations. We appreciate your attention to detail and the opportunity to improve the clarity and accuracy of our presentation.

      Figure 6 - please include a color-coding key in the figure. Further support for these simulations could be provided by supplementary movies or plots of the interaction. Figure 4 colour palette should be adjusted for the spheres in the Richardson diagrams to have greater distinction.

      As suggested, we have amended the colour palette in Figure 4 to improve conformity throughout the figure.

      Minor points

      Figure 2 - it'd be helpful to know what the percentage coverage of peptides is.

      We have updated the figure legend to include peptide coverage for both proteins

      Some typos - Supp 2 legend "Domians".

      Fixed

      Figure 6 legend - analyzed by needs a space;

      Fixed

      Fig 8 legend schematic misspelled.

      Fixed

      Broadly, if you Google T-loop you get a pot pourri of enzyme answers. Why not just use Activation loop?

      The choice of "T-loop" over "Activation loop" in our manuscript was made to maintain consistency with other literature in the field, and in particular our previous paper “Aurora A regulation by reversible cysteine oxidation reveals evolutionarily conserved redox control of Ser/Thr protein kinase activity” where we refer to the activation loop cysteine as T-loop + 2. We acknowledge the varied enzyme contexts in which "T-loop" is used and agree on the importance of clarity. To address this, we made an explicit note in the manuscript that the "T-loop" is also referred to as the "Activation loop", ensuring readers are aware of the interchangeable use of these terms. Additionally, this nomenclature facilitates a more straightforward designation of cysteine residues within the loop (T+2 Cysteine). We believe this approach balances adherence to established conventions with the need for clarity and precision in our descriptions.

      Methods - what is LR cloning. Requires some definition. Some manufacturer detail is missing in methods, and referring to prior work is not sufficient to empower readers to replicate.

      We agree, and have added the following to the methods section:

      “BRSK1 and 2 were sub-cloned into pDest vectors (to encode the expression of N-terminal Flag or HA tagged proteins) using the Gateway LR Clonase II system (Invitrogen) according to the manufacturer’s instructions. pENtR BRSK1/2 clones were obtained in the form of Gateway-compatible donor vectors from Dr Ben Major (Washington University in St. Louis). The Gateway LR Clonase II enzyme mix mediates recombination between the attL sites on the Entry clone and the attR sites on the destination vector. All cloned BRSK1/2 genes were fully sequenced prior to use.”

      Page 7 - optimal settings should be reported. How were pTau signals quantified and normalised?

      We have added the following to the methods section:

      “Two-color Western blot detection method employing infrared fluorescence was used to measure the ratio of Tau phospho serine 262 to total Tau. Total GFP Tau was detected using a mouse anti GFP antibody and visualized at 680 nm using goat anti mouse IRdye 680 while phospho-tau was detected using a Tau phospho serine 262 specific antibody and visualized at 800 nm using goat anti rabbit IRdye 800. Imaging was performed using a Licor Odessey Clx with scan control settings set to 169 μm, medium quality, and 0.0 mm distance. Quantification was performed using Licor image studio on the raw image files. Total Tau to phospho Tau ratio was determined by measuring the ratio of the fluorescence intensities measured at 800 nm (pTau) to those at 680 nm (total tau).”

      In the Figure 6g-j legend, the salt bridge is incorrectly annotated as E185-R248 rather than 258.

      Fixed

      Lines 393-395 provides a repeat statement on BRSKs phosphorylating Tau (from 388-389).

      We have removed the repetition and reworded the opening lines of the results section to improve the overall flow of the manuscript.

      Supp. Figure 1 is difficult to view - would it be possible to increase the size of the phylogenetic analysis?

      We thank the reviewer for this observation. We have rotated (90°) and expanded the figure so that it can be more clearly viewed

      Supp. Figure 2 - BRSK1/2 incorrectly spelled.

      Fixed

      Please check the alignment of labels in Supp. Figure 3e.

      Fixed

      Reviewer #2 (Recommendations For The Authors):

      (1) In Figure 1, current panel b is not mentioned/described in the figure legend and as a consequence, the rest of the panels in the legends do not fit the content of the figure.

      Reviewer 1 also noted this error, and we have amended the manuscript accordingly.

      What is the rationale for using the HEK293T cells as the main experimental/cellular system? Are there cell lines that express both proteins endogenously so that the authors can recapitulate the results obtained from ectopic overexpression?

      The selection of HEK-293T cells was driven by their well-established utility in overexpression studies, which make them ideal for the investigation of protein interactions and redox regulation. This cell line's robust transfection efficiency and well-characterized biology provide a reliable platform for dissecting the molecular mechanisms underlying the redox regulation of proteins. Furthermore, the use of HEK-293T cells aligns with the broader scientific practice, serving as a common ground for comparability with existing literature in the field of BRSK1/2 signaling, protein regulation and interaction studies.

      The application of HEK-293T cells as a model system in our study serves as a foundational step towards eventually elucidating the functions of BRSK1/2 in neuronal cells, where these kinases are predominantly expressed and play critical roles. Given the fact that BRSKs are classed as ‘understudied’ kinases, the choice of a HEK-293T co-overexpression system allowed us to analyze the direct effects of BRSK kinase activity (using phosphorylation of Tau as a readout) in a cellular context and in more controlled manner. This approach not only aids in the establishment of a baseline understanding of the redox regulation of BRSK1/2, but also sets the stage for subsequent investigations in more physiologically relevant neuronal models

      In current panel d, could the authors recapitulate the same experimental conditions as in current panel c?

      Figure 1 panel c shows that both BRSK1 and 2 are reversibly inhibited by oxidizing agents such as H2O2, whilst panels d and e show the concentration dependent activation and inhibition of the BRSKs with increasing concentrations of DTT and H2O2 respectively. The experimental conditions were identical, other than changing amounts of reducing and oxidizing agents, and used the same peptide coupled assays. Data for all experiments were originally collected in ‘real time’ as depicted in Fig 1c (increase in substrate phosphorylation over time). However, to aid interpretation of the data, we elected to present the latter two panels as dose response curves by calculating the change in the rate of enzyme activity (shown as pmol phosphate incorporated into the peptide substrate per min) for each condition. To aid the reader, we now include an additional supplementary figure (new supplementary figure 2) depicting BRSK1 and 2 dependent phosphorylation of the peptide substrate in the presence of different concentrations of DTT and H2O2 in a real time (kinetic) assay. The new data shown is a subset of the unprocessed data that was used to calculate the rates of BRSK activity in Fig 1d & e.

      Why did the authors use full-length constructs in these experiments and did not in e.g. Figure 2 where they used KD constructs instead?

      In the initial experiments, illustrated in Figure 1, we employed full-length protein constructs to establish a proof of concept, demonstrating the overall behavior and interactions of the proteins in their full-length form. This confirmed that BRSK1 & 2, which both contain a conserved T + 2 Cys residue that is frequently prognostic for redox sensitivity in related kinases, displayed a near-obligate requirement for reducing agents to promote kinase activity.  

      Subsequently, in Figure 2, our focus shifted towards delineating the specific regions within the proteins that are critical for redox regulation. By using constructs that encompass only the kinase domain, we aimed to demonstrate that the redox-sensitive regulation of these proteins is predominantly mediated by specific cysteine residues located within the kinase domain itself. This strategic use of the kinase domain of the protein allowed for a more targeted investigation. Furthermore, in our hands these truncated forms of the protein were more stable at higher concentrations, enabling more detailed characterization of the proteins by DSF and SEC-MALS. We predict that the flanking disordered regions of the full-length protein (as predicted by AlphaFold) contribute to this effect.

      (2) In Figure 2, Did the authors try to do LC/MS-MS in the same experimental conditions as in Figure 1 (e.g. buffer minus/plus DTT, H2O2, H2O2 + DTT)?

      We would like to clarify that the mass spectrometry experiments were conducted exclusively on proteins purified under native (non-reducing) conditions. We did not extend the LC/MS-MS analyses to include proteins treated with various buffer conditions such as minus/plus DTT, H2O2, or H2O2 + DTT as used in the experiments depicted in Figure 1. Given that we could readily detect disulfides in the absence of oxidizing agents, we did not see the benefit of additional treatment conditions as peroxide treatment of protein samples can frequently complicate interpretation of MS data. However, it should be noted that prior to MS analysis, tryptic peptides were subjected to a 50:50 split, with one half alkylated in the presence of DTT (as described in the methods section) to eliminate disulfides and other transiently oxidized Cys forms. Comparative analysis between reduced and non-reduced tryptic peptides improved our confidence when assigning disulfide bonds (which were eliminated in identical peptides in the presence of DTT).

      On panel b, why did the authors show alphafold predictions and not empiric structural information (e.g. X-ray, EM,..)?

      The AlphaFold models were primarily utilized to map the general locations of redox-sensitive cysteine pairs within the proteins of interest. Although we have access to the crystal structure of mouse BRSK2, they do not fully capture the active conformation seen in the Alphafold model of the human version. The use of AlphaFold models for human proteins in this study aids in consistently tracking residue numbering across the manuscript, offering a useful framework for understanding the spatial arrangement of these critical cysteine pairs in their potentially active-like states. This approach facilitates our analysis and discussion by providing a reference for the structural context of these residues in the human proteins.

      What was the rationale for using the KD construct and not the FL as in Figure 1?

      The rationale to use the kinase domain was primarily based on the significantly lower confidence in the structural predictions for regions outside the kinase domain (KD). Our experimental focus was to investigate the role of conserved cysteine residues within the kinase domain, which are critical for the protein's function and regulation. This targeted approach allowed us to concentrate our analyses on the most functionally relevant and structurally defined portion of the protein, thereby enhancing the precision and relevance of our findings. As is frequently the case, truncated forms of the protein, consisting only of the kinase domain, are much more stable than their full length counterparts and are therefore more amenable to in vitro biochemical analysis. In our hands this was true for both BRSK1 and 2, and as such much of the data collected here was generated using kinase-domain (KD) constructs. Simulations using the KD structures are therefore much more representative of our original experimental setup.

      The BSK1 KD construct appears to be rather inactive and not responsive to DTT treatment. Could the authors comment on the differences observed with the FL construct of Figure 1

      It is important to note that BRSK1, in general, exhibits lower intrinsic activity compared to BRSK2. This reduced activity could be attributed to a range of factors, including the need for activation by upstream kinases such as LKB1, as well as potential post-translational modifications (PTMs) that may be absent in the bacterially expressed KD construct. The full-length forms of the protein were purified from Sf21 cells, and as such may have additional modifications that are lacking in the bacterially derived KD counterparts. We also cannot discount additional regulatory roles of the regions that flank the KD, and these may contribute in part to the modest discrepancy observed between constructs.  Despite these differences, it is crucial to emphasize that both the KD and FL constructs of BRSK1 are regulated by DTT, indicating a conserved redox-dependent activation for both of the related BRSK proteins.  

      (3) In Figure 4, on panel A wouldn´t the authors expect that mutating on the pairs e.g. C198A in BSK1 would have the same effect as mutating the C191 from the T+2 site? Did they try mutating individual sites of the aE/CHRD pair? The same will apply to BSK2

      We appreciate the insightful comment. It's important to clarify that the redox regulation of these proteins is influenced not solely by the formation of disulfide bonds but also by the oxidation state of individual cysteine residues, particularly the T+2 Cys. This nuanced mechanism of regulation allows for a diverse range of functional outcomes based on the specific cysteine involved and its state of oxidation. This aspect forms a key finding of our paper, highlighting the complexity of redox regulation beyond mere disulfide bond formation. For example, AURA kinase activity is regulated by oxidation of a single T+2 Cys (Cys290, equivalent to Cys191 and Cys176 of BRSK1 and 2 respectively), but this regulation can be supplemented through artificial incorporation of a secondary Cys at the DFG+2 position (Byrne et al., 2020). This targeted genetic modification or AURA mirrors equivalent regulatory disulfide-forming Cys pairs that naturally occur in kinases such as AKT and MELK, and which provide an extra layer of regulatory fine tuning (and a possible protective role to prevent deleterious over oxidation) to the T+2 Cys. We surmise that the CPE Cys is also an accessory regulatory element to the T+2 Cys in BRSK1 +2, which is the dominant driver of BRSK redox sensitivity (as judged by the fact that CPE Cys mutants are still potently regulated by redox [Fig 4]), by locking it in an inactive disulfide configuration.

      In our preliminary analysis of BRSK1, we observed that mutations of individual sites within the aE/CHRD pair was similarly detrimental to kinase activity as a tandem mutation (see reviewer figure 1). As discussed in the manuscript, we think that these Cys may serve important structural regulatory functions and opted to focus on co-mutations of the aE/CHRD pair for the remainder of our investigation.

      Author response image 1.

      In vitro kinase assays showing rates of in vitro peptide phosphorylation by WT and Cys-to-Ala (aE/CHRD residues) variants of BRSK1 after activation by LKB1.

      In panels C and D, the same experimental conditions should have been measured as in A and B.

      Panels A and B were designed to demonstrate the enzymatic activity and the response to DTT treatment to establish the baseline redox regulation of the kinase and a panel of Cys-to-Ala mutant variants. In contrast, panels C and D were specifically focused on rescue experiments with mutants that showed a significant effect under the conditions tested in A and B. These panels were intended to further explore the role of redox regulation in modulating the activity of these mutants, particularly those that retained some level of activity or exhibited a notable response to redox changes.

      The rationale for this experimental design was to prioritize the investigation of mutants, such as those at the T+2 and CPE cysteine sites, which provided the most insight into the redox-dependent modulation of kinase activity. Other mutants, which resulted in inactivation, were deprioritized in this context as they offered limited additional information regarding the redox regulation mechanism. This focused approach allowed us to delve deeper into understanding how specific cysteine residues contribute to the redox-sensitive control of kinase function, aligning with the overall objective of elucidating the nuanced roles of redox regulation in kinase activity.

      (4) In figure 5: Why did the authors use reduced Glutathione instead of DTT? The authors should have recapitulated the same experimental conditions as in Figure 4 and not focused only on the T+2 or the CPE single mutants but using the double and the aE/CHRD mutants as well, as internal controls and validation of the enzymatic assays using the modified peptide

      Regarding the use of reduced glutathione (GSH) instead of DTT in Figure 5, we chose GSH for its well characterized biological relevance as an antioxidant in cellular responses to oxidative stress. Furthermore, while DTT has been widely used in experimental setups, it is also potentially cytotoxic at high concentrations.

      Addressing the point on experimental consistency with Figure 4, we appreciate the suggestion and indeed had already conducted such experiments (Previously Supp Fig 3, now changed to current Supp Fig 4). These experiments include analyses of BRSK mutant activity in a HEK-293T model. However, we chose not to focus on inactivating mutants (such as the aE/CHRD mutants which had depleted expression levels possibly as a consequence of compromised structural integrity) or pursue the generation of double mutant CMV plasmids, as these were deemed unlikely to add significant insights into the core narrative of our study. Our focus remained on the mutants that yielded the most informative results regarding the redox regulation mechanisms in the in vitro setting, ensuring a clear and impactful presentation of our findings.

      A time course evaluation of the reducing or oxidizing reagents should have been performed. Would we expect that in WT samples, and in the presence of GSH, and also in the case of the CPE mutant, an increment in the levels of Tau phosphorylation as a readout of BSK1-2 activity?

      We acknowledge the importance of such analyses in understanding the dynamic nature of redox regulation on kinase activity and have included a time course (Supp Fig 2 e-g). These results confirm a depletion of Tau phosphorylation over time in response to peroxide generated by the enzyme glucose oxidase.

      (5) In Figure 6, did the authors look at the functional impact of the residues with which interact the T+2 and the CPE motifs e.g. T174 and the E185-R258 tether?

      Our primary focus was on the salt bridges, as this is a key regulatory structural feature that is conserved across many kinases. Regarding the additional interactions mentioned, we have thoroughly evaluated their roles and dynamics through molecular dynamics (MD) simulations but did not find any results of significant relevance to warrant inclusion.

      (6) In Figure 7: Did the author look at the oligomerization state of the BSK1-2 multimers under non-reducing conditions? Were they also observed in the case of the FL constructs? What was the stoichiometry?

      Our current work indicates that the kinase domain of BRSK1-2 primarily exists in a monomeric state, with some evidence of dimerization or multimer formation under specific conditions. Our SEC-MALS (Supp Fig 6) and SDS-PAGE analysis (Figure 7) clearly demonstrates that monomers are overwhelmingly the dominant species under non-reducing conditions (>90 %). We also conclude that these limited oligomeric species can be removed by inclusion of reducing agents such as DTT (Figure 7), which may suggest a role for a Cys residue(s). Notably, removal of the T+2 Cys was insufficient to prevent multimerization.

      We were unable to obtain reliable SEC-MALS data for the full-length forms of the protein, likely due to the presence of disordered regions that flank the kinase domain which results in a highly heterodispersed and unstable preparation (at the concentrations required for SEC-MALS). Although we are therefore unable to comment on the stoichiometry of FL BRSK dimers, we can detect BRSK1 and 2 hetero- and homo-complexes in HEK-293T cells by IP, which supports the existence of limited BRSK1 & 2 dimers (Supp Fig 6a). However, we were unable to detect intermolecular disulfide bonds by MS, although this does not necessarily preclude their existence. The physiological role of BRSK multimerization (if any) and establishing specifically which Cys residues drive this phenomenon is of significant interest to our future investigations.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This manuscript represents a fundamental contribution demonstrating that fentanyl-induced respiratory depression can be reversed with a peripherally-restricted mu opioid receptor antagonist. The paper reports compelling and rigorous physiological, pharmacokinetic, and behavioral evidence supporting this major claim, and furthers mechanistic understanding of how peripheral opioid receptors contribute to respiratory depression. These findings reshape our understanding of opioid-related effects on respiration and have significant therapeutic implications given that medications currently used to reverse opioid overdose (such as naloxone) produce severe aversive and withdrawal effects via actions within the central nervous system.

      We thank the reviewers for their insightful comments and critiques, which we have incorporated into the manuscript. We believe these revisions have significantly improved the manuscript. Additionally, following discussions among the authors, we have revised the color scheme across all figures. For example, the color of the symbols in Figure 1B-D now match the bars in Figure 1E-J, rather than the symbols. We feel that this change improves the clarity and visual consistency of the figures, making it easier to interpret the data across figures.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This paper shows that the synthetic opioid fentanyl induces respiratory depression in rodents. This effect is revised by the opioid receptor antagonist naloxone, as expected. Unexpectedly, the peripherally restricted opioid receptor antagonist naloxone methiodide also blocks fentanyl-induced respiratory depression.

      Strengths:

      The paper reports compelling physiology data supporting the induction of respiratory distress in fentanyl-treated animals. Evidence suggesting that naloxone methiodide reverses this respiratory depression is compelling. This is further supported by pharmacokinetic data suggesting that naloxone methiodide does not penetrate into the brain, nor is it metabolized into brain-penetrant naloxone.

      Weaknesses:

      A weakness of the study is the fact that the functional significance of opioid-induced changes in neural activity in the nTS (as measured by cFos and GcAMP/photometry) is not established. Does the nTS regulate fentanyl-induced respiratory depression, and are changes in nTS activity induced by naloxone and naloxone methiodide relevant to their ability to reverse respiratory depression?

      Reviewer #2 (Public review):

      Summary:

      In this article, Ruyle and colleagues assessed the contribution of central and peripheral mu opioid receptors in mediating fentanyl-induced respiratory depression using both naloxone and naloxone methiodide, which does not cross the blood-brain barrier. Both compounds prevented and reversed fentanyl-induced respiratory depression to a comparable degree. The advantage of peripheral treatments is that they circumvent the withdrawal-like effects of naloxone. Moreover, neurons located in the nucleus of the solitary tract are no longer activated by fentanyl when nalaxone methiodide is administered, suggesting that these responses are mediated by peripheral mu opioid receptors. The results delineate a role for peripheral mu opioid receptors in fentanyl-derived respiratory depression and identify a potentially advantageous approach to treating overdoses without inflicting withdrawal on the patients.

      Strengths:

      The strengths of the article include the intravenous delivery of all compounds, which increase the translational value of the article. The authors address both the prevention and reversal of fentanyl-derived respiratory depression. The experimental design and data interpretation are rigorous and appropriate controls were used in the study. Multiple doses were screened in the study and the approaches were multipronged. The authors demonstrated the activation of NTS cells using multiple techniques and the study links peripheral activation of mu opioid receptors to central activation of NTS cells. Both males and females were used in the experiments. The authors demonstrate the peripheral restriction of naloxone methiodide.

      Weaknesses:

      Nalaxone is already broadly used to prevent overdoses from opioids so in some respects, the effects reported here are somewhat incremental.

      The reviewer is correct that naloxone is the standard antidote for reversing opioid-induced respiratory depression. However, its limitations, including the risk of precipitated withdrawal, are well-documented in both preclinical and clinical studies. The likelihood of withdrawal increases when multiple doses of naloxone are administered. Since naloxone-induced withdrawal is centrally mediated, this study aimed to evaluate a peripherally restricted MOR antagonist for its ability to prevent or reverse fentanyl-induced respiratory depression. A key finding is that NLXM reversed OIRD without inducing aversive behavior. This suggests that peripheral antagonists like NLXM may be integrated into intervention strategies that save lives while preventing the adverse behavioral and physiological effects that are observed after treatment with naloxone.

      Reviewer #3 (Public review):

      Summary:

      This manuscript outlines a series of very exciting and game-changing experiments examining the role of peripheral MORs in OIRD. The authors outline experiments that demonstrate a peripherally restricted MOR antagonist (NLX Methiodide) can rescue fentanyl-induced respiratory depression and this effect coincides with a lack of conditioned place aversion. This approach would be a massive boon to the OUD community, as there are a multitude of clinical reports showing that naloxone rescue post fentanyl over-intoxication is more aversive than the potential loss-of-life to the individuals involved. This important study reframes our understanding of successful overdose rescue with potential for reduced aversive withdrawal effects.

      Strengths:

      Strengths include the plethora of approaches arriving at the same general conclusion, the inclusion of both sexes and the result that a peripheral approach for OIRD rescue may side-step severe negative withdrawal symptoms of traditional NLX rescue.

      Weaknesses:

      The major weakness of this version relates to the data analysis assessed sex-specific contributors to the results.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Some points for the authors to consider are:

      (1) In the Abstract, it is unclear why "high potency and lipophilicity" contribute to opioid-induced respiratory depression.

      The higher potency of fentanyl compared to other opioids significantly increases the risk of overdose and subsequent respiratory depression. Its high lipophilicity facilitates rapid absorption and central nervous system penetration, which contributes to the rapid onset of these cardiorespiratory depression. The narrow therapeutic window of fentanyl further emphasizes the critical need for timely intervention when an overdose has occurred, and effective antagonists to reverse respiratory depression and save lives. We have revised the abstract to clarify these points.

      (2) Are the doses of fentanyl used in the study (2, 20, or 50 µg/kg IV) relevant to those achieved by fentanyl-exposed human drug users?

      In these studies, we intravenously administered three doses of fentanyl. The human equivalent doses (HED) of 20ug/kg and 50 ug/kg fentanyl are ~3 ug/kg and ~8 ug/kg, respectively. These doses have previously been shown to induce respiratory depression in humans (Dahan et al.,2005).

      (3) In Figure 1, it appeared that only a small fraction of tyrosine hydroxylase-positive (TH+) neurons expressed cFos in response to fentanyl, and the degree of cFos expression was largely similar across all fentanyl doses tested. Thus, it is unclear whether TH+ neurons play a role in fentanyl-induced respiratory depression, and the value of these data is unclear (see point #6 below also).

      As shown in the mean data, the lowest dose of fentanyl, which was below the threshold for inducing OIRD, activated approximately 50% of tyrosine hydroxylase-positive (TH+) nTS neurons. In contrast, the highest dose of fentanyl resulted in a statistically significant increase, with ~75% of TH+ cells co-expressing Fos-IR.

      We included the assessment of catecholaminergic nTS cells for several reasons. The regions of the nTS evaluated in this study contains high expression of MOR and are the termination points of sensory afferent fibers transmitting cardiorespiratory information to the nTS (Aicher et al., 2000; Furdui et al., 2024). Catecholaminergic cells receive direct excitatory inputs from visceral afferents (Appleyard et al., 2007) and exhibit intensity-dependent increases in Fos-IR in rats exposed to hypoxic air (Kline et al., 2010; King et al., 2012). These neurons are essential for generating appropriate cardiorespiratory responses to hypoxic challenges (Bathina et al., 2013; King et al., 2015). As the reviewer notes, rats exposed to fentanyl exhibit a high degree of Fos-IR in the nTS, including catecholaminergic neurons. Despite the robust fentanyl-induced activation (increased Fos-IR) nTS neurons, yet there appears to be a failure to initiate appropriate chemoreflex-mediated cardiorespiratory responses. Our photometry data further indicate that fentanyl-induced changes in neuronal activity are mediated, in part, by peripheral MOR. Collectively, these findings suggest that fentanyl impacts nTS activity through alterations in peripheral afferent signaling to the nTS, which may contribute to the severity and duration of OIRD.

      (4) It would help with the flow of the paper if the pharmacokinetic data shown in Figure 6 were presented earlier (as part of Figure 2).

      We have moved the biodistribution data earlier in the manuscript, now presenting it as Figure 2. The numbering of all subsequent figures has been adjusted accordingly.

      (5) In Figure 5, there appears to be a large number of GCaMP-expressing neurons located outside the nTS. To what degree can the changes in calcium signaling, attributed to alterations in neural activity in the nTS, be explained by altered activity of neurons located outside the nTS?

      The reviewer is correct that our viral spread extends beyond the boundaries of the nTS, raising the possibility that the responses observed in Figure 5 may be influenced by neural activity of cells outside the nTS. While some viral spread beyond the target region is unavoidable, calcium transients were measured at the tip of the fiber, which was positioned directly within the nTS.

      To address this concern further, we performed Fos immunohistochemistry in a subset of animals that received bilateral GCaMP virus injections into the nTS. Following fentanyl administration (50 µg/kg IV), brains were collected two hours later. As shown in the accompanying image, we observed Fos-IR co-expression with GCaMP exclusively within the nTS boundaries. No Fos-IR was detected outside the nTS, including in GCaMP cells. Taken together, these findings support our conclusion that the data depicted in our photometry figure (now Figure 6) accurately represent fentanyl-induced activity changes in nTS neurons.

      Author response image 1.

      Arrowheads: Fos-negative GCaMP cell; Arrows: Co-labeled Fos/GCaMP cell; Asterisk: Fos+ GCaMP-negative cell

      (6) Currently, the cFos and photometry data are descriptive in nature. Are opioid-induced changes in nTS neural activity relevant to respiratory depression? If so, one might expect DREADD-mediated stimulation of the nTS neural activity (or stimulating nTS activity by some other means) would reverse fentanyl-induced respiratory depression similar to naloxone and methyl-naloxone.

      The reviewer raises an interesting point regarding the relevance of the nTS in the context of OIRD. The nTS is a major site of integration of sensory afferent information and involved in the initiation of reflex responses that facilitate a return to homeostasis. As described above, we characterized the collective response of nTS neurons to intravenous fentanyl using both Fos immunohistochemistry and fiber photometry. Our data indicate that fentanyl-induced changes in nTS activity are strongly mediated by peripheral MOR. While the suggestion to use global chemogenetic activation of nTS neurons to reverse fentanyl-induced respiratory depression is intriguing, results from these experiments may be difficult to interpret due to the extensive heterogeneity of the nTS. However, we are currently conducting similar experiments using a more selective approach that will allow us to isolate and evaluate specific nTS phenotypes to better understand their contributions to OIRD.

      (7) Are peripherally restricted mu opioid receptor (MOR) agonists available? If so, it would strengthen the paper if such compounds could be used to show that stimulation of peripheral MORs is sufficient to induce respiratory distress independent of actions on centrally located MORs.

      Peripherally acting Mu Opioid Receptor Antagonists (PAMORAs) are indeed available and currently being evaluated in our laboratory.

      Reviewer #2 (Recommendations for the authors):

      Consider having the figures/data numbered in the order that they appear in the manuscript. Right now, Figure 6 is mentioned between Figures 1 and 2 (minor).

      Thank you for this suggestion. We have reordered the figures so that the biodistribution figure appears before the MOR antagonist pretreatment and reversal figures.

      Reviewer #3 (Recommendations for the authors):

      This manuscript outlines a series of very exciting and game-changing experiments examining the role of peripheral MORs in OIRD. The authors outline experiments that demonstrate a peripherally restricted MOR antagonist (NLX Methiodide) can rescue fentanyl-induced respiratory depression and this effect coincides with a lack of conditioned place aversion. This approach would be a massive boon to the OUD community, as there are a multitude of clinical reports showing that naloxone rescue post fentanyl over-intoxication is more aversive than the potential loss-of-life to the individuals involved. This important study reframes our understanding of successful overdose rescue with potential for reduced aversive withdrawal effects.

      While this is an exciting and important study, there are a few minor to moderate critiques for the authors to consider. These are below.

      (1) Title: "devoid of aversive effects" - While CPA is a good, cumulative indicator of potential aversive effects, it is not an exhaustive one. Since no other withdrawal measures were included, this is an overstatement.

      The reviewer is correct in noting that our analysis of aversive effects is not exhaustive. Since we only assessed changes in aversive behavior between NLX and NLXM, we believe it is more accurate to modify the title accordingly. We have changed the title from “devoid of aversive effects” to “devoid of aversive behavior” better reflect the scope of the experiments conducted.

      (2) Page 3, top line: MOR (mu opioid receptor) is highly expressed...

      An article should likely be included prior to MOR or make plural and adjust the sentence.

      Thank you for this suggestion. We have reworked this section in the manuscript.

      (3) Figure 6D: this figure is very important for the interpretation of every single figure. It should either be moved to figure 1 or 2 or combined with figure 1 or 2.

      Thank you for this suggestion. The biodistribution figure has been moved to Figure 2.

      (4) Page 5, line 164, Figure 21-D: remove the 1.

      Done.

      (5) Sex differences (or lack thereof):

      Throughout the manuscript, the authors report a lack of sex differences. However, while the data is not powered for the distinction of sex differences, there appears to be a bi-modal distribution of the individual data points that likely correspond to sex across most experiments. For example, in Figure 2E there are both color and clear dots, which this reviewer assumes indicates sex (however, this wasn't easily apparent if it was commented on at all in the paper). If you look at the saline oxygen saturation (nadir) levels (2e), there is wide variability with the red-filled circles, but not the clear ones. This may indicate a bimodal distribution (and may be related to the baseline HR sex differences highlighted). This is also the case in Figure 2L but is perhaps more obvious in the CPA score data (Figure 4d), where it seems the nlx negative CPA effects were likely driven primarily by one sex. While this reviewer does not expect a full powering of experiments for sex differences (and also is very appreciative of the inclusion of both sexes), full raw data with sex indicated included in the supplemental data would greatly aid the field in general and allow for those with a specific interest in this area to build upon this data. Additionally, further discussion regarding the potential role of sex differences in the translational value of these findings is also warranted.

      For all bar graphs, open symbols represent females and filled symbols represent males. This information can be found in the first paragraph of the Materials and Methods section. We have also added this information to each figure for increased visibility. We appreciate the acknowledgement of our inclusion of both sexes. For all experiments, we attempted to balance by sex. Unfortunately, we occasionally had to exclude animals for technical reasons (with clogged catheters being the most common reason for exclusion). This sometimes led to an imbalance in sex in some groups, as the reviewer has noted. In the graph of oxygen saturation nadir values in Fig 2E (now Fig 3E in the revised manuscript, all animals received intravenous fentanyl at a dose of 20 ug/kg. The reviewer is correct that there is greater variability in the males (filled symbols) compared to the females (open symbols) in this graph. However, this variability in the distribution was not observed in Fig 1E or Fig 4E, in which male and female rats received an identical dose of 20 ug/kg. Taking this into account, our overall interpretation of the data is that there is relatively minor sex difference in the responses observed after intravenous fentanyl, and the variability in Fig 3E is primarily due to a lower n compared to Fig 1E.

      All raw data will be uploaded to a data repository.

      (6) Page 7, line 209: Figure 5D should be Figure 6D.

      We have incorporated this change.

      (7) Page 8, line 267: Cure should be Curve.

      We have incorporated this change.

      (8) Discussion: Page10, line322 states that "no detectable NLX ... was found in brain tissue". This is incorrect based on Figure 6.

      The sentence the reviewer highlighted refers to detection of NLX or NLXM in brain tissue from animals that received intravenous NLXM. As demonstrated in the biodistribution figure (now Figure 2 in the manuscript), our data demonstrate that an intravenous injection of NLXM did not result in NLX formation in the brain. We have reworked the sentence for clarity.

      (9) jGCaMP injections: Figure 5B/c shows the distribution of the gcamp across animals. The optic fiber is placed directly over the NTs. However, how are we certain there isn't a nearby nuclei/structure outside the NTS that is contributing to the photometry data presented in D-G?

      See our above comment.  

      (10) Fiber Photometry and Sex: These studies unfortunately may have had only 1 of a sex included in the fiber photometry data. While the inclusion is overall good, the single value for a sex suggests that there are differences, given the clustering of the data. While the anesthesia may be driving this potential sex effect, it is not clear based on the data presented. For reference: https://link.springer.com/article/10.1007/s12975-012-0229-y

      The reviewer is correct that there was an imbalance of sex in this dataset. While we made every attempt to balance for sex across all experiments, we unfortunately had to exclude some animals for technical reasons (clogged catheter, missed injection site, etc). This produced an imbalance in our photometry studies and did not allow us to thoroughly evaluate sex differences in fentanyl-induced changes in neural activity or in the responses to anesthesia. We have expanded on this limitation in the discussion.

      (11) Figure 5 - the bars are not the color indicated by the legend.

      We have corrected this in the figure. Thank you.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1:

      While very positive towards our manuscript, this reviewer also points out three suggestions for improvement.

      Overall, there are not many weaknesses. The main one I noticed is with the lipidomic analysis shown in Figs 3C, 7C, S1 and S3. While these data are an essential part of the analysis and provide strong evidence for the conclusions of the study, it is unfortunate that the methods used did not enable the distinction between two 18:1 isomers. These two isomers of 18:1 are important in C. elegans biology, because one is a substrate for FAT-2 (18:1n-9, oleic acid) and the other is not (18:1n-7, cis vaccenic acid). Although rarer in mammals, cisvaccenic acid is the most abundant fatty acid in C. elegans and is likely the most important structural MUFA. The measurement of these two isomers is not essential for the conclusions of the study, but the manuscript should include a comment about the abundance of oleic vs vaccenic acid in C. elegans (authors can find this information, even in the fat-2 mutant, in other publications of C. elegans fatty acid composition). Otherwise, readers who are not familiar with C. elegans might assume the 18:1 that is reported is likely to be mainly oleic acid, as is common in mammals.

      Excellent point. As suggested by the reviewer, we now include a clarification of this in the text: "Consistent with previous publications [10], the levels of 18:1 fatty acids were greatly increased in the fat-2(wa17) mutant. It is important to note that the majority of these 18:1 fatty acids is likely 18:1n7 (vaccenic acid) and not 18:1n9 (OA) [10,23], which is the substrate of FAT-2; the lipid analysis methods used here are not able to distinguish between the two 18:1 species."

      The title could be less specific; it might be confusing to readers to include the allele name in the title.

      We thank the reviewer for the suggestion, and we have now modified the title:

      "Forward Genetics In C. elegans Reveals Genetic Adaptations To Polyunsaturated Fatty Acid Deficiency"

      There are two errors in the pathway depicted in Figure 1A. The16:0-16:1 desaturation can be performed by FAT-5, FAT-6, and FAT-7. The 18:0-18:1 desaturation can only be performed by FAT-6 and FAT-7.

      We thank the reviewer for pointing out this mistake. The pathway in Fig. 1A has been corrected.

      Reviewer #2:

      This reviewer was also very positive towards our manuscript but also pointed out several suggestions for additional experiments or changes to the manuscript.

      Major recommendations

      (1) To conclude that membrane rigidification is not the major cause of defects associated with fat-2 mutations, the authors need to show that fluidity is rescued by their treatments (oleic acid or NP-40). I honestly doubt that it is the case, as oleic acid is already abundant in fat-2 mutants. It is possible that the treatments, which are effective in rescuing fluidity in paqr-2 mutants, do not have the same effects in fat-2 mutants.

      The reviewer raises an important point. In an effort to address this, we have now performed a FRAP study on fat-2(wa17) mutants with/without NP40 as a fluidizing agent (with wild-type and paqr-2 mutants as controls). The new data, now included as Fig. 2J, shows that NP40 did improve the fluidity of the intestinal cell membrane in the fat-2(wa17) mutant, though not to the same degree as in the paqr-2 mutant. This is now cited in the text as follows:

      "However, cultivating the fat-2(wa17) mutant in the presence of the non-ionic detergent NP40, which improves the growth of the paqr-2(tm3410) mutant [17], did not suppress the poor growth phenotype of the fat-2(wa17) mutant even though it did improve membrane fluidity as measured using FRAP (Fig. 2I-J). Similarly, supplementing the fat-2(wa17) mutant with the MUFA oleic acid (OA, 18:1), which also suppresses paqr-2(tm3410) phenotypes [17], did not suppress the poor growth phenotype of the fat-2(wa17) mutant (Fig 2K)."

      (2) It is not validated experimentally that the mutations converge into FTN-2 repression. This can be verified by analyzing mRNA or protein expression of FTN-2 in the egl-9 and hif-1 mutants obtained in the screening.

      Our manuscript does lean on several publications that previously established the HIF-1 pathway in C. elegans. Additionally, we now added a qPCR experiment showing that the newly isolated hif-1(et69) allele indeed suppresses the expression of ftn-2. This was an especially valuable experiment since the hif-1(et69) is proposed to act as a gain-of-function allele that would constitutively suppress ftn-2 expression. This new result is included as Fig. 6C and mentioned in the text:

      "Inhibition of egl-9 promotes HIF-1 activity [41], which we here verified for the egl-9(et60) allele using western blots (Fig 6A). Additionally, we found by qPCR that ftn-2 mRNA levels are as expected reduced by the proposed gain-of-function hif-1(et69) allele (Fig 6C). We conclude that the egl-9 and hif-1 suppressor mutations likely converge on inhibiting ftn-2 and thus act similarly to the ftn-2 loss-of-function alleles."

      (3) In the hif-1(et69) and ftn-2(et68) mutants, the rescues in lipid composition seem to be minor, with eicosapentaenoic acid (EPA) levels remaining low. The ftn-2 mutant data is especially concerning, as it suggests that egl-9 mutants rescue lipid composition via distinct mechanisms not including ftn-2 suppression. I suggest that the authors test the minimal doses of linoleic acid or EPA required to rescue fat-2 mutants and perform lipidomics to test which is the degree of EPA restoration that is needed. If a low level of restoration is sufficient, the hif-1 and ftn-2 mutants might indeed rescue phenotypes via a restoration of EPA levels. Otherwise, other mechanisms have to be considered.

      In line with the above issue, the low level or EPA restoration in hif-1 and ftn-2 mutants raise the possibility that the mutations rescue fat-2 mutants downstream of lipid changes. The reduction in HIF-1 levels in fat-2 mutants also suggest that lipid changes affect HIF-1 expression. Thus, the "impossibility to genetically compensate PUFA deficiency" might be wrong. The above experiment would answer to this point too.

      The reviewer is entirely correct to consider alternative explanations. In the lipidomics in Fig 3, we see that fat-2(wa17) worms on NGM have only ~1.5-2%mol EPA in phosphatidylcholines. When treated with 2 mM LA, the levels of EPA rise to ~10%mol, still below the ~ 25% observed in N2 but perhaps this is sufficient cause for restoring fat-2(wa17) health. Similarly, the hif-1(et69) and ftn-2(et68) mutant alleles elevate EPA levels to 5- 7% in fat-2(wa17). Thus, we have a correlation where a significant increase in EPA, obtained either through LA supplementation or through suppressor mutations (e.g. egl-9 (et60), hif-1(et69) or ftn-2(et68)), is associated with improved growth and health of the fat-2(wa17) mutant. However, correlation is of course not proof. The suggested experiment to titrate EPA to its lowest fat-2(wa17) rescuing levels and then perform lipidomics analysis was not possible in a reasonable time frame during this revision. However, preliminary experiments showed that even 25 μM LA (most of which will be converted to EPA by the worms) is enough to rescue the fat-2(wa17) or null mutant (Author response image 1), suggesting that even tiny amounts (much below the >250 μM used in our article) bring great benefits.

      Author response image 1.

      Nevertheless, we now acknowledge in the discussion that alternative explanations exist:

      "Other mechanisms are also possible. For example, mutations in the HIF-1 pathway could somehow reduce EPA turnover rates in the fat-2(wa17) mutant and allow its levels to rise above an essential threshold. This hypothesis is consistent with the observation that the suppressors can rescue both the fat-2(wa17) mutant and fat-2 RNAi-treated worms but not the fat-2 null mutant. It is even possible, though deemed unlikely, that the fat-2(wa17) suppressors act by compensating for the PUFA shortage via some undefined separate process downstream of the lipid changes and that they only indirectly result in elevated EPA levels."

      Additionally, another possible mechanism of action of the fat-2(wa17) suppressors could have been that they all cause upregulation of the FAT-2 protein. We have now explored this possibility using Western blots and found that this is an unlikely mechanism. This is presented in Fig. 6D-E and S3C-D, mentioned in the text as follows:

      "We also used Western blots to evaluate the abundance of the FAT-2 protein expressed from endogenous wild-type or mutant loci but to which a HA tag was fused using CRISPR/Cas9. We found that the FAT-2::HA levels are severely reduced when the locus contains the S101F substitution present in the wa17 allele, but restored close to wild-type levels by the fat2(et65) suppressor mutation (Fig 6D-E, S3C-D Fig). The levels of FAT-2 in the HIF-1 pathway suppressors varied between experiments, with the suppressors sometimes restoring FAT-2 levels and sometimes not even when the worms were growing well (Fig 6D-E, S3C-D Fig). The fat-2(wa17) suppressors, except for the intragenic fat-2 alleles, likely do not act by increasing FAT-2 protein levels."

      (4) It should be tested how Fe2+ levels are changed in the mutants, and how effective the ferric ammonium citrate treatment is. The authors might use a ftn-1::GFP reporter for this purpose.

      We did obtain a strain carrying the ftn-1::GFP reporter but could not generate conclusive data with it. In particular, we saw no increase in fluorescence in fat-2(wa17) worms carrying suppressor mutations. However, we also found that even FAC treatment that rescue the fat2(wa17) mutant did not result in a measurable increased GFP levels suggesting that the reporter is not sensitive enough.

      Minor comments

      (1) I think that putting Figure 6A in Figure 5 would be helpful for the readers, so that they understand that the mutations converge in the same pathway.

      This is now done.

      (2) Page 3: While it is clear that paqr-2 regulates lipid composition, I believe that it remains unclear if it "promote the production and incorporation of PUFAs into phospholipids to restore membrane homeostasis".

      A reference was missing to support that statement. Ruiz et al. (2023) is now cited for this (ref. 7).

      (3) C. elegans is extremely rich in EPA (see for example DOI: 10.3390/jcm5020019), but the lipidomics data in this study rather suggest that oleic acid is predominant. I recommend to check why this discrepancy occurs.

      OA (18:1n9) makes up only ~2%, but vaccenic acid (18:1n7) is ~21% in WT worms, EPA is slightly less at ~19% (Watts et al. 2002). These match with our lipidomics results although we cannot distinguish between 18:1n9 and n7. See also answer to Reviewer #1, comment 1.

      (4) Abstract: The authors write that mammals do not synthesize PUFAs, which is almost correct, but they still produce the PUFA mead acid. Thus, the statement is not completely right.

      Didn't know that! From literature, it is our understanding that mammals synthesize mead acid during FA deficiency but not in normal conditions, so they are not regularly producing mead acid. We have now updated the introduction:

      "An exception to this exists during severe essential fatty acid deficiency when mammals can synthesize mead acid (20:3n9), though this is not a common occurrence [11]"

      (5) Page 10: Eicosanoids are C20 lipid mediators, thus those produced from docosahexaenoic acid are not eicosanoids. Correct the statement.

      We thank the reviewer for pointing this out. We now write:

      " EPA and DHA, being long chain PUFAs should have similar fluidizing effects on membrane properties (though in vitro experiments challenge this view [78]), and both can serve as precursors of eicosanoids or docosanoids, particularly inflammatory ones [79]."

      (6) Page 7: "hif-1(et69) is similarly able to suppress fat-2(wa17) when ftn-2 is knocked out" I am not sure that the data agrees with this statement, and it is unclear what we can conclude from such observation.

      Fig. 2D shows that ftn-2(et68) suppresses fat-2(wa17) even in the presence of a hif-1(ok2654) null allele, showing that no HIF-1 function is required once ftn-2 is mutated. Conversely, Fig S2E shows that combining both the hif-1(et69) and the ftn-2(ok404) null allele also suppresses fat-2(wa17) (the worms do not fully reach N2 length, but they are significantly longer and were fertile adults); this is merely the expected outcome if the pathway converges on loss of ftn-2 function, though other interpretations could be possible from this experiment alone.

      (7) S3 Fig: in panel B, is the last column ftn-2;egl-9 mutant? I would imagine that it is ftn2;fat-2.

      We thank the reviewer for pointing this out. This has been corrected.

      (8) Fig 6B, how many times has been this experiment done?

      With these exact conditions (6h and 20h hypoxia) and order of strains the blot was done once, but the blot overall was done 5 times. We now added another replicate in Fig. S3A.

      Note also that a few minor modifications have been made throughout the text, which can be seen in the Word file with tracked changes.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Response to the Joint Public Review:

      We are indebted to eLife’s reviewing process for helping us improve our manuscript and for highlighting that our study provides new molecular insights into SFT pathogenesis.  

      Response to Reviewers:

      (1) The authors state that "NAB2-STAT6 localization is exclusively driven by EGR1 binding" yet WT1 motives are also consistently enriched. Can you please touch upon the potential involvement of WT1 (or lack thereof, and why)?

      Our data suggest that EGR1 is the primary driver of NAB2-STAT6 localization. In fact, EGR1 is the most significantly enriched motif (Fig. 4) at NAB2-STAT6 binding sites and we detect an interaction between the fusion protein and EGR1 (Fig. 5). Conversely, we did not identify an interaction between NAB2-STAT6 and WT1. However, WT1 also belongs to the C2H2 zinc finger subclass and recognizes a motif bearing striking similarities to the EGR1/2 consensus. EGR1 has been previously described to bind WT1 motifs and to function as an activator of WT1 targets (as opposed to WT1 repressive abilities). See https://www.jbc.org/article/S0021-9258(20)74720-4/fulltext and https://www.sciencedirect.com/science/article/pii/S0378111901005935.

      (2) In the description of Figure 5C the authors observe nuclear staining of both NAB2 and STAT6 following NAB2-STAT6 fusion induction. They interpret this as the fusion stimulates nuclear translocation of endogenous NAB2. This statement can only be rigorously made if the authors can unequivocally demonstrate that their antibody exclusively detects endogenous NAB2 and not the NAB2 portion of the fusion. As presented, a more likely interpretation is that the NAB2 staining detects NAB2-STAT6 fusion protein. Since there is some cytoplasmic NAB2 signal still present, the findings in Figure 5c do not support nor disprove nuclear translocation of endogenous NAB2. It may be prudent to remove this section. Figure 5B is currently the best direct evidence of nuclear translocation.

      We agree with the reviewer that Fig. 5C does not rigorously show that NAB2-STAT6 fusion proteins drag endogenous NAB2 into the nucleus. The immunostaining reveals that wt NAB2 localization is overwhelmingly cytoplasmic at steady-state conditions (and prior to expression of the fusion protein). Instead, Figure 5B shows that endogenous NAB2 translocates to the nucleus upon NAB2-STAT6 expression. Additionally, figure 5A (along with Suppl. Fig. 5 E-F) demonstrates that endogenous NAB2 co-precipitates with NAB2-STAT6 fusions in nuclear extracts of U2OS and HEK293T cells. We have rephrased the paragraph accordingly.

      (3) Figure 5D: for the interpretation of the presented data to hold up, namely, NAB1 nuclear translocation upon NAB2-STAT6 expression, it is important to demonstrate that NAB1 antibodies do not cross-react with NAB2 given the similarity between NAB1 and NAB2. Without such control, another likely interpretation of the results in Figure 5D is that NAB1 antibody detects the NAB2 portion of the overexpressed fusion protein. This needs to be acknowledged in the text.

      We had similar concerns, therefore we confirmed that the NAB1 antibody does not cross react with NAB2 by immunoblot (see figure below). We overexpressed FLAG-NAB2, HA-NAB1 and GFP constructs in HEK293T cells, we performed immunoprecipitation with either HA or FLAG from whole cell extracts followed by western blot using anti-NAB2 and anti-NAB1 polyclonal antibodies. We did not observe cross-reactivity of these antibodies. We acknowledged antibody validation in the revised text.

      Author response image 1.

      (4) Also, to support the notion that NAB2-STAT6 fusion promotes nuclear translocation of the entire complex, an imaging approach detecting EGR1 similar to Figure 5C-D would be helpful. EGR1 staining also avoids the potential pitfall of NAB1/2 antibodies detecting NAB2-STAT6 overexpressed fusion instead of endogenous proteins.

      We agree with the reviewer that this would be a helpful approach. Unfortunately, none of the commercially available EGR1 antibodies that we tested were suitable for immunocytochemistry, as they either failed to show a proper signal or were marred by high nonspecific background signal.

      (5) The authors found increased mRNA expression of certain cytokines and secreted neuropeptides in SFTs. While this may be consistent with a secretory phenotype, additional evidence such as detection of elevated levels of these proteins in tumor lysates or in culture media is necessary to formally make this claim. Please rephrase.

      We have rephrased our claims as suggested. The revised text is now as follows: “​​We also identified a distinct secretory gene signature associated with SFTs. In fact, IGF2 is the most upregulated gene, via activation of an intronic enhancer by EGR1. IGF2 was pinpointed as the cause of hypoglycemia occurring in a very small subset of SFTs (Doege–Potter syndrome)(52). Our data suggest that IGF2 (and IGF1) upregulation is a common feature of all SFTs. In addition to insulin-like growth factors, STFs may secrete a host of peptides with diverse functions in neuronal processes, chemotaxis, and growth stimulation. The previously unrecognized neuronal features and the putative secretory phenotype of STFs set them apart from mesenchymal malignancies and relate them to neuroendocrine malignancies such as pheochromocytoma, oligodendroglioma and neuroblastoma.”

      (6) GSEA with 500 randomly selected genes from target datasets needs a more detailed description to clarify the method.

      To improve clarity, we added the following description: “Gene set enrichment analysis (GSEA) was done with 500 randomly selected genes from the given set of genes across the C2 collection of the human molecular signatures database or custom signatures using the GSEA function in clusterProfiler package in R (v4.6.2).

      (7) In the IP-MS description, please double check the NaCl concentration in the second extraction step - 0.5mM seems low. Also, in the IP part, a buffer recipe appears to have been incorrectly pasted.

      We thank the reviewer for identifying this typo. Indeed, we used 0.5M NaCl instead of 0.5mM. We have corrected the co-IP buffer recipe accordingly.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment:

      This important study represents a comprehensive computational analysis of Plasmodium falciparum gene expression, with a focus on var gene expression, in parasites isolated from patients; it assesses changes that occur as the parasites adapt to short-term in vitro culture conditions. The work provides technical advances to update a previously developed computational pipeline. Although the findings of the shifts in the expression of particular var genes have theoretical or practical implications beyond a single subfield, the results are incomplete and the main claims are only partially supported.

      The authors would like to thank the reviewers and editors for their insightful and constructive assessment. We particularly appreciate the statement that our work provides a technical advance of our computational pipeline given that this was one of our main aims. To address the editorial criticisms, we have rephrased and restructured the manuscript to ensure clarity of results and to support our main claims. For the same reason, we removed the var transcript differential expression analysis, as this led to confusion.

      Public Reviews:

      Reviewer #1:

      The authors took advantage of a large dataset of transcriptomic information obtained from parasites recovered from 35 patients. In addition, parasites from 13 of these patients were reared for 1 generation in vivo, 10 for 2 generations, and 1 for a third generation. This provided the authors with a remarkable resource for monitoring how parasites initially adapt to the environmental change of being grown in culture. They focused initially on var gene expression due to the importance of this gene family for parasite virulence, then subsequently assessed changes in the entire transcriptome. Their goal was to develop a more accurate and informative computational pipeline for assessing var gene expression and secondly, to document the adaptation process at the whole transcriptome level.

      Overall, the authors were largely successful in their aims. They provide convincing evidence that their new computational pipeline is better able to assemble var transcripts and assess the structure of the encoded PfEMP1s. They can also assess var gene switching as a tool for examining antigenic variation. They also documented potentially important changes in the overall transcriptome that will be important for researchers who employ ex vivo samples for assessing things like drug sensitivity profiles or metabolic states. These are likely to be important tools and insights for researchers working on field samples.

      One concern is that the abstract highlights "Unpredictable var gene switching..." and states that "Our results cast doubt on the validity of the common practice of using short-term cultured parasites...". This seems somewhat overly pessimistic with regard to var gene expression profiling and does not reflect the data described in the paper. In contrast, the main text of the paper repeatedly refers to "modest changes in var gene expression repertoire upon culture" or "relatively small changes in var expression from ex vivo to culture", and many additional similar assessments. On balance, it seems that transition to culture conditions causes relatively minor changes in var gene expression, at least in the initial generations. The authors do highlight that a few individuals in their analysis showed more pronounced and unpredictable changes, which certainly warrants caution for future studies but should not obscure the interesting observation that var gene expression remained relatively stable during transition to culture.

      Thank you for this comment. We were happy to modify the wording in the abstract to have consistency with the results presented by highlighting that modest but unpredictable var gene switching was observed while substantial changes were found in the core transcriptome. Moreover, any differences observed in core transcriptome between ex vivo samples from naïve and pre-exposed patients are diminished after one cycle of cultivation making inferences about parasite biology in vivo impossible.

      Therefore, – to our opinion – the statement in the last sentence is well supported by the data presented.

      Line 43–47: “Modest but unpredictable var gene switching and convergence towards var2csa were observed in culture, along with differential expression of 19% of the core transcriptome between paired ex vivo and generation 1 samples. Our results cast doubt on the validity of the common practice of using short-term cultured parasites to make inferences about in vivo phenotype and behaviour.” Nevertheless, we would like to note that this study was in a unique position to assess changes at the individual patient level as we had successive parasite generations. This comparison is not done in most cross-sectional studies and therefore these small, unpredictable changes in the var transcriptome are missed.

      Reviewer #2:

      In this study, the authors describe a pipeline to sequence expressed var genes from RNA sequencing that improves on a previous one that they had developed. Importantly, they use this approach to determine how var gene expression changes with short-term culture. Their finding of shifts in the expression of particular var genes is compelling and casts some doubt on the comparability of gene expression in short-term culture versus var expression at the time of participant sampling. The authors appear to overstate the novelty of their pipeline, which should be better situated within the context of existing pipelines described in the literature.

      Other studies have relied on short-term culture to understand var gene expression in clinical malaria studies. This study indicates the need for caution in over-interpreting findings from these studies.

      The novel method of var gene assembly described by the authors needs to be appropriately situated within the context of previous studies. They neglect to mention several recent studies that present transcript-level novel assembly of var genes from clinical samples. It is important for them to situate their work within this context and compare and contrast it accordingly. A table comparing all existing methods in terms of pros and cons would be helpful to evaluate their method.

      We are grateful for this suggestion and agree that a table comparing the pros and cons of all existing methods would be helpful for the general reader and also highlight the key advantages of our new approach. A table comparing previous methods for var gene and transcript characterisation has been added to the manuscript and is referenced in the introduction (line 107).

      Author response table 1.

      Comparison of previous var assembly approaches based on DNA- and RNA-sequencing.

      Reviewer #3:

      This work focuses on the important problem of how to access the highly polymorphic var gene family using short-read sequence data. The approach that was most successful, and utilized for all subsequent analyses, employed a different assembler from their prior pipeline, and impressively, more than doubles the N50 metric.

      The authors then endeavor to utilize these improved assemblies to assess differential RNA expression of ex vivo and short-term cultured samples, and conclude that their results "cast doubt on the validity" of using short-term cultured parasites to infer in vivo characteristics. Readers should be aware that the various approaches to assess differential expression lack statistical clarity and appear to be contradictory. Unfortunately, there is no attempt to describe the rationale for the different approaches and how they might inform one another.

      It is unclear whether adjusting for life-cycle stage as reported is appropriate for the var-only expression models. The methods do not appear to describe what type of correction variable (continuous/categorical) was used in each model, and there is no discussion of the impact on var vs. core transcriptome results.

      We agree with the reviewer that the different methods and results of the var transcriptome analysis can be difficult to reconcile. To address this, we have included a summary table with a brief description of the rationale and results of each approach in our analysis pipeline.

      Author response table 2.

      Summary of the different levels of analysis performed to assess the effect of short-term parasite culturing on var and core gene expression, their rational, method, results, and interpretation.

      Additionally, the var transcript differential expression analysis was removed from the manuscript, because this study was in a unique position to perform a more focused analysis of var transcriptional changes across paired samples, meaning the per-patient approach was more suitable. This allowed for changes in the var transcriptome to be identified that would have gone unnoticed in the traditional differential expression analysis.

      We thank the reviewer for his highly important comment about adjusting for life cycle stage. Var gene expression is highly stage-dependent, so any quantitative comparison between samples does need adjustment for developmental stage. All life cycle stage adjustments were done using the mixture model proportions to be consistent with the original paper, described in the results and methods sections:

      • Line 219–221: “Due to the potential confounding effect of differences in stage distribution on gene expression, we adjusted for developmental stage determined by the mixture model in all subsequent analyses.”

      • Line 722–725: “Var gene expression is highly stage dependent, so any quantitative comparison between samples needs adjustment for developmental stage. The life cycle stage proportions determined from the mixture model approach were used for adjustment.“

      The rank-expression analysis did not have adjustment for life cycle stage as the values were determined as a percentage contribution to the total var transcriptome. The var group level and the global var gene expression analyses were adjusted for life cycle stages, by including them as an independent variable, as described in the results and methods sections.

      Var group expression:

      • Line 321–326: “Due to these results, the expression of group A var genes vs. group B and C var genes was investigated using a paired analysis on all the DBLα (DBLα1 vs DBLα0 and DBLα2) and NTS (NTSA vs NTSB) sequences assembled from ex vivo samples and across multiple generations in culture. A linear model was created with group A expression as the response variable, the generation and life cycle stage as independent variables and the patient information included as a random effect. The same was performed using group B and C expression levels.“

      • Line 784–787: “DESeq2 normalisation was performed, with patient identity and life cycle stage proportions included as covariates and differences in the amounts of var transcripts of group A compared with groups B and C assessed (Love et al., 2014). A similar approach was repeated for NTS domains.”

      Gobal var gene expression:

      • Line 342–347: “A linear model was created (using only paired samples from ex vivo and generation 1) (Supplementary file 1) with proportion of total gene expression dedicated to var gene expression as the response variable, the generation and life cycle stage as independent variables and the patient information included as a random effect. This model showed no significant differences between generations, suggesting that differences observed in the raw data may be a consequence of small changes in developmental stage distribution in culture.”

      • Line 804–806: “Significant differences in total var gene expression were tested by constructing a linear model with the proportion of gene expression dedicated to var gene expression as the response variable, the generation and life cycle stage as an independent variables and the patient identity included as a random effect.“

      The analysis of the conserved var gene expression was adjusted for life cycle stage:

      • Line 766–768: “For each conserved gene, Salmon normalised read counts (adjusted for life cycle stage) were summed and expression compared across the generations using a pairwise Wilcoxon rank test.”

      And life cycle stage estimates were included as covariates in the design matrix for the domain differential expression analysis:

      • Line 771–773: “DESeq2 was used to test for differential domain expression, with five expected read counts in at least three patient isolates required, with life cycle stage and patient identity used as covariates.”

      Reviewer #1:

      1. In the legend to Figure 1, the authors cite "Deitsch and Hviid, 2004" for the classification of different var gene types. This is not the best reference for this work. Better citations would be Kraemer and Smith, Mol Micro, 2003 and Lavstsen et al, Malaria J, 2003.

      We agree and have updated the legend in Figure 1 with these references, consistent with the references cited in the introduction.

      1. In Figures 2 and 3, each of the boxes in the flow charts are largely filled with empty space while the text is nearly too small to read. Adjusting the size of the text would improve legibility.

      We have increased the size of the text in these figures.

      1. My understanding of the computational method for assessing global var gene expression indicates an initial step of identifying reads containing the amino acid sequence LARSFADIG. It is worth noting that VAR2CSA does not contain this motif. Will the pipeline therefore miss expression of this gene, and if so, how does this affect the assessment of global var gene assessment? This seems relevant given that the authors detect increased expression of var2csa during adaptation to culture.

      To address this question, we have added an explanation in the methods section to better explain our analysis. Var2csa was not captured in the global var gene expression analysis, but was analyzed separately because of its unique properties (conservation, proposed role in regulating var gene switching, slightly divergent timing of expression, translational repression).

      • Line 802/3: “Var2csa does not contain the LARSFADIG motif, hence this quantitative analysis of global var gene expression excluded var2csa (which was analysed separately).”
      1. In Figures 4 and 7, panels a and b display virtually identical PCA plots, with the exception that panel A displays more generations. Why are both panels included? There doesn't appear to be any additional information provided by panel B.

      We agree and have removed Figure 7b for the core transcriptome PCA as it did not provide any new information. The var transcript differential analysis (displayed in Figure 4) has been removed from the manuscript.

      1. On line 560-567, the authors state "However, the impact of short-term culture was the most apparent at the var transcript level and became less clear at higher levels." What are the high levels being referred to here?

      We have replaced this sentence to make it clearer what the different levels are (global var gene expression, var domain and var type).

      • Line 526/7: “However, the impact of short-term culture was the most apparent at the var transcript level and became less clear at the var domain, var type and global var gene expression level.”

      Reviewer #2:

      The authors make no mention or assessment of previously published var gene assembly methods from clinical samples that focus on genomic or transcriptomic approaches. These include:

      https://pubmed.ncbi.nlm.nih.gov/28351419/

      https://pubmed.ncbi.nlm.nih.gov/34846163/

      These methods should be compared to the method for var gene assembly outlined by the co-authors, especially as the authors say that their method "overcomes previous limitations and outperforms current methods" (128-129). The second reference above appears to be a method to measure var expression in clinical samples and so should be particularly compared to the approach outlined by the authors.

      Thank you for pointing this out. We have included the second reference in the introduction of our revised manuscript, where we refer to var assembly and quantification from RNA-sequencing data. We abstained from including the first paper in this paragraph (Dara et al., 2017) as it describes a var gene assembly pipeline and not a var transcript assembly pipeline.

      • Line 101–105: “While approaches for var assembly and quantification based on RNA-sequencing have recently been proposed (Wichers et al., 2021; Stucke et al., 2021; Andrade et al., 2020; TonkinHill et al., 2018, Duffy et al., 2016), these still produce inadequate assembly of the biologically important N-terminal domain region, have a relatively high number of misassemblies and do not provide an adequate solution for handling the conserved var variants (Table S1).”

      Additionally, we have updated the manuscript with a table (Table S1) comparing these two methods plus other previously used var transcript/gene assembly approaches (see comment to the public reviews).

      But to address this particular comment in more detail, the first paper (Dara et al., 2017) is a var gene assembly pipeline and not a var transcript assembly pipeline. It is based on assembling var exon 1 from unfished whole genome assemblies of clinical samples and requires a prior step for filtering out human DNA. The authors used two different assemblers, Celera for short reads (which is no longer maintained) and Sprai for long reads (>2000bp), but found that Celera performed worse than Sprai, and subsequently used Sprai assemblies. Therefore, this method does not appear to be suitable for assembling short reads from RNA-seq.

      The second paper (Stucke et al. 2021) focusses more on enriching for parasite RNA, which precedes assembly. The capture method they describe would complement downstream analysis of var transcript assembly with our pipeline. Their assembly pipeline is similar to our pipeline as they also performed de novo assembly on all P. falciparum mapping and non-human mapping reads and used the same assembler (but with different parameters). They clustered sequences using the same approach but at 90% sequence identity as opposed to 99% sequence identity using our approach. Then, Stucke et al. use 500nt as a cut-off as opposed to the more stringent filtering approach used in our approach. They annotated their de novo assembled transcripts with the known amino acid sequences used in their design of the capture array; our approach does not assume prior information on the var transcripts. Finally, their approach was validated only for its ability to recover the most highly expressed var transcript in 6 uncomplicated malaria samples, and they did not assess mis-assemblies in their approach.

      For the methods (619–621), were erythrocytes isolated by Ficoll gradient centrifugation at the time of collection or later?

      We have updated the methods section to clarify this.

      • Line 586–588: “Blood was drawn and either immediately processed (#1, #2, #3, #4, #11, #12, #14, #17, #21, #23, #28, #29, #30, #31, #32) or stored overnight at 4oC until processing (#5, #6, #7, #9, #10, #13, #15, #16, #18, #19, #20, #22, #24, #25, #26, #27, #33).”

      Was the current pipeline and assembly method assessed for var chimeras? This should be described.

      Yes, this was quantified in the Pf 3D7 dataset and also assessed in the German traveler dataset. For the 3D7 dataset it is described in the result section and Figure S1.

      • Line 168–174: “However, we found high accuracies (> 0.95) across all approaches, meaning the sequences we assembled were correct (Figure 2 – Figure supplement 1b). The whole transcript approach also performed the best when assembling the lower expressed var genes (Figure 2 – Figure supplement 1e) and produced the fewest var chimeras compared to the original approach on P. falciparum 3D7. Fourteen misassemblies were observed with the whole transcript approach compared to 19 with the original approach (Table S2). This reduction in misassemblies was particularly apparent in the ring-stage samples.” - Figure S1:

      Author response image 1.

      Performance of novel computational pipelines for var assembly on Plasmodium falciparum 3D7: The three approaches (whole transcript: blue, domain approach: orange, original approach: green) were applied to a public RNA-seq dataset (ENA: PRJEB31535) of the intra-erythrocytic life cycle stages of 3 biological replicates of cultured P. falciparum 3D7, sampled at 8-hour intervals up until 40hrs post infection (bpi) and then at 4-hour intervals up until 48 (Wichers al., 2019). Boxplots show the data from the 3 biological replicates for each time point in the intra-erythrocytic life cycle: a) alignment scores for the dominantly expressed var gene (PF3D7_07126m), b) accuracy scores for the dominantly var gene (PF3D7_0712600), c) number of contigs to assemble the dominant var gene (PF3D7_0712600), d) alignment scores for a middle ranking expressed vargene (PF3D7_0937800), e) alignment scores for the lowest expressed var gene (PF3D7_0200100). The first best blast hit (significance threshold = le-10) was chosen for each contig. The alignment score was used to evaluate the each method. The alignment score represents √accuracy* recovery. The accuracy is the proportion of bases that are correct in the assembled transcript and the recovery reflects what proportion of the true transcript was assembled. Assembly completeness of the dominant vargene (PF3D7 071200, length = 6648nt) for the three approaches was assessed for each biological f) biological replicate 1, g) biological replicate 2, h) biological replicate 3. Dotted lines represent the start and end of the contigs required to assemble the vargene. Red bars represent assembled sequences relative to the dominantly whole vargene sequence, where we know the true sequence (termed “reference transcript”).

      For the ex vivo samples, this has been discussed in the result section and now we also added this information to Table 1.

      • Line 182/3: “Remarkably, with the new whole transcript method, we observed a significant decrease (2 vs 336) in clearly misassembled transcripts with, for example, an N-terminal domain at an internal position.”

      • Table 1:

      Author response table 3.

      Statistics for the different approaches used to assemble the var transcripts. Var assembly approaches were applied to malaria patient ex vivo samples (n=32) from (Wichers et al., 2021) and statistics determined. Given are the total number of assembled var transcripts longer than 500 nt containing at least one significantly annotated var domain, the maximum length of the longest assembled var transcript in nucleotides and the N50 value, respectively. The N50 is defined as the sequence length of the shortest var contig, with all var contigs greater than or equal to this length together accounting for 50% of the total length of concatenated var transcript assemblies. Misassemblies represents the number of misassemblies for each approach. **Number of misassemblies were not determined for the domain approach due to its poor performance in other metrics.

      Line 432: "the core gene transcriptome underwent a greater change relative to the var transcriptome upon transition to culture." Can this be shown statistically? It's unclear whether the difference in the sizes of the respective pools of the core genome and the var genes may account for this observation.

      We found 19% of the core transcriptome to be differentially expressed. The per patient var transcript analysis revealed individually highly variable but generally rather subtle changes in the var transcriptome. The different methods for assessing this make it difficult to statistically compare these two different results.

      The feasibility of this approach for field samples should be discussed in the Discussion.

      In the original manuscript we reflected on this already several times in the discussion (e.g., line 465/6; line 471–475; line 555–568). We now have added another two sentences at the end of the paragraph starting in line 449 to address this point. It reads now:

      • Line 442–451: “Our new approach used the most geographically diverse reference of var gene sequences to date, which improved the identification of reads derived from var transcripts. This is crucial when analysing patient samples with low parasitaemia where var transcripts are hard to assemble due to their low abundancy (Guillochon et al., 2022). Our approach has wide utility due to stable performance on both laboratory-adapted and clinical samples. Concordance in the different var expression profiling approaches (RNA-sequencing and DBLα-tag) on ex vivo samples increased using the new approach by 13%, when compared to the original approach (96% in the whole transcript approach compared to 83% in Wichers et al., 2021. This suggests the new approach provides a more accurate method for characterising var genes, especially in samples collected directly from patients. Ultimately, this will allow a deeper understanding of relationships between var gene expression and clinical manifestations of malaria.”

      MINOR

      The plural form of PfEMP1 (PfEMP1s) is inconsistently used throughout the text.

      Corrected.

      404-405: statistical test for significance?

      Thank you for this suggestion. We have done two comparisons between the original analysis from Wichers et al., 2021 and our new whole transcript approach to test concordance of the RNAseq approaches with the DBLα-tag approach using paired Wilcoxon tests. These comparisons suggest that our new approach has significantly increased concordance with DBLα-tag data and might be better at capturing all expressed DBLα domains than the original analysis (and the DBLα-approach), although not statistically significant. We describe this now in the result section.

      • Line 352–361: “Overall, we found a high agreement between the detected DBLα-tag sequences and the de novo assembled var transcripts. A median of 96% (IQR: 93–100%) of all unique DBLα-tag sequences detected with >10 reads were found in the RNA-sequencing approach. This is a significant improvement on the original approach (p= 0.0077, paired Wilcoxon test), in which a median of 83% (IQR: 79–96%) was found (Wichers et al., 2021). To allow for a fair comparison of the >10 reads threshold used in the DBLα-tag approach, the upper 75th percentile of the RNA-sequencingassembled DBLα domains were analysed. A median of 77.4% (IQR: 61–88%) of the upper 75th percentile of the assembled DBLα domains were found in the DBLα-tag approach. This is a lower median percentage than the median of 81.3% (IQR: 73–98%) found in the original analysis (p= 0.28, paired Wilcoxon test) and suggests the new assembly approach is better at capturing all expressed DBLα domains.”

      Figure 4: The letters for the figure panels need to be added.

      The figure has been removed from the manuscript.

      Reviewer #3:

      It is difficult from Table S2 to determine how many unique var transcripts would have enough coverage to be potentially assembled from each sample. It seems unlikely that 455 distinct vars (~14 per sample) would be expressed at a detectable level for assembly. Why not DNA-sequence these samples to get the full repertoire for comparison to RNA? Why would so many distinct transcripts be yielded from fairly synchronous samples?

      We know from controlled human malaria infections of malaria-naive volunteers, that most var genes present in the genomic repertoire of the parasite strain are expressed at the onset of the human blood phase (heterogenous var gene expression) (Wang et al., 2009; Bachmann et al, 2016; Wichers-Misterek et al., 2023). This pattern shifts to a more restricted, homogeneous var expression pattern in semi-immune individuals (expression of few variants) depending on the degree of immunity (Bachmann et al., 2019).

      Author response image 2.

      In this cohort, 15 first-time infections are included, which should also possess a more heterogenous var gene expression in comparison to the pre-exposed individuals, and indeed such a trend is already seen in the number of different DBLa-tag clusters found in both patient groups (see figure panel from Wichers et al. 2021: blue-first-time infections; grey–pre-exposed). Moreover, Warimwe et al. 2013 have shown that asymptomatic infections have a more homogeneous var expression in comparison to symptomatic infections. Therefore, we expect that parasites from symptomatic infections have a heterogenous var expression pattern with multiple var gene variants expressed, which we could assemble due to our high read depth and our improved var assembly pipeline for even low expressed variants.

      Moreover, the distinct transcripts found in the RNA-seq approach were confirmed with the DBLα tag data. To our opinion, previous approaches may have underestimated the complexity of the var transcriptome in less immune individuals.

      Mapping reads to these 455 putative transcripts and using this count matrix for differential expression analysis seems very unlikely to produce reliable results. As acknowledged on line 327, many reads will be mis-mapped, and perhaps most challenging is that most vars will not be represented in most samples. In other words, even if mapping were somehow perfect, one would expect a sparse matrix that would not be suitable for statistical comparisons between groups. This is likely why the per-patient transcript analysis doesn't appear to be consistent. I would recommend the authors remove the DE sections utilizing this approach, or add convincing evidence that the count matrix is useable.

      We agree that this is a general issue of var differential expression analysis. Therefore, we have removed the var differential expression analysis from this manuscript as the per patient approach was more appropriate for the paired samples. We validated different mapping strategies (new Figure S6) and included a paragraph discussing the problem in the result section:

      • Line 237–255: “In the original approach of Wichers et al., 2021, the non-core reads of each sample used for var assembly were mapped against a pooled reference of assembled var transcripts from all samples, as a preliminary step towards differential var transcript expression analysis. This approach returned a small number of var transcripts which were expressed across multiple patient samples (Figure 3 – Figure supplement 2a). As genome sequencing was not available, it was not possible to know whether there was truly overlap in var genomic repertoires of the different patient samples, but substantial overlap was not expected. Stricter mapping approaches (for example, excluding transcripts shorter than 1500nt) changed the resulting var expression profiles and produced more realistic scenarios where similar var expression profiles were generated across paired samples, whilst there was decreasing overlap across different patient samples (Figure 3 – Figure supplement 2b,c). Given this limitation, we used the paired samples to analyse var gene expression at an individual subject level, where we confirmed the MSP1 genotypes and alleles were still present after short-term in vitro cultivation. The per patient approach showed consistent expression of var transcripts within samples from each patient but no overlap of var expression profiles across different patients (Figure 3 – Figure supplement 2d). Taken together, the per patient approach was better suited for assessing var transcriptional changes in longitudinal samples. It has been hypothesised that more conserved var genes in field isolates increase parasite fitness during chronic infections, necessitating the need to correctly identify them (Dimonte et al., 2020, Otto et al., 2019). Accordingly, further work is needed to optimise the pooled sample approach to identify truly conserved var transcripts across different parasite isolates in cross-sectional studies.” - Figure S6:

      Author response image 3.

      Var expression profiles across different mapping. Different mapping approaches Were used to quantify the Var expression profiles of each sample (ex Vivo (n=13), generation I (n=13), generation 2 (n=10) and generation 3 (n=l). The pooled sample approach in Which all significantly assembled van transcripts (1500nt and containing3 significantly annotated var domains) across samples were combined into a reference and redundancy was removed using cd-hit (at sequence identity = 99%) (a—c). The non-core reads of each sample were mapped to this pooled reference using a) Salmon, b) bowtie2 filtering for uniquely mapping paired reads with MAPQ and c) bowtie2 filtering for uniquely mapping paired reads with a MAPQ > 20. d) The per patient approach was applied. For each patient, the paired ex vivo and in vitro samples were analysed. The assembled var transcripts (at least 1500nt and containing3 significantly annotated var domains) across all the generations for a patient were combined into a reference, redundancy was removed using cd-hit (at sequence identity: 99%), and expression was quantified using Salmon. Pie charts show the var expression profile With the relative size of each slice representing the relative percentage of total var gene expression of each var transcript. Different colours represent different assembled var transcripts with the same colour code used across a-d.

      For future cross-sectional studies a per patient analysis that attempts to group per patient assemblies on some unifying structure (e.g., domain, homology blocks, domain cassettes etc) should be performed.

      Line 304. I don't understand the rationale for comparing naïve vs. prior-exposed individuals at ex-vivo and gen 1 timepoints to provide insights into how reliable cultured parasites are as a surrogate for var expression in vivo. Further, the next section (per patient) appears to confirm the significant limitation of the 'all sample analysis' approach. The conclusion on line 319 is not supported by the results reported in figures S9a and S9b, nor is the bold conclusion in the abstract about "casting doubt" on experiments utilizing culture adapted

      We have removed this comparison from the manuscript due to the inconsistencies with the var per patient approach. However, the conclusion in the abstract has been rephrased to reflect the fact we observed 19% of the core transcript differentially expressed within one cycle of cultivation.

      Line 372/391 (and for the other LMM descriptions). I believe you mean to say response variable, rather than explanatory variable. Explanatory variables are on the right hand side of the equation.

      Thank you for spotting this inaccuracy, we changed it to “response variable” (line 324, line 343, line 805).

      Line 467. Similar to line 304, why would comparisons of naïve vs. prior-exposed be informative about surrogates for in vivo studies? Without a gold-standard for what should be differentially expressed between naïve and prior-exposed in vivo, it doesn't seem prudent to interpret a drop in the number of DE genes for this comparison in generation 1 as evidence that biological signal for this comparison is lost. What if the generation 1 result is actually more reflective of the true difference in vivo, but the ex vivo samples are just noisy? How do we know? Why not just compare ex vivo vs generation 1/2 directly (as done in the first DE analysis), and then you can comment on the large number of changes as samples are less and less proximal to in vivo?

      In the original paper (Wichers et al., 2021), there were differences between the core transcriptome of naïve vs previously exposed patients. However, these differences appeared to diminish in vitro, suggesting the in vivo core transcriptome is not fully maintained in vitro.

      We have added a sentence explaining the reasoning behind this analysis in the results section:

      • Lines 414–423: “In the original analysis of ex vivo samples, hundreds of core genes were identified as significantly differentially expressed between pre-exposed and naïve malaria patients. We investigated whether these differences persisted after in vitro cultivation. We performed differential expression analysis comparing parasite isolates from naïve (n=6) vs pre-exposed (n=7) patients, first between their ex vivo samples, and then between the corresponding generation 1 samples. Interestingly, when using the ex vivo samples, we observed 206 core genes significantly upregulated in naïve patients compared to pre-exposed patients (Figure 7 – Figure supplement 3a). Conversely, we observed no differentially expressed genes in the naïve vs pre-exposed analysis of the paired generation 1 samples (Figure 7 – Figure supplement 3b). Taken together with the preceding findings, this suggests one cycle of cultivation shifts the core transcriptomes of parasites to be more alike each other, diminishing inferences about parasite biology in vivo.”

      Overall, I found the many DE approaches very frustrating to interpret coherently. If not dropped in revision, the reader would benefit from a substantial effort to clarify the rationale for each approach, and how each result fits together with the other approaches and builds to a concise conclusion.

      We agree that the manuscript contains many different complex layers of analysis and that it is therefore important to explain the rationale for each approach. Therefore, we now included the summary Table 3 (see comment to public review). Additionally, we have removed the var transcript differential expression due to its limitations, which we hope has already streamlined our manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The current manuscript provides strong evidence that the molecular function of SLC35G1, an orphan human SLC transporter, is citrate export at the basolateral membrane of intestinal epithelial cells. Multiple lines of evidence, including radioactive transport experiments, immunohistochemical staining, gene expression analysis, and siRNA knockdown are combined to deduce a model of the physiological role of this transporter.

      Strengths:

      The experimental approaches are comprehensive, and together establish a strong model for the role of SLC35G1 in citrate uptake. The observation that chloride inhibits uptake suggests an interesting mechanism that exploits the difference in chloride concentration across the basolateral membrane.

      Weaknesses:

      Some aspects of the results would benefit from a more thorough discussion of the conclusions and/or model.

      For example, the authors find that SLC35G1 prefers the dianionic (singly protonated) form of citrate, and rationalize this finding by comparison with the substrate selectivity of the citrate importer NaDC1. However, this comparison has weaknesses when considering the physiological pH for SLC35G1 and NaDC1. NaDC1 binds citrate at a pH of ~5.4 (the pKa of citrate is 5.4, so there is a lot of dianionic citrate present under physiological circumstances). SLC35G1 binds citrate under pH conditions of ~7.5, where a very small amount of dianionic citrate is present. The data clearly show a pH dependence of transport, and the authors rule out proton coupling, but the discrepancy between the pH dependence and the physiological expectations should be addressed/commented on.

      Thank you for your insightful comment. Citrate exists mostly in its trianionic form under near neutral pH conditions in biological fluids, as you pointed out. Its dianionic form represents only a small portion (about 1/100) of total citrate due to the pKa. However, significant SLC35G1-specific uptake was observed under near neutral pH conditions (Figure 1G). Therefore, although SLC35G1-mediated citrate transport is less efficient under physiologically relevant near neutral pH conditions, it could still play a role particularly in the intestinal absorption process, in which the concentration gradient of dianionic citrate could be maintained by continuous supply by NaDC1-mediated apical uptake.

      The rationale for the series of compounds tested in Figure 1F, which includes metabolites with carboxylate groups, a selection of drugs including anion channel inhibitors and statins, and bile acids, is not described. Moreover, the lessons drawn from this experiment are vague and should be expanded upon. It is not clear what, if anything, the compounds that reduce citrate uptake have in common.

      Thank you for highlighting the need for clarity regarding the compounds tested in Figure 1F. The tested compounds were TCA cycle intermediates (fumarate, α-ketoglutarate, malate, pyruvate, and succinate) as substrate candidate carboxylates analogous to citrate, diverse anionic compounds (BSP, DIDS, probenecid, pravastatin, and taurocholate) as those that might be substrates or inhibitors, and diverse cationic compounds (cimetidine, quinidine, and verapamil) as those that are least likely to interact with SLC35G1. Among them, certain anionic compounds significantly reduced SLC35G1-specific citrate uptake, suggesting that they may interact with SLC35G1. However, we could not identify any structural features commonly shared by these compounds, except that they have anionic moieties. We acknowledge that it requires further elaboration to clarify such structural features. We have revised the relevant section on p. 3 (line 25 - 32) to include these.

      The transporter is described as a facilitative transporter, but this is not established definitively. For example, another possibility could involve coupling citrate transport to another substrate, possibly even chloride ion.

      Thank you for your insightful comment regarding the nature of SLC35G1's transport mechanism. While we have described SLC35G1 as a facilitative transporter based on our current data, we acknowledge that this has not been definitively proven, as you pointed out, and we cannot exclude the possibility that its sensitivity to extracellular Cl- might imply its operation as a citrate/Cl- exchanger. To examine the possibility, we would need to manipulate the chloride ion gradient across the plasma membrane. Particularly, generating an outward Cl- gradient to see if it could enhance citrate uptake could be a potential strategy. However, current techniques do not allow us to effectively generate the Cl- gradient, thus preventing us from conclusively verifying this possibility. We recognize the importance of further investigating this aspect in future studies. Your suggestion highlights an important area for additional research to fully understand the transport mechanism of SLC35G1. We have additionally commented on this issue on p. 4 (line 1 – 3).

      Reviewer #2 (Public Review):

      Summary:

      The primary goal of this study was to identify the transport pathway that is responsible for the release of dietary citrate from enterocytes into blood across the basolateral membrane.

      Strengths:

      The transport pathway responsible for the entry of dietary citrate into enterocytes was already known, but the transporter responsible for the second step remained unidentified. The studies presented in this manuscript identify SLC35G1 as the most likely transporter that mediates the release of absorbed citrate from intestinal cells into the serosal side. This fills an important gap in our current knowledge of the transcellular absorption of dietary citrate. The exclusive localization of the transporter in the basolateral membrane of human intestinal cells and the human intestinal cell line Caco-2 and the inhibition of the transporter function by chloride support this conclusion.

      Weaknesses:

      (i) The substrate specificity experiments have been done with relatively low concentrations of potential competing substrates, considering the relatively low affinity of the transporter for citrate. Given that NaDC1 brings in not only citrate as a divalent anion but also other divalent anions such as succinate, it is possible that SLC35G1 is responsible for the release of not only citrate but also other dicarboxylates. But the substrate specificity studies show that the dicarboxylates tested did not compete with citrate, meaning that SLc35G1 is selective for the citrate (2-), but this conclusion might be flawed because of the low concentration of the competing substrates used in the experiment.

      Thank you for your valuable comment on our substrate specificity experiments. As you pointed out, we cannot rule out the possibility that dicarboxylates might be recognized by SLC35G1 with low affinity as the tested concentration was relatively low. However, at the concentration of 200 μM, competing substrates with an affinity comparable to that of citrate could inhibit SLC35G1-specific citrate uptake by about 30%. Therefore, it is likely that the compounds that did not exhibit significant effect have no affinity or at least lower affinity than citrate to SLC35G1. Further studies should explore a broader range of concentrations for potential substrates including those with lower affinity. It would help clarify the substrate recognition characteristics of SLC35G1 and if it indeed has a unique preference for citrate over dicarboxylates. We have additionally mentioned that on p. 3, line 32 – 35.

      (ii) The authors have used MDCK cells for assessment of the transcellular transfer of citrate via SLC35G1, but it is not clear whether this cell line expresses NaDC1 in the apical membrane as the enterocytes do. Even though the authors expressed SLC35G1 ectopically in MDCK cells and showed that the transporter localizes to the basolateral membrane, the question as to how citrate actually enters the apical membrane for SLC35G1 in the other membrane to work remains unanswered.

      Thank you for highlighting this important aspect of our study. The mechanism of apical citrate entry in MDCKII cells is unknown, although NaDC1 or a similar transporter may be involved. However, this set of experiments have successfully demonstrated the basolateral localization of SLC35G1 and its operation for citrate efflux. Attempts to clarify the apical entry mechanism may need to be included in future studies for more detailed characterization of the model system using MDCKII cells. This would help in fully understanding the transcellular transport system for citrate. Investigation using Caco-2 cells or MDCKII cells double transfected with NaDC1 and SLC35G1 would also need to be induced in future studies to gain more definitive insights into the transcellular transport mechanism for citrate in the intestine, delineating the suggested cooperative role of NaDC1 and SLC35G1. We would be grateful for your understanding of our handling regarding this issue.

      (iii) There is one other transporter that has already been identified for the efflux of citrate in some cell types in the literature (SLC62A1, PLoS Genetics; 10.1371/journal.pgen.1008884), but no mention of this transporter has been made in the current manuscript.

      Thank you for bringing up the relevance of SLC62A1, which has recently been identified as a citrate efflux transporter in some cell types (PLoS Genet, 16, e1008884, 2020). We have now included comments on this transporter in Introduction (p. 2).

      Reviewer #3 (Public Review):

      Summary:

      Mimura et al describe the discovery of the orphan transporter SLC35G1 as a citrate transporter in the small intestine. Using a combination of cellular transport assays, they show that SLC35G1 can mediate citrate transport in small intestinal cell lines. Furthermore, they investigate its expression and localization in both human tissue and cell lines. Limited evidence exists to date on both SLC35G1 and citrate uptake in the small intestine, therefore this study is an important contribution to both fields. However, the main claims by the authors are only partially supported by experimental evidence.

      Strengths:

      The authors convincingly show that SLC35G1 mediates uptake of citrate which is dependent on pH and chloride concentration. Putting their initial findings in a physiological context, they present human tissue expression data of SLC35G. Their Transwell assay indicates that SLC35G1 is a citrate exporter at the basolateral membrane.

      Weaknesses:

      Further confirmation and clarification are required to claim that the SLC indeed exports citrate at the basolateral membrane as concluded by the authors. Most experiments measure citrate uptake, but the authors state that SLC35G1 is an exporter, mostly based on the lack of uptake at physiological conditions faced at the basolateral side. The Transwell assay in Figure 1L is the only evidence that it indeed is an exporter. However, in this experiment, the applied chloride concentration was not according to the proposed model (120 mM at the basolateral side). The Transwell assay, or a similar assay measuring export instead of import, should be carried out in knockdown cells to prove that the export indeed occurs through SLC35G1 and not through an indirect effect. Related to the mentioned chloride sensitivity, it is unclear how the proposed model works if the SLC faces high chloride conditions under physiological conditions though it is inhibited by chloride.

      Thank you for highlighting these important points. We used the Cl--rich medium in transcellular transport studies, as stated in the relevant section in Meterials and Methods (p. 6, line 2 – 5). The Cl- concentration (144 mM) was comparable to the physiological concentration in extracellular body fluids. To clarify that experimental condition, we have additionally noted that in the text (p. 4, line 9) and the legends of Figs. 1K and 1L. The results indicate that basolaterally localized SLC35G1 can mediate citrate export effectively under the Cl--rich extracellular condition. The transport mechanism regulated by Cl- is unclear, but it is difficult to further clarify the mechanism at this time. We recognize the importance of further investigating the aspect in future studies, including the possibility that SLC35G1 might be a citrate/Cl- exchanger, as pointed out by Reviewer #1 (3rd comment).

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      The figures are very tiny and difficult to see. The inset in Figure 1C is much too small to be readable. I suggest enlarging the panels.

      Thank you for your feedback. As advised, we have enlarged the panels to improve visibility.

      Line 74: "certain anionic compounds signficantly inhibited SLC35G1-specific citrate uptake, indicating they are also recognized by SLC35G1." This sentence should be reworded since the mechanism is not clear. The word "reduced" would be a better option than "inhibited." Are there other interpretations besides SLC35G1 binding to explain the observations?

      Thank you for your suggestion. We have reworded the sentence to improve clarity (p. 3, line 30). It may be possible to speculate that they interact with SLC35G1, but the mechanisms are not clear yet.

      The manuscript is vague about how the transporter was discovered. If a screen of orphan transporters was performed to identify a citrate transporter, this should be described.

      Thank you for pointing out the need for more details regarding the discovery of the transporter. We have added some detailed description at the beginning of Results and Discussion (p. 3).

      Reviewer #2 (Recommendations For The Authors):

      Recommendations for the authors:

      (1) For transcellular transport of citrate and the role of SLC35G1, it would be better to use Caco-2 cells cultured on Transwells because these cells express NaDC1 in the apical membrane and the authors have shown that SLC35G1 is expressed in the basolateral membrane in this cell line. The mechanism for the entry of citrate into MDCK cells used in the present manuscript is not known. If the authors prefer to use MDCK cells because of their superior use for polarization, they can use a double transfection (NaDC1 and SLC35G1) to differentially express the two transporters in the apical versus and basolateral membrane and then use the cells for trans cellular transport of citrate.

      Please refer to our reply to your second review comment.

      (2) The substrate specificity experiments should use concentrations higher than 0.2 mM for competing dicarboxylates because the Km for citrate is only 0.5 mM. It is likely that NaDC1 brings in citrate and other dicarboxylates into enterocytes and then SLC35G1 mediates the efflux of these metabolic intermediates into blood.

      Please refer to our reply to your first review comment.

      (3) One major aspect of the transport function of this newly discovered citrate efflux transporter that has not been explored is the role of membrane potential in the transport function. The transporter is not coupled to Na or K or even H; so then the transport of citrate via this transporter must be electrogenic. Of course, this would be perfect for the transporter to function in the efflux of citrate because of the inside-negative membrane potential, but the authors need to show that the transporter is electrogenic. This can be examined through Caco-2 cells and/or MDCK cells expressing SLC35G1 and examining the impact of changes in membrane potential (valinomycin and K) on the transport of citrate.

      Thank you for your suggestion. As shown in Figure 1D, the use of K-gluconate in place of Na-gluconate, which induces plasma membrane depolarization, had no impact on the specific uptake of citrate, suggesting that SLC35G1-mediated citrate transport is independent of membrane potential. We have additionally mentioned this on p. 3 (line 21 – 24).

      (4) The localization studies mention Na/K ATPase component as a basolateral membrane marker, but the text describes it as BCRP. This needs to be corrected.

      Thank you for pointing out the mistake. We have corrected that. The marker was ATP1A1.

      Reviewer #3 (Recommendations For The Authors):

      Major points:

      (1) Most experiments measure citrate uptake, but the authors state that SLC35G1 is an exporter, mostly based on the lack of uptake at physiological conditions faced at the basolateral side. The Transwell assay in Figure 1L is the only evidence that it indeed is an exporter. However, in this experiment, the applied chloride concentration was not according to the proposed model (120mM at basolateral side). Why was this chloride concentration not mimicked accordingly in the Transwell assay?

      (2) The Transwell assay, or a similar assay measuring export instead of import, should be carried out in knockdown cells to prove that the export indeed occurs through SLC35G1 and not through an indirect effect.

      (3) Related to the mentioned chloride sensitivity, it is unclear how the proposed model works if the SLC faces high chloride conditions under physiological conditions though it is inhibited by chloride.

      Please refer to our reply to your review comments.

      Related to the localization of SLC35G1:

      (4) The polyclonal antibody against SLC35G1 should be validated to prove the specificity. This should be relatively straightforward given the authors have SLC35G1 knockdown cells.

      Thank you for your suggestion. To validate the specificity of the polyclonal antibody against SLC35G1, we prepared HEK293 cells transiently expressing SLC35G1 and SLC35G1 tagged with a FLAG epitope at the C-terminus (SLC35G1-FLAG). In the immunostained images, whereas only SLC35G1-FLAG was stained with the anti-FLAG antibody, both SLC35G1 and SLC35G1-FLAG were stained with the anti-SLC35G1 antibody, indicating that the anti-SLC35G1 antibody can recognize SLC35G1. In addition, the localization patterns of SLC35G1-FLAG observed with both antibodies were consistent, indicating furthermore that the anti-SLC35G1 antibody can recognize SLC35G1 specifically. Based on all these, the specificity of the anti-SLC35G1 antibody was validated.

      Author response image 1.

      (5) To strengthen the data on the localization of SLC35G1, the cell lines should be co-stained with a plasma membrane marker as well, not just in tissue with ATP1A1. In polarized cells co-staining with apical and basolateral markers should be applied.

      SLC35G1 was indicated to be localized to the basolateral membrane geometrically in both polarized MDCKII and Caco-2 cells. This finding aligns with its basolateral localization indicated by its colocalization with ATP1A1 in the human small intestinal section. These results are we consider sufficient to support the basolateral localization characteristics of SLC35G1.

      General points:

      (6) In the abstract the authors mention that they focus on highly expressed orphan transporters in the small intestine as candidates. However, no other candidates are mentioned or discussed in the study. Consequently, this should be rephrased.

      Thank you for the advice. Also taking into consideration the third recommendation point by Reviewer #1, we have added some detailed description at the beginning of Results and Discussion (p. 3).

      (7) As far as mentioned there is exactly one (other) publication on SLC35G1 (10.1073/pnas.1117231108). The authors should discuss this only publication with functional data on SLC35G1 in more detail. How do the authors integrate their findings with the existing knowledge? For example, why did the authors not investigate the impact of Ca2+ on SLC35G1 transport?

      Thank you for your suggestion. SLC35G1 was indicated to be mainly localized to the endoplasmic reticulum (ER) in the earlier study, in which SLC35G1 was tagged with GFP. A possibility is that SLC35G1 was wrongly directed to ER due to the modulation in the study. We have additionally mentioned this possibility in the relevant section (p. 3, line 9 – 11). We have also revised a relevant sentence on p. 3 (line 5).

      With regard to another point that GFP-tagged SLC35G1 was indicated to interact with STIM1, we examined its effect on SLC35G1-mediated citrate uptake supplementary. As shown in the accompanying figure, coexpression of HA-tagged STIM1 did not affect the elevated citrate uptake induced by FLAG-tagged SLC35G1, indicating that STIM1 has no impact on citrate transport function of SLC35G1 at the plasma membrane.

      Author response image 2.

      (A) Effect of the coexpression of HA-tagged STIM1 on [14C]citrate (1 μM) uptake by FLAG-tagged SLC35G1 transiently expressed in HEK293 cells. The uptake was evaluated for 10 min at pH 5.5 and 37°C. Data represent the mean ± SD of three biological replicates. Statistical differences were assessed using ANOVA followed by Dunnett’s test. *, p < 0.05 compared with the control (gray bar). (B) Western blot analysis was conducted by probing for the HA and FLAG tags, using the whole-cell lysate samples (10 µg protein aliquots) prepared from cells expressing HA-STIM1 and/or FLAG-SLC35G1. The blots of β-actin are shown for reference.

      (8) Generally, the introduction could provide more background.

      In response to your suggestion and also to the third review comment from Reviewer #2, we have now additionally included comments on SLC62A1, which has recently been reported as a citrate efflux transporter in some cell types, in Introduction.

      Minor points:

      (9) There is a typo in Figure 1D: manniotol instead of mannitol.

      Thank you for pointing that out. We have corrected the typo in Figure 1D.

      (10) Figure 1J: The resolution is low and the localization to the basolateral membrane is not conclusive based on this image. It seems rather localized at the whole membrane and intracellularly too.

      Thank you for your feedback. We have enhanced the resolution of the image and also enlarged it to improve clarity and make the basolateral membrane localization more discernible.

      (11) Figure 1K: Clarification is needed if the experiment was performed in the Transwell plate. Based on the results from the pH titration experiment, it is expected that there is no uptake at pH7.4. Therefore, this experiment does not seem to provide additional evidence or support the conclusions drawn related to cellular polarization.

      Please refer to our reply to your review comments.

    1. Author response:

      The following is the authors’ response to the current reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      This article by Navratna et al. reports the first structure of human HGSNAT in an acetyl-CoA-bound state. Through careful structural analysis, the authors propose potential reasons why certain human mutations lead to lysosomal storage disorders and outline a catalytic mechanism. The structural data are of good quality, and the manuscript is clearly written. This study represents an important step toward understanding the mechanism of HGSNAT and is valuable to the field. I have the following suggestions:

      (1) The authors should characterize whether the purified protein is active. Otherwise, how does one know if the detergent used maintains the protein in a biologically relevant state? The authors should at least attempt to do so. If these prove to be challenging, at the very least, the authors should try a cell-based assay to demonstrate that the GFP tag does not interfere with the function.

      We have addressed these concerns in the revised version and mentioned these efforts in our previous response letter. We’re briefly mentioning them here again. We attempted measuring HGSNAT catalyzed reaction by monitoring the decrease in acetyl-CoA in the presence of D-glucosamine (acetyl group acceptor) using a coupled enzyme acetyl-CoA assay kit from SIGMA (MAK039) that converts acetyl-CoA to a fluorescent product measurable at Ex/Em of 535/587 nm. We noticed a decrease in the level of acetyl-CoA (gray) upon the addition of HGSNAT (red) (Rebuttal figure 1).

      Author response image 1.

      Acetyl-CoA levels in absence and presence of HGSNAT purified in digitonin. Decrease in the levels of 10 M acetyl-CoA was measured in presence of 10 M D-glucosamine and 30 nM HGSNAT at pH 7.5.

      While optimizing the assay, Xu et al. (2024, Nat Struct Mol Biol) published structural and biochemical characterization of HGSNAT, showing that detergent-purified HGSNAT is active. In addition, we have shown by cryo-EM that GFP-tagged HGSNAT that we purified in detergent was already bound to the endogenous substrate ACO, an observation that has been observed by Xu et al., as well. Finally, we performed LC-MS on GFP-tagged HGSNAT purified in detergent to detect bound ACO, which could be further removed by dialysis. These results have been included in Figure S9. The endogenous binding of ACO to HGSNAT in detergent suggests that neither the tag nor detergent are detrimental to the function.

      (2) In Figure 5, the authors present a detailed schematic of the catalytic cycle, which I find to be too speculative. There is no evidence to suggest that this enzyme undergoes isomerization, similar to a transporter, between open-to-lumen and open-to-cytosol states. Could it not simply involve some movements of side chains to complete the acetyl transfer?

      We have already changed this figure in our latest submission. Perhaps the changes made were not obvious while reviewing. We agreed with this reviewer that the enzyme could likely achieve catalysis by simple side chain movements without undergoing extensive isomerization steps, as depicted in Figure 5. In the absence of data supporting large movements during the acetyl transfer reaction, old Figure 5 appeared speculative. Hence, we have edited Figure 5 in the revised version of the manuscript based on the observations we made in this study, and different states shown in the figure do not show any conformational changes and only depict acetyl transfer.

      Reviewer #2 (Public Review):

      Summary:

      This work describes the structure of Heparan-alpha-glucosaminide N-acetyltransferase (HGSNAT), a lysosomal membrane protein that catalyzes the acetylation reaction of the terminal alpha-D-glucosamine group required for degradation of heparan sulfate (HS). HS degradation takes place during the degradation of the extracellular matrix, a process required for restructuring tissue architecture, regulation of cellular function and differentiation. During this process, HS is degraded into monosaccharides and free sulfate in lysosomes.

      HGSNAT catalyzes the transfer of the acetyl group from acetyl-CoA to the terminal non-reducing amino group of alpha-D-glucosamine. The molecular mechanism by which this process occur has not been described so far. One of the main reasons to study the mechanism of HGSNAT is that multiple mutations spanning the entire sequence of the protein, such as, nonsense mutations, splice-site variants, and missense mutations lead to dysfunction that causes abnormal accumulation of HS within the lysosomes. This accumulation is a cause of mucopolysaccharidosis IIIC (MPS IIIC), an autosomal recessive neurodegenerative lysosomal storage disorder, for which there are no approved drugs or treatment strategies.

      This paper provides a 3.26A structure of HGSNAT, determined by single-particle cryo-EM. The structure reveals that HGSNAT is a dimer in detergent micelles, and a density assigned to acetyl-CoA. The authors speculate about the molecular mechanism of the acetylation reaction, map the mutations known to cause MPS IIIC on the structure and speculate about the nature of the HGSNAT disfunction caused by such mutations.

      Strengths:

      The paper describes a structure of HGSNAT a member of the transmembrane acyl transferase (TmAT) superfamily. The high-resolution of a HGSNAT bound to acetyl-CoA is important for our understanding of HGSNAT mechanism. The density map is of high-quality, except for the luminal domain. The location of the acetyl-CoA allows speculation about the mechanistic role of multiple residues surrounding this molecule. The authors thoroughly describe the architecture of HGSNAT and map the mutations leading to MPS IIIC.

      Reviewer #3 (Public Review):

      Summary:

      Navratna et al. have solved the first structure of a transmembrane N-acetyltransferase (TNAT), resolving the architecture of human heparan-alpha-glucosaminide N-acetyltransferase (HGSNAT) in the acetyl-CoA bound state using single particle cryo-electron microscopy (cryoEM). They show that the protein is a dimer, and define the architecture of the alpha- and beta-GSNAT fragments, as well as convincingly characterizing the binding site of acetyl-CoA.

      Strengths:

      This is the first structure of any member of the transmembrane acyl transferase superfamily, and as such it provides important insights into the architecture and acetyl-CoA binding site of this class of enzymes.

      The structural data is of a high quality, with an isotropic cryoEM density map at 3.3Å facilitating building of a high-confidence atomic model. Importantly, the density for the acetyl-CoA ligand is particularly well-defined, as are the contacting residues within the transmembrane domain.

      The structure of HSGNAT presented here will undoubtedly lay the groundwork for future structural and functional characterization of the reaction cycle of this class of enzymes.

      Weaknesses:

      While the structural data for the state presented in this work is very convincing, and clearly defines the binding site of acetyl-CoA, to get a complete picture of the enzymatic mechanism of this family, additional structures of other states will be required.

      A weakness of the study is the lack of functional validation. The enzymatic activity of the enzyme characterized was not measured, and the enzyme lacks native proteolytic processing, so it is a little unclear whether the structure represents an active enzyme.

      Recommendations for the authors:

      Reviewer #3 (Recommendations For The Authors):

      In the response to reviewers, the authors mention revised coordinates, but the revised coordinates provided to this reviewer do not reflect the stated changes (I assume a technical error somewhere)

      Perhaps, the old coordinates in the deposition system were resubmitted with the revised draft. Nevertheless, we have made the changes suggested by this reviewer to structure in the previous round and have released the new coordinates (PDB ID: 8TU9).

      Is there any evidence for the interprotomer disulfide except for the map? e.g. if it is a disulfide-linked dimer, one should see a shift in mobility on non-reducing vs reducing SDS-PAGE. Without this, the evidence from the map is not conclusive - while the symmetry-related cysteines are nearby to one another, based on the map I could argue that they could just as well be modeled with the cys sidechains reduced and pointing away from one another.

      In addition to building the density based on cryo-EM maps, we have performed FSEC-based thermal melt analysis of the Ala mutation of C334 that is involved in disulfide at the dimer interface. C334A is still expressed as a dimer, suggesting that C334A is not the only residue stabilizing the dimer. Upon heating the detergent-solubilized protein, we noticed that the FSEC peak for C334A shows a monomeric HGSNAT (Figure 4-Figure supplement 1 in main manuscript). We hypothesize that in the absence of C334 disulfide, the extensive hydrophobic side-chain interaction network displayed in Figure 2C is responsible for maintaining the integrity of the dimer. Heating disturbs these non-disulfide interactions, thereby rendering the protein monomer. We have also performed PAGE analysis as suggested by this reviewer and noticed that reducing conditions result in a monomeric protein band (Rebuttal figure 2). While we were revising this manuscript, two other groups published structures of HGSNAT (Xu et al., 2024, Nat. Struct Mol Biol, and Zhao et al., 2024, Nat. Comm). These groups have also identified this disulfide at the dimer interface in their HGSNAT structures. Zhao et al. showed that this disulfide is not crucial for dimerization and also suggested that it can break depending on the conformation of HGSNAT. Our FSEC results agree with this observation.

      Author response image 2.

      Comparison of purified HGSNAT on native and reducing SDS-PAGE. The arrows on both the gels indicate N-GFP-HGSNAT. The two bands on the SDS PAGE are, perhaps, two differentially glycosylated forms of HGSNAT.


      The following is the authors’ response to the original reviews.

      (1) The authors should characterize whether the purified protein is active. Otherwise, how does one know if the detergent used maintains the protein in a biologically relevant state? The authors should at least attempt to do so. If these prove to be challenging, at the very least, the authors should try a cell-based assay to demonstrate that the GFP tag does not interfere with the function. The authors would need to establish an in vitro assay using purified protein and assess the level of Acetyl-CoA in the reaction (there are commercial kits and a long list of literature showing how to measure this). They could also follow the HS acetylation reaction by e.g. HPLC-MS or NMR (among other methods).

      The cryo-EM sample was prepared without the exogenous addition of ligand, as noted in the manuscript. However, we see that acetyl-CoA was intrinsically bound to the protein, indicating the ability of GFP-tagged HGSNAT protein to bind the ligand. Upon dialysis, we see release of acetyl-CoA from the protein, which we have confirmed by LC-MS analysis (Fig S9). We purified the protein at a pH optimal for acetyl-CoA binding, as suggested by Bame, K. J. and Rome, L. H. (1985) and Meikle, P. J. et al., (1995). Because we see acetyl-CoA in a structure obtained using a GFP fusion, we argue that GFP does not interfere with protein stability and ability to bind to the co-substrate. As demonstrated by existing literature HGSNAT catalyzed reaction is compartmentalized spatially and conditionally. The binding of acetyl-CoA happens towards the cytosol and is optimal at pH 7-0.8.0, while the transfer of the acetyl group to heparan sulfate occurs towards the luminal side and is optimal at pH 5.0-6.0. We attempted measuring HGSNAT catalyzed reaction by monitoring decrease in acetyl-CoA in presence of D-glucosamine (acetyl group acceptor) using a coupled enzyme acetyl-CoA assay kit from SIGMA (MAK039) that converts acetyl-CoA to a fluorescent product measurable at Ex/Em of 535/587 nm. We noticed a decrease in the level of acetyl-CoA in the presence of HGSNAT-ACO complex (blue) and apo HGSNAT (red); the difference compared to the ACO standard (gray) was not significant. While optimizing the assay, Xu et al. (2024, Nat Struct Mol Biol) published structural and biochemical characterization of HGSNAT, showing that detergent-purified HGSNAT is active.

      Author response image 3.

      Acetyl-CoA levels in absence and presence of HGSNAT purified in digitonin. Decrease in the levels of 10 mM acetyl-CoA was measured in presence of 10 mM D-glucosamine and 30 nM HGSNAT at pH 7.5.

      (2) In Figure 5, the authors present a detailed schematic of the catalytic cycle, which I find to be too speculative. There is no evidence to suggest that this enzyme undergoes isomerization, similar to a transporter, between open-to-lumen and open-to-cytosol states. Could it not simply involve some movements of side chains to complete the acetyl transfer? The speculative nature of this assumption needs to be clearly acknowledged throughout the manuscript and discussed in more detail. The authors could use HDX-MS or introduce cysteine residues in the hypothetical inward- and outward-facing cavities and test accessibility by incubating the purified protein with maleimides or other agents reacting with free cysteine.

      We thank the reviewers for this insightful critique. Yes, the enzyme could likely achieve catalysis by simple side chain movements without undergoing extensive isomerization steps, as depicted in Figure 5. We also agree with the reviewer that HDX-MS could be the best way to monitor the substrate-induced conformational dynamics within HGSNAT experimentally. In the absence of data supporting large movements during the acetyl transfer reaction, figure 5 is speculative. We have now edited Figure 5 in the revised version of the manuscript based on the observations we made in this study.

      (3) The acetyl-CoA-bound state is described as the open-to-lumen state. Indeed, from Figure 1C, the lumen opening appears much larger than the cytosol opening. Is there any small tunnel that connects the substrate site to the cytosol? In other words, is this state accessible to both the lumen and the cytosol, albeit with a larger opening toward the lumen? This question arises because, in Figure S5, the tunnel calculated by MOLE seems to also connect to the cytosol.

      Yes, it is likely that the ACOS is accessible via lumen and cytosol to varying degrees, as evidenced by MOLE prediction. However, binding of the bulky nucleoside head group of acetyl-CoA at ACOS blocks the cytosolic entrance in the confirmation discussed in this manuscript. MOLE prediction was performed on a structure devoid of acetyl-CoA, and it is possible that the protein doesn’t essentially undergo isomerization between open-to-lumen and open-to-cytosol confirmations during acetyl transfer. Likely, ACOS is always accessible from both the lumen and cytosol, but depending on the substrates or products bound, the accessibility could be limited to either the lysosomal lumen or cytosol. We have rewritten all the statements mentioning an open-to-lumen confirmation to reflect this argument.

      (4) The authors state, "Interestingly, in most of the detergent conditions we tested, HGSNAT was predominantly dimeric (Fig S1C-H)," and also mention, "In all the detergents we tested, HGSNAT eluted as a dimer, a testament to the extensive side-chain interaction network." The dimerization is said to be mediated by a disulfide bond. I would be surprised if the detergents the authors tested could break a disulfide bond. Therefore, can this observation truly serve as a testament to an "extensive" side-chain interaction network?

      We agree with the reviewer that detergents are unlikely to break a disulfide bond. To address this comment, we generated a C334A mutant of HGSNAT and extracted it from cells in 1% digitonin. It is still expressed as a dimer (Fig S8E). However, upon heating the detergent solubilized protein, we noticed that the FSEC peak for C334A shows a monomeric HGSNAT (Fig S8I and S8K). We hypothesize that in the absence of C334 disulfide, the extensive hydrophobic side-chain interaction network displayed in Figure 2C is responsible for maintaining the integrity of the dimer. Heating disturbs these non-disulfide interactions, thereby rendering the protein monomer.

      (5) Apart from the cryo-EM structure, the article does not provide any other experimental evidence to support or explain a molecular mechanism. Due to the complete absence of functional assays, mutagenesis analysis, or other structures such as a ternary complex or an acetylated enzyme intermediate, the mechanistic model depicted in Figure 5 should be taken with caution. This uncertainty needs to be clearly described in the manuscript text. Performing additional mutagenesis experiments to test key hypotheses, or further discussing relevant data from the literature, would strengthen the manuscript.

      We agree with the reviewer on the lack of supporting evidence for the mechanistic models proposed in Fig 5. They were made based on previously reported biochemical characterization of HGSNAT by Rome & Crain (1981), Rome et al. (1983), Miekle et al. (1995), and Fan et al. (2011). However, we agree with the reviewer that this schematic is not experimentally proven and is speculative at best. We have edited Figure 5 in the revised version of the manuscript. In addition, we have also performed mutagenesis analysis to study the stability of mutants (Fig S8) and performed LC-MS analysis to identify endogenously bound acetyl-CoA (Fig S9) to strengthen parts of the manuscript. We have discussed our findings in the results and modified the discussion according to these suggestions.

      (6) It is discussed that H269 is an essential residue that participates in the acetylation reaction, possibly becoming acetylated during the process. However, there is no solid experimental evidence, e.g. mutagenesis analysis or structural analysis, in this or previous articles, that demonstrates this to be the case. Providing more information, ideally involving additional experimental work, would strengthen this aspect of the mechanism that is proposed. This would require establishing an in vitro assay, as described in 1).

      H269, as a crucial catalytic residue, was suggested by monitoring the effect of chemical modifications of amino acids on acetylation of HGSNAT membranes by Bame, K. J. and Rome, L. H. (1986). We generated N258I and H269A mutants of HGSNAT and analyzed their stability. We noticed a greater destabilization in N258I compared to H269A (Fig S8). We believe this is because of the loss of ability to bind acetyl-CoA, as the TMs around a catalytic core of the protein in our cryo-EM structure were stabilized by interactions with acetyl-CoA. Recently, Xu et al. (2024, Nat Struct Mol Biol) suggested that they do not observe acetylated histidine in their structure. However, our structure and that reported by Xu et al. (2024) are obtained at cytosolic pH. Perhaps, acetylation of H269 occurs at acidic lysosomal pH. Extensive structural and catalytic investigation of HGSNAT at low pH is required to rule out H269 acetylation as a step in the HGSNAT catalyzed reaction.

      (7) In the discussion part, the authors mention previous studies in which it was postulated that the catalytic reaction can be described by a random order mechanistic model or a Ping Pong Bi Bi model. However, the authors leave open the question of which of these mechanisms best describes the acetylation reaction. The structure presented here does not provide evidence that could support one mechanism or the other. The authors could explore if an in vitro experimental measurement of protein activity would provide any information in this regard.

      We agree with the reviewer that a more detailed kinetic analysis is necessary to define the bisubstrate reaction mechanism of HGSNAT. All the existing structural data on two isoforms of HGSNAT is obtained at basic pH. As a result, the existing structures do not unambiguously demonstrate the bisusbtrate mechanism of HGSNAT. We believe low pH structural characterization and a detailed kinetic and structural characterization of HGSNAT in membrane mimetics like nanodiscs could provide more insights into the mechanism. However, these studies are a future undertaking and are not a part of this manuscript.

      (8) Although the authors map the mutations leading to MPS IIIC on the structure and use FoldX software to predict the impact of these mutations on folding and fold stability, there is no experimental evidence to support FoldX's predictions. It would be ideal if an additional test for these predictions were included in the manuscript. The authors could follow the unfolding of purified mutants by SEC, FSEC, or changes in intrinsic fluorescence to assess protein stability.

      As suggested here, we prepared HGSNAT MPSIIIC variants and tested their expression and stability (please see Fig S8). These results have been included in the revised version of the manuscript.

      (9) Some sidechains that have quite strong sidechain density are missing atoms. I would be particularly careful with omitting sidechains that pack in the hydrophobic core, as this can tend to artificially reduce the clash score. Check F81, L62, P91 and V87, for example.

      We have revisited the modeling of these regions and deposited new coordinates.

      (10) W316 seems to have the wrong rotamer.

      This has been corrected in the new coordinate file that has been released.

      (11) N134 and N433 seem to have extra density. Are these known glycosylation sites?

      As per Hrebicek M. et al., 2006 and Feldhammer M. et al., 2009, there are five predicted glycosylation sites: N66, N114, N134, N433, and N602. However, we see evidence for NAG density at N114, N134, and N433. These have now been modeled in the structure.

      (12) At the C-terminal residue (Ile-635), the very C-terminal carboxylate is modeled pointing to a hydrophobic environment. It seems more likely to me that the Ile sidechain is packing here, with the C-terminal carboxylate facing the solvent.

      Thank you for pointing this out. We have edited the orientation of the Ile sidechain accordingly.

      Presentation and wording of results/methods:

      - Figure S3 legend "At places with missing density, the side chains were trimmed to C- alpha" - this is incorrect, I think the authors mean C-beta.

      We have corrected this error in the revised version of the manuscript.

      - Figure S3 legend - the authors refer to a gray mesh, where a transparent surface is displayed.

      Thanks for pointing this error out. We have corrected this in the revised version.

      - Some colloquial/vague wording in the main text (a lot of sentences starting with "Interestingly, ...". Making the wording more specific would help the reader I think.

      We have edited out ‘interestingly’ from the document and have re-written parts of the manuscript, per reviewers’ suggestion, for brevity.

      - Figure S2 legend, "throughout the processing workflow the resolution of luminal domain was used as a guidepost" - it is not entirely clear to me what this means in this context, perhaps revise the wording?

      We have rephrased this line in the revised draft of the manuscript.

      - Figure S2 and methods, Local refinements of LD and TMD are mentioned, but not indicated on the processing workflow.

      We have included a new Fig S2 & edited the legend, including these changes, per the reviewers’ suggestions.

    1. Author response:

      The following is the authors’ response to the original reviews

      We again thank the reviewers for their comments and recommendations. In response to the reviewer’s suggestions, we have performed several additional experiments, added additional discussion, and updated our conclusions to reflect the additional work. Specifically, we have performed additional analyses in female WT and Marco-deficient animals, demonstrating that the Marco-associated phonotypes observed in male mice (reduced adrenal weight, increased lung Ace mRNA and protein expression, unchanged expression of adrenal corticosteroid biosynthetic enzymes) are not present in female mice. We also report new data on the physiological consequences of increased aldosterone levels observed in male mice, namely plasma sodium and potassium titres, and blood pressure alterations in WT vs Marco-deficient male mice. In an attempt to address the reviewer’s comments relating to our proposed mechanism on the regulation of lung Ace expression, we additionally performed a co-culture experiment using an alveolar macrophage cell line and an endothelial cell line. In light of the additional evidence presented herein, we have updated our conclusions from this study and changed the title of our work to acknowledge that the mechanism underlying the reported phenotype remains incompletely understood. Specific responses to reviewers can be seen below.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The investigators sought to determine whether Marco regulates the levels of aldosterone by limiting uptake of its parent molecule cholesterol in the adrenal gland. Instead, they identify an unexpected role for Marco on alveolar macrophages in lowering the levels of angiotensin-converting enzyme in the lung. This suggests an unexpected role of alveolar macrophages and lung ACE in the production of aldosterone.

      Strengths:

      The investigators suggest an unexpected role for ACE in the lung in the regulation of systemic aldosterone levels.

      The investigators suggest important sex-related differences in the regulation of aldosterone by alveolar macrophages and ACE in the lung.

      Studies to exclude a role for Marco in the adrenal gland are strong, suggesting an extra-adrenal source for the excess Marco observed in male Marco knockout mice.

      Weaknesses:

      While the investigators have identified important sex differences in the regulation of extrapulmonary ACE in the regulation of aldosterone levels, the mechanisms underlying these differences are not explored.

      The physiologic impact of the increased aldosterone levels observed in Marco -/- male mice on blood pressure or response to injury is not clear.

      The intracellular signaling mechanism linking lung macrophage levels with the expression of ACE in the lung is not supported by direct evidence.

      Reviewer #2 (Public Review):

      Summary:

      Tissue-resident macrophages are more and more thought to exert key homeostatic functions and contribute to physiological responses. In the report of O'Brien and Colleagues, the idea that the macrophage-expressed scavenger receptor MARCO could regulate adrenal corticosteroid output at steady-state was explored. The authors found that male MARCO-deficient mice exhibited higher plasma aldosterone levels and higher lung ACE expression as compared to wild-type mice, while the availability of cholesterol and the machinery required to produce aldosterone in the adrenal gland were not affected by MARCO deficiency. The authors take these data to conclude that MARCO in alveolar macrophages can negatively regulate ACE expression and aldosterone production at steady-state and that MARCO-deficient mice suffer from secondary hyperaldosteronism.

      Strengths:

      If properly demonstrated and validated, the fact that tissue-resident macrophages can exert physiological functions and influence endocrine systems would be highly significant and could be amenable to novel therapies.

      Weaknesses:

      The data provided by the authors currently do not support the major claim of the authors that alveolar macrophages, via MARCO, are involved in the regulation of a hormonal output in vivo at steady-state. At this point, there are two interesting but descriptive observations in male, but not female, MARCO-deficient animals, and overall, the study lacks key controls and validation experiments, as detailed below.

      Major weaknesses:

      (1) According to the reviewer's own experience, the comparison between C57BL/6J wild-type mice and knock-out mice for which precise information about the genetic background and the history of breedings and crossings is lacking, can lead to misinterpretations of the results obtained. Hence, MARCO-deficient mice should be compared with true littermate controls.

      (2) The use of mice globally deficient for MARCO combined with the fact that alveolar macrophages produce high levels of MARCO is not sufficient to prove that the phenotype observed is linked to alveolar macrophage-expressed MARCO (see below for suggestions of experiments).

      (3) If the hypothesis of the authors is correct, then additional read-outs could be performed to reinforce their claims: levels of Angiotensin I would be lower in MARCO-deficient mice, levels of Antiotensin II would be higher in MARCO-deficient mice, Arterial blood pressure would be higher in MARCO-deficient mice, natremia would be higher in MARCO-deficient mice, while kaliemia would be lower in MARCO-deficient mice. In addition, co-culture experiments between MARCO-sufficient or deficient alveolar macrophages and lung endothelial cells, combined with the assessment of ACE expression, would allow the authors to evaluate whether the AM-expressed MARCO can directly regulate ACE expression.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Corticosterone levels in male Marco -/- mice are not significantly different, but there is (by eye) substantially more variability in the knockout compared to the wild type. A power analysis should be performed to determine the number of mice needed to detect a similar % difference in corticosterone to the difference observed in aldosterone between male Marco knockout and wild-type mice. If necessary the experiments should be repeated with an adequately powered cohort.

      Using a power calculator (www.gigacalculator.com) it was determined that our sample size of 13 was one less than sufficient to detect a similar % difference in corticosterone as was detected in corticosterone. We regret that we unable to perform additional measurements as the author suggested in the available timeframe.

      (2) All of the data throughout the MS (particularly data in the lung) should be presented in male and female mice. For example, the induction of ACE in the lungs of Marco-/- female mice should be absent. Similar concerns relate to the dexamethasone suppression studies. Also would be useful if the single cell data could be examined by sex--should be possible even post hoc using Xist etc.

      Given the limitations outlined in our previous response to reviewers it was not possible to repeat every experiment from the original manuscript. We were able to measure the expression of lung Ace mRNA, ACE protein, adrenal weights, adrenal expression of steroid biosynthetic enzymes, presence of myeloid cells, and levels of serum electrolytes in female animals. These are presented in figures 1G, 3B, 4A, 4E, 4F, 4I, and 4J. We have elected to not present single cell seq data according to sex as it did not indicate substantial differences between males and females in Marco or Ace expression and so does not substantively change our approach.

      (3) IF is notoriously unreliable in the lung, which has high levels of autofluorescence. This is the only method used to show ACE levels are increased in the absence of Marco. Orthogonal methods (e.g. immunoblots of flow-sorted cells, or ideally CITE-seq that includes both male and female mice) should be used.

      We used negative controls to guide our settings during acquisition of immunofluorescent images. Additionally, we also used qPCR to show an increase in Ace mRNA expression in the lung in addition to the protein level. This data was presented in the original manuscript and is further bolstered by our additional presentation of expression data for Ace mRNA and protein in female animals in this revised manuscript.

      (4) Given the central importance of ACE staining to the conclusions, validation of the antibody should be included in the supplement.

      We don’t have ACE-deficient mice so cannot do KO validation of the antibody. We did perform secondary stain controls which confirmed the signal observed is primary antibody-derived. Moreover, we specifically chose an anti-ACE antibody (Invitrogen catalogue # MA5-32741) that has undergone advanced verification with the manufacturer. We additionally tested the antibody in the brain and liver and observed no significant levels of staining.

      Author response image 1.

      (5) The link between alveolar macrophage Marco and ACE is poorly explored.

      We carried out a co-culture experiments of alveolar macrophages and endothelial cells and measure ACE/Ace expression as a consequence. This is presented in figure 5D and the discussion.

      (6) Mechanisms explaining the substantial sex difference in the primary outcome are not explored.

      This is outside the scope if this project, though we would consider exploring such experiments in future studies.

      (7) Are there physiologic consequences either in homeostasis or under stress to the increased aldosterone (or lung ACE levels) observed in Marco-/- male mice?

      We measured blood electrolytes and blood pressure in Marco-deficient and Marco-sufficient mice. The results from these experiments are presented in 4G-4M.

      Reviewer #2 (Recommendations For The Authors):

      Below is a suggestion of important control or validation experiments to be performed in order to support the authors' claims.

      (1) It is imperative to validate that the phenotype observed in MARCO-deficient mice is indeed caused by the deficiency in MARCO. To this end, littermate mice issued from the crossing between heterozygous MARCO +/- mice should be compared to each other. C57BL/6J mice can first be crossed with MARCO-deficient mice in F0, and F1 heterozygous MARCO +/- mice should be crossed together to produce F2 MARCO +/+, MARCO +/- and MARCO -/- littermate mice that can be used for experiments.

      We thank the reviewer for their comments. We recognise the concern of the reviewer but due to limited experimenter availability we are unable to undertake such a breeding programme to address this particular concern.

      (2) The use of mice in which AM, but not other cells, lack MARCO expression would demonstrate that the effect is indeed linked to AM. To this end, AM-deficient Csf2rb-deficient mice could be adoptively transferred with MARCO-deficient AM. In addition, the phenotype of MARCO-deficient mice should be restored by the adoptive transfer of wild-type, MARCO-expressing AM. Alternatively, bone marrow chimeras in which only the hematopoietic compartment is deficient in MARCO would be another option, albeit less specific for AM.

      We recognise the concern of the reviewer. We carried out a co-culture experiments of alveolar macrophages and endothelial cells and measure ACE/Ace expression as a consequence. This is presented in figure 5D and the implications explored in the discussion.

      (3) If the hypothesis of the authors is correct, then additional read-outs could be performed to reinforce their claims: levels of Angiotensin I would be lower in MARCO-deficient mice, levels of Antiotensin II would be higher in MARCO-deficient mice, Arterial blood pressure would be higher in MARCO-deficient mice, natremia would be higher in MARCO-deficient mice, while kaliemia would be lower in MARCO-deficient mice. Similar read-outs could also be performed in the models proposed in point 2).

      We measured blood electrolytes and blood pressure in Marco-deficient and Marco-sufficient mice. The results from these experiments are presented in 4G-4M.

      (4) Co-culture experiments between MARCO-sufficient or deficient alveolar macrophages and lung endothelial cells, combined with the assessment of ACE expression, would allow the authors to evaluate whether the AM-expressed MARCO can directly regulate ACE expression.

      To address this concern we carried out a co-culture experiment as described above.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      I will summarize my comments and suggestions below.

      (1) Abstract:

      "Non-catalytic (pseudo)kinase signaling mechanisms have been described in metazoans, but information is scarce for plants." To the best of my understanding EFR is an active protein kinase in vitro and in vivo and cannot be considered a pseudokinase. Consider rephrasing.

      We rephrased to: “Non-catalytic signaling mechanisms of protein kinase domains have been described in metazoans, but information is scarce for plants.”

      (2) Page 4: It should be noted, that while membrane associated Rap-RiD systems have been used in planta to activate receptor kinase intracellular domains by promoting interaction with a co-receptor kinase domain, this system does not resemble the actual activation mechanism in the plasma membrane. This would be worth discussing when introducing the system. For example, the first substrates of the RK signaling complex may also be membrane associated and not freely diffuse in solution, which may be important for enzyme-substrate interaction.

      We inserted on page 4: “The RiD system was previously applied in planta, maintaining membrane-association by N-terminal myristoylation (Kim et al., 2021). For the in vitro experiments, the myristoylation sites were excluded to facilitate the production of recombinant protein.”

      (3) Page 4 and Fig 1: The catalytic Asp in BRI1 is D1027 and not D1009 (https://pubmed.ncbi.nlm.nih.gov/21289069/). Please check and prepare the correct mutant protein if needed.

      We clarified this in the text by stating that we mutated the HRD-aspartate to asparagine in all our catalytic-dead mutants: “Kinase-dead variants with the catalytic residue (HRD-aspartate) replaced by asparagine (EFRD849N and BRI1D1009N), had distinct effects […]”. D1027 in BRI1 is the DFG-Asp, which was not mutated in our study.

      (4) Page 4 and Fig 1: Is BIK1 a known component of the BR signaling pathway and a direct BRI1 substrate? Or in other words how specific is the trans-phosphorylation assay? In my opinion, a more suitable substrate for BRI1/BAK1 would be BSK1 or BSK3 (for example https://pubmed.ncbi.nlm.nih.gov/30615605/).

      Kinase-dead BIK1 is a reported substrate of BRI1. We clarified this in the results section by inserting: “BIK1 was chosen as it is reported substrate of both, EFR/BAK1 and BRI1/BAK1 complexes (Lin et al., 2013).”

      (5) Fig. 1B Why is BIK1 D202N partially phosphorylated in the absence of Rap? I would suggest to add control lanes showing BRI1, EFR, FLS2, BAK1 and BIK1 in isolation. Given that a nice in vitro activation system with purified components is available, why not compare the different enzyme kinetics rather than band intensities at only 1 enzyme : substrate ratio?

      BIK1 D202N is partially phosphorylated due to the presence of active BAK1 that is capable of transphosphorylating BIK1 D202N as it has been reported in a previous study: (DOI: 10.1038/s41586-018-0471-x).

      (6) Page 4 and Fig 1: Is the kinase dead variant of EFR indeed kinase dead? I could still see a decent autorad signal for this mutant when expressed in E. coli (Fig 1 A in Bender et al., 2021; https://pubmed.ncbi.nlm.nih.gov/34531323/)? If this mutant is not completely inactive, could this change the interpretation of the experiments performed with the mutant protein in vitro and in planta in the current manuscript? In my opinion, it could be possible that a partially active EFR mutant can be further activated by BAK1, and in turn can phosphorylate BIK1 D202N. The differences in autorad signal for BRI1D1009?N and EFRD849N is very small, and the entire mechanism hinges on this difference.

      We would like to emphasize that the mechanism hinges on the difference between non-dimerized and dimerized kinase domains in the in vitro kinase assay. BRI1 D1009N fails to enhance BIK1 D202N trans-phosphorylation compared to the non-dimerized sample, while EFR D849N is still capable of enhancing BIK1 transphosphorylation upon dimerization as indicated by quantification of autorads (Figure 1B/C). We have also addressed this point in a section on the limitations of our study.

      (7) Fig 1B. "Our findings therefore support the hypothesis that EFR increases BIK1 phosphorylation by allosterically activating the BAK1 kinase domain." To the best of my understanding presence of wild-type EFR in the EFR-BAK1 signaling complex leads to much better phosphorylation of BIK1D202N when compared to the EFRD849N mutant. How does that support the allosteric mechanism? By assuming that the D849N mutant is in an inactive conformation and fully catalytically inactive (see above)? Again, I think the data could also be interpreted in such a way that the small difference in autorad signal for BIK1 between BRI1 inactive (but see above) and ERF inactive are due to EFR not being completely kinase dead (see above), rather than EFR being an allosteric regulator. To clarify this point I would suggest to a) perform quantitative auto- and trans-(generic substrate) phosphorylation assays with wt and D849N EFR to derive enzyme kinetic parameters, to (2) include the EFRD849 mutant in the HDX analysis and (3) to generate transgenic lines for EFRD489N/F761H/Y836F // EFRD489N/F761H/SSAA and compare them to the existing lines in Fig. 3.

      Mutations of proteins, especially those that require conformational plasticity for their function can have pleiotropic effects as the mutation may affect the conformational plasticity and consequently catalytic and non-catalytic functions that depend on the conformational plasticity. In such cases, it is difficult to fully untangle catalytic and non-catalytic functions. Coming back to EFR D849N, the D849N mutation may also impact the non-catalytic function by altering the conformational plasticity, explaining the difference observed in EFR vs EFR D849N. As you rightly suggested, HDX would be a way to address this but would still not clarify whether catalytic activity contributes to activation. We instead attempted to produce analog sensitive EFR variants for in vivo characterization of EFR-targeted catalytic inhibition. Unfortunately, we failed in producing an analog-sensitive variant for which we could show ATP-analog binding. To address your concern, we inserted a section on limitations of the study.

      (8) Fig. 2B,C, supplement 3 C,D. Has it been assessed if the different EFR versions were expressed to similar protein levels and still localized to the PM?

      Localization of the mutant receptors has not been explicitly evaluated by confocal microscopy. However, the selected mutation EFRF761H is shown to accumulate in stable Arabidopsis lines (Figure 3 – Supplement 1C) and BAK1 could be coIPed by all EFR variants upon elf18-treatment (Figure 3 B), indicating plasma membrane localization.

      (9) How the active-like conformation of EFR is in turn activating BAK1 is poorly characterized, but appears to be the main step in the activation of the receptor complex. Extending the HDX analyses to resting and Rap-activated receptor complexes could be a first step to address this question. I tried to come up with an experimental plan to test if indeed the kinase activity of BAK1 and not of EFR is essential for signal propagation, but this is a complex issue. You would need to be able to mimic an activated form of EFR (which you can), to make sure its inactive (possibly, see above) and likewise to engineer a catalytically inactive form of BAK1 in an active-like state (difficult). As such a decisive experiment is difficult to implement, I would suggest to discuss different possible interpretations of the existing data and alternative scenarios in the discussion section of the manuscript.

      We addressed your concern whether BAK1 kinase activity is essential for signaling propagation by pairing EFRF761H and BAK1D416N (Figure 4 Supplement 2 C) which fails to induce signaling. In this case, EFRF761H is in its activated conformation but cannot activate downstream signaling. We also attempted to address your concern by an in vitro kinase assay by pairing EFR and BAK1D416N and using a range of concentrations of the substrate BIK1D202N. We observed that catalytic activity of BAK1 but not EFR was essential for BIK1 phosphorylation. However, this experiment does not address whether activated EFR can efficiently propagate signaling in the absence of BAK1 catalytic activity. In the limitations of the study section, we now discuss the catalytic importance of EFR for signaling activation.

      Author response image 1.

      BIK1 trans-phosphorylation depends on BAK1 catalytic activity. Increasing concentrations of BIK1 D202N were used as substrate for Rap-induced dimers of EFR-BAK1, EFR D849N-BAK1, and EFR-BAK1 D416N respectively. BIK1 trans-phosphorylation depended on the catalytic activity of BAK1. Proteins were purified from E. coli λPP cells. Three experiments yielded similar results of which a representative is shown here.

      Reviewer #2:

      All of my suggestions are minor.

      Figure 1B, I think it would be more useful to readers to explain the amino acid in the D-N change, rather than just call it D-to-N? Also, please label the bands on the stained gel; the shift on FKBP-BRI1 and FKBP-EFR are noticeable on the Coomassie stain.

      We implemented your suggestions.

      Figure 1-Supplement 1. There is still a signal in pS612 BAK1 (it states 'also failed to induce BAK1 S612 phosphorylation' in the text, which is not quite correct). Also, could mention the gel shift seen in BAK1, which appears absent in Y836F.

      We corrected the text which now states: “To test whether the requirement for Y836 phosphorylation is similar, we immunoprecipitated EFR-GFP and EFRY836F-GFP from mock- or elf18-treated seedlings and probed co-immunoprecipitated BAK1 for S612 phosphorylation. EFRY836F also obstructed the induction of BAK1 S612 phosphorylation (Figure 1 – Supplement 1), indicating that EFRY836F and EFRSSAA impair receptor complex activation.” The gel shift of BAK1 you pointed out was not observed in replications and thus we prefer not to comment on it.

      Figure 2 and 3 are full of a, b, c,d's, which I don't understand. Sorry

      We used uppercase letters to indicate subpanels and lowercase letters to indicate the results of the statistical testing. In the figure caption, we have clarified that the lowercase letters refer to statistical comparisons.

      Figure 2 A. If each point on the x-axis is one amino acid, I think it would again be useful to name the amino acids that the gold or purple or blue colored lines extend through.

      Each point stands for a peptide which are sorted by position of their starting amino acid from N-terminus to C-terminus. We now added plots of HDX for individual peptides that correspond to the highlighted region in subpanel A.

      Figure Supplement 1 is very small for what it is trying to show, even on the printed page. If this residue were to be phosphorylated, what would happen to the H-bond?

      We suppose that VIa-Tyr phosphorylation would break the H-bond and causes displacement of the aC-b4 loop. Recent studies, published after our submission, highlight the importance of this loop for substrate coordination and ATP binding. Thus, phosphorylation of VIa-Tyr and displacing this loop may render the kinase rather unproductive. We have expanded the discussion to include this point.

      Figure 2B: Tyr 836 is not present in any of the alignments in Figure 2A. This should be rectified, because the text talks about the similarity to Tyr 156 in PKA.

      We have adjusted the alignments such that they now contain the VIa-Tyr residues of EFR and PKA.

      Figure 4D. Is there any particular reason that these Blots are so hard to compare or FKBP and BAK1?

      We assume it is referred to Figure 4 – Supplement 2 D. FKBP-EFR and FRB-BAK1 both are approximately the size of RubisCo, the most abundant protein in plant protein samples and which overlay the FKBP- and FRB-tagged kinase. Thus, it is difficult to detect these proteins.

      Reviewer #3:

      (1) The paper reporting the allosteric activation mechanism of EGFR should be cited.

      Will be included.

      (2)The authors showed that "Rap addition increased BIK1 D202N phosphorylation when the BRI1 or EFR kinase domains were dimerized with BAK1, but no such effect was observed with FLS2". Please explain why FLS2 failed to enhance BIK1 transphosphorylation by Rap treatment?

      Even though BIK1 is a reported downstream signaling component of FLS2/BAK1, it might be not the most relevant downstream signaling component and rather related RLCKs, like PBL1, might be better substrates for dimerized FLS2/BAK1. We haven’t tested this, however. Alternatively, the purified FLS2 kinase domain might be labile and quickly unfolds even though it was kept on ice until the start of the assay, or the N-terminal FKBP-tag may disrupt function. As the reason for our observation is not clear, we have removed FLS2 in vitro dimerization experiments from the manuscript.

      (3) Based solely on the data presented in Figure 1, it can be concluded that EFR's kinase activity is not required to facilitate BIK1 transphosphorylation. Therefore, the title of Figure 1, "EFR Allosterically Activates BAK1," may be inappropriate.

      We have changed the figure title to: “EFR facilitates BIK1 trans-phosphorylation by BAK1 non-catalytically.”

      (4) In Figure 1- Supplement 1, I could not find any bands in anti-GFP and anti-BAK1 pS612 of input. Please redo it.

      Indeed, we could not detect protein in the input samples of this experiment. BAK1 S612 phosphorylation is an activation mark and not necessarily expected to be abundant enough for detection in input samples. EFR-GFP, however, is usually detected in input samples and is reported in Macho et al. 2014 from which manuscript these lines come. Why EFR-GFP is not detected in this set of experiments is unclear but, in our opinion, does not detract from the conclusions drawn since similar amounts of EFR-GFP are pulled-down across all samples.

      (5) For Figure 2A, please mark the structure represented by each color directly in the figure.

      We have made the suggested change.

      (6) Please modify "EFRF761/Y836F and EFRF761H/SSAA restore BIK1 trans-phosphorylation" to "EFRF761H/Y836F and EFRF761H/SSAA restore BIK1 trans-phosphorylation".

      Thank you for spotting this. We changed it.

      (7) The HDX-MS analysis demonstrated that the EFR (Y836F) mutation inhibits the formation of the active-like conformation. Conversely, the EFR (F761H) mutation serves as a potent intragenic suppressor, significantly stabilizing the active-like conformation. Confirming through HDX-MS conformational testing that the EFR (Y836F F761H) double mutation does not hinder the formation of the active-like EFR kinase conformation would greatly strengthen the conclusions of the article.

      Response: We agree that this is beneficial, and we attempted to do it but failed to produce enough protein for HDX-MS analysis. We stated this now in an extra section of the paper (“Limitations of the study”).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Duan et al analyzed brain imaging data in UKBK and found a pattern in brain structure changes by aging. They identified two patterns and found links that can be differentiated by the categorization.

      Strengths:

      This discovery harbors a substantial impact on aging and brain structure and function.

      Weaknesses:

      (1) Therefore, the study requires more validation efforts. Most importantly, data underlying the stratification of the two groups are not obvious and lack further details. Can they also stratified by different methods? i.e. PCA?

      Response: Thanks for the comment. In this study, principal component analysis (PCA) was applied to individualized deviation of anatomic region of interest (ROI) for dimensionality reduction, which yielded the first 15 principal components explaining approximately 70% of the total variations for identifying longitudinal brain aging patterns. These two patterns can be stratified by both linear and non-linear dimensionality reduction methods: PCA and locally linear embedding (LLE)1. The grey matter volume (GMV) of 40 ROIs at baseline were linearly adjusted for sex, assessment center, handedness, ethnic, intracranial volume (ICV), and second-degree polynomial in age to be consistent with the whole-brain GMV trajectory model. There was a clear boundary between two patterns in the projected coordinate space, indicating distinct structural differences in brain aging between the two patterns (Author response image 1).

      Author response image 1.

      Stratification of the identified brain aging patterns using linear and non-linear dimensionality reduction methods. (a) The principal component space of PC1 and PC2, and (b) two-dimensional projected locally linear embedding space derived from brain volumetric measures. Points have been colored and shaped according to grouping labels of the brain aging patterns.

      (2) Are there any external data that can be used for validation?

      Response: Thanks for the comment. We were given access to the Alzheimer’s Disease Neuroimaging Initiative (ADNI) study, which aimed at determining the relationships between clinical, cognitive, imaging, genetic, and biochemical biomarkers across the entire spectrum of Alzheimer’s disease. ADNI recruits participants aged between 55 and 90 years at 57 sites in the United States and Canada, who undergo a series of initial tests that are repeated at intervals over subsequent years. 

      Unfortunately, there are no appropriate and sufficient data, especially clinical, cognitive, and genetic data, to support unbiased validation of the heterogeneity in structural brain aging patterns. Only 890 (31.83%) of the 2796 subjects included in the ADNI were cognitively normal, of which 656 were included in the analyses after quality control of structural MRI and exclusion of missing covariate, with a mean age at the screen visit of 70.8 years (SD = 6.48 years), and 60.21% of the subjects were female. Thus, there are significant differences between ADNI and UK Biobank in terms of the population composition, with ADNI collecting more older subjects due to its focus on defining the progression of Alzheimer’s disease.

      Moreover, among 656 subjects with structural imaging data, the dataset used to validate the clinical, cognitive, and genetic manifestations of the brain aging patterns were missing to varying degrees. For example, blood biochemistry tests and telomere length data were missing at baseline by approximately 58% and 82% respectively, and genotype data were not assayed for more than 70 percent of the subjects. As for cognitive function tests, only the results of Mini-Mental State Examination were complete, while other tests such as the Trail Making Test and Digit Span Backward were available for less than 10 percent of subjects. 

      (3) Other previous discoveries or claims supporting the results of the study should be explored to support the conclusion.

      Response: Thanks for the suggestion. As we mentioned in the manuscript lines 274-277, participants with brain aging pattern 2 (lower baseline total GMV and more rapid GMV decrease) were characterized by accelerated biological aging and cognitive decline. Previous research on brainAGE2,3 (the difference between chronological age and the age predicted by the machine learning model of brain imaging data) showed that as a biomarker of accelerated brain aging, people with older brainAGE have accelerated biological aging and early signs of cognitive decline, which is consistent with our discoveries in this study (lines 302-306).

      Further, genome-wide association studies identified significant genetic loci contributing to accelerated brain aging, some of which can be found in pervious GWAS on image-derived phenotypes4, such as regional and tissue volume, cortical area and white matter tract measurements, and specific brain aging mode using a data-driven decomposition approach5 (lines 207-213).

      In addition, we demonstrated the “last in, first out” mirroring patterns between structural brain aging and brain development, and found that mirroring patterns are predominantly localized to the lateral / medial temporal cortex and the cingulate cortex, noted in the manuscript lines 231-234. Large differences in the patterns of change between adolescent late development and aging in the medial temporal cortex were previously found in studies of  brain development and aging patterns6 (lines 315-317).

      (4) Sex was merely used as a covariate. Were there sex differences during brain aging? What was the sex ratio difference in groups 1 and 2?

      Thanks for the comment. Sex differences during brain aging can be observed by investigating sex-stratified whole-brain GMV trajectories. We fitted the growth curve and estimated rate of change for total grey matter volume (TGMV) separately for male and female using generalized additive mixed effect models (GAMM), which included 40,921 observations from 17,055 males and 19,958 females (Author response image 2). Overall, among healthy participants aged 44-82 years in UK Biobank, males overall had higher total GMV and a faster rate of GMV decrease over time, while females had lower total GMV and a lower rate of GMV decrease. Similar conclusion can be found in normative brain-volume trajectories across the human lifespan7 . Supplementary Table 5 showed baseline and demographic characteristics for all participants and participants stratified by brain aging patterns. There were slightly more females than males among the total participants and for brain aging pattern 1 (53.4%) and pattern 2 (54.4%), and χ^2 tests showed no significant difference in the sex ratio between the two patterns (P = 0.06).

      Author response image 2.

      Total gray matter volume (TGMV) (a) and the estimated rate of change (b) for females (red) and males (blue). Rates of volumetric change for total gray matter and each ROI were estimated using GAMM, which incorporates both cross-sectional between-subject variation and longitudinal withinsubject variation from 22,067 observations for 19,958 females, and 18,854 observations for 17,055 males. Covariates include assessment center, handedness, ethnic, and ICV. Shaded areas around the fit line denotes 95% CI.

      (5) Although statistically significant, Figure 3 shows minimal differences. LTL and phenoAge are displayed in adjusted values but what are the actual values that differ between patterns 1 and 2?

      Response: Thanks for the comment. We have modified the visualization of Figure 3 in the revised manuscript by adjusting the appropriate axes for leucocyte telomere length (LTL) and PhenoAge variables and removing the whisker from the boxplot. Associations between biological aging biomarkers and brain aging patterns were listed in Supplementary Table 6. Compared to brain aging pattern 1, participants in pattern 2 with more rapid GMV decrease had shorter leucocyte telomere

      length (P = 0.009, Cohen’s D = -0.028) and higher PhenoAge (P = 0.019, Cohen’s D = 0.027) without covariate adjustment. Specifically, participants in brain aging pattern 1 had average Z-standardized LTL 0.083 (SD 0.98) and average PhenoAge 41.35 years (SD 8.17 years), and those in pattern 2 had average Z-standardized LTL 0.055 (SD 0.97) and average PhenoAge 41.58 years (SD 8.32 years).

      (6) It is not intuitive to link gene expression results shown in Figure 8 and brain structure and functional differences between patterns 1 and 2. Any overlap of genes identified from analyses shown in Figure 6 (GWAS) and 8 (gene expression)?

      Response: Thanks for the comment. We apologize for the confusion. As we mentioned in the Result Section Gene expression profiles were associated with delayed brain development and accelerated brain aging, seventeen of the 45 genes mapped to GWAS significant SNP were found in Allen Human Brain Atlas (AHBA) dataset. Gene expression of LGR4 (rspearman = 0.56, Ppermutation = 2.5 × 10-4) were significantly associated with delayed brain development, and ESR1 (rspearman = 0.53, Ppermutation = 1.5 × 10-4) and FAM3C (rspearman = -0.37, Ppermutation = 0.004) were significantly associated with accelerated brain aging. BDNF-AS was positively associated with both delayed brain development and accelerated brain aging after spatial permutation test. Full association between gene expression profiles of mapped genes and estimated APC during brain development / aging were presented in Supplementary Tables 12 and 13, respectively.  

      Furthermore, we screened the genes based on their contributions and effect directions to the first PLS components in brain development and brain aging. We have found genes mapped to GWAS significant SNP among the genes screened for inclusion in the functional enrichment analysis (Author response table 1), with LGR4 (PLSw1(LGR4) = 3.70, P.FDR = 0.002) associated with delayed development and ESR1 (PLSw1(ESR1) = 3.91, P.FDR = 6.12 × 10-4) and FAM3C (PLSw1(FAM3C) = -3.68, P.FDR = 0.001) associated with accelerated aging.

      Author response table 1.

      Contributions and effect directions of the first PLS components in brain development and brain aging of genes that mapped to GWAS significant SNP. The bold P values reflect significance (P < 0.005, inclusion in the functional enrichment analysis) after FDR correction.

      Reviewer #2 (Public Review):

      Summary:

      The authors aimed to understand the heterogeneity of brain aging by analyzing brain imaging data. Based on the concept of structural brain aging, they divided participants into two groups based on the volume and rate of decrease of gray matter volume (GMV). The group with rapid brain aging showed accelerated biological aging and cognitive decline and was found to be vulnerable to certain neuropsychiatric disorders. Furthermore, the authors claimed the existence of a "last in, first out" mirroring pattern between brain aging and brain development, which they argued is more pronounced in the group with rapid brain aging. Lastly, the authors identified genetic differences between the two groups and speculated that the cause of rapid brain aging may lie in genetic differences.

      Strengths:

      The authors supported their claims by analyzing a large amount of data using various statistical techniques. There seems to be no doubt about the quality and quantity of the data. Additionally, they demonstrated their strength in integrating diverse data through various analysis techniques to conclude.

      Weaknesses:

      There appears to be a lack of connection between the analysis results and their claims. Readers lacking sufficient background knowledge of the brain may find it difficult to understand the paper. It would be beneficial to modify the figures and writing to make the authors' claims clearer to readers. Furthermore, the paper gives an overall impression of being less polished in terms of abbreviations, figure numbering, etc. These aspects should be revised to make the paper easier for readers to understand.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Gray matter volume (GMV) is defined later in the manuscript and may confuse readers.

      Response: Thanks for the comment. We have now defined GMV upon its first appearance in the manuscript.

      Reviewer #2 (Recommendations For The Authors):

      (1) In conducting GWAS, the authors used total GMV at the age of 60 as a phenotype (line 195). It would be beneficial to provide additional explanation as to why only the data from individuals aged 60 were utilized, especially considering the ample availability of GMV data.

      Response: Thanks for the comment and we apologize for the confusion. As we mentioned in the Methods Section Genome Wide Association Study to identify SNPs associated with brain aging patterns, we performed Genome-wide association studies (GWAS) on individual deviations of total GMV relative to the population average at 60 years using PLINK 2.0. Therefore, data from all individuals were used in the GWAS, rather than only those aged at 60y. To accomplish this, deviation of total GMV from the population average for each participant at age 60y was calculated using mixed effect regression model as described in the Methods Section Identification of longitudinal brain aging patterns.

      (2) Whole-brain gene expression data was linked to GMV (Line 237). Gray matter is known to account for about 40% of the total brain. Thus, interpreting whole-brain data in connection with GMV might introduce significant errors. Could this potential source of error be addressed?

      Response: Thanks for the comment. In our study, the Allen Human Brain Atlas (AHBA) dataset were processed using abagen toolbox version 0.1.3 (https://doi.org/10.5281/zenodo.5129257) with Desikan-Killiany atlas8, resulting in a matrix (83 regions × 15,633 gene expression levels) of transcriptional level values that contains brain structure of cortex and subcortex in bilateral hemispheres, and brainstem. Only data from 34 cerebral cortex regions, but not the whole brain, were included in the analysis of the association between regional change rate of gray matter volume and gene expression profiles using partial least squares (PLS) regression. We have clarified in the revised manuscript that we utilized AHBA microarray expression data from regions of interest (ROIs) in the cortex.

      (3) The paper lacks biological interpretation of the important genetic factors (SNPs and genes) for brain aging discovered in this study, as well as the results of gene ontology analysis. Many readers would be curious about the biological significance of these genetic differences and what kind of outcomes they may produce.

      Response: Thanks for the suggestion. As we mentioned in our manuscript, six independent single nucleotide polymorphisms (SNPs) were identified at genome-wide significance level (P < 5 ×1 0-8) (Fig. 6). Among them, two SNPs (rs10835187 and rs779233904) were also found to be associated with multiple brain imaging phenotypes in previous studies, such as regional and tissue volume, cortical area and white matter tract measurements. Compared to the GWAS using global gray matter volume as the phenotype, our GWAS revealed additional signal in chromosome 7 (rs7776725), which was mapped to the intron of FAM3C and encodes a secreted protein involved in pancreatic cancer and Alzheimer's disease. This signal was further validated to be associated with specific brain aging mode by another study using a data-driven decomposition approach. In addition, another significant locus (rs10835187, P = 1.11 ×1 0-13) is an intergenic variant between gene LGR4-AS1 and LIN7C, and was reported to be associated with bone density, and brain volume and total cortical area measurements. LIN7C encodes the Lin-7C protein, which is involved in the localization and stabilization of ion channels in polarized cells, such as neurons and epithelial cell. Previous study has revealed the association of both allelic and haplotypic variations in the LIN7C gene with ADHD. In addition, ESR1 was found to be involved in I-kappaB kinase/NF-kappaB signaling in the functional enrichment associated with accelerated brain aging (Figure 8 and Supplementary Figure 5), and its activation leads to a variety of human pathologies such as neurodegenerative, inflammatory, autoimmune and cancerous disease9. 

      In summary, the analyses from using the databases of GO biological processes and KEGG Pathways indicate synaptic transmission as an important process in the common mechanisms of brain development and aging, and cellular processes (autophagy), as well as the progression of neurodegenerative diseases, are important processes in the mechanisms of brain aging.

      (4) As mentioned in the public review, it would be helpful if figures were revised to more clearly represent the claims.

      (4.1) For Figure 1, it would be beneficial to explain how the authors analyzed the differences between the mentioned cross-section and longitudinal trajectory, which they identified as a strength of the study.

      Response: We have added the strengths of adopting longitudinal data for modeling brain aging trajectories compared to only using cross-sectional data in Figure 1 caption in the revised manuscript:

      “Fig. 1 Overview of the study workflow. a, Population cohorts (UK Biobank and IMAGEN) and data sources (brain imaging, biological aging biomarkers, cognitive functions, genomic data) involved in this study. b, Brain aging patterns were identified using longitudinal trajectories of the whole brain GMV, which enabled the capturing of long-term and individualized variations compared to only use cross-sectional data, and associations between brain aging patterns and other measurements (biological aging, cognitive functions and PRS of major neuropsychiatric disorders) were investigated. c, Mirroring patterns between brain aging and brain development was investigated using ztransformed brain volumetric change map and gene expression analysis.”

      (4.2) In Figure 3, it's challenging to distinguish differences between patterns 1 and 2 in LTL and PhenoAge. (e.g. It's unclear whether Pattern 1 is higher or lower). Clarifying this visually would be useful.

      Response: We have modified the visualization of Figure 3 in the revised manuscript by adjusting the appropriate axes for leucocyte telomere length (LTL) and PhenoAge variables and removing the whisker from the boxplot.

      Author response image 3.

      Distributions of biological aging biomarkers (leucocyte telomere length (LTL) and PhenoAge) among participants with brain aging patterns 1 and 2.

      (4.3) Figure 7 explains the mirroring pattern, but it's hard to discern significant differences from the figures alone (especially in Figures 7b and 7c). Using an alternative method (graph, etc.) to clearly represent this would be appreciated.

      Response: We have included an arrow pointing to the brain regions with significant differences in each subfigure.

      Author response image 4.

      The “last in, first out” mirroring patterns between brain development and brain aging.

      (5) Abbreviations should be explained when they are first introduced in the paper. For example, GMV continues to be used without explanation, and in line 203, it is written out as 'gray matter volume'. ADHD and ASD first appear at line 172, but the explanation is found in lines 177-178. Additionally, there are terms without explanations in the manuscript. For instance, BMI is not explained in the main manuscript but is defined in the Supplementary Information (Table S6).

      Response: We have corrected the inappropriate formatting regarding misplaced and missing abbreviations in the revised manuscript and Supplementary Information.

      (6) Figure numbers should follow the order of appearance in the paper. The first Supplementary Fig. in the manuscript is Supplementary Figure 3. It should be Supplementary Figure 1.

      Response: We have relabeled the figures with the order of appearance in the paper in the revised manuscript and Supplementary Information.

      Reference:

      (1) Roweis, S. T. & Saul, L. K. Nonlinear dimensionality reduction by locally linear embedding. science 290, 2323–2326 (2000).

      (2) Christman, S. et al. Accelerated brain aging predicts impaired cognitive performance and greater disability in geriatric but not midlife adult depression. Translational Psychiatry 10, 317 (2020).

      (3) Elliott, M. L. et al. Brain-age in midlife is associated with accelerated biological aging and cognitive decline in a longitudinal birth cohort. Molecular psychiatry 26, 3829–3838 (2021).

      (4) Smith, S. M. et al. An expanded set of genome-wide association studies of brain imaging phenotypes in UK Biobank. Nature neuroscience 24, 737–745 (2021).

      (5) Smith, S. M. et al. Brain aging comprises many modes of structural and functional change with distinct genetic and biophysical associations. elife 9, e52677 (2020).

      (6) Tamnes, C. K. et al. Brain development and aging: overlapping and unique patterns of change. Neuroimage 68, 63–74 (2013).

      (7) Bethlehem, R. A. et al. Brain charts for the human lifespan. Nature 604, 525–533 (2022).

      (8) Desikan, R. S. et al. An automated labeling system for subdividing the human cerebral cortex on MRI scans into gyral based regions of interest. Neuroimage 31, 968–980 (2006).

      (9) Singh, S. & Singh, T. G. Role of nuclear factor kappa B (NF-κB) signalling in neurodegenerative diseases: an mechanistic approach. Current Neuropharmacology 18, 918–935 (2020).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public review): 

      Summary: 

      This is a comprehensive study that clearly and deeply investigates the function of GATA6 in human early cardiac development. 

      Strengths: 

      This study combines hESC engineering, differentiation, detailed gene expression, genome occupancy, and pathway modulation to elucidate the role of GATA6 in early cardiac differentiation. The work is carefully executed and the results support the conclusions. The use of publicly available data is well integrated throughout the manuscript. The RIME experiments are excellent. 

      Weaknesses: 

      Much has been known about GATA6 in mesendoderm development, and this is acknowledged by the authors. 

      We appreciate the comments and have tried to highlight both the early role of GATA6 in cardiac progenitor biology as well as the haploinsufficiency for relevance to human congenital heart disease, which we believe adds value to other recent published work, among others Sharma et al. eLife 2020.

      Reviewer #2 (Public review): 

      Summary: 

      This manuscript by Bisson et al describes the role of GATA6 to regulate cardiac progenitor cell (CPC) specification and cardiomyocyte (CM) generation using human embryonic stem cells (hESCs). The authors found that GATA6 loss-of-function hESC exhibits early defects in mesendoderm and lateral mesoderm patterning stages. Using RNA-seq and CUT&RUN assays the genes of the Wnt and BMP programs were found to be affected by the loss of GATA6 expression. Modulating Wnt and BMP during early cardiac differentiation can partially rescue CPC and CM defects in GATA6 hetero- and homozygous mutant hESCs. 

      Strengths: 

      The studies performed were rigorous and the rationale for the experimental design was logical. The results obtained were clear and supported the conclusions that the authors made regarding the role of GATA6 on Wnt and BMP pathway gene expression. 

      Weaknesses: 

      Given the wealth of studies that have been performed in this research area previously, the amount of new information provided in this study is relatively modest. Nevertheless, the results and quite clear and should make a strong contribution to the field. 

      Likewise for reviewer 2, we appreciate the comments and have tried to highlight both the early role of GATA6 in cardiac progenitor biology as well as the haploinsufficiency for relevance to human congenital heart disease.

      Reviewer #3 (Public review): 

      In this study, Bison et al. analyzed the role of the GATA6 transcription factor in patterning the early mesoderm and generating cardiomyocytes, using human embryonic stem cell differentiation assays and patient-derived hiPSCs with heart defects associated with mutations in the GATA6 gene. They identified a novel role for GATA6 in regulating genes involved in the WNT and BMP pathways -findings not previously noted in earlier analyses of GATA6 mutant hiPSCs during early cardiac mesoderm specification (Sharma et al., 2020). Modulation of the WNT and BMP pathways may partially rescue early cardiac mesoderm defects in GATA6 mutant hESCs. These results provide significant insights into how GATA6 loss-of-function and heterozygous mutations contribute to heart defects. 

      I have the following comments: 

      (1) Throughout the manuscript, Bison et al. alternate between different protocols to generate cardiomyocytes, which creates some confusion (e.g., Figure 1 vs. Supplemental Figure 2A). The authors should provide a clear justification for using alternative protocols. 

      We agree and clarified this issue in the revision (p. 6). The reviewer is correct that there are two widely used protocols for directed differentiation of PSCs to cardiac fate. One is a cytokine-based protocol (Fig. 1A) and the other uses small molecules to manipulate the WNT pathway (CHIR protocol, Supplemental Fig. 2B). In our study, we used the CHIR protocol only for experiments in Supplemental Figure 2B-E. Since our data implicated BMP and WNT as mediators of the GATA6-dependent program, we did this mainly to confirm that the phenotype we observed with the cytokine-based protocol was not biased by the differentiation protocol. However, we found the CHIR protocol to be overall relatively inefficient for cardiac differentiation using the parental H1 hESCs and the various isogenic lines. The in vitro cardiac differentiation protocols for hPSCs are known to be variable depending on lines and sometimes require extensive optimization for various media components and concentrations, cell seeding densities, and batch variations for crucial reagents. The cytokine-based protocol we optimized worked most efficiently with our hPSC lines to generate cardiomyocytes, therefore we committed to using it for the bulk of experiments in this study.  

      (2) The authors should characterise the mesodermal identity and cardiomyocyte subtypes generated with the activin/BMP-induction protocol thoroughly and clarify whether defects in the expression of BMP and WNTrelated gene affect the formation of specific cardiomyocyte subtypes in a chamber-specific manner. This analysis is important, as Sharma et al. suggested a role for GATA6 in orchestrating outflow tract formation, and Bison et al. similarly identified decreased expression of NRP1, a gene involved in outflow tract septation, in their GATA6 mutant cells. 

      We agree it is important that the mesodermal identities are quite thoroughly characterized.

      For example, Fig. 2 (K+P+, Brachyury, EOMES), Fig. 3G&H (lateral mesoderm, cardiac mesoderm RNAseq & GSEA comparing datasets from Koh et al.). The capacity of the cytokine-based protocol to generate both FHF and SHF derived sub-types has been rigorously evaluated by Keller and colleagues, which we now cite (Yang et al. 2022). Since the null cells do not generate CMs, chamber specific subtypes cannot be evaluated; whether the GATA6 heterozygous mutants are biased is an interesting question. Indeed, the top GO term identified by CUT&RUN analysis for GATA6 at day 2 of

      differentiation is outflow tract morphogenesis, which is consistent with the interpretation by Sharma et al., but implicates this program at a much earlier developmental stage, long before cardiomyocyte differentiation. We think this is one of the most important findings of our study and appreciate the chance to highlight this in the revision (p. 9, 17). When we evaluated chamber-specificity for differentiated cardiomyocytes, we did not find significant differences, as indicated for the reviewer in the panel below (day 20 of differentiation). Since our study focuses on early stages of progenitor specification rather than cardiomyocyte differentiation, we agree that a more rigorous analysis would be of value, and indicated this as a limitation of our current study (p. 18).

      Author response image 1.

      (3) The authors developed an iPSC line derived from a congenital heart disease (CHD) patient with an atrial septal defect and observed that these cells generate cTnnT+ cells less efficiently. However, it remains unclear whether atrial cardiomyocytes (or those localised specifically at the septum) are being generated using the activin/BMP-induction protocol and the patient-derived iPSC line.

      As indicated above, our study is focused on cardiac progenitor specification, and we found similar differences with the patient-derived iPSC-CMs compared to using hESC heterozygous targeted mutants. While we did not note any major differences in expression of cardiomyocyte markers, whether the mutants show any biases toward sub-types of cardiomyocytes is an interesting question to be pursued in subsequent work.

      (4) The authors should also justify the necessity of using the patient-derived line to further analyse GATA6 function. 

      This is a good point, and as suggested we provided the justification (p. 5-6). This is the first patient-derived iPSC line published with a heterozygous GATA6 mutation along with an isogenic mutation-corrected control generated for cardiac directed differentiation. Patients with congenital heart disease (CHD) associated with GATA6 mutations are typically heterozygous (also true for many other CHD variants; presumably homozygous null embryos would not survive). It is important to query if phenotypes found using targeted mutations in hESCs (or iPSCs) model the human disease, since the patient cells (or the hESCs) likely have additional genetic variants that might interact with the GATA6 mutation. The fact that both types of heterozygous cells (patient-derived iPSCs and targeted hESCs) generate similar defects in CM differentiation provides evidence supporting the use of these human cellular models to study the genetic and cellular basis for congenital heart disease. This is particularly important, since other models, such as heterozygous mice, do not show such phenotypes.

      (5) Figure 3 suggests an enrichment of paraxial mesoderm genes in the context of GATA6 loss-of-function, which is intriguing given the well-established role of GATA6 in specifying cardiac versus pharyngeal mesoderm lineages in model organisms. Could the authors expand their analysis beyond GO term enrichment to explore which alternative fates GATA6 mutant cells may acquire? Additionally, how does the potential enrichment of paraxial mesoderm, rather than pharyngeal mesoderm, relate to the initial mesodermal induction from their differentiation protocol? Could the authors also rule out the possibility of increased neuronal cell fates? 

      We need to interpret our in vitro differentiation data cautiously in relation to what has been shown in vivo, since we are unlikely to be reproducing all the complex signaling taking place in the embryo. Yet we do see modest increases in gene expression levels including signatures of paraxial mesoderm and ECM/mesenchymal at days 2 or 3 of differentiation in the GATA6 mutant cells. Therefore, we now include a heatmap showing enriched paraxial mesoderm gene expression in the mutant cells, new Fig. 3I (see page 10).

      A caveat of this result is that the cells are being differentiated toward cardiac fate, so a bias for alternative fates might be suppressed. We modified the protocol to favor paraxial fate by adding CHIR at day 2 (rather than XAV) and performing qPCR assays at day 3. We found this successfully induced paraxial mesoderm gene expression, but equally comparing wildtype, heterozygous, or null cells, so do not feel it warrants highlighting further. 

      Recommendations for the authors:  

      Reviewing Editor (Recommendations for the authors): 

      Incorporation of marker analysis for various stages of iPSC to CM differentiation (mesoderm, cardiac progenitor, CM subtypes) would increase the significance and support for the findings presented. Further data on the link (direct or indirect) between GATA6 and Wnt/BMP signalling would also add to the significance of this study. A number of textual changes/clarifications are also suggested to improve the manuscript. 

      We appreciate the feedback and provide responses for issues raised for markers, direct or indirect interactions, and textual changes/clarifications in the following sections. As indicated above, we did not find obvious alterations in cardiac subtypes, but since our study is focused on early progenitor specification, this is an interesting question that we think should be more rigorously evaluated in subsequent work.  

      Reviewer #1 (Recommendations for the authors)

      Minor details: 

      (1) On p6 "Principal component analysis (PCA) showed that the cells derived from each genotype were well separated from each other (Supplemental Figure 2C)". All genotypes should be in one PCA plot to better evaluate the three genotypes. 

      We prepared the new plot as suggested, presented as new Supplemental Fig. 2C. 

      (2) p10: "Chia et al.22 and found a significantly decreased enrichment in GATA6-/- cells relative to WT at day 2" decreased enrichment of what? Direct target genes? 

      Thank you for catching this. Yes, the text was changed to indicate a “decreased enrichment in GATA6-/- cells relative to WT at day 2 for putative direct GATA6 target genes.” 

      Reviewer #2 (Recommendations for the authors): 

      Overall, this is an interesting study that addresses the early developmental roles of GATA6 on cardiac differentiation. While the identification of Wnt and BMP pathway genes to be involved in GATA6 regulation is not entirely unexpected, the authors do bring forth some useful knowledge that helps to further elucidate the mechanism of pre-cardiac mesoderm regulation. Some suggestions for improvement are included below - 

      Major points: 

      (1) Since the loss of Gata6 in this study is global (either as heterozygous or homozygous, it is likely that the very early requirement of Gata6 (e.g. mesodermal stage of differentiation) is responsible for the cardiac transcriptional phenotype observed and not due to specific role of Gata6 in the cardiac lineage which would need to be addressed using conditional knock out of Gata6 in hPSC model. The authors should be more explicit when discussing the results as disruption of mesodermal differentiation leading to loss of downstream cardiac lineage cells. For example, I would change the title "GATA6 loss-of-function impairs CM differentiation" to "GATA6 loss-of-function impairs mesodermal (or mesodermal lineage) differentiation" and show the changes in cardiac progenitor cells genes (Isl1, Tbx1, Hand1, and BAF50c/Smarcd3) in addition to cardiomyocyte genes but no change in mesodermal (e.g. Brachyury, T, Eomes, Mesp1/2, etc) genes. 

      We agree with the reviewer’s interpretation. The title for the section was changed as suggested. In Fig. 1, we show changes in cardiac progenitor cell genes (Isl1, Hand1, and BAF50c/Smarcd3) while not seeing changes in mesodermal genes in Fig. 2 (e.g. Brachyury, Eomes, Mesp1/2). We note that the defect may be specific to cardiac (or anterior lateral) mesoderm, as the ability to express paraxial mesoderm markers was not impaired.  

      (2) The use of NKX2.5, TBX5, TBX20, and GATA4 as markers for CPC is not ideal. These markers are also expressed in differentiated cardiomycytes. ISL1 or TBX1 for second heart field progenitors and HAND1 or BAF60c/Smarcd3 for first heart field progenitors would be ideal.  

      As suggested, we included additional day 6 qPCR panel (new Fig. 1E) to evaluate the heart field progenitor markers. 

      (3) Much of the findings described in this study have been known in the field including the requirement of Wnt and BMP to induce mesodermal and subsequently cardiomyocyte differentiation. The key new information here is that Gata6 knockout disrupts Wnt and BMP signaling. It would help to further validate experimentally some of the Wnt and BMP genes as either direct or indirect targets of Gata6 using reporter assays. 

      While reporter assays are feasible and do provide relevant outputs, we feel that the use of any one or even several response elements in a reporter assay adds relatively little value compared to comprehensive analysis of bona fide network components. To address the reviewers concern we have included profiling heat maps for WNT and BMP pathway components to more rigorously and specifically evaluate the disruption in the signaling networks caused by loss of GATA6. Proving direct targets of endogenous genes is challenging, but we mapped many binding peaks for GATA6 to putative enhancers of WNT/BMP pathway genes (based on histone marks). We provide a list of these genes (new Fig. 4F) and distinguish these from WNT/BMP pathway genes that were not bound by GATA6 yet are down-regulated in the GATA6 mutant cells and are likely to be indirect targets (p. 12). 

      Minor points: 

      (1) Figures 1 and 2 - in the figure legend the labels w2, w4, m2, m5, m11, and m14 should be explained as the name of the clones of targeted hESC.  

      The legends were edited to provide this information.  

      (2) Supplemental Figure 3A - the resolution of the FACS plot is suboptimal. 

      We apologize and have corrected the plot resolution in the revised manuscript.  

      (3) Supplemental Table 1 - it's intriguing that amongst all the SWI/SNF factors, the one that is known to be cardiac-specific (SMARCD3) did not come up in the GATA6-RIME-enriched proteins. Is this a reflection of the early stage in which GATA6 plays a role in development (e.g. mesendoderm development but not precardiac mesoderm development when SMARCD3 is expressed)? 

      We agree and have noted this feature in the revised manuscript (p. 17). We note that SMARCD3 is expressed in the RNA-seq data as early as day 2. Although speculative, it may be that GATA6 primarily interacts with SWI/SNF complexes prior to the role for SMARCD3 in cardiac specification.

      Reviewer #3 (Recommendations for the authors): 

      (1) Figures 3G and 3H, as well as others, have resolution issues. The gene names are unreadable, and higherresolution images should be provided. 

      We apologize for the resolution issues and these have been fixed in the revised version. 

      (2) In their early manipulation of the WNT and BMP pathways (Figure 6A), it is unclear whether the activin/BMP protocol shown in Figure 1A was used. If this is the case, the authors should compare their results to a wild-type + DOX EV condition for consistency. 

      We clarified in the revision (Fig. 6A) that all the experiments in Fig. 6 use the cytokine protocol. In the revised figure, we included the wild-type + DOX EV condition as suggested. 

      (3) In Figures 6C and 6D, the authors should include an analysis of a wild-type isogenic line under their new CHIR/LB condition for comparison. 

      As suggested, we included the WT isogenic line in the comparison. For Fig. 6C these are shown on a separate graph because the Y-axis values are very different. Note that the CHIR/LB treatments that improve mutant cell differentiation impact the WT cells in the opposite manner.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      This is an interesting manuscript that extends prior work from this group identifying that a chemovar of Cannabis induces apoptosis of T-ALL cells by preventing NOTCH1 cleavage. Here the authors isolate specific components of the chemovar responsible for this effect to CBD and CBDV. They identify the mechanism of action of these agents as occurring via the integrated stress response. Overall the work is well performed but there are two lingering questions that would be helpful to address as follows:

      • Exactly how CBD and CBDV result in the upregulation of the TRPV1/integrated stress response is unclear. What is the most proximal target of these agents that results in these changes?

      The interaction of CBD and CBDV with TRPV1 has been thoroughly investigated by previous studies in the field. A few prominent examples are:

      (1) De Petrocellis, Luciano, Alessia Ligresti, Aniello Schiano Moriello, Marco Allarà, Tiziana Bisogno, Stefania Petrosino, Colin G. Stott, and Vincenzo Di Marzo. "Effects of cannabinoids and cannabinoid‐enriched Cannabis extracts on TRP channels and endocannabinoid metabolic enzymes." British journal of pharmacology 163, no. 7 (2011): 1479-1494.

      (2) Muller, Chanté, Paula Morales, and Patricia H. Reggio. "Cannabinoid ligands targeting TRP channels." Frontiers in molecular neuroscience 11 (2019): 487.

      (3) Iannotti, Fabio Arturo, Charlotte L. Hill, Antonio Leo, Ahlam Alhusaini, Camille Soubrane, Enrico Mazzarella, Emilio Russo, Benjamin J. Whalley, Vincenzo Di Marzo, and Gary J. Stephens. "Nonpsychotropic plant cannabinoids, cannabidivarin (CBDV) and cannabidiol (CBD), activate and desensitize transient receptor potential vanilloid 1 (TRPV1) channels in vitro: potential for the treatment of neuronal hyperexcitability." ACS chemical neuroscience 5, no. 11 (2014): 1131-1141.

      (4) Costa, Barbara, Gabriella Giagnoni, Chiara Franke, Anna Elisa Trovato, and Mariapia Colleoni. "Vanilloid TRPV1 receptor mediates the antihyperalgesic effect of the nonpsychoactive cannabinoid, cannabidiol, in a rat model of acute inflammation." British journal of pharmacology 143, no. 2 (2004): 247-250.

      (5) de Almeida, Douglas L., and Lakshmi A. Devi. "Diversity of molecular targets and signaling pathways for CBD." Pharmacology research & perspectives 8, no. 6 (2020): e00682.

      (6) Anand, Uma, Ben Jones, Yuri Korchev, Stephen R. Bloom, Barbara Pacchetti, Praveen Anand, and Mikael Hans Sodergren. "CBD effects on TRPV1 signaling pathways in cultured DRG neurons." Journal of Pain Research (2020): 22692278.

      Similarly, other works have demonstrated the link between TRPV1 and the integrated stress response pathway (see below). These studies suggested increased reactive oxygen species (ROS) production, Cyclooxygenase-2 (COX-2) enzyme, as well as other stressors, lead to modulation of intracellular calcium levels by TRPV1.

      (1) Ho, Karen W., Nicholas J. Ward, and David J. Calkins. "TRPV1: a stress response protein in the central nervous system." American journal of neurodegenerative disease 1, no. 1 (2012): 1.

      (2) de la Harpe, Amy, Natasha Beukes, and Carminita L. Frost. "CBD activation of TRPV1 induces oxidative signaling and subsequent ER stress in breast cancer cell lines." Biotechnology and Applied Biochemistry 69, no. 2 (2022): 420-430.

      (3) Soliman, Eman, and Rukiyah Van Dross. "Anandamide‐induced endoplasmic reticulum stress and apoptosis are mediated by oxidative stress in nonmelanoma skin cancer: Receptor‐independent endocannabinoid signaling." Molecular Carcinogenesis 55, no. 11 (2016): 1807-1821.

      • Related to the above, all experiments to confirm the mechanism of action of CBD/CBDV rely on chemical agents, whose precise targets are not fully clear in some cases. Thus, some use of genetic means (such as by knockout of TRPV1, ATF4) to confirm the dependency of these pathways on drug response and NOTCH cleavage would be very helpful.

      Knockdown experiments and inhibition with inhibitors are two different approaches to studying the function of a specific gene or protein. Each method has its advantages and limitations. We initially attempted to knock-down CHAC1, but only managed to silence ~50% (Incomplete knockdown). Following treatment of MOLT4 cells with the whole extract, we observed only a partial downregulation in the mRNA expression of the Notch intracellular domain (NICD) (left panel), which could account for the incomplete rescue from the extract-induced death (right panel). We therefore turned to confirm the signaling pathway by inhibition of different targets with chemical agents.

      Author response image 1.

      Partial knockdown of CHAC1 hinders extract-induced cell death. (A) MOLT-4 cells were treated with either an empty vector or shRNA for Chac1, 369 and 739 represent two different areas of Chac1, for 48 hrs. Then, the gene expression of CHAC1 was assessed via qRT-PCR (N=3). (B) MOLT-4 cells were treated as in A, then added vehicle control or whole Extract (3 µg/mL) for additional 24 hrs, and the viability of the cells was assessed with XTT.

      Reviewer #2 (Public Review):

      Summary:

      The Meiri group previously showed that Notch1-activated human T-ALL cell lines are sensitive to a cannabis extract in vitro and in vivo (Ref. 32). In that article, the authors showed that Extract #12 reduced NICD expression and viability, which was partially rescued by restoring NICD expression. Here, the authors have identified three compounds of Extract #12 (CBD, 331-18A, and CBDV) that are responsible for the majority of anti-leukemic activity and NICD reduction. Using a pharmacological approach, the authors determined that Extract #12 exerted its anti-leukemic and NICD-reducing effects through the CB2 and TRPV1 receptors. To determine the mechanism, the authors performed RNA-seq and observed that Extract #12 induces ER calcium depletion and stress-associated signals -- ATF4, CHOP, and CHAC1. Since CHAC1 was previously shown to be a Notch inhibitor in neural cells, the authors assume that the cannabis compounds repress Notch S1 cleavage through CHAC1 induction. The induction of stress-associated signals, Notch repression, and anti-leukemic effects were reversed by the integrated stress response (ISR) inhibitor ISRIB. Interestingly, combining the 3 cannabinoids gave synergistic anti-leukemic effects in vitro and had growthinhibitory effects in vivo.

      Strengths:

      (1) The authors show novel mechanistic insights that cannabinoids induce ER calcium release and that the subsequent integrated stress response represses activated NOTCH1 expression and kills T-ALL cells.

      (2) This report adds to the evidence that phytocannabinoids can show a so-called "entourage effect" in which minor cannabinoids enhance the effect of the major cannabinoid CBD.

      (3) This report dissects the main cannabinoids in the previously described Extract #12 that contribute to T-ALL killing.

      (4) The manuscript is clear and generally well-written.

      (5) The data are generally high quality and with adequate statistical analyses.

      (6) The data generally support the authors' conclusions. The exception is the experiments related to Notch.

      (7) The authors' discovery of the role of the integrated stress response might explain previous observations that SERCA inhibitors block Notch S1 cleavage and activation in T-ALL (Roti Cancer Cell 2013). The previous explanation by Roti et al was that calcium depletion causes Notch misfolding, which leads to impaired trafficking and cleavage. Perhaps this explanation is not entirely sufficient.

      Weaknesses:

      (1) Given the authors' previous Cancer Communications paper on the anti-leukemic effects and mechanism of Extract #12, the significance of the current manuscript is reduced.

      Our original manuscript consisted extensive multidisciplinary research, and we were asked to divide the research work into a paper that focuses on the cannabis plant and another paper that addresses finding the specific molecules and their underlying mechanism.

      We understand that our publication of the initial observations with the whole extract dampened the overall novelty presented here, but the previous publication lacked the detailed and strong mechanistic work presented here that explains how the cannabis extract exerted its antitumoral effects.

      In addition, the finding of the need for 3 phytocannabinoids and the synergy analysis supplies essential support to the ‘entourage effect’. This is a phenomenon in which the presence of minor proportions of cannabinoids and other plant components significantly modulate the effects of the main active components of cannabis and thereby produce more potent or more selective effects than the use of one major cannabinoid alone. It was well-demonstrated for endocannabinoids but was only demonstrated in very few studies for phytocannabinoids.

      (2) It would be important to connect the authors' findings and a wealth of literature on the role of ER calcium/stress on Notch cleavage, folding, trafficking, and activation.

      We mentioned three previous papers (ref. 34-36) that guided us in our investigation. Following this reviewer’s comment, we added to the discussion a few lines connecting our findings to previous works on ER stress and Notch activation with the appropriate references.

      (3) There is an overreliance on the data on a single cell line -- MOLT4. MOLT4 is a good initial choice as it is Notch-mutated, Notch-dependent, and representative of the most common T-ALL subtype -- TAL1. However, there is no confirmatory data in other TAL1positive T-ALLs or interrogation of other T-ALL subtypes.

      As mentioned by the reviewer, this study followed a previous publication in which 7 different cell lines were assessed (MOLT‐4, CCRF‐CEM, Jurkat, Loucy, HPB-ALL, DND-41and T-ALL1). MOLT-4 cells were used to investigate the mechanism, both MOLT-4 cells and CCRF-CEM cells were utilized to investigate the effect of the cannabinoid combination or the whole extract in-vivo.

      (4) Fig. 6H. The effects of the cannabinoid combination might be statistically significant but seem biologically weak.

      Survival rates are presented in Fig. 6H for the combination of the cannabinoids and in Supplementary Fig. S6C for the whole extract. While this mouse model provides valuable insights, the biological significance and the translation of findings to human patients require cautious interpretation.

      (5) Fig. 3. Based on these data, the authors conclude that the cannabinoid combination induces CHAC1, which represses Notch S1 cleavage in T-ALL cells. The concern is that Notch signaling is highly context-dependent. CHAC1 might inhibit Notch in neural cells (Refs. 34-35), but it might not do this in a different context like T-ALL. It would be important to show evidence that CHAC1 represses S1 cleavage in the T-ALL context. More importantly, Fig. 3H clearly shows the cannabinoid combination inducing ATF4 and CHOP protein expression, but the effects on CHAC1 protein do not seem to be satisfactory as a mechanism for Notch inhibition. Perhaps something else is blocking Notch expression?

      We understand the reviewer’s concern. Previous works had shown the upregulation of CHAC1 also in the context of Notch signaling in leukemia, particularly recently also for T-ALL:

      (1) Meng, X., Matlawska-Wasowska, K., Girodon, F., Mazel, T., Willman, C.L., Atlas, S., Chen, I.M., Harvey, R.C., Hunger, S.P., Ness, S.A. and Winter, S.S., 2011. GSI-I (Z-LLNle-CHO) inhibits γ-secretase and the proteosome to trigger cell death in precursor-B acute lymphoblastic leukemia. Leukemia, 25(7), pp.11351146.

      (2) Chang, Yoon Soo, Joell J. Gills, Shigeru Kawabata, Masahiro Onozawa, Giusy Della Gatta, Adolfo A. Ferrando, Peter D. Aplan, and Phillip A. Dennis. "Inhibition of the NOTCH and mTOR pathways by nelfinavir as a novel treatment for T cell acute lymphoblastic leukemia." International Journal of Oncology 63, no. 5 (2023): 1-12.

      As for the second part of the reviewer’s comment, we tested both the mRNA transcript and protein expression of CHAC1. The increase is clearly shown at 60 min for the mRNA Fig. 3D and Fig. 4F and for the protein also in Supplementary Fig. S4G-I.

      To show direct involvement of CHAC1 we utilized the means of knockdown. Though it was not completely effective and accounted for about ~50% reduction, it clearly shows the involvement of CHAC1 in the mechanism leading to the reduction in viability of these cancer cells.

      Author response image 2.

      Partial knockdown of CHAC1 hinders extract-induced cell death. (A) MOLT-4 cells were treated with either an empty vector or shRNA for Chac1, 369 and 739 represent two different areas of Chac1, for 48 hrs. Then, the gene expression of CHAC1 was assessed via qRT-PCR (N=3). (B) MOLT-4 cells were treated as in A, then added vehicle control or whole Extract (3 µg/mL) for additional 24 hrs, and the viability of the cells was assessed with XTT.

      (6) Fig. 4B-C/S5D-E. These Western blots of NICD expression are consistent with the cannabinoid combination blocking Furin-mediated NOTCH1 cleavage, which is reversed by ISR inhibition. However, there are many mechanisms that regulate NICD expression. To support their conclusion that the effects are specifically Furin-medated, the authors should probe full-length (uncleaved) NOTCH1 in their Western blots.

      We have probed for the full-length Notch1 in our previously published paper (Cancer Communications, Supplementary Fig. S1G-I). As we have shown here the three cannabinoids together mimic the effect of the whole extract, we did not repeat the experiments with full-length Notch1.

      (7) Fig. S4A-B. While these pharmacologic data are suggestive that Extract #12 reduces NICD expression through the CB2 receptor and TRPV1 channel, the doses used are very high (50uM). To exclude off-target effects, these data should be paired with genetic data to support the authors' conclusions.

      We performed a dose-response experiment before choosing the doses used for the inhibitors of CB2 and TRPV1 (see below). The dose of 50 µM was selected as it did not affect the viability of the cells.

      Author response image 3.

      Dose-response of CB2 and TRPV1 inhibitors in MOLT-4 cells. MOLT-4 cells were treated with increasing concentrations (µM) of (A) CB2 inhibitor AM630 or (B) TRPV1 inhibitor AMG9810; and 24 hrs later the viability of the cells was assessed with XTT.

      Reviewer #2 (Recommendations For The Authors):

      (1) In Fig. 6H, it is unclear why the authors are using CCRF-CEM cells, which are known to be resistant to Notch inhibitors, rather than popular cell lines that are Notch-dependent (e.g. CUTLL1, DND-41, HPB-ALL). Since cannabinoids seem to kill at least in part through Notch inhibition, the effects would be predicted to be greater in Notch-dependent T-ALL cell lines than Notch-independent cell lines like CCRF-CEM. To show stronger in vivo preclinical efficacy, another suggestion is to combine cannabinoids with tolerable dosing of gammasecretase inhibitors as published by the Michelle Kelliher group.

      We have shown in our previous publication that both MOLT-4 and CCRF-CEM cells are dependent on Notch for their propagation, while other cell lines of T-ALL such as Loucy and Jurkat do not. Therefore, we treat CCRF-CEM as Notch-dependent. We discuss the possibility of using the cannabinoid combination with other treatments, specifically chemotherapy, to enhance effectiveness.

      (2) To increase significance, this reviewer suggests strengthening the mechanism. However, this reviewer understands the challenge of identifying the correct mechanism. Thus, an alternative would be to increase clinical relevance. Some specific suggestions are described below.

      (a) With regard to increasing mechanistic insights, the authors should be aware of some papers that might be helpful. Roti et al (Cancer Cell 2013) showed that SERCA inhibitors like thapsigardin reduce ER calcium levels and block Notch signaling by inhibiting NOTCH1 trafficking and inhibiting Furin-mediated (S1) cleavage of Notch1. Multiple EGF repeats and all three Lin12/Notch repeats in the extracellular domains of Notch receptors require calcium for proper folding (Aster Biochemistry 1999; Gordon Nat. Struct. Mol. Biol. 2007; Hambleton Structure 2004; Rand Protein Sci 1997). Thus, Roti et al concluded that ER calcium depletion blocks NOTCH1 S1 cleavage. This effect seems to be conserved in Drosophila as Periz and Fortiin (EMBO J, 1999) showed impaired Notch cleavage in Ca2+/ATPasemutated Drosophila cells. Besser et al should consider these papers when exploring the mechanism by which the ER calcium release by the cannabinoid combination blocks activated NOTCH1 expression. Similarities and differences should be discussed.

      As mentioned above and stated also by the reviewer, many papers have shown the cleavage and/or activation of Notch following ER stress.

      (b) With regard to increasing clinical relevance, the authors should consider testing the effects of the cannabinoid combination on primary samples, PDX models, and/or genetically engineered mouse models. Pan-Notch inhibitors like gamma-secretase inhibitors (GSIs) have been disappointing in clinical trials because of excessive on-target toxicity, in particular in the intestine. The authors should consider exploring whether the cannabinoids might be superior to GSIs with regard to intestinal toxicity and why that might be (e.g. receptor expression).

      We thank the reviewer and agree that clinical relevance is of outmost importance. As obtaining primary tumor cells from patients is challenging, we assessed the whole cannabis extract in a PDX model. This extract is already being used by patients. We added this result as Supplementary fig. S7, and address it in the main text of the Results and in the Materials and Methods section.

      (3) Since the authors have performed gene expression profiling, another test to confirm that Extract #12 acts through the Notch pathway is to perform enrichment analysis for known Notch target genes in T-ALL (e.g. Wang PNAS 2013).

      We performed the analysis and this is how we pinpointed the involvement of ATF4, CHOP and CHAC1 of the integrated stress response pathway.

      Minor concern:

      Supplemental Table S4. According to the text (page 10, line 160) and table title, these data are RNA-seq. However, according to the GSE154287 annotation, these data are Affymetrix arrays There are no gene names in the GSE table. Are the IDs probesets rather than genes?

      Indeed, the gene analysis data are Affymetrix arrays and the title was corrected.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      Summary:

      The manuscript by Bohra et al. describes the indirect effects of ligand-dependent gene activation on neighboring non-target genes. The authors utilized single-molecule RNA-FISH (targeting both mature and intronic regions), 4C-seq, and enhancer deletions to demonstrate that the non-enhancer-targeted gene TFF3, located in the same TAD as the target gene TFF1, alters its expression when TFF1 expression declines at the end of the estrogen signaling peak. Since the enhancer does not loop with TFF3, the authors conclude that mechanisms other than estrogen receptor or enhancer-driven induction are responsible for TFF3 expression. Moreover, ERα intensity correlations show that both high and low levels of ERα are unfavorable for TFF1 expression. The ERa level correlations are further supported by overexpression of GFP-ERa. The authors conclude that transcriptional machinery used by TFF1 for its acute activation can negatively impact the TFF3 at peak of signaling but once, the condensate dissolves, TFF3 benefits from it for its low expression.

      Strengths:

      The findings are indeed intriguing. The authors have maintained appropriate experimental controls, and their conclusions are well-supported by the data.

      Weaknesses:

      There are some major and minor concerns that related to approach, data presentation and discussion. But I think they can be fixed with more efforts.

      We thank the reviewer for their positive comments on the paper. We have addressed all their specific recommendations below.  

      The deletion of enhancer reveals the absolute reliance of TFF1 on its enhancers for its expression. Authors should elaborate more on this as this is an important finding.

      We thank the reviewer for the comment. We have now added a more detailed discussion on the requirement of enhancer for TFF1 expression in the revised manuscript (line 368-385).  

      In Fig. 1, TFF3 expression is shown to be induced upon E2 signaling through qRT-PCR, while smFISH does not display a similar pattern. The authors attribute this discrepancy to the overall low expression of TFF3. In my opinion, this argument could be further supported by relevant literature, if available. Additionally, does GRO-seq data reveal any changes in TFF3 expression following estrogen stimulation? The GRO-seq track shown in Fig.1 should be adjusted to TFF3 expression to appreciate its expression changes.

      We have now included a browser shot image of TFF3 region showing GRO-Seq signal at E2 time course (Fig. S1C). We observed an increased transcription towards the 3’ end of TFF3 gene body at 3h.  The increased transcription at 3h, corroborates with smFISH data. The relative changes of TFF3 expression measured by qRT-PCR and smFISH for intronic transcripts are somewhat different, we speculate that such biased measurements that are dependent on PCR amplifications could be more for genes that express at low levels and smFISH using intronic probes may be a more sensitive assay to detect such changes.    

      Since the mutually exclusive relationship between TFF1 and TFF3 is based on snap shots in fixed cells, can authors comment on whether the same cell that expresses TFF1 at 1h, expresses TFF3 at 3h? Perhaps, the calculations taking total number of cells that express these genes at 1 and 3h would be useful.

      Like pointed out by the reviewer, since these are fixed cells, we cannot comment on the fate of the same cell at two time points. To further address this limitation, future work could employ cells with endogenous tags for TFF1 and TFF3 and utilize live cell imaging techniques. In a fixed cell assay, as the reviewer suggests, it can be investigated whether a similar fraction shows high TFF3 expression at 3h, as the fraction that shows high TFF1 expression at 1 h. To quantify the fractions as suggested by the reviewer, we plotted the fraction of cells showing high TFF1 and TFF3 expression at 1h and 3h. We identify truly high expressing cells by taking mean and one standard deviation (for single cell level data) at E2-1hr as the threshold for TFF1 (80 and above transcript counts) and mean and one standard deviation (for single cell level data) at E2-3hr as the threshold for TFF3 (36 and above transcript counts). The fraction with high TFF1 expression at 1h  (12.06 ± 2.1) is indeed comparable to that with high TFF3 expression at 3h (12.50 ± 2.0) (Fig. 2C and Author response image 1). We should note that if the transcript counts were normally distributed, a predetermined fraction would be expected to be above these thresholds and comparable fractions can arise just from underlying statistics. But in our experiments, this is unlikely to be the case given the many outliers that affect both the mean and the standard deviation, and the lack of normality and high dispersion in single cell distributions. Of course, despite the fractions being comparable, we cannot be certain if it is the same set of cells that go from high expression of TFF1 to high expression of TFF3, but definitely that is a possibility. We thank the reviewer for pointing out this comparison.

      Author response image 1.

      The graph represents the percent of cells that show high expression for TFF1 and TFF3 at 1h and 3h post E2 signaling. The threshold was collected by pooling in absolute RNA counts from 650 analyzed cells (as in Fig. 2C). The mean and standard deviation over single cell data were calculated. Mean plus one standard deviation was used to set the threshold for identifying high expressing cells. For TFF1, as it maximally expresses at 1h the threshold used was 80. For TFF3, as it maximally expresses at 3h the threshold used was 36. Fraction of cells expressing above 80 and 36 for TFF1 and TFF3 respectively were calculated from three different repeats. Mean of means and standard deviations from the three experiments are plotted here.

      Authors conclude that TFF3 is not directly regulated by enhancer or estrogen receptor. Does ERa bind on TFF3 promoter? 

      The ERa ChIP-seq performed at 1h and 3h of signaling suggests that TFF3 promoter is not bound by ERa as shown in supplementary Fig. 1B and S1B. However, one peak upstream to TFF1 promoter is visible and that is lost at 3h. 

      Minor comments:

      Reviewer’s comment -The figures would benefit from resizing of panels. There is very little space between the panels.

      We have now resized the figures in the revised manuscript.

      The discussion section could include an extrapolation on the relationship between ERα concentration and transcriptional regulation. Given that ERα levels have been shown to play a critical role in breast cancer, exploring how varying concentrations of ERα affect gene expression, including the differential regulation of target and non-target genes, would provide valuable insights into the broader implications of this study.

      This is a very important point that was missing from the manuscript. We have included this in the discussion in the revised manuscript (line 426-430).

      Reviewer #2:

      Summary:

      In this manuscript by Bohra et al., the authors use the well-established estrogen response in MCF7 cells to interrogate the role of genome architecture, enhancers, and estrogen receptor concentration in transcriptional regulation. They propose there is competition between the genes TFF1 and TFF3 which is mediated by transcriptional condensates. This reviewer does not find these claims persuasive as presented. Moreover, the results are not placed in the context of current knowledge.

      Strengths:

      High level of ERalpha expression seems to diminish the transcriptional response. Thus, the results in Fig. 4 have potential insight into ER-mediated transcription. Yet, this observation is not pursued in great depth however, for example with mutagenesis of ERalpha. However, this phenomenon - which falls under the general description of non monotonic dose response - is treated at great depth in the literature (i.e. PMID: 22419778). For example, the result the authors describe in Fig. 4 has been reported and in fact mathematically modeled in PMID 23134774. One possible avenue for improving this paper would be to dig into this result at the single-cell level using deletion mutants of ERalpha or by perturbing co-activators.

      We thank the reviewer for pointing us to the relevant literature on our observation which will enhance the manuscript. We have discussed these findings in relations to ours in the discussion section (Line 400-413). We thank the reviewer for insight on non-monotonic behavior.

      Weaknesses:

      There are concerns with the sm-RNA FISH experiments. It is highly unusual to see so much intronic signal away from the site of transcription (Fig. 2) (PMID: 27932455, 30554876), which suggests to me the authors are carrying out incorrect thresholding or have a substantial amount of labelling background. The Cote paper cited in the manuscript is likewise inconsistent with their findings and is cited in a misleading manner: they see splicing within a very small region away from the site of transcription. 

      We thank the reviewer for this comment, and apologize if they feel we misrepresented the argument from Cote et al. This has now been rectified in the manuscript. However, we do not agree that the intronic signals away from the site of transcription are an artefact. First, the images presented here are just representative 2D projections of 3D Z-stacks; whereas the full 3D stack is used for spot counting using a widely-used algorithm that reports spot counts that are constant over wide range of thresholds (Raj et al., 2008). The veracity of automated counts was first verified initially by comparison to manual counts. Even for the 2D representations the extragenic intronic signals show up at similar thresholds to the transcription sites. 

      The signal is not non-specific arising from background labeling, explained by following reasons:

      • To further support the time-course smFISH data and its interpretation without depending on the dispersed intronic signal, we have analyzed the number of alleles firing/site of transcription at a given time in a cell under the three conditions. We counted the sites of transcription in a given cell and calculated the percentage of cells showing 1,2,3,4 or >4 sites. We see that the percent of cells showing a single site of transcription for TFF1 is very high in uninduced cells and this decreases at 1h. At 1h, the cells showing 2, 3 and 4 sites of transcription increase which again goes down at 3h (Author response image 2A). This agrees with the interpretation made from mean intronic counts away from the site of transcription. Similarly, for TFF3, the number of cells showing 2,3 and 4 sites of transcription increase slightly at 3hr compared to uninduced and 1hr (Author response image 2B).  We can also see that several cells have no alleles firing at a given time as has been quantified in the graphs on right showing total fraction of cells with zero versus non-zero alleles firing (Author response image 2A-B). A non-specific signal would be present in all cells.

      • There is literature on post-transcriptional splicing of RNA beyond our work, which suggests that intronic signal can be found at relatively large distances away from the site of transcription. Waks et al. showed that some fraction of unspliced RNA could be observed up to 6-10 microns away from the site of transcription suggesting that there can be a delay between transcription and (alternative) splicing (Waks et al., 2011). Pannuclear disperse intronic signals can arise as there can be more than one allele firing at a time in different nuclear locations. The spread of intronic transcripts in our images is also limited in cells in which only 1 allele is firing at E2-1 hour (Author response image 2C) or uninduced cells (Author response image 2D). Furthermore, Cote et al. discuss that “Of note, we see that increased transcription level correlates with intron dispersal, suggesting that the percentage of splicing occurring away from the transcription site is regulated by transcription level for at least some introns. This may explain why we observe posttranscriptional splicing of all genes we measured, as all were highly expressed.” This is in line with our interpretation that intron signal dispersal can occur in case of posttranscriptional splicing (Coté et al., 2023). Additionally, other studies have suggested that transcripts in cells do not necessarily undergo co-transcriptional splicing which leads us to conclude that intronic signal can be found farther away from the site of transcription. Coulon et al. showed that splicing can occur after transcript release from the site and suggested that no strict checkpoint exists to ensure intron removal before release which results in splicing and release being kinetically uncoupled from each other (Coulon et al., 2014). Similarly, using live-cell imaging, it was shown that splicing is not always coupled with transcription, and this could depend on the nature and structural features of transcript (such as blockage of polypyrimidine tract which results in delayed recognition) (Vargas et al., 2011). Drexler  et al. showed that as opposed to drosophila transcripts that are shorter, in mammalian cells, splicing of the terminal intron can occur post-transcriptionally (Drexler et al., 2020). Using RNA polymerase II ChIP-Seq time course data from ERα activation in the MCF-7 cells, Honkela et al. showed that large number of genes can show significant delays between the completion of transcription and mRNA production (Honkela et al., 2015). This was attributed to faster transcription of shorter genes which results in splicing  delays suggesting rapid completion of transcription on shorter genes can lead to splicing-associated delays (Honkela et al., 2015). More recently, comparisons of nascent and mature RNA levels suggested a time lapse between transcription and splicing for the genes that are early responders during signaling (Zambrano et al., 2020). The presence of significant numbers of TFF1 nascent RNA in the nucleus in our data corroborates with above observations. 

      • Uniform intensities across many transcripts suggests these are true signal arising from RNA molecules which would not be the case for non-specific, background signal (Author response image 2E).

      • Splicing occurs in the nucleus and intron containing pre-transcripts should be nuclear localized. Thus, intronic signals should remain localized to the nucleus unlike the mature mRNA which translocate to the cytoplasm after processing and thus exonic signals can be found both in the nucleus and the cytoplasm. In keeping with this, we observe no signal in the cytoplasm for the intronic probes and it remains localized within the nucleus as expected and can be seen in Author response image 2F, while exonic signals are observed in both compartments. This suggests to us that the signal is coming from true pre-transcripts. There is no reason for non-specific background labelling to remain restricted to the nucleus.

      • We observe that the mean intronic label counts for both the genes TFF1 and TFF3 increases upon E2-induction compared to uninduced condition (Fig. 2B). Similarly, the mean intronic count for both genes reduce drastically in the TFF1-enhancer deleted cells (Fig. 3C, D). This change in the number of intronic signal specifically on induction and enhancer deletion suggests that the signal is not an artefact and arises from true nascent transcripts that are sensitive to stimulus or enhancer deletion.

      • We expect colocalization of intronic signal with exonic signals in the nucleus, while there can be exonic signals that do not colocalize with intronic, representing more mature mRNA. Indeed, we observe a clear colocalization between the intronic and exonic signals in the nucleus, while exonic signals can occur independent of intronic both in the nucleus and the cytoplasm. This clearly demonstrates that the intronic signals in our experiments are specific and not simply background labelling (Author response image 2G).

      These studies and the arguments above lead us to conclude that the presence of intronic transcripts in the nucleus, away from the site of transcription is not an artefact. We hope the reviewer will agree with us. These analyses have now been included in the manuscript as Supplementary Figure 6 and have been added in the manuscript at line numbers 106-111, 201204,  215-217 and line 231-235. We thank the reviewer for raising this important point.

      Author response image 2.

      Dynamic induction and RNA localization of TFF1 and TFF3 transcription across cell populations using smRNA FISH A. Bar graph depicting the percentage of cells with 1,2,3,4, or greater than 4 sites of transcription for TFF1 (left) is shown. The graph shows the mean of means from different repeats of the experiment, and error bars denote SEM (n>200, N=3). Only the cells with at least one allele firing were counted and cells with no alleles were not included in this. The graph on right shows the number of cells with zero or non-zero number of alleles firing. B. Bar graph depicting the percentage of cells with 1,2,3,4 or greater than 4 sites of transcription for TFF3 (left) is shown. The graph shows the mean of means from different repeats of the experiment, and error bars denote SEM (n>200, N=3). Only the cells with at least one allele firing were counted and cells with no alleles were not included in this. The graph in the middle shows the number of cells with 2,3,4 or greater than 4 sites of transcription for TFF3.The graph on the right shows the number of cells with zero or non-zero number of alleles firing. C. Images from single molecule RNA FISH experiment showing transcripts for InTFF1 in cells induced for 1 hour with E2. The image shows that when a single allele of TFF1 is firing, the transcripts show a more spatially restricted localisation. The scale bar is 5 microns. D. Images from single molecule RNA FISH experiment showing transcripts for InTFF1 in uninduced cells. The image shows that when a single allele of TFF1 is firing and transcription is low, the transcripts show a more spatially restricted localisation. The scale bar is 5 microns. E. Line profile through several transcripts in the nucleus show uniform and similar intensities indicating that these are true signals. F. 60X Representative images from a single molecule RNA FISH experiment showing transcripts for InTFF1 and ExTFF1 (top) and InTFF3 and ExTFF3 (bottom). The image shows that there is no intronic signal in the cytoplasm, while exonic signals can be found both in the nucleus and the cytoplasm. The scale bar is 5 microns. G. 60X Representative images from single molecule RNA FISH experiment showing transcripts for InTFF1 and ExTFF1. The image shows that all intronic signals are colocalized with exonic signals, but all exonic signals are expectedly not colocalized with intronic signals, representing more mature mRNA. The scale bar is 5 microns.

      One substantial way to improve the manuscript is to take a careful look at previous single cell analysis of the estrogen response, which in some cases has been done on the exact same genes (PMID: 29476006, 35081348, 30554876, 31930333). In some of these cases, the authors reach different conclusions than those presented in the present manuscript. Likewise, there have been more than a few studies that have characterized these enhancers (the first one I know of is: PMID 18728018). Also, Oh et al. 2021 (cited in the manuscript) did show an interaction between TFF1e and TFF3, which seems to contradict the conclusion from Fig. 3. In summary, the results of this paper are not in dialogue with the field, which is a major shortcoming. 

      We thank the reviewer for pointing out these important studies. The studies from Prof. Larson group are particularly very insightful (Rodriguez et al., 2019). We have now included this in the discussion (line 106-111 and line 420-424) where we suggest the differences and similarities between our, Larson’s group and also Mancini’s group (Patange et al., 2022; Stossi et al., 2020). 

      The 4C-Seq data from the manuscript Oh et al. 2021 is exactly consistent with our observation from Fig 3 as they also observed little to no interaction between TFF1e and TFF3p in WT cells, only upon TFF1p deletion, did the TFF1e become engaged with the TFF3p. In agreement with this, we also observe little to no interaction between TFF1e and TFF3p in WT cells (Fig.3A). This is also consistent with our competition model for resources between these two genes. Oh et al. shows interaction between TFF1e and TFF3 when the TFF1 promoter is deleted showing that when the primary promoter is not available the enhancer is retargeted to the next available gene (Oh et al., 2021). It does not show that in WT or at any time point of E2 signalling does TFF1e and TFF3 interact.

      In the opinion of this reviewer, there are few - if any - experiments to interrogate the existence of LLPS for diffraction-limited spots such as those associated with transcription. This difficulty is a general problem with the field and not specific to the present manuscript. For example, transient binding will also appear as a dynamic 'spot' in the nucleus, independently of any higher-order interactions. As for Fig. 5, I don't think treating cells with 1,6 hexanediol is any longer considered a credible experiment. For example, there are profound effects on chromatin independent of changes in LLPS (PMID: 33536240).  

      We are cognizant of and appreciate the limitations pointed out by the reviewer. We and others have previously shown that ERa forms condensates on TFF1 chromatin region using ImmunoFISH assay (Saravanan et al., 2020).  The data below shows the relative mean ERα intensity on TFF1 FISH spots and random regions clearly showing an appearance of the condensate at the TFF1 site. Further, the deletion of TFF1e causes the reduction in size of this condensate. Thus, we expect that these ERα condensates are characterized by higher-order interactions and become disrupted on treatment with 1,6-hexanediol. These condensates are the size of below micron as mentioned by the reviewer, but most TF condensates are of the similar sizes. We agree with the reviewer that 1,6- hexanediol treatment is a brute-force experiment with several irreversible changes to the chromatin. Although we have tried to use it at a low concentration for a short period of time and it has been used in several papers (Chen et al., 2023; Gamliel et al., 2022). The opposite pattern of TFF1 vs. TFF3 expression upon 1,6- hexanediol treatment suggests that there is specificity. Further, to perturb condensates, mutants of ERa can be used (N-terminus IDR truncations) however, the transcriptional response of these mutants is also altered due to perturbed recruitment of coactivators that recognize Nterminus of ER, restricting the distinction between ERa functions and condensate formation.

      References:

      Chen, L., Zhang, Z., Han, Q., Maity, B. K., Rodrigues, L., Zboril, E., Adhikari, R., Ko, S.-H., Li, X., Yoshida, S. R., Xue, P., Smith, E., Xu, K., Wang, Q., Huang, T. H.-M., Chong, S., & Liu, Z. (2023). Hormone-induced enhancer assembly requires an optimal level of hormone receptor multivalent interactions. Molecular Cell, 83(19), 3438-3456.e12. https://doi.org/10.1016/j.molcel.2023.08.027

      Coté, A., O’Farrell, A., Dardani, I., Dunagin, M., Coté, C., Wan, Y., Bayatpour, S., Drexler, H. L., Alexander, K. A., Chen, F., Wassie, A. T., Patel, R., Pham, K., Boyden, E. S., Berger, S., Phillips-Cremins, J., Churchman, L. S., & Raj, A. (2023). Post-transcriptional splicing can occur in a slow-moving zone around the gene. eLife, 12. https://doi.org/10.7554/eLife.91357.2

      Coulon, A., Ferguson, M. L., de Turris, V., Palangat, M., Chow, C. C., & Larson, D. R. (2014). Kinetic competition during the transcription cycle results in stochastic RNA processing. eLife, 3, e03939. https://doi.org/10.7554/eLife.03939

      Drexler, H. L., Choquet, K., & Churchman, L. S. (2020). Splicing Kinetics and Coordination Revealed by Direct Nascent RNA Sequencing through Nanopores. Molecular Cell, 77(5), 985-998.e8. https://doi.org/10.1016/j.molcel.2019.11.017

      Gamliel, A., Meluzzi, D., Oh, S., Jiang, N., Destici, E., Rosenfeld, M. G., & Nair, S. J. (2022). Long-distance association of topological boundaries through nuclear condensates. Proceedings of the National Academy of Sciences of the United States of America, 119(32), e2206216119. https://doi.org/10.1073/pnas.2206216119

      Honkela, A., Peltonen, J., Topa, H., Charapitsa, I., Matarese, F., Grote, K., Stunnenberg, H. G., Reid, G., Lawrence, N. D., & Rattray, M. (2015). Genome-wide modeling of transcription kinetics reveals patterns of RNA production delays. Proceedings of the National Academy of Sciences of the United States of America, 112(42), 13115. https://doi.org/10.1073/pnas.1420404112

      Oh, S., Shao, J., Mitra, J., Xiong, F., D’Antonio, M., Wang, R., Garcia-Bassets, I., Ma, Q., Zhu, X., Lee, J.-H., Nair, S. J., Yang, F., Ohgi, K., Frazer, K. A., Zhang, Z. D., Li, W., & Rosenfeld, M. G. (2021). Enhancer release and retargeting activates disease-susceptibility genes. Nature, 595(7869), Article 7869. https://doi.org/10.1038/s41586-021-03577-1

      Patange, S., Ball, D. A., Wan, Y., Karpova, T. S., Girvan, M., Levens, D., & Larson, D. R. (2022). MYC amplifies gene expression through global changes in transcription factor dynamics. Cell Reports, 38(4). https://doi.org/10.1016/j.celrep.2021.110292

      Raj, A., van den Bogaard, P., Rifkin, S. A., van Oudenaarden, A., & Tyagi, S. (2008). Imaging individual mRNA molecules using multiple singly labeled probes. Nature Methods, 5(10), Article 10. https://doi.org/10.1038/nmeth.1253

      Rodriguez, J., Ren, G., Day, C. R., Zhao, K., Chow, C. C., & Larson, D. R. (2019). Intrinsic Dynamics of a Human Gene Reveal the Basis of Expression Heterogeneity. Cell, 176(1–2), 213-226.e18. https://doi.org/10.1016/j.cell.2018.11.026

      Saravanan, B., Soota, D., Islam, Z., Majumdar, S., Mann, R., Meel, S., Farooq, U., Walavalkar, K., Gayen, S., Singh, A. K., Hannenhalli, S., & Notani, D. (2020). Ligand dependent gene regulation by transient ERα clustered enhancers. PLOS Genetics, 16(1), e1008516. https://doi.org/10.1371/journal.pgen.1008516

      Stossi, F., Dandekar, R. D., Mancini, M. G., Gu, G., Fuqua, S. A. W., Nardone, A., De Angelis, C., Fu, X., Schiff, R., Bedford, M. T., Xu, W., Johansson, H. E., Stephan, C. C., & Mancini, M. A. (2020). Estrogeninduced transcription at individual alleles is independent of receptor level and active conformation but can be modulated by coactivators activity. Nucleic Acids Research, 48(4), 1800. https://doi.org/10.1093/nar/gkz1172

      Vargas, D. Y., Shah, K., Batish, M., Levandoski, M., Sinha, S., Marras, S. A. E., Schedl, P., & Tyagi, S. (2011). Single-Molecule Imaging of Transcriptionally Coupled and Uncoupled Splicing. Cell, 147(5), 1054–1065. https://doi.org/10.1016/j.cell.2011.10.024

      Waks, Z., Klein, A. M., & Silver, P. A. (2011). Cell-to-cell variability of alternative RNA splicing. Molecular Systems Biology, 7(1), 506. https://doi.org/10.1038/msb.2011.32

      Zambrano, S., Loffreda, A., Carelli, E., Stefanelli, G., Colombo, F., Bertrand, E., Tacchetti, C., Agresti, A., Bianchi, M. E., Molina, N., & Mazza, D. (2020). First Responders Shape a Prompt and Sharp NF-κB-Mediated Transcriptional Response to TNF-α. iScience, 23(9), 101529. https://doi.org/10.1016/j.isci.2020.101529

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      The authors in this paper investigate the nature of the activity in the rodent EPN during a simple freely moving cue-reward association task. Given that primate literature suggests movement coding whereas other primate and rodent studies suggest mainly reward outcome coding in the EPNs, it is important to try to tease apart the two views. Through careful analysis of behavior kinematics, position, and neural activity in the EPNs, the authors reveal an interesting and complex relationship between the EPN and mouse behavior.

      Strengths:

      (1) The authors use a novel freely moving task to study EPN activity, which displays rich movement trajectories and kinematics. Given that previous studies have mostly looked at reward coding during head-fixed behavior, this study adds a valuable dataset to the literature. (2) The neural analysis is rich and thorough. Both single neuron level and population level (i.e. PCA) analysis are employed to reveal what EPN encodes.

      Thank you very much for this appreciation.

      Weaknesses:

      (1) One major weakness in this paper is the way the authors define the EPN neurons. Without a clear method of delineating EPN vs other surrounding regions, it is not convincing enough to call these neurons EPNs solely from looking at the electrode cannula track from Figure 2B. Indeed, EPN is a very small nucleus and previous studies like Stephenson-Jones et al (2016) have used opto-tagging of Vglut2 neurons to precisely label EPN single neurons. Wallace et al (2017) have also shown the existence of SOM and PV-positive neurons in the EPN. By not using transgenic lines and cell-type specific approaches to label these EPN neurons, the authors miss the opportunity to claim that the neurons recorded in this study do indeed come from EPN. The authors should at least consider showing an analysis of neurons slightly above or below EPN and show that these neurons display different waveforms or firing patterns.

      We thank the reviewer for their comment, and we thank the opportunity to expand on the inclusion criteria of studied units after providing an explanation. 

      As part of another study, we performed experiments recording in EPN with optrodes and photoidentification in PV-Cre animals. We found optoidentified units in both: animals with correct placement (within the EPN) and on those with off-target placement (within the thalamus or medial to the EPN). Thus, despite the use of Cre animals, we relied on histology to ensure correct EPN recording. We believe that the optotagging based purely on neural makers such as PV, SOM, VGLUT, VGAT would not provide a better anatomical delineation of the EPN since adjacent structures are rich in those same markers. The thalamic reticular nucleus is just dorsal to the EPN and it has been shown to express both SOM and PV (Martinez-Garcia et al., 2020). 

      On the other hand, the lateral hypothalamus (just medial to the EPN) also expresses vGlut2 and SOM. Stephenson-Jones (2016), Extended Data Figure 1, panel g, shows vGluT2 and somatostatin labeling of neurons, with important expression of neurons dorsal, ventral and medial to the EPN. Thus, we believe that viral strategies relying on single neuronal markers still depend on careful histological analysis of recording sites.

      A combination of neural markers or more complex viral strategies might be more suitable to delineate the EPN. As an example, for anatomical tracing Stephenson-Jones et al. 2016 performed a rabies-virus based approach involving retrogradely transported virus making use of projection sites through two injections. Two step viral approaches were also performed in Wallace, M. et al. 2017. We attempted to perform a two-step viral approach, using an anterogradely transported Cre-expressing virus (AAV1.hSyn.Cre.WPRE.hGH) injected into the striatum and a second Cre dependent ChR2 into the EPN. However, our preliminary experiments showed that this double viral approach had a stark effect decreasing the performance of animals during the task (we attempted re-training 2-3 weeks after viral infections and animals failed to turn to the contralateral side of the injections). We believe that this approach might have had a toxic effect (Zingg et al., 2017). 

      To this point, a recent paper (Lazaridis et al., 2019) repeated an optogenetic experiment performed in the Stephenson-Jones et al. study, using a set of different viral approaches and concluded that increasing the activity of GPi-LHb is not aversive, as it had been previously reported. Thus, future studies attempting to increase anatomical specificity are a must, but they will require using viral approaches amenable to the behavioral paradigm.

      We attempted to find properties regarding waveforms, firing rate, and firing patterns from units above or below, however, we did not find a marker that could generate a clear demarcation. We show here a figure that includes the included units in this study as well as excluded ones to show that there is a clear overlap.

      Author response image 1.

      Finally, we completely agree with the reviewer in that there is still room for improvement. We have further expanded the Methods section to explain better our efforts to include units recorded within the EPN. Further, we have added a paragraph within the Discussion section to point out this limitation (lines 871-876).

      Methods (lines 116-131):

      “Recordings. Movable microwire bundles (16 microwires, 32 micrometers in diameter, held inside a cannula, Innovative Neurophysiology, Durham, NC)] were stereotaxtically implanted just above the entopeduncular nucleus (-0.8 AP, 1.7 ML, 3.9 DV). Post surgical care included antibiotic, analgesic and antiinflammatory pharmacological treatment. After 5 days of recovery, animals were retrained for 1-2 weeks. Unitary activity was recorded for 2-6 days at each dorsoventral electrode position and the session with the best electrophysiological (signal to noise ratio (>2), stability across time) and behavioral [performance, number of trials (>220)] quality was selected. Microwire electrodes were advanced in 50 micrometer dorsoventral steps for 500 micrometers in total. After experiment completion, animals were perfused with a 4% paraformaldehyde solution. Brains were extracted, dehydrated with a 30% sucrose solution and sectioned in a cryostat into 30micron thick slices. Slices were mounted and photographed using a light microscope. Microwire tracks of the 16-microwire bundle were analyzed (Fig. 2A-B) and only animals with tracks traversing the EPN were selected (6 out of 10). Finally, we located the final position of microwire tips and inferred the dorsoventral recording position of each of the recording sessions. Only units recorded within the EPN were included.” 

      Discussion (lines 871-876):

      “A weakness of the current study is the lack of characterization of neuronal subtypes. An area of opportunity for future research could be to perform photo-identification of neuronal subtypes within the EPN which could contribute to the overall description of the information representation. Further, detailed anatomical viral vector strategies could aid to improve anatomical localization of recordings, reduce reliance on histological examination, and solve some current controversies (Lazaridis et al., 2019).” 

      (2) The authors fail to replicate the main finding about EPN neurons which is that they encode outcome in a negative manner. Both Stephenson-Jones et al (2016) and Hong and Hikosaka (2008) show a reward response during the outcome period where firing goes down during reward and up during neutral or aversive outcome. However, Figure 2 G top panel shows that the mean population is higher during correct trials and lower during incorrect trials. This could be interesting given that the authors might try recording from another part of EPN that has not been studied before. However, without convincing evidence that the neurons recorded are from EPN in the first place (point 1), it is hard to interpret these results and reconcile them with previous studies.

      We really thank the reviewer for pointing out that we need to better explain how EPN units encode outcome. We now provide an additional panel in Figure 4, its corresponding text in the results section (lines 544-562) and a new paragraph in the discussion related to this comment.

      We believe that we do indeed recapitulate findings of both of Stephenson-Jones et al (2016) and Hong and Hikosaka (2008). Both studies focus on a specific subpopulation of GPi/EPN neurons that project to the lateral habenula (LHb). Stephenson-Jones et al (2016) posit that GPi-LHb neurons (which they opto-tag as vGluT2) exhibit a decreased firing rate during rewarding outcomes. Hong and Hikosaka (2008) antidromically identified LHb projecting neurons through within the GPi and found reward positive and reward negative neurons, which were respectively modulated either by increasing or decreasing their firing rate with a rewarding outcome (red and green dots on the x-axis of Figure 5A in their paper).

      As the reviewer pointed out the zScore may be misleading. Therefore, in our study we also decomposed population activity on reward axis through dPCA. When marginalizing for reward in Figure 3F, we find that the weights of individual units on this axis are centered around zero, with positive and negative values (Figure 3F, right panel). Thus, units can code a rewarding outcome as either an increase or a decrease of activity. We show example units of such modulation in Figure 3-1g and h.

      We had segregated our analysis of spatio-temporal and kinematic coding upon the reward coding of units in Figure 4L-M. Yet, following this comment and in an effort of further clarifying this segregation, we introduced panels with the mean zScore of units during outcome evaluation in Figure 4L.

      We amended the main text to better explain these findings (lines 544-562).

      “Previous reports suggest that EPN units that project to the lateral habenula encode reward as a decrease in firing rate. Thus, we wished to ask whether reward encoding units can code kinematic and spatio-temporal variables as well.

      To this end, we first segregated units upon their reward coding properties: reward positive (which increased activity with reward) and reward negative units (which decreased activity with reward). We performed auROC on the 250ms after head entry comparing rewarded trials and incorrect trails (p<0.001, permutation test). Mean activity of reward insensitive, positive and negative units is shown in Fig. 4L. Next, we performed a dimensionality reduction on the coefficients of the model that best explained both contexts (kinematic + spatio-temporal model on pooled data) using UMAP (McInnes et al., 2018). We observe a continuum rather than discrete clusters (Fig. 4L). Note that individual units are color coded according to their responsivity to reward. We did not find a clear clustering either.”  

      Paragraph added in the discussion (lines 749-755):

      “In this study, we found that rewarding outcomes can be represented by EPN units through either an increase or a decrease in firing rate (Fig. 3F, 3-1g-h, 4L). While Stephenson-Jones et al., 2016 found that lateral habenula (LHb)-projecting neurons within the EPN of mice primarily encoded rewarding outcomes by a decrease in firing rate, Hong and Hikosaka, 2008 observed that in primates, LHb-projecting units could encode reward through either a decrease or an increase in firing rate. Thus, our results align more closely with the latter study, which also employed an operant conditioning task.”

      (3) The authors say that: 'reward and kinematic doing are not mutually exclusive, challenging the notion of distinct pathways and movement processing'. However, it is not clear whether the data presented in this work supports this statement. First, the authors have not attempted to record from the entire EPN. Thus it is possible that the coding might be more segregated in other parts of EPN. Second, EPNs have previously been shown to display positive firing for negative outcomes and vice versa, something which the authors do not find here. It is possible that those neurons might not encode kinematic and movement variables. Thus, the authors should point out in the main text the possibility that the EPN activity recorded might be missing some parts of the whole EPN.

      We thank the reviewer for the opportunity to expand on this topic. We believe it is certainly possible that other not-recorded regions of the EPN might exhibit greater segregation of reward and kinematics. However, we considered it worthwhile pointing out that from the dataset collected in this study reward-sensitive units encode kinematics in a similar fashion to reward-insensitive ones (Fig. 4L,M). Moreover, we asked specifically whether reward-negative units (that decrease firing rate with rewarding outcomes, as previously reported) could encode kinematics and spatio-temporal variables with different strength than reward-insensitive ones and could not find significant differences (Fig. 4M).

      We did indeed find units that displayed decreased firing rate upon rewarding outcomes, as has been previously reported. We have addressed this fact more thoroughly in point (2). 

      Finally, we agree with the reviewer that the dataset collected in this study is by no means exhaustive of the entire EPN and have thus included a sentence pointing this out in the Discussion section (lines 805-806):

      “Given that we did not record from the entire EPN, it is still possible that another region of the nucleus might exhibit more segregation.”

      (4) The authors use an IR beam system to record licks and make a strong claim about the nature of lick encoding in the EPN. However, the authors should note that IR beam system is not the most accurate way of detecting licks given that any object blocking the path (paw or jaw-dropping) will be detected as lick events. Capacitance based, closed-loop detection, or video capturing is better suited to detect individual licks. Given that the authors are interested in kinematics of licking, this is important. The authors should either point this out in the main text or verify in the system if the IR beam is correctly detecting licks using a combination of those methods.

      We thank the reviewer for the opportunity of clarifying the lick event acquisition. We have experience using electrical alternatives to lickometers; however, we believe they were not best suited to this application. Closed-loop lickometers generally use a metallic grid upon which animals stand so that the loop can be closed; however, we wanted to have a transparent floor. We have found capacitance based lickometers to be useful in head-fixed conditions but have noticed that they are very dependent on animal position and proximity of other bodyparts such as limbs. Given the freely moving aspect of the task this was difficult to control. Finally, both electric alternatives for lickometers are more prone to noise and may introduce electrical artifacts that might contaminate the spiking signal. This is why we opted to use a slit in combination with an IR beam that would only fit the tongue and that forced enough protrusion such that individual licks could be monitored. Further, the slit could not fit other body-parts like the paw or jaw. We have now included a video (Supp. Video 2) showing a closeup of this behavior that better conveys how the jaw and paw do not fit inside the slit. The following text has been added in the corresponding methods section (lines 97-98):

      “The lickometer slit was just wide enough to fit the tongue and deep enough to evoke a clear tongue protrusion.”

      Reviewer #1 (Recommendations For The Authors):

      (1)The authors should verify using opto-tagging of either Vglut2, SOM, or PV neurons whether they can see the same firing pattern. If not, the authors should address this weakness in the paper.

      We thank the reviewer for this important point, we have provided a more detailed reply above.

      (2)The way dPCA or PCA is applied to the data is not stated at all in the main text. Are all units from different mice combined? Or applied separately for each mouse? How does that affect the interpretation of the data? At least a brief text should be included in the main text to guide the readers.

      We thank the reviewer for pointing out this important omission. We have included an explanation in the Methods section and in the Main text.

      Methods (lines 182-184):

      “For all population level analyses individual units recorded from all sessions and all animals were pooled to construct pseudo-simultaneous population response of combined data mostly recorded separately.”

      Main text (lines 397-399):

      “For population level analyses throughout the study, we pooled recorded units from all animals to construct a pseudo-simultaneous population.”

      Discussion (lines 729-730):

      “…(from pooled units from all animals to construct a pseudo-simultaneous population, which assumes homogeneity across subjects)”

      (3) The authors argue that they do not find 'value coding' in this study. However, the authors never manipulate reward size or probability, but only the uncertainty or difficulty of the task. This might be better termed 'difficulty', and it is difficult to say whether this correlates with value in this task. For instance, mice might be very confident about the choice, even for an intermediate frequency sweep, if the mouse had waited long enough to hear the full sweep. In that case, the difficulty would not correlate with value, given that the mouse will think the value of the port it is going to is high. Thus, authors should avoid using the term value.

      We agree with the reviewer. We have modified the text to specify that difficulty was the variable being studied and added the following sentence in the Discussion (lines 747-748):

      “It is still possible that by modifying reward contingencies such as droplet size value coding could be evidenced.”

      (4) How have the authors obtained Figure 7D bottom panel? It is unclear at all what this correlation represents. Are the authors looking at a correlation between instantaneous firing rate and lick rate during a lick bout?

      We thank the reviewer for pointing out that omission. It is indeed correlation coefficient between the instantaneous firing rate and the instantaneous lick rate for a lick bout. We have included labeling in Figure 7D and pointed this out in the main text [lines 680-681]:

      “Fig.7D, lower panel shows the correlation coefficient between the instantaneous firing rate and the instantaneous lick rate within a lick bout for all units.”

      Reviewer #2 (Public Review):

      This paper examined how the activity of neurons in the entopeduncular nucleus (EPN) of mice relates to kinematics, value, and reward. The authors recorded neural activity during an auditory-cued two-alternative choice task, allowing them to examine how neuronal firing relates to specific movements like licking or paw movements, as well as how contextual factors like task stage or proximity to a goal influence the coding of kinematic and spatiotemporal features. The data shows that the firing of individual neurons is linked to kinematic features such as lick or step cycles. However, the majority of neurons exhibited activity related to both movement types, suggesting that EPN neuronal activity does not merely reflect muscle-level representations. This contradicts what would be expected from traditional action selection or action specification models of the basal ganglia.

      The authors also show that spatiotemporal variables account for more variability compared to kinematic features alone. Using demixed Principal Component Analysis, they reveal that at the population level, the three principal components explaining the most variance were related to specific temporal or spatial features of the task, such as ramping activity as mice approached reward ports, rather than trial outcome or specific actions. Notably, this activity was present in neurons whose firing was also modulated by kinematic features, demonstrating that individual EPN neurons integrate multiple features. A weakness is that what the spatiotemporal activity reflects is not well specified. The authors suggest some may relate to action value due to greater modulation when approaching a reward port, but acknowledge action value is not well parametrized or separated from variables like reward expectation.

      We thank the reviewer for the comment. We indeed believe that further exploring these spatiotemporal signals is important and will be the subject of future studies.

      A key goal was to determine whether activity related to expected value and reward delivery arose from a distinct population of EPN neurons or was also present in neurons modulated by kinematic and spatiotemporal features. In contrast to previous studies (Hong & Hikosaka 2008 and Stephenson-Jones et al., 2016), the current data reveals that individual neurons can exhibit modulation by both reward and kinematic parameters. Two potential differences may explain this discrepancy: First, the previous studies used head-fixed recordings, where it may have been easier to isolate movement versus reward-related responses. Second, those studies observed prominent phasic responses to the delivery or omission of expected rewards - responses largely absent in the current paper. This absence suggests a possibility that neurons exhibiting such phasic "reward" responses were not sampled, which is plausible since in both primates and rodents, these neurons tend to be located in restricted topographic regions. Alternatively, in the head-fixed recordings, kinematic/spatial coding may have gone undetected due to the forced immobility.

      Thank you for raising this point. Nevertheless, there is some phasic activity associated with reward responses, which can be seen in the new panel in Figure 4L.

      Overall, this paper offers needed insight into how the basal ganglia output encodes behavior. The EPN recordings from freely moving mice clearly demonstrate that individual neurons integrate reward, kinematic, and spatiotemporal features, challenging traditional models. However, the specific relationship between spatiotemporal activity and factors like action value remains unclear.

      We really appreciate this reviewer for their valuable comments.

      Reviewer #2 (Recommendations For The Authors):

      One small suggestion is to make sure that all the panels in the figures are well annotated. I struggled in places to know what certain alignments or groupings meant because they were not labelled. An example would be what do the lines correspond to in the lower panels of Figure 2D and E. I could figure it out from other panels but it would have helped if each panel had better labelling.

      Thanks for pointing this out, we have improved labelling across the figures and corrected the specific example you have pointed out.

      The paper is very nice though. Congratulations!

      Thank you very much.

      Editor's note:

      Should you choose to revise your manuscript, please include full statistical reporting including exact p-values wherever possible alongside the summary statistics (test statistic and df) and 95% confidence intervals. These should be reported for all key questions and not only when the p-value is less than 0.05 in the main manuscript.

      We thank the editor for the comment. A statistics table has been added.

      References:

      Lazaridis, I., Tzortzi, O., Weglage, M., Märtin, A., Xuan, Y., Parent, M., Johansson, Y., Fuzik, J., Fürth, D., Fenno, L. E., Ramakrishnan, C., Silberberg, G., Deisseroth, K., Carlén, M., & Meletis, K. (2019). A hypothalamus-habenula circuit controls aversion. Molecular Psychiatry, 24(9), 1351–1368. https://doi.org/10.1038/s41380-019-0369-5

      Martinez-Garcia, R. I., Voelcker, B., Zaltsman, J. B., Patrick, S. L., Stevens, T. R., Connors, B. W., & Cruikshank, S. J. (2020). Two dynamically distinct circuits drive inhibition in the sensory thalamus. Nature, 583(7818), 813–818. https://doi.org/10.1038/s41586-0202512-5

      McInnes, L., Healy, J., Saul, N., & Großberger, L. (2018). UMAP: Uniform Manifold Approximation and Projection. Journal of Open Source Software, 3(29), 861. https://doi.org/10.21105/joss.00861

      Zingg, B., Chou, X. lin, Zhang, Z. gang, Mesik, L., Liang, F., Tao, H. W., & Zhang, L. I. (2017). AAV-Mediated Anterograde Transsynaptic Tagging: Mapping Corticocollicular Input-Defined Neural Pathways for Defense Behaviors. Neuron, 93(1), 33–47. https://doi.org/10.1016/j.neuron.2016.11.045

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This paper reports valuable results regarding the potential role and time course of the prefrontal cortex in conscious perception. Although the sample size is small, the results are clear and convincing, and strengths include the use of several complementary analysis methods. The behavioral test includes subject report so the results do not allow for distinguishing between theories of consciousness; nevertheless, results do advance our understanding of the contribution of prefrontal cortex to conscious perception. We appreciate very much for editor and reviewers encouraged review opinion. Particularly, we thank three reviewers very much for their professional and constructive comments that help us to improve the manuscript substantially.

      Public Reviews:

      Reviewer #1 (Public Review):

      This is a clear and rigorous study of intracranial EEG signals in the prefrontal cortex during a visual awareness task. The results are convincing and worthwhile, and strengths include the use of several complementary analysis methods and clear results. The only methodological weakness is the relatively small sample size of only 6 participants compared to other studies in the field. Interpretation weaknesses that can easily be addressed are claims that their task removes the confound of report (it does not), and claims of primacy in showing early prefrontal cortical involvement in visual perception using intracranial EEG (several studies already have shown this). Also the shorter reaction times for perceived vs not perceived stimuli (confident vs not confident responses) has been described many times previously and is not a new result.

      We appreciate very much for the reviewer’s encouraged opinion. We are going to address reviewer’s specific questions and comments point-by-point in following.

      ‘The only methodological weakness is the relatively small sample size of only 6 participants compared to other studies in the field.’

      We agree that the sample size is relatively small in the present study. To compensate such shortcoming, we rigorously verified each result at both individual and population levels, resembling the data analysis method in non-human primate study.

      Interpretation weaknesses that can easily be addressed are claims that their task removes the confound of report (it does not),

      Thank you very much for your comment. We agree that our task does not remove the confound of report entirely. However, we believe that our task minimizes the motor confounds by dissociating the emergence of awareness from motor in time and balanced direction of motor between aware and unaware conditions. We have modified the text according to reviewer’s comment in the revised manuscript as following: “This task removes the confound of motor-related activity”.

      ..and claims of primacy in showing early prefrontal cortical involvement in visual perception using intracranial EEG (several studies already have shown this).

      We agree that several iEEG studies, including ERP and HFA, have shown the early involvement of prefrontal cortical in visual perception. However, in these studies, the differential activity between conscious and unconscious conditions was not investigated, thus, the activity in prefrontal cortex might be correlated with unconscious processing, rather than conscious processing. In present study, we compared the neural activity in PFC between conscious and unconscious trials, and found the correlation between PFC activity and conscious perception. Although one iEEG study(Gaillard et al., 2009) reported awareness-specific PFC activation, the awareness-related activity started 300 ms after the onset of visual stimuli, which was ~100 ms later than the early awareness related activity in our study. Also, due to the limited number of electrodes in the previous study (2 patients with 19 recording sites mostly in mesiofrontal and peri-insular regions), it was restricted while exploring the awareness-related activity in PFC. In the present study, the number of recording sites (245) were much more than previous study and covered multiple areas in PFC. Our results further show earlier awareness-related activity (~ 200 ms after visual stimuli onset), including ERP, HFA and PLV, which sheds new light on understanding of the role of PFC in conscious perception.

      We have added this discussion in the MS (lines 522-536);

      Also the shorter reaction times for perceived vs not perceived stimuli (confident vs not confident responses) has been described many times previously and is not a new result. Thank you very much for your comment. We agree that the reaction time is strongly modulated by the confident level, which has been described previously (Broggin, Savazzi, & Marzi, 2012; Marzi, Mancini, Metitieri, & Savazzi, 2006). However, in previous studies, the confident levels were usually induced by presenting stimulus with different physical property, such as spatial frequency, eccentricity and contrast. It is well known that the more salient stimuli will induce the faster process of visual information and speed up the process of visuomotor transformation, eventually shorten the reaction time (Corbetta & Shulman, 2002; Posner & Petersen, 1990). Therefore, the dependence of visual processing on the salience of visual stimulus confounds with the effect of visual awareness on the reaction time, which is hard to attribute the shorter reaction time in more salient condition purely to visual awareness. In contrast, we create a condition (near perceptual threshold) in the present study, in which the saliency (contrast) of visual stimulus is very similar in both aware and unaware conditions in order to eliminate the influence of stimulus saliency in reaction time. We think that the difference in reaction time in our study is mainly due to the modulation of awareness state, which was not reported previously.

      We have added the discussion in the MS (lines 497-507).

      Reviewer #1 (Recommendations For The Authors):

      Specific comments follow:

      Abstract: "we designed a visual awareness task that can minimize report-related confounding" and in the Introduction lines 112-115: "Such a paradigm can effectively dissociate awareness-related activity from report-related activity in terms of time... and report behavior"; Discussion lines 481-483 "even after eliminating the influence of the confounding variables related to subjective reports such as motion preparation" and other similar statements in the manuscript should be removed. The task involves report using eye movements with every single stimulus. The fact that there is report for both perceived and not perceived stimuli, that the direction of report is not determined until the time of report, and that there is delay between stimulus and report, does not remove the report-related post-perceptual processing that will inevitably occur in a task where overt report is required for every single trial. For example, brain activity related to planning to report perception will only occur after perceived trials, regardless of the direction of eye movement later decided upon. This preparation to respond is different for perceived and not perceived stimuli, but is not part of the perception itself. In this way the current task is not at all unique and does not substantially differ from many other report-based tasks used previously.

      The objective of present study is to assess whether PFC is involved in the emergence of visual awareness. To do so, it is crucial to determine the subjective awareness state as correct as possible. Considering the disadvantage of non-report paradigms in determining the subjective awareness state (Tsuchiya et al. TiCS, 2015; Mashour et al, Neuron, 2020), we employed a balanced report paradigm. It has been argued (Merten & Nieder, PNAS, 2011) that, in the balanced report paradigms, subjects could not prepare any motor response during the delay period because only the appearance of a rule cue (change color of fixation point at the end of delay period) informed subjects about the appropriate motor action. In this case, the post-perceptual processing during delay period might reflect the non-motor cognitive activity. Alternatively, as being mentioned by reviewer, the post-perceptual processing might relate to planning to report perception, which is different for perceived and not perceived stimuli. Therefore, up to date, the understanding of the post-perceptual processing remains controversial. According to reviewer’s comment, we have modified the description of our task as following: “we designed a visual awareness task that can minimize report-related motor confounding”. Also, have changed “report-related” to “motorrelated” in the text of manuscript.

      Figures 3, 4 changes in posterior middle frontal gyri suggest early frontal eye field involvement in perception. This should be interpreted in the context of many previous studies showing FEF involvement in signal detection. The authors claim that "earlier visual awareness related activities in the prefrontal cortex were not found in previous iEEG studies, especially in the HG band" on lines 501-502 of the Discussion. This statement is not true and should be removed. The following statement in the Discussion on lines 563-564 should be removed for the same reasons: "our study detected 'ignition' in the human PFC for the first time." Authors should review and cite the following studies as precedent among others:

      Blanke O, Morand S, Thut G, Michel CM, Spinelli L, Landis T, Seeck M (1999) Visual activity in the human frontal eye field. Neuroreport 10 (5):925-930. doi:10.1097/00001756-19990406000006

      Foxe JJ, Simpson GV (2002) Flow of activation from V1 to frontal cortex in humans. A framework for defining "early" visual processing. Exp Brain Res 142 (1):139-150. doi:10.1007/s00221-001-0906-7

      Gaillard R, Dehaene S, Adam C, Clemenceau S, Hasboun D, Baulac M, Cohen L, Naccache L (2009) Converging intracranial markers of conscious access. Plos Biology 7 (3):e61

      Gregoriou GG, Gotts SJ, Zhou H, Desimone R (2009) High-frequency, long-range coupling between prefrontal and visual cortex during attention. Science 324:1207-1210

      Herman WX, Smith RE, Kronemer SI, Watsky RE, Chen WC, Gober LM, Touloumes GJ, Khosla M, Raja A, Horien CL, Morse EC, Botta KL, Hirsch LJ, Alkawadri R, Gerrard JL, Spencer DD, Blumenfeld H (2019) A Switch and Wave of Neuronal Activity in the Cerebral Cortex During the First Second of Conscious Perception. Cereb Cortex 29 (2):461-474.

      Khalaf A, Kronemer SI, Christison-Lagay K, Kwon H, Li J, Wu K, Blumenfeld H (2022) Early neural activity changes associated with stimulus detection during visual conscious perception. Cereb Cortex. doi:10.1093/cercor/bhac140

      Kwon H, Kronemer SI, Christison-Lagay KL, Khalaf A, Li J, Ding JZ, Freedman NC, Blumenfeld H (2021) Early cortical signals in visual stimulus detection. Neuroimage 244:118608.

      We agree that several iEEG studies, including ERP and HFA, have shown the early involvement of prefrontal cortical in visual perception. However, in these studies, the differential activity between conscious and unconscious conditions was not investigated, thus, the activity in prefrontal cortex might be correlated with unconscious processing, rather than conscious processing. In present study, we compared the neural activity in PFC between conscious and unconscious trials, and found the correlation between PFC activity and conscious perception. Although one iEEG study reported awareness-specific PFC activation, the awareness-related activity started 300 ms after the onset of visual stimuli, which was ~100 ms later than the early awareness related activity in our study. Also, due to the limited number of electrodes in the previous study (2 patients with 19 recording sites mostly in mesiofrontal and peri-insular regions), it was restricted while exploring the awareness-related activity in PFC. In the present study, the number of recording sites (245) were much more than previous study and covered multiple areas in PFC. Our results further show earlier awareness-related activity (~ 200 ms after visual stimuli onset), including ERP, HFA and PLV, which sheds new light on understanding of the role of PFC in conscious perception.

      We have added this discussion in the MS (lines 522-533);

      Minor weakness that should be mentioned in the Discussion: The intervals for the FP (fixation period) and Delay period were both fixed at 600 ms instead of randomly jittered, so that subjects likely had anticipatory activity predictably occurring with each grating and cue stimulus.

      Thank you very much for your comment. We agree that subjects might have anticipatory activity during experiment. Actually, the goal for us to design the task in this way is to try to balance the effect of attention and anticipation between aware and unaware conditions. We have added this discussion in the MS (lines 467-469);

      The faster reaction times for perceived/confident responses vs not perceived/unconfident responses has been reported many times previously in the literature and should be acknowledged rather than being claimed as a novel finding. Authors should modify p. 163 lines 160-162, first sentence of the Discussion lines 445-446 "reaction time.. shorter" claiming this was a novel finding; same for lines 464-467. Please see the following among others:

      Broggin E, Savazzi S, Marzi CA (2012) Similar effects of visual perception and imagery on simple reaction time. Q J Exp Psychol (Hove) 65 (1):151-164. doi:10.1080/17470218.2011.594896

      Chelazzi L, Marzi CA, Panozzo G, Pasqualini N, Tassinari G, Tomazzoli L (1988) Hemiretinal differences in speed of light detection in esotropic amblyopes. Vision Res 28 (1):95-104 Marzi CA, Mancini F, Metitieri T, Savazzi S (2006) Retinal eccentricity effects on reaction time to imagined stimuli. Neuropsychologia 44 (8):1489-1495. doi:10.1016/j.neuropsychologia.2005.11.012

      Posner MI (1994) Attention: the mechanisms of consciousness. Proceedings of the National Academy of Sciences of the United States of America 91 (16):7398-7403

      Sternberg S (1969) Memory-scanning: mental processes revealed by reaction-time experiments. Am Sci 57 (4):421-457

      Thanks. We have cited some of these papers in the revised manuscript due to the restricted number of citations.

      Methods lines 658-659: "results under LU and HA conditions were classified as the control group and were only used to verify and check the results during calculation." However the authors show these results in the figures and they are interesting. HA stimuli show earlier responses than NA stimuli. This is a valuable result which should be discussed and interpreted in light of the other findings.

      We thank very much for reviewer’s comment. We have made discussion accordingly in the revised MS (lines 535-536).

      General comment on figures: Many of the figure elements are tiny and the text labels and details can't be seen at all, especially single trial color plots, and the brain insets showing recording sites.

      We have modified the figures accordingly.

      Other minor comments: Typo: Figure 2 legend, line 169 "The contrast level resulted in an awareness percentage greater than 25%..." is missing a word and should say instead something like "The contrast level that resulted in an awareness percentage greater than 25%..."

      Thanks. We have corrected the typo accordingly.

      Figure 2 Table description in text line 190 says "proportions of recording sites" but the Table only shows number of recording sites and number of subjects, not "proportions." This should be corrected in the text.

      Thanks. We have corrected the error.

      Figure 3, and other figures, should always label the left and right hemispheres to avoid ambiguity.

      Thanks. We have made correction accordingly. In caption of Figure 2D (line 189), we modified the sentence as ‘In all brain images, right side of the image represents the right side of the brain’.

      Methods line 666. The saccadic latency calculations paragraph should have a separate heading before it, to separate it from the Behavioral data analysis section.

      Thanks. It has been corrected in line 725.

      Reviewer #2 (Public Review):

      The authors attempt to address a long-standing controversy in the study of the neural correlates of visual awareness, namely whether neurons in prefrontal cortex are necessarily involved in conscious perception. Several leading theories of consciousness propose a necessary role for (at least some sub-regions of) PFC in basic perceptual awareness (e.g., global neuronal workspace theory, higher order theories), while several other leading theories posit that much of the previously reported PFC contributions to perceptual awareness may have been confounded by task-based cognition that co-varied between the aware and unaware reports (e.g., recurrent processing theory, integrated information theory). By employing intracranial EEG in human patients and a threshold detection task on low-contrast visual stimuli, the authors assessed the timing and location of neural populations in PFC that are differentially activated by stimuli that are consciously perceived vs. not perceived. Overall, the reported results support the view that certain regions of PFC do contribute to visual awareness, but at time-points earlier than traditionally predicted by GNWT and HOTs.

      Reply: We appreciate very much for the reviewer’s encouraged opinion.

      Major strengths of this paper include the straightforward visual threshold detection task including the careful calibration of the stimuli and the separate set of healthy control subjects used for validation of the behavioral and eye tracking results, the high quality of the neural data in six epilepsy patients, the clear patterns of differential high gamma activity and temporal generalization of decoding for seen versus unseen stimuli, and the authors' interpretation of these results within the larger research literature on this topic. This study appears to have been carefully conducted, the data were analyzed appropriately, and the overall conclusions seem warranted given the main patterns of results.

      Reply: We appreciate very much for the reviewer’s encouraged opinion.

      Weaknesses include the saccadic reaction time results and the potential flaws in the design of the reporting task. This is not a "no report" paradigm, rather, it's a paradigm aimed at balancing the post-perceptual cognitive and motor requirements between the seen and unseen trials. On each trial, subjects/patients either perceived the stimulus or not, and had to briefly maintain this "yes/no" judgment until a fixation cross changed color, and the color change indicated how to respond (saccade to the left or right). Differences in saccadic RTs (measured from the time of the fixation color change to moving the eyes to the left or right response square) were evident between the seen and unseen trials (faster for seen). If the authors' design achieved what they claim on page 3, "the report behaviors were matched between the two awareness states ", then shouldn't we expect no differences in saccadic RTs between the aware and unaware conditions? The fact that there were such differences may indicate differences in post-perceptual cognition during the time between the stimulus and the response cue. Alternatively, the RT difference could reflect task-strategies used by subjects/patients to remember the response mapping rules between the perception and the color cue (e.g., if the YES+GREEN=RIGHT and YES+RED=LEFT rules were held in memory, while the NO mappings were inferred secondarily rather than being actively held in memory). This saccadic RT result should be better explained in the context of the goals of this particular reporting-task.

      The objective of present study is to assess whether PFC is involved in the emergence of visual awareness. To do so, it is crucial to determine the subjective awareness state as correct as possible. Considering the disadvantage of non-report paradigms in determining the subjective awareness state (Tsuchiya et al, TiCS, 2015; Mashour et al, Neuron, 2020), we employed a balanced report paradigm. It has been argued (Merten & Nieder, PNAS, 2011) that, in the balanced report paradigms, subjects could not prepare any motor response during the delay period because only after the appearance of a rule cue (change color of fixation point at the end of delay period) subjects were informed about the appropriate motor action. In this case, the post-perceptual processing during delay period might reflect the non-motor cognitive activity, such as working memory (Mashour et al. Neuron, 2020). Alternatively, as being mentioned by reviewer, the postperceptual processing might relate to planning to report perception, which is different for perceived and not perceived stimuli (Aru et al. Neurosci Biobehav Rev, 2012 ). Therefore, up to date, the understanding of the post-perceptual processing remains controversial. Considering reviewer’s comment together with other opinions, we have modified the description of our task as following: “we designed a visual awareness task that can minimize report-related motor confounding”. Also, we have changed “report-related” to “motor-related” in the rest of manuscript.

      Regarding the question whether the saccadic RT in our balanced response paradigm should be expected to be similar between aware and unaware condition, we think that the RT should be similar in case if the delay period is long enough for the decision of “no” to be completed. In fact, in a previous study (Merten & Nieder, PNAS, 2011), the neuronal encoding of “no” decision didn’t appear until 2s after the stimulus cue onset. However, in our task, the delay period lasted only 600 ms that was long enough to form the “yes” decision, but was not enough to form the “no” decision. It might be the reason that our data show shorter RT in aware condition than in unaware condition.

      We totally agree reviewer’s comment about the alternative interpretation for RT difference between aware and unaware condition in our study, i.e., reflecting task-strategies used by subjects/patients to remember the response mapping rules between the perception and the color cue (e.g., if the YES+GREEN=RIGHT and YES+RED=LEFT rules were held in memory, while the NO mappings were inferred secondarily rather than being actively held in memory). We have made additional discussion about these questions in the revised manuscript (lines 492496).

      Nevertheless, the current results do help advance our understanding of the contribution of PFC to visual awareness. These results, when situated within the larger context of the rapidly developing literature on this topic (using "no report" paradigms), e.g., the recent studies by Vishne et al. (2023) Cell Reports and the Cogitate consortium (2023) bioRxiv, provide converging evidence that some sub-regions of PFC contribute to visual awareness, but at latencies earlier than originally predicted by proponents of, especially, global neuronal workspace theory.

      We appreciate very much for the reviewer’s encouraged opinion.

      Reviewer #2 (Recommendations For The Authors):

      Abstract: "the spatiotemporal overlap between the awareness-related activity and the interregional connectivity in PFC suggested that conscious access and phenomenal awareness may be closely coupled." I strongly suggest revising this sentence. The current results cannot be used to make such a broad claim about p-consciousness vs. a-consciousness. This study used a balanced trial-by-trial report paradigm, which can only measure conscious access.

      We thank reviewer for this comment. We have withdrawn this sentence from the revised manuscript.

      Task design: A very similar task was used previously by Schröder et al. (2021) J Neurosci. See specifically, their Figure 1, and Figure 4B-C. Using almost the exact same "matching task", the authors of this previous study show that they get a P3b for both the perceived and not-perceived conditions, confirming that post-perceptual cognition/report confounds were not eliminated, but instead were present in (and balanced between) both the perceived/not-perceived trials due to the delayed matching aspect of the design. This previous paper should be cited and the P3b result should be considered when assessing whether cognition/report confounds were addressed in the current study.

      Thank you very much for your reminding about the study of Schröder et al. We are sorry for not citing this closely related study in our previous manuscript. Schröder et al. found while P3b showed significant difference between perceived and not-perceived trials in direct report task, the P3b was presented in both perceived/not-perceived trials and not significantly different in the matched task. Based on these findings, Schröder et al. argued that P3b represented the task specific post-perceptual cognition/report rather than the emergence of awareness per se. Considering the similarity of tasks between Schröder et al. and ours, we agree that our task is not able to totally eliminate the confound of post-perceptual cognition/report related activity with awareness related activity. Nevertheless, our task is able to minimize the confound of motorrelated activity with the emergence of awareness by separating them in time and balancing the direction of responsive movements. Therefore, we modified the term of “report-related” to “motor-related” in the text of revised manuscript.

      On page 2, lines 71-75, the authors' review of the Frassle et al. (2014) experiment should be revised for accuracy. In this study, all PFC activity did not disappear as the authors claim. Also, the main contrast in the Frassle et al. study was rivalry vs. replay. However, in both of these conditions, visual awareness was changing with the main difference being whether there was sensory conflict between the two eyes or not. Such a contrast would presumably subtract out the common activity patterns related to visual awareness changes, while isolating rivalry (and the resulting neural competition) vs. non-rivalry (and the lack of such competition) which is not broadly relevant for the goal of measuring neural correlates of visual awareness which are present in both sides of the contrast (rivalry and replay).

      Thank you very much for your suggestion. We agree that and revised in the MS (lines 71-76).

      ‘For instance, a functional magnetic resonance imaging (fMRI) study employing human binocular rivalry paradigms found that when subjects need to manually report the changing of their awareness between conflict visual stimuli, the frontal, parietal, and occipital lobes all exhibited awareness-related activity. However, when report was not required, awareness-related activation was largely diminished in the frontal lobe but remained in the occipital and parietal lobes’

      On page 2, lines 76-78, the authors write, "no-report paradigm may overestimate unconscious processing because it cannot directly measure the awareness state". This should be reworded for clarity, as report paradigms also do not "directly measure the awareness state". All measures of awareness are indirect, either via subjects verbal or manual reports, or via behaviors or other physiological measures like OKN, pupillometry, etc. It's also not clear as written why no-report paradigms might overestimate unconscious processing.

      Thank you very much for your suggestion. We agreed and modified the description. In lines 76-80:

      ‘Nevertheless, the no-report paradigm may overestimate the neural correlates of awareness by including unconscious processing, because it infers the awareness state through other relevant physiological indicators, such as optokinetic nystagmus and pupil size(Tsuchiya, Wilke, Frassle, & Lamme, 2015). In the absence of subjective reports, it remains controversial regarding whether the presented stimuli are truly seen or not.’

      However, the no-report paradigm may overestimate the neural correlates of awareness, because it infers the awareness state through other relevant physiological indicators, such as optokinetic nystagmus and pupil size(Tsuchiya et al., 2015) , in the absence of subjective reports and it remains controversial that whether the stimuli presented in such paradigm are truly seen as opposed to being merely potentially visible but unattended.

      On page 5, line 155, there is a typo. This should be Figure 2C, not 2B.

      Thanks. We have modified the description.

      On page 5, lines 160-162, the authors state, "The results showed that the saccadic reaction time in the aware trials was systematically shorter than that in the unaware trials. Such results demonstrate that visual awareness significantly affects the speed of information processing in the brain." I don't understand this. If subjects can never make a saccade until the fixation cross changes color, both for Y and N decisions, why would a difference in saccadic reaction times indicate anything about visual awareness affecting the speed of information processing in the brain? Doesn't this just show that the Red/Green x Left/Right response contingencies were easier to remember and execute for the Yes-I-did-see-it decisions compared to the No-I-didn't-see-it decisions?

      We agree and have made additional discussion about these questions in the revised manuscript (lines 492-496).

      ‘An alternative interpretation for RT difference between aware and unaware condition in our study is that the difference in task-strategies used by subjects/patients to remember the response mapping rules between the perception and the color cue (e.g., if the YES+GREEN=RIGHT and YES+RED=LEFT rules were held in memory, while the NO mappings were inferred secondarily rather than being actively held in memory).’

      In Figure 3B (and several other figures) due to the chosen view and particular brain visualization used, many readers will not know whether the front of brain is up and back of brain down or vise versa (there are no obvious landmarks like the cerebellum, temporal sulcus, etc.). I suggest specifying this in the caption or better yet on the figure itself.

      Thanks. We have added these descriptions in the caption of Figure 2D.

      Line 189 ‘In all brain images, right and up sides of each image represent the right and up sides of the brain’.

      In Figure 3B, the color scale may confuse some readers. When I first inspected this figure, I immediately thought the red meant positive voltage or activation, while the blue meant negative voltage or deactivation. Only later, I realized that any color here is meaningful. Not sure if an adjustment of the color scale might help, or perhaps not normalizing (and not taking absolute values of the voltage diffs, but maintaining the +/- diffs)?

      Thanks for reviewer’s comment. We are sorry for not clearly describing the reason why we normalized the activity in absolute value and chose the color scale from 0 to 20. The major reason is that it is not clearly understood so far regarding the biological characteristics of LFP polarity (Einevoll et al, Nat Rev Neurosci, 2013). To simplify such complex issue, we consider the change in magnitude of LFP during delay period in our task represents awareness related activity, regardless its actual value being positive or negative. Therefore, we first calculated the absolute value of activity difference between aware and unaware trials in individual recording site, then used Shepard's method (see Method for detailed information) to calculate the activity in each vertex and projected on the surface of brain template as shown in Fig. 3B.

      We have added the description in the MS (lines 794-800).

      We have tried to adjust the color scale from -20 to 20 according to reviewer’s suggestion. However, the topographic heatmap showed less distinguishable between brain regions with different strength of awareness related activity. Thus, we would like to keep the way as we used to analyze and present these results.

      Figure 3B: Why choose seemingly arbitrary time points in this figure? What's the significance of 247 and 314 and 381ms (why not show 200, 250, 300, etc.)? Also, are these single time-points or averages within a broader time window around this time-point, e.g., 225-275ms for the 250ms plot?

      Thank reviewer for this helpful comment. We are sorry for not clearly describing why we chose the 8 time points to demonstrate the spatiotemporal characteristics of awareness related activity in Fig. 3B. To identify the awareness related activity, we analyzed the activity difference between aware and unaware trials during delay period (180-650 ms after visual stimulus onset). The whole dynamic process has been presented in SI with a video (video S1). Here, we just sampled the activity at 8 time points (180 ms, 247 ms, 314 ms, etc.) that equally divided the 430 ms delay period.

      We have added the description in the MS (lines 213-215).

      Figure 3D: It's not clear how this figure panel is related to the data shown in Fig3A. In Fig3A, the positive amplitude diffs all end at around 400ms, but in Fig3D, these diffs extend out to 600+ms. I suggest adding clarity about the conversion being used here.

      Thanks for reviewer’s comment. We are sorry for not clearly describing the way to analyze the population activity (Fig. 3D) in the previous version of manuscript. Since it is not clearly understood so far regarding the biological characteristics of LFP polarity, to simplify such complex issue, we consider the change in magnitude of LFP during delay period in our task is awareness related activity, regardless its actual value being positive or negative. Therefore, while analyzing the awareness related population activity, we first calculate the absolute value of activity difference between aware and unaware trials in individual recording site, then pool the data of 43 recording sites together and calculate the mean and standard error of mean (SEM)(Fig. 3D). As you can see in Fig. 3A, the activity difference between aware (red) and unaware (blue) trials lasts until/after the end of delay period. Thus, the awareness related population activity in Fig 3D extends out to 600 ms.

      We have added the description in the MS (lines 769-777).

      Figure 6D could be improved by making the time labels much bigger, perhaps putting them on the time axis on the bottom rather than in tiny text above each brain.

      Thanks for reviewer’s comment. We have modified it accordingly.

      Page 18, line 480: "our results show that the prefrontal cortex still displays visual awareness-related activities even after eliminating the influence of the confounding variables related to subjective reports such as motion preparation" This is too strong of a statement. It's not at all clear whether confounding variables related to subjective reports (especially the cognition needed to hold in mind the Y/N decision about seeing the stimulus prior to the response cue) were eliminated with the design used here. In other places of the manuscript, the authors use "minimized" which is more accurate.

      Thanks for reviewer’s comment. We have modified it accordingly.

      Page 19, section starting on line 508: The authors should consider citing the study by Vishne et al. (2023), which was just accepted for publication recently, but has been posted on bioRxiv for almost a year now: https://www.biorxiv.org/content/10.1101/2022.08.02.502469v1 . And on page 20, line 563, the authors claim that to the best of their knowledge, they were the first to detect "ignition" in PFC in human subjects. Consider revising this statement, now that you know about the Vishne et al. paper.

      We agree.

      Thanks for your reminding about these papers. We have cited this study and made discussion in the revised manuscript (line 522-533). We agree that several iEEG studies have shown the early involvement of PFC in visual perception (Vishne et al. 2023; Khalaf et al. 2023; Kwon et al. 2021). However, in these studies, authors did not compare the neural activity between conscious and unconscious conditions, leaving the possibility that the ERP and HFA were correlated with the unconscious information processing rather than awareness-specific processing. In the present study, we compared the neural activity in PFC between conscious and unconscious trials, and found that the activity of PFC specifically correlated with conscious perception. As we mentioned in the previous version of manuscript, there is one iEEG study (Gaillard et al. 2009) that reported awareness-specific activity in PFC. However, the awareness related activity started more than 300 ms after the onset of visual stimuli, which was about 100 ms longer than the early awareness related activity in our study. Nevertheless, according to reviewer’s comment, we modified our argument as following in lines 621-623:

      ‘However, as discussed above, in contrast with previous studies, our study detected earlier awareness-specific ‘ignition’ in the human PFC, while minimizing the motor-related confounding.’

      Experimental task section of Methods: Were any strategies for learning the response cue matching task suggested to patients/subjects, and/or did any patients/subjects report which strategy they ended up using? For example, if I were a subject in this experiment, I would remember and mentally rehearse the rules: "YES+GREEN = RIGHT" and "YES+RED = LEFT". For trials in which I didn't see anything, I wouldn't need to hold 2 more rules in mind, as they can be inferred from the inverse of the YES rules (and it's much harder to hold 4 things in mind than 2). This extra inference needed to get to the NO+GREEN = LEFT and NO+RED = RIGHT rules would likely cause me to respond slightly slower to the NO trials compared to the YES trials, leading to saccadic RT effects in the same direction the authors found. More information about the task training and strategies used by patients/subjects would be helpful.

      We agree and discussed this in lines 492-496.

      Reviewer #3 (Public Review):

      The authors report a study in which they use intracranial recordings to dissociate subjectively aware and subjectively unaware stimuli, focusing mainly on prefrontal cortex. Although this paper reports some interesting findings (the videos are very nice and informative!) the interpretation of the data is unfortunately problematic for several reasons. I will detail my main comments below. If the authors address these comments well, I believe the paper may provide an interesting contribution to further specifying the neural mechanisms important for conscious access (in line with Gaillard et al., Plos Biology 2009).

      Reply: We appreciate very much for the reviewer’s encouraged opinion.

      The main problem with the interpretation of the data is that the authors have NOT used a so called "no-report paradigm". The idea of no report paradigms is that subjects passively view a certain stimulus without the instruction to "do something with it", e.g., detect the stimulus, immediately or later in time. Because of the confusion of this term, specifically being related to the "act of reporting", some have argued we should use the term no-cognition paradigm instead (Block, TiCS, 2019, see also Pitts et al., Phil Trans B 2018). The crucial aspect is that, in these types of paradigms, the critical stimulus should be task-irrelevant and thus not be associated with any task (immediately or later). Because in this experiment subjects were instructed to detect the gratings when cued 600 ms later in time, the stimuli are task relevant, they have to be reported about later and therefore trigger all kinds of (known and potentially unknown) cognitive processes at the moment the stimuli are detected in real-time (so stimulus-locked). You could argue that the setup of this delayed response task excludes some very specific report related processes (e.g., the preparation of an eye-movement), which is good, however this is usually not considered the main issue. For example when comparing masked versus unmasked stimuli (Gaillard et al., 2009 Plos Biology), these conditions usually also both contain responses but these response related processes are "averaged out" in the specific contrasts (unmasked > masked). In this paper, RT differences between conditions (that are present in this dataset) are taken care of by using this delayed response in this paper, which is a nice feature for that and is not the case for the above example set-up.

      Given the task instructions, and this being merely a delayed-response task, it is to be expected that prefrontal cortex shows stronger activity for subjectively aware versus subjectively unaware stimuli. Unfortunately, given the nature of this task, the novelty of the findings is severely reduced. The authors cannot claim that prefrontal cortex is associated with "visual awareness", or what people have called phenomenal consciousness (this is the goal of using no-cognition paradigms). The only conclusion that can be drawn is that prefrontal cortex activity is associated with accessing sensory input: and hence conscious access. This less novel observation has been shown many times before and there is also little disagreement about this issue between different theories of consciousness (e.g., global workspace theory and local recurrency theories both agree on this).

      We totally agree that the no-report/no-cognition paradigms contain less cognition within the post-perceptual processing than the report paradigms. We designed the balanced response task in order to minimize the motor related component from post-perceptual processing, even though this task does not eliminate the entire cognition from post-perceptual processing. Regarding reviewer’s comment that our task is not able to assess the involvement of PFC in the emergence of awareness, we have different opinion. As we mentioned in the manuscript, the findings of early awareness related activity (~200 ms) in PFC, which resemble the VAN activity in EEG studies, indicate the association of PFC with the emergence of visual awareness (phenomenal consciousness).

      The best solution at this point seems to rewrite the paper entirely in light of this. My advice would be to state in the introduction that the authors investigate conscious access using iEEG and then not refer too much to no-cognition paradigm or maybe highlight some different strategies about using task-irrelevant stimuli (see Canales-Johnson et al., Plos Biology 2023; Hesse et al., eLife 2020; Hatamimajoumerd et al Curr Bio 2022; Alilovic et al., Plos Biology 2023; Pitts et al., Frontiers 2014; Dwarakanth et al., Neuron 2023 and more). Obviously, the authors should then also not claim that their results solve debates about theories regarding visual awareness (in the "no-cognition" sense, or phenomenal consciousness), for example in relation to the debate about the "front or the back of the brain", because the data do not inform that discussion. Basically, the authors can just discuss their results in detail (related to timing, frequency, synchronization etc) and relate the different signatures that they have observed to conscious access.

      The objective of present study is to assess whether PFC is involved in the emergence of visual awareness (i.e., phenomenal consciousness). Interestingly, we found the early awareness related activity (~200 ms after visual stimulus onset), including ERP, high gamma activity and phase synchronization, in PFC, which indicate the association of PFC with the emergence of visual awareness. Therefore, we would like to keep the basic context of manuscript and make revision according to reviewers’ comments.

      On the other hand, we totally agree reviewer’s argument that the report paradigm is more suitable to study the access consciousness. Indeed, we have found that the awareness related activity in PFC could be separated into two subgroups, i.e., early activity with shorter latency (~200 ms after stimulus onset) and late activity with longer latency (> 350 ms after stimulus onset). In addition, the early activity was declined to the baseline level within ~200 ms during delay period, whereas the late activity lasted throughout the delay period and reached to the next stage of task (change color of the fixation point). Moreover, the early activity occurs primarily within the contralateral PFC of the visual stimulus, whereas the late activity occurs within both contralateral and ipsilateral PFC. While the early awareness related activity resembles the VAN activity in EEG studies (associating with p-consciousness), the late awareness related activity resembles the P3b activity (associating with a-consciousness). We are going to report these results in a separated paper soon.

      I think the authors have to discuss the Gaillard et al PLOS Biology 2009 paper in much more detail. Gaillard et al also report a study related to conscious access contrasting unmasked and masked stimuli using iEEG. In this paper they also report ERP, time frequency and phase synchronization results (and even Granger causality). Because of the similarities in approach, I think it would be important to directly compare the results presented in that paper with results presented here and highlight the commonalities and discrepancies in the Discussion.

      Thanks for reviewer’s comment. We have made additional analysis and detailed discussion accordingly. In addition, we also extended discussion with other relevant studies in the revised manuscript.

      In lines 528-549,

      ‘Although one iEEG study reported awareness-specific PFC activation, the awareness-related activity started 300 ms after the onset of visual stimuli, which was ~100 ms later than the early activity in our study. Also, due to the limited number of electrodes in PFC (2 patients with 19 recording sites mostly in mesiofrontal and peri-insular regions), their experiments were restricted while exploring the awareness-related activity in PFC. In the present study, the number of recording sites (245) were much more than previous study and covered more areas in PFC. Our results further show earlier awareness-related activity (~ 200 ms after visual stimuli onset), including ERP, HFA and PLV. These awareness-related activity in PFC occurred even earlier (~150 ms after stimulus onset) for the salient stimulus trials (Fig. 3A\D and Fig. 4A\D, HA condition).

      However, the proportions are much smaller than that reported by Gaillard et al, which peaked at ~60%. We think that one possibility for the difference may be due to the more sampled PFC subregions in present study and the uneven distribution of awareness-related activity in PFC. Meanwhile, we noticed that the peri-insula regions and middle frontal gyrus (MFG), which were similar with the regions reported by Gaillard et al, seemed to show more fraction of awarenessrelated sites than other subregions during the delay period (0-650 ms after stimulus onset). To test such possibility and make comparison with the study of Gaillard et al. we calculated the proportion of awareness-related site in peri-insula and MFG regions. We found although the proportion of awareness-related site was larger in peri-insula and MFG than in other subregions, it was much lower than the report of Gaillard et al. One alternative possibility for the difference between these two studies might be due to the more complex task in Gaillard et al. Nevertheless, we think these new results would contribute to our understanding of the neural mechanism underlying conscious perception, especially for the role of PFC.’ In lines 601-603:

      ‘The only human iEEG study reported that the phase synchronization of the beta band in the aware condition also occurred relatively late (> 300 ms) and mainly confined to posterior zones but not PFC.’

      As for the Granger Causality analysis between PFC and occipital lobe, while the aim of this study focused mainly on PFC and there were few recoding sites in occipital lobe, we would like to do this analysis in later studies after we collect more data.

      In the Gaillard paper they report a figure plotting the percentage of significant frontal electrodes across time (figure 4A) in which it can be seen that significant electrodes emerge after approximately 250 ms in PFC as well. It would be great if the authors could make a similar figure to compare results. In the current paper there are much more frontal electrode contacts than in the Gaillard paper, so that is interesting in itself.

      Thanks reviewer for this constructive comment. We made similar analysis as Gaillard et al. and plotted the results in the figure bellow. As you can see, the awareness related sites started to emerge about 200 ms after visual stimulus onset according to both ERP and HG activity. The proportion of awareness related sites reached peak at ~14% (8% for HG) in 300-400ms. However, the proportions are much smaller than that reported by Gaillard et al, which peaked at ~60%. We think that one possibility for the difference may be due to the more sampled PFC subregions in present study and the uneven distribution of awareness-related activity in PFC. Meanwhile, we noticed that the peri-insula regions and middle frontal gyrus (MFG), which were similar with the regions reported by Gaillard et al, seemed to show more fraction of awareness-related sites than other subregions during the delay period (0-650 ms after stimulus onset). To test such possibility and make comparison with the study of Gaillard et al. we calculated the proportion of awareness-related site in peri-insula and MFG regions. We found although the proportion of awareness-related site was larger in peri-insula and MFG than in other subregions, it was much lower than the report of Gaillard et al. One alternative possibility for the difference between these two studies might be due to the more complex task in Gaillard et al.

      We have added this figure and discussion to the revised manuscript as a new result (Figure 4E & S2 and lines 537-549).

      Author response image 1.

      Percentage of awareness-related sites in ERP and HG analysis. n, number of recording sites in PFC.

      Author response image 2.

      Percentage of awareness-related sites in ERP and HG analysis at parsopercularis and middle frontal gyrus (MFG). n, number of recording sites.

      In my opinion, some of the most interesting results are not highlighted: the findings that subjectively unaware stimuli show increased activations in the prefrontal cortex as compared to stimulus absent trials (e.g., Figure 4D). Previous work has shown PFC activations to masked stimuli (e.g., van Gaal et al., J Neuroscience 2008, 2010; Lau and Passigngham J Neurosci 2007) as well as PFC activations to subjectively unaware stimuli (e.g., King, Pescetelli, and Dehaene, Neuron 2016) and this is a very nice illustration of that with methods having more detailed spatial precision. Although potentially interesting, I wonder about the objective detection performance of the stimuli in this task. So please report objective detection performance for the patients and the healthy subjects, using signal detection theoretic d'. This gives the reader an idea of how good subjects were in detecting the presence/absence of the gratings. Likely, this reveals far above chance detection performance and in that case I would interpret these findings as "PFC activation to stimuli indicated as subjectively unaware" and not unconscious stimuli. See Stein et al., Plos Biology 2021 for a direct comparison of subjectively and objectively unaware stimuli.

      We gratefully appreciate for reviewer’s helpful and valuable comments. We do notice that the activity of PFC in subjectively unawareness condition (stimulus contrast near perceptual threshold) is significantly higher than stimulus absent condition. Such results, by using sEEG recordings with much higher spatial resolution than brain imaging and scalp EEG, support findings of previous studies (citations). Considering the question of neural correlation of unawareness processing is a hot and interesting topic, after carefully considering, we would like to report these results in a separate paper, rather than add these results in the current manuscript in order to avoid the distraction.

      According to reviewer’s comment about the objective detection performance of the stimuli in our task, we analyzed the signal detection theoretic d’. The values of d’ in patients and healthy subjects are similar (1.81±0.27 in patients and 2.12±0.37 in healthy subjects). Such results indicate that the objective detection performance of subjects in our task is well above the chance level. Since our task merely measures the subjective awareness, we agree reviewer’s comment about the interpretation of our results as “PFC activation to stimuli indicated the subjective unawareness rather than objective unawareness”. We will emphasize this point in our next paper.

      We have added the d prime in the MS (lines149-150).

      In Figure 7 of the paper the authors want to make the case that the contrast does not differ between subjectively aware stimuli and subjectively unaware stimuli. However so far they've done the majority of their analyses across subjects, and for this analysis the authors only performed within-subject tests, which is not a fair comparison imo. Because several P values are very close to significance I anticipate that a test across subjects will clearly show that the contrast level of the subjectively aware stimuli is higher than of the subjectively unaware stimuli, at the group level. A solution to this would be to sub-select trials from one condition (NA) to match the contrast of the other condition (NU), and thereby create two conditions that are matched in contrast levels of the stimuli included. Then do all the analyses on the matched conditions.

      Thank reviewer for the helpful comment. Regarding reviewer’s comment “However so far they've done the majority of their analyses across subjects, and for this analysis the authors only performed within-subject tests, which is not a fair comparison imo”, if we understand correctly, reviewer considered that it was fair if the analysis of neural activity in PFC was done across subjects but the stimulus contrast analysis between NA and NU was done individually. Actually, it is not the case. In neural activity analysis, the significant awareness-related sites were identified firstly in each individual subject (Fig. 3A and Fig 4A, and Methods), same as the analysis of stimulus contrast (see Methods). Only in the neural population activity analysis, the activity of awareness-related sites was pooled together and made further analysis.

      To further evidence the awareness related activity in PFC is not highly correlated with stimulus contrast, we compared the activity difference between two different stimulus contrast conditions, i.e., stimulus contrast difference between high-contrast aware (HA) and NA conditions (large difference, ~14%), and between NA and NU conditions (slight difference, ~0.2%). The working hypothesis is that, if PFC activity is closely correlated with the contrast of stimulus contrast, we expect to see the activity difference between HA and NA conditions is much larger than that between NA and NU conditions. To test this hypothesis, we analyzed data of two patients in which the previous analysis showed significant or near significant difference of stimulus contrast between NA and NU conditions (Author response image 1, below, patient #2 and 1). The results (Author response image 1) show that the averaged activity difference (0-650 ms after visual stimulus onset) between HA and NA was similar as the averaged activity difference between NA and NU trials, even though the stimulus contrast difference was much larger between HA and NA conditions than between NA and NU conditions. Such results indicate that the awareness-related activity in PFC cannot be solely explained by the contrast difference between NA and NU conditions. Based on these results, we think that it is not necessary to perform the analysis as reviewer’s comment “A solution to this would be to sub-select trials from one condition (NA) to match the contrast of the other condition (NU), and thereby create two conditions that are matched in contrast levels of the stimuli included. Then do all the analyses on the matched conditions”. Another reason that impedes us to do this analysis is due to the limited trial numbers in our dataset.

      Author response image 3.

      Relationship between stimulus contract and PFC activity. X axis represents the stimulus contrast difference between two paired conditions, i.e., aware versus unaware in near perceptual threshold conditions (NA – NU, red dots); aware in high contrast condition versus aware in near perceptual threshold condition (HA – NA, blue dots). Y axis represents the activity difference between paired stimulus conditions. The results show that activity difference is similar between two paired conditions regardless the remarkable contrast difference between two paired conditions. Such results indicate that the greater activity in NA trials than in NU trials (Fig. xx-xx) could not be interpreted by the slight difference in stimulus contrast between NA and NU trials.

      Related, Figure 7B is confusing and the results are puzzling. Why is there such a strong below chance decoding on the diagonal? (also even before stimulus onset) Please clarify the goal and approach of this analysis and also discuss/explain better what they mean.

      We have withdrawn Figure7B for the confusing decoding results on the diagonal.

      I was somewhat surprised by several statements in the paper and it felt that the authors may not be aware of several intricacies in the field of consciousness. For example, a statement like the following "Consciousness, as a high-level cognitive function of the brain, should have some similar effects as other cognitive functions on behavior (for example, saccadic reaction time). With this question in mind, we carefully searched the literature about the relationship between consciousness and behavior; surprisingly, we failed to find any relevant literature." This is rather problematic for at least two reasons. First, not everyone would agree that consciousness is a highlevel cognitive function and second there are many papers arguing for a certain relationship between consciousness and behavior (Dehaene and Naccache, 2001 Cognition; van Gaal et al., 2012, Frontiers in Neuroscience; Block 1995, BBS; Lamme, Frontiers in Psychology, 2020; Seth, 2008 and many more). Further, the explanation for the reaction time differences in this specific case is likely related to the fact that subjects' confidence in that decision is much higher in the aware trials than in the unaware trials, hence the speeded response for the first. This is a phenomenon that is often observed if one explores the "confidence literature". Although the authors have not measured confidence I would not make too much out of this RT difference.

      We agree that and modified accordingly in lines 492-507.

      ‘An alternative interpretation for RT difference between aware and unaware condition in our study, i.e., reflecting task-strategies used by subjects/patients to remember the response mapping rules between the perception and the color cue (e.g., if the YES+GREEN=RIGHT and YES+RED=LEFT rules were held in memory, while the NO mappings were inferred secondarily rather than being actively held in memory).

      Another possibility is that the reaction time is strongly modulated by the confident level, which has been described in previous studies(Broggin et al., 2012; Marzi et al., 2006). However, in previous studies, the confident levels were usually induced by presenting stimulus with different physical property, such as spatial frequency, eccentricity and contrast. However, the dependence of visual process on the salience of visual stimulus confounds with the effect of visual awareness on the reaction time of responsive movements, which is hard to attribute the shorter reaction time in more salient condition purely to visual awareness. In contrast, we create a condition (near aware threshold) in the present study, in which the saliency (contrast) of visual stimulus is very similar in both aware and unaware conditions in order to eliminate the influence of stimulus saliency in reaction time. We think that the difference in reaction time in our study is mainly due to the modulation of awareness state, which was not reported previously.’

      I would be interested in a lateralized analysis, in which the authors compare the PFC responses and connectivity profiles using PLV as a factor of stimulus location (thus comparing electrodes contralateral to the presented stimulus and electrodes ipsilateral to the presented stimulus). If possible this may give interesting insights in the mechanism of global ignition (global broadcasting), supposing that for contralateral electrodes information does not have to cross from one hemisphere to another, whereas for ipsilateral electrodes that is the case (which may take time). Gaillard et al refer to this issue as well in their paper, and this issue is sometimes discussed regarding to Global workspace theory. This would add novelty to the findings of the paper in my opinion.

      We gratefully appreciate reviewer’s helpful and available suggestions. We have made the analysis accordingly. We find that the awareness-related ERP activation in PFC occurs earlier only in the contralateral PFC with latency about 200 ms and then occurs in both contralateral and ipsilateral PFC about 100 ms later. In addition, the magnitude of awareness-related activity is stronger in the contralateral PFC than in ipsilateral PFC during the early phase (200-400 ms), then the activity becomes similar between contralateral and ipsilateral PFC. Moreover, the awareness related HG activity only appears in the contralateral PFC. Such results show the spatiotemporal characteristics of visual awareness related activity between two hemispheres. We are going to report these results in a separate paper soon.

      Reviewer #3 (Recommendations For The Authors):

      Some of the font sizes in the figures are too small.

      We have modified accordingly.

      To me, the abbreviations are confusing, (NA/NU etc). I would try to come up with easier ones or just not use abbreviations.

      We have modified accordingly and try to avoid to use the abbreviations.

      The data/scripts availability statement states "available upon reasonable request". I would suggest that the authors make the data openly available when possible, and I believe eLife requires that as well.

      Thanks for reviewer’s suggestions. Due to several ongoing studies based on this dataset, we would like to open our data after complete these studies if there is no restriction from national policy.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Recommendations For The Authors):

      Comment 1: The authors need to do more to cite the prior work of others. CCL2 allelic expression imbalance tied to the rs13900 alleles was first reported by Johnson et al. (Pharmacogenet Genomics. 2008 Sep; 18(9): 781-791) and should be cited in the Introduction on line 128 next to the Pham 2012 reference. Also, in the Results section, line 142, please provide references for the statement "We and others have previously reported a perfect linkage disequilibrium between rs1024611 in the CCL2 cis-regulatory region and rs13900 in its 3′ UTR" since the linkage disequilibrium for these 2 SNPs is not reported in the ENSEMBL server for the 1000 genomes dataset. #

      We thank the reviewer for pointing out the omission regarding the citation of prior work. We acknowledge that Johnson et al. (2008) reported the association between rs13900 and CCL2 allelic expression imbalance based on Snapshot methodology while examining _cis-_acting variants of 42 candidate genes. To acknowledge these prior studies, we have cited the previous works of Johnson et al. (Johnson et al., 2008) along with Pham et al. (Pham et al., 2012) that linked rs13900 to CCL2 allelic expression imbalance. The text in the introduction section (Lines 128-130) has been updated to reflect the above-mentioned changes.

      “We and others have demonstrated AEI in CCL2 using rs13900 as a marker with the T allele showing a higher expression level relative to C allele (Johnson et al., 2008; Pham et al., 2012).”

      We have cited some previous studies that suggested strong linkage disequilibrium between rs1024611 and rs13900 within CCL2 gene, with D’=1 and R<sup>2</sup>=0.96 (Hubal et al., 2010; Intemann et al., 2011; Kasztelewicz et al., 2017; Pham et al., 2012) on Line 144. To address the concern regarding unreported linkage disequilibrium between rs1024611 and rs13900, we reviewed the pairwise linkage disequilibrium data by population in the ENSEMBL server for 1000 Genome dataset and confirm that the linkage disequilibrium (LD) between rs1024611 and rs13900 has been observed, with D’=1 and R<sup>2</sup>=0.92 to 1.0 in specific populations. We have included a table (Author response table 1) depicting pairwise LD between rs13900 and rs1024611 as reported in the ENSEMBL server for the 1000 genome dataset, a URL reference to the ENSEMBL server data.

      Author response table 1.

      Pairwise linkage disequilibrium data between rs13900 and rs1024611 by population reported in the ENSEMBL server for the 1000 genome dataset

      F. Variant, Focus Variant; R<sup>2</sup>, correlation between the pair loci; D’, difference between the observed and expected frequency of a given haplotype.

      URL: https://www.ensembl.org/Homo_sapiens/Variation/HighLD?db=core;r=17:34252269-34253269;v=rs1024611;vdb=variation;vf=959559590;second_variant_name=rs13900

      Comment 2: Certain details of the experimental protocols need to be further elaborated or clarified to contextualize the significance of the findings. For example, in the results line 184 the authors state "Using nascent RNA allows accurate determination of mRNA decay by eliminating the effects of preexisting mRNA." How does measuring nascent RNA enable the accurate determination of mRNA decay? Doesn't it measure allele-specific mRNA synthesis? Please elaborate, as this is a key result of the study. Can the authors provide a reference supporting this statement?

      It is worthwhile to mention that mRNA decay can be precisely measured by eliminating the effect of any preexisting mRNA. Metabolic labeling with 4-thiouridine allows exclusive capture of newly synthesized RNA which will allow quantification of RNA decay eliminating any interference from preexisting RNA. We agree that nascent RNA measurement primarily reflects synthesis rate rather than degradation. However, in conjugation with actinomycin-D based inhibition studies it can be exploited for accurate mRNA decay determination of the newly synthesized RNA (Russo et al., 2017). Therefore, our aim was to use the nascent RNA to study decay kinetics. The imbalance in the CCL2 allele expression does occur at the transcriptional level as seen in non-actinomycin-D treatment group (Figure 2C) although the impact of post-transcriptional mechanisms that alter transcripts stability cannot be ruled out. Therefore, we employed a novel approach that could assess both the synthesis and the degradation by combining actinomycin-D inhibition and nascent RNA capture in the same experimental setup. In the presence of actinomycin-D, we could detect much greater allelic difference in the expression levels of the rs13900T and C allele four-hour post-treatment, suggesting a role for post-transcriptional mechanisms in CCL2 AEI.

      “We have expanded the method section in the revised draft to include experimental details on capture of nascent RNA and subsequent downstream analysis” (Lines 553-563).

      Newly synthesized RNA was isolated using the Click-It Nascent RNA Capture Kit (Invitrogen, Cat No: C10365) following the manufacturer’s protocol. Peripheral blood mononuclear cells (PBMCs) or monocyte-derived macrophages (MDMs) obtained from heterozygous individuals were stimulated with lipopolysaccharide (LPS) for 3 hours in presence of 0.2 mM 5-ethynyl uridine (EU) (Jao and Salic, 2008; Paulsen et al., 2013). After the pulse, the culture medium was replaced with fresh growth medium devoid of EU. To assess RNA stability, actinomycin-D (5 µg/mL) was added, and samples were collected at 0, 1, 2, and 4 h post-treatment. The EU RNA was subjected to a click reaction that adds a biotin handle which was then captured by streptavidin beads. The captured RNA was used for cDNA synthesis (Superscript Vilo kit, Cat No: 11754250), PCR amplification, and allelic quantification.”

      Comment 3: Also, they next state that the assay was carried out using cells treated with actinomycin D (line 186). Doesn't actinomycin D block transcription? The original study by Jia et al 2008 in PNAS reported that low concentration of ActD (100 nM) blocked RNA pol I and higher concentration (2 uM) blocked RNA pol II. This or the study on which the InVitrogen kit is based should be cited. The concentration of actinomycin D used to treat the cells should be given. They report that the T allele transcript was more abundant than the C allele transcript in nascent RNA. Why doesn't that argue for a transcriptional mechanism rather than an RNA-stability mechanism? This result should be discussed in the Discussion.

      In our study, we used a concentration of 5 µg/mL (3.98 µM), which as noted by the reviewer can effectively inhibit RNA polymerase II (Pl II) activity. We have updated our manuscript to include details and cited the original work of (Jao and Salic, 2008; Paulsen et al., 2013), which thoroughly investigate the effect of various concentrations of ActD on RNA polymerase I and II (Line no 557). A discussion of the RNA stability mechanism is provided in the Result section (Lines 196-198).

      Comment 4: In their bioinformatics analysis of the allele-specific CCL2 mRNAs, they reported that the analysis obtained a score of 1e (line 214). What does that mean? Is it significant?

      We acknowledge that the notation “a score of 1e” was unclear and thank the reviewer for pointing it out. We have clarified its significance in the revised manuscript. The following text has been included in the result section (Line no 223)

      “The score of 1e was obtained using RBP-Var, a bioinformatics tool that scores variants involved in posttranscriptional interaction and regulation (Mao et al., 2016). Here, the annotation system rates the functional confidence of variants from category 1 to 6. While Category 1 is the most significant category and includes variants that are known to be expression quantitative trait loci (eQTLs), likely affecting RBP binding site, RNA secondary structure and expression, category 6 is assigned to minimal possibility to affect RBP binding. Additionally, subcategories provide further annotation ranging from the most informational variants (a) to the least informational variant (e). Reported 1e denotes that the variant has a motif for RBP binding. Although the employed scoring system is hierarchical from 1a to 1e, with decreasing confidence in the variant’s function. However, all the variants in category 1 are considered potentially functional to some degree.”

      Comment 5: In Figure 3A, why is the rare SNP rs181021073 shown? This SNP does not comeup anywhere else in the paper. For clarity, it should be removed from Figure 3A.

      We thank the reviewer for pointing out the error in Figure 3A and apologize for the oversight. We agree that the SNP rs1810210732 is not mentioned anywhere in the manuscript and its inclusion in Figure 3A may have caused confusion. We have removed this SNP from the revised figure.

      Comment 6: For the RNA EMSA results presented in Fig. 4C with recombinant ELAVL1 (HuR), there is clearly a loss of unbound T allele probe with increasing concentrations of the recombinant protein (without a concomitant increase in shifted complex). This suggests that the T allele probe is degraded or loses its fluorescent tag in the presence of recombinant HuR, whereas the C allele probe does not. The quantitation of the shifted complex presented in Fig. 4D as a percentage of bound and unbound probe is therefore artificially elevated for the T allele compared to the C allele. In fact, there seems to be little difference between the shifted complexes with the T and C allele probes. The authors should explain this difference in free probe levels.

      We appreciate the constructive critique of the reviewer regarding the RNA EMSA results in Fig. 4C. To address this, we repeated the experiments to analyze the differential binding of rs13900T/C allele bearing probes with increasing concentration of the recombinant HuR. No degradation/ loss of fluorescence tag for T allele was noted in presence of recombinant HuR in three independent experiments (Author response image 1). This indicates that both the probes with C or T allele show comparable stability and are not affected by increasing concentration of recombinant HuR. The apparent reduction in the unbound T allele probe in Figure 4C may be due to saturation at higher HuR concentration rather than degradation.

      Author response image 1.

      Differential binding and stability of oligoribonucleotide probes containing rs13900C or T alleles with recombinant HuR. (A) REMSA with labeled oligoribonucleotides containing either rs13900C or rs13900T and recombinant HuR at indicated concentrations. (B&C) Representative quantitative densitometric analysis of HuR binding to the oligoribonucleotides bearing rs13900 T or C. The signal in the bound fractions were normalized with the free probe. The figure represents data from three independent experiments (mean ± SEM).

      Comment 7: In the Methods section, concentrations and source of reagents should be given. For example, what was the bacterial origin of LPS and concentration? What concentration of actinomycin D? What was the source? Was it provided with the nascent RNA kit? In describing the riboprobes used for REMSA, please underline the allele in the sequences (lines 549 and 550).

      Thank you for your detailed feedback and suggestions regarding the Materials and Methods Section. We regret the oversight in providing detailed information on reagent concentrations and sources in the method section. We have now rectified this omission and have provided the necessary details and a summary of material/reagents used is presented as a supplementary table (Supplementary Table 4) to enable others to replicate our experiments accurately. Regarding the description of riboprobes for RNA Electrophoretic Mobility Shift Assay, we underlined and bold the allele in the sequences as suggested (Lines 603-604).

      Comment 8: For polysome profiling on line 603, please provide a protocol for the differentiation of primary macrophages from monocytes (please cite an original protocol, not a prior paper that does not give a detailed protocol).

      We agree with the reviewer’s comment and have included the following text for primary macrophage differentiation from monocytes in the method section cited the original protocol (Line 668).

      “Human monocytes were isolated from fresh blood as described earlier (Gavrilin et al., 2009) with slight modification. Briefly, peripheral blood mononuclear cells were isolated by density gradient centrifugation using Histopaque, followed by immunomagnetic negative selection using EasySep Human Monocyte isolation kit. A high purity level for CD14+ cells was consistently achieved (≥90%) through this procedure, as confirmed by flowcytometry. The purified monocytes were immediately used for macrophage differentiation by treating them with 50 ng/mL M-CSF (PeproTech) for 72 h and flow cytometric measurement of surface markers CD64+,

      CD206+, CD44 was used to confirm the differentiation”. This data is now shown in the new Supplementary Figure S6.

      Comment 9: In the legend of Figure 2, please replace "5 ug of actinomycin D" with the actual concentration used.

      We appreciate your attention to detail and thank you for pointing out the error in the legend of Figure 2. We regret the oversight and have made the suggested change (Line 739).

      Comment 10: In the Discussion, the authors cite the study of CCL2 mRNA stabilization by HuR in mice by Sasaki et al (lines 407-9). Is regulation of CCL2 mRNA by HuR in the mouse relevant to human studies?

      How conserved is the 3'UTR of mouse and human CCL2? Is the rs13900 variant located in a conserved region? How many putative HuR sites are found in the 3'UTR of human and mouse CCL2 3'UTR? Does HuR dimerize (see Pabis et al 2019, NAR)? This information could be added to the Discussion.

      Thank you for your valuable comment. We appreciate your suggestion to include information on the dimerization of HuR in our discussion. While reporting the overall structure and domain arrangement of HuR, Pabis et al. (2019) deciphered dimerization involving Trp261 in RRM3 as key requirement for functional activity of HuR in vitro. This finding provides additional context for understanding HuR’s role in regulating CCL2 expression. We have added the following few lines in the discussion (Lines 421-428) acknowledging HuR’s ability to dimerize and cite the relevant references.

      “HuR consists of three RNA recognition motifs (RRMs) that are highly conserved and canonical in nature (Ripin et al., 2019). In absence of RNA the three RRMs are flexibly linked but upon RNA binding they transition to a more compact arrangement. Mutational analysis revealed that HuR function is inseparably linked to RRM3 dimerization and RNA binding. Dimerization enables recognition of tandem AREs by dimeric HuR (Pabis et al., 2019) and explains how this protein family can regulate numerous targets found in pre-mRNAs, mature mRNAs, miRNAs and long noncoding RNAs.”

      We aligned the CCL2 3’UTR from five different mammalian species and found that the region flanking rs13900/ HuR binding site is relatively conserved (Author response image 2). Based on PAR-CLIP datasets there are four HuR binding regions in human CCL2 3’ UTR (Lebedeva et al., 2011). However, the region overlapping rs13900 seems to be predominantly involved in the CCL2 regulation (Fan et al., 2011). This information has been included in the discussion.

      Author response image 2.

      Cross-species alignment of the CCL2 3’UTR region flanking the rs13900 using homologous regions from 5 different mammals. (Hu, Human; CH, Chimps; MO, Mouse; RA, Rat; DO, Dog, rs13900 is shown within the brackets Y, pyrimidine)

      Reviewer #2 (Recommendations For The Authors):

      Comment 1: The supplemental figures need appropriate figure legends.

      We regret the oversight and thank the reviewer for bringing it to our attention. We have now included the figure legend for the supplemental figures in the revised manuscript.

      Comment 2: The data on LPS-induced CCL2 expression in PBMCs should be represented as a scatter plot with statistical significance to enhance clarity and interpretability.

      We thank the reviewer for this constructive suggestion. In the revised Figure 2A the induction of CCL2 expression by LPS in PBMCs obtained from 6 volunteers is represented as a scatter plot. We have also included individual data points in the updated figure and statistical significance to improve clarity and interpretability.

      Comment 3: The stability of CCL2 mRNA in control cells needs comparison with treated cells for context. The stability of a housekeeping gene (such as GAPDH or ACTB) should always be included as a control in actinomycin D experiments. Clarify the differential stability of rs13900C vs. rs13900T alleles.

      We used 18S to normalize data for the mRNA stability studies, as it is abundant and has been recommended for such studies, as it is relatively unaltered when compared to other housekeeping genes following Act D treatment in well-controlled studies (Barta et al., 2023). We also compared Ct values between the Act D-treated samples and the Act D-untreated samples in this study and found them to be comparable (Author response image 3).

      Author response image 3.

      Ct values of 18s rRNA in ACT-D and control samples in Fig 2.

      Comment 4: In the main text and the methods, the authors state that nascent RNA was obtained in the presence of actinomycin D and EU. However, actinomycin D blocks the transcription of nascent RNAs, therefore the findings in Figure 2C do not reflect nascent RNA

      Please see our response to Reviewer 1 Comment 2. We would like to emphasize that to assess the differential role of the rs13900 in nascent RNA decay we integrated nascent RNA labeling and transcriptional inhibition. Briefly, PBMC from a heterozygous individual were either unstimulated or stimulated with LPS and pulsed with 5-ethynyl uridine (0.2 mM) for 3 h and the media was replaced with EU free growth medium. RNA was obtained at 0,1, 2 and 4 h following actinomycin-D treatment (5 µg/mL) to assess the stability of nascent RNA.

      Comment 5: Figure 4A is not clearly described or labeled. What are lanes 2 and 6?

      Figure 4 has now been updated to clearly describe all the lanes. Lanes 2 and 6 represent the mobility shift seen following the incubation by whole cell extracts and oligonucleotide bearing rs13900C and rs13900T probes respectively.

      Comment 6: Figure 4C and Figure 4D: the charts in Figure 4D do not seem to reflect the changes in Figure 4C. How was the mean variant calculated? How do the authors explain the different quantities in unbound/free RNA in rs13900C compared to rs13900T?

      We appreciate the constructive critique of the reviewer regarding the RNA EMSA results in Fig. 4C. To address this, we repeated the experiments to analyze the differential binding of rs13900T/C probes with increasing concentration of the recombinant HuR. No degradation/ loss of fluorescence tag in presence of HuR was noted in case of T allele (Author response image 1). This indicates that both the C and T allele probes exhibit comparable stability and are not affected by increasing the concentration of recombinant HuR. The apparent reduction in the unbound T allele probe in Figure 4C may be due to saturation due to higher HuR concentration rather than degradation. Also please note under limiting HuR concentration (50µM) there is more binding of purified HuR by the T bearing oligoribonucleotide (compare lanes 2 & 6 in Author response image 1).

      Comment 7: Figure 5A does not look like an IP. The authors should show the heavy and light chains and clarify why there is co-precipitation of beta-actin with IgG and HuR. Also, they should include input samples. Figure 5B: given that in a traditional RIP the mRNA is not cross-linked and fragmented, any region of CCL2 mRNA would be amplified, not just the 3'UTR. In other words, Figure 5B can be valuable to show the enrichment of CCL2 mRNA in general, but not the enrichment of a specific region.

      We understand the reviewer’s concern on Figure 5A and 5B. Due to sample limitations we are unable to confirm these results using heavy and light chains antibodies. However, it is important to note that co-precipitation of β-actin with IgG and HuR can be due to its non-specific binding with protein G. In a recent study non-specific precipitation by protein G or A was reported for proteins such as p53, p65 and β-actin (Zeng et al., 2022). We are including a figure provided by MBL Life Sciences as the quality check document for their RIP Assay Kit (RN 1001) that was used in our study. It is evident from Author response image 4 that even pre-clearing the lysate may not remove the ubiquitously expressed proteins such as β-actin or GAPDH and they will persist as contaminants in pull-down samples. Hence the presence of β-actin in the IgG and HuR IP fractions may be due to non-specific interactions with the agarose beads.

      Author response image 4.

      MBL RIP-Assay Kit’s Quality Check. Quality check of immunoprecipitated endogenous PTBP1 expressed in Jurkat cells. Lane 1: Jurkat (WB positive cells), Lane 2: Jurkat + normal Rabbit IgG, Lane 3: Jurkat+ anti-PTBP1.

      We agree with the reviewer’s comments that traditional RIP without cross-linking and fragmentation allows amplification of any region of CCL2 mRNA. However, the upregulation of CCL2 gene expression in α-HuR immunoprecipitated samples indirectly reflects the enrichment of CCL2 mRNA associated with HuR. Moreover, 3’-UTR targeting primers were used for amplification to examine HuR binding at this region. We believe this approach ensures that the above enrichment specifically reflects HuR association with the 3’-UTR rather than other parts of the transcript.

      Comment 8: Construct Validation in Luciferase Assays (Figure 6): The authors need to confirm equal transfection amounts of constructs and show changes in luciferase mRNA levels. It would be better to use a dual luciferase construct for internal normalization.

      We would like to thank the reviewer for his concern and comments related to the luciferase reporter assay. As mentioned in the Methods equal transfection amount (0.5 µg) were used in our study (Line 658). We chose to normalize the reporter activity using total protein concentration instead of using a dual-reporter system to avoid crosstalk with co-transfected control plasmids. This is now included in the Materials and Method section (Lines 662-664). The optimized design of the LightSwitch Assay system used in our study allows a single assay design when a highly efficient transfection system is used (as recommended by the manufacturer). We verified the presence of the correct insert in the CCL2 Light Switch 3’UTR reporter constructs (Author response image 5). We also sequenced the vector backbone of both constructs to rule out any inadvertently added mutations.

      Author response image 5.

      Schematic of the Lightswitch 3’UTR vector. (A) Vector information. The vector contains a multiple cloning site (MCS) upstream of the Renilla Luciferase gene (RenSP). Human 3’UTR CCL2 is cloned into MCS downstream of the reporter gene and it becomes a part of a hybrid transcript that contains the luciferase coding sequence used to the UTR sequence of CCL2. Constructs containing rs13900C or rs13900T allele were generated using site-specific mutagenesis on CCL2 LightSwitch 3’UTR reporter. The constructs were validated by Sanger sequencing. (B&C) Sequence chromatograph of the constructs containing CCL2-3’UTR insert showing rs13900C and rs13900T respectively. The result confirms the fidelity of the constructs used in the reporter assay.

      Comment 9: Polysome Data Presentation: The authors should present the distribution of luciferase mRNA (rs13900T and rs13900C) in all fractions separately and include data on the translation of a control like ACTB or GAPDH.

      Since our assessment of CCL2 allele-specific enrichment in the polysome fractions from MDMs of heterozygous donors did not yield a consistent pattern for differential loading (Supplementary Table3), we used a 3’UTR reporter-based assays that estimated the impact of rs13900 T and C alleles on overall translational output (translatability). The translatability was calculated as luciferase activity normalized by luciferase mRNA levels after adjusting for protein and 18S rRNA using a previously reported method (Zhang et al., 2017). As the measurement of relative allele enrichment in polysome fractions was not included in our invitro reporter assays, it is not possible to present the distribution of luciferase mRNA in various fractions separately. Author response image 6 shows the proportion of CCL2 mRNA in different fractions corresponding to cytosolic, monosome and polysome fractions obtained from MDM lysates from heterozygous donors along with 18S rRNA quantification.

      Author response image 6.

      Determination of rs13900C/T allelic enrichment in polysome fractions and its effect on polysome loading. Polysome profile obtained by sucrose gradient centrifugation of macrophages before and after stimulation with LPS (1 µg/mL) for 3 h. (A&B) The CCL2 mRNA shifts from monosome-associated fractions to heavier polysomes following LPS stimulation, indicating increased translation efficiency. (C&D) In contrast, the distribution of 18S shows no significant shift due to LPS treatment. (mean ± SEM, n=4). The percentage of mRNA loading on polysome was calculated using ΔCT method (mean ± SEM, n=4). (E&F) CCL2 AEI measurement in polysomes of macrophages from heterozygous donors (n=2). Genomic and cDNA were subjected to Sanger sequencing and the peak height of both the alleles were used to determine the relative abundance of each allele.

      Comment 10: Please explain in detail how primary monocytes were transfected with siRNAs for more than 72 hours. Typically, primary monocytes are very hard to transfect, have a very limited lifespan in culture (around 48 hours), and show a high level of cell death upon transfection. If monocytes were differentiated from macrophages, explain in detail how it was done and provide supporting citations from the literature.

      We agree with the challenges associated with transfecting primary monocytes, including their limited lifespan in culture and susceptibility to cell death following transfection and apologize for not elaborating the method section on lentiviral transduction of primary macrophages. To overcome these limitations, we utilized monocytes undergoing differentiation into macrophages rather than fully differentiated macrophages for our experiments. Cells were transfected by slightly modifying the method described by Plaisance-Bonstaff et.al 2019 (Plaisance-Bonstaff et al., 2019). Briefly, monocytes were purified from PBMCs obtained from homozygous donors for rs13900 C or rs13900T by negative selection. Upon purification cells were resuspended in 24 well plates at a seeding density of 0.5 x10<sup>6</sup> cells per well and were further cultured in the medium supplemented with 50 ng/mL M-CSF (Fig S7 and Fig. S6). After 24 h, ready to use GFP-tagged pCMV6-HuR or CMV-null lentiviral particles (Amsbio, Cambridge, M.A) were transduced into 0.5 x10<sup>6</sup> cells in presence of polybrene (60 µg/mL) at a MOI of 1. The cells were processed for HuR and CCL2 expression 72 h after transduction after stimulation with LPS for 3 h. This data is now shown in new Supplementary Figure S7.

      Comment 11: The authors should prove the binding of HuR to the 3'UTR of CCL2 not only in vitro but also in cells. For this aim, a CLIP including RNA fragmentation followed by RT-PCR or sequencing would be more informative than a RIP. It would be helpful also to demonstrate the different binding to the 3'UTR variants (rs13900C vs. rs13900T).

      We thank the reviewer for his valuable suggestion on validating binding of HuR to the 3’UTR in cells. It is important to highlight that several independent datasets including CLIP have already demonstrated that HuR binds to the 3’UTR of CCL2 including the region spanning the rs13900 locus. We have summarized the relevant studies in a tabular form (Supplementary Table-2). We are unable to confirm these results in new experiments due to sample limitation. The already existing data and experimental evidence provided in this manuscript strongly suggest that HuR binds within the 3’UTR. Also, a previously published study (Fan et al, 2011) showed that only the first 125 bp of the CCL2 3’UTR that flanks rs13900 showed strong binding to HuR but not the CCL2 coding region or other regions of 3’UTR. This further suggests that the HuR binding to the CCL2 is localized to the 3’UTR that flanks rs13900. Please note that the primers used for amplification of the RIP material were 3’-UTR specific.

      Comment 12: To quantify nascent RNA, Figure 2C should be replaced by new experiments. To label nascent RNA, authors can perform a run on/run-off experiments only with EU, without actinomycin D. As aforementioned, ActD blocks the transcription of new RNA, therefore is not useful for studying nascent RNA.

      We thank the reviewer for the suggestion and would like to emphasize that while measuring the rs13900C/T allelic ratio in nascent RNA, the experimental setup included evaluating the AEI both in presence and absence of the transcriptional inhibitor actinomycin D. The data presented in Figure 2C shows that the AEI in presence of actinomycin D is amplified in comparison to non-actinomycin D treatment. This provides definitive evidence to our hypothesis that rs13900T confers greater stability to the CCL2 message. We apologize for the oversight of not mentioning non-ACT D treatment in the methods. Necessary changes have been made to the revised manuscript (Lines 553-63).

      Comment 13: The authors should also investigate the role of TIA1 as a potential RBP and explore the possibility that TIA1 may interact more with the C allele to suppress translation.

      Based on the existing studies, we highlighted the importance of RNA-binding proteins such as TIA1 and U2AF56 that may interact with CCL2 transcript (Lines 408-09). However, exploring TIA1 binding and its functional consequences are beyond the scope of the current study. We thank the reviewer for this comment and this aspect will be pursued in future studies.

      Comment 14: It would be informative if the authors included study limitations and potential clinical implications of these findings, particularly regarding therapeutic approaches targeting CCL2.

      We would like to inform the reviewer that the submitted manuscript included the limitations of our study. They were discussed at appropriate places and were not included as a separate section. For instance, Line 398 emphasizes the need for in-depth studies for association of rs13900 and canonical CCL2 transcript. The need for additional studies regarding SNP-induced structural changes in RNA and its implication for RBP accessibility was highlighted at Lines 417-419. The inconclusive results of differential loading of polysomes and the need to conduct further research on the impact of rs13900 on CCL2 translatability in primary cells (Lines 457-459). We noted at Lines 484-485 about our further studies exploring the differential binding of HuR to the other regions of CCL2 3’UTR.

      Multiple studies have indicated that functional interference of HuR as a novel therapeutic strategy, particularly in the context of cancer, inflammation, neurodegeneration, and autoimmune disorders. These approaches include inhibitors such as MS-444, KH-3, and CMLD-2 that disrupt the interaction between HuR and ARE elements or mRNAs of target genes involved in disease pathology (Chaudhary et al., 2023; Fattahi et al., 2022; Lang et al., 2017; Liu et al., 2020; Wang et al., 2019; Wei et al., 2024), offering a potential new avenue for disease treatment. Findings from our studies provide unique insights on regulation of CCL2 expression by both rs13900 and HuR. We strongly believe that the SNP rs13900 and HuR represent a new druggable target for M/M-mediated disorders such as inflammatory diseases, cancer, and cardiovascular diseases. The potential clinical implications have been discussed in the revised manuscript (Lines 487-494)

      References

      Barta, N., Ordog, N., Pantazi, V., Berzsenyi, I., Borsos, B.N., Majoros, H., Pahi, Z.G., Ujfaludi, Z., Pankotai, T., 2023. Identifying Suitable Reference Gene Candidates for Quantification of DNA Damage-Induced Cellular Responses in Human U2OS Cell Culture System. Biomolecules 13.

      Chaudhary, S., Appadurai, M.I., Maurya, S.K., Nallasamy, P., Marimuthu, S., Shah, A., Atri, P., Ramakanth, C.V., Lele, S.M., Seshacharyulu, P., Ponnusamy, M.P., Nasser, M.W., Ganti, A.K., Batra, S.K., Lakshmanan, I., 2023. MUC16 promotes triple-negative breast cancer lung metastasis by modulating RNA-binding protein ELAVL1/HUR. Breast Cancer Res 25, 25.

      Fan, J., Ishmael, F.T., Fang, X., Myers, A., Cheadle, C., Huang, S.K., Atasoy, U., Gorospe, M., Stellato, C., 2011. Chemokine transcripts as targets of the RNA-binding protein HuR in human airway epithelium. J Immunol 186, 2482-2494.

      Fattahi, F., Ellis, J.S., Sylvester, M., Bahleda, K., Hietanen, S., Correa, L., Lugogo, N.L., Atasoy, U., 2022. HuR-Targeted Inhibition Impairs Th2 Proinflammatory Responses in Asthmatic CD4(+) T Cells. J Immunol 208, 38-48.

      Hubal, M.J., Devaney, J.M., Hoffman, E.P., Zambraski, E.J., Gordish-Dressman, H., Kearns, A.K., Larkin, J.S., Adham, K., Patel, R.R., Clarkson, P.M., 2010. CCL2 and CCR2 polymorphisms are associated with markers of exercise-induced skeletal muscle damage. J Appl Physiol (1985) 108, 1651-1658.

      Intemann, C.D., Thye, T., Forster, B., Owusu-Dabo, E., Gyapong, J., Horstmann, R.D., Meyer, C.G., 2011. MCP1 haplotypes associated with protection from pulmonary tuberculosis. BMC Genet 12, 34.

      Jao, C.Y., Salic, A., 2008. Exploring RNA transcription and turnover in vivo by using click chemistry. Proc Natl Acad Sci U S A 105, 15779-15784.

      Johnson, A.D., Zhang, Y., Papp, A.C., Pinsonneault, J.K., Lim, J.E., Saffen, D., Dai, Z., Wang, D., Sadee, W., 2008. Polymorphisms affecting gene transcription and mRNA processing in pharmacogenetic candidate genes: detection through allelic expression imbalance in human target tissues. Pharmacogenet Genomics 18, 781791.

      Kasztelewicz, B., Czech-Kowalska, J., Lipka, B., Milewska-Bobula, B., Borszewska-Kornacka, M.K., Romanska, J., Dzierzanowska-Fangrat, K., 2017. Cytokine gene polymorphism associations with congenital cytomegalovirus infection and sensorineural hearing loss. Eur J Clin Microbiol Infect Dis 36, 1811-1818. Lang, M., Berry, D., Passecker, K., Mesteri, I., Bhuju, S., Ebner, F., Sedlyarov, V., Evstatiev, R., Dammann, K., Loy, A., Kuzyk, O., Kovarik, P., Khare, V., Beibel, M., Roma, G., Meisner-Kober, N., Gasche, C., 2017. HuR Small-Molecule Inhibitor Elicits Differential Effects in Adenomatosis Polyposis and Colorectal Carcinogenesis. Cancer Res 77, 2424-2438.

      Lebedeva, S., Jens, M., Theil, K., Schwanhausser, B., Selbach, M., Landthaler, M., Rajewsky, N., 2011. Transcriptome-wide analysis of regulatory interactions of the RNA-binding protein HuR. Mol Cell 43, 340-352.

      Liu, S., Huang, Z., Tang, A., Wu, X., Aube, J., Xu, L., Xing, C., Huang, Y., 2020. Inhibition of RNA-binding protein HuR reduces glomerulosclerosis in experimental nephritis. Clin Sci (Lond) 134, 1433-1448.

      Mao, F., Xiao, L., Li, X., Liang, J., Teng, H., Cai, W., Sun, Z.S., 2016. RBP-Var: a database of functional variants involved in regulation mediated by RNA-binding proteins. Nucleic Acids Res 44, D154-163.

      Pabis, M., Popowicz, G.M., Stehle, R., Fernandez-Ramos, D., Asami, S., Warner, L., Garcia-Maurino, S.M., Schlundt, A., Martinez-Chantar, M.L., Diaz-Moreno, I., Sattler, M., 2019. HuR biological function involves RRM3-mediated dimerization and RNA binding by all three RRMs. Nucleic Acids Res 47, 1011-1029.

      Paulsen, M.T., Veloso, A., Prasad, J., Bedi, K., Ljungman, E.A., Tsan, Y.C., Chang, C.W., Tarrier, B., Washburn, J.G., Lyons, R., Robinson, D.R., Kumar-Sinha, C., Wilson, T.E., Ljungman, M., 2013. Coordinated regulation of synthesis and stability of RNA during the acute TNF-induced proinflammatory response. Proc Natl Acad Sci U S A 110, 2240-2245.

      Pham, M.H., Bonello, G.B., Castiblanco, J., Le, T., Sigala, J., He, W., Mummidi, S., 2012. The rs1024611 regulatory region polymorphism is associated with CCL2 allelic expression imbalance. PLoS One 7, e49498.

      Plaisance-Bonstaff, K., Faia, C., Wyczechowska, D., Jeansonne, D., Vittori, C., Peruzzi, F., 2019. Isolation, Transfection, and Culture of Primary Human Monocytes. J Vis Exp.

      Ripin, N., Boudet, J., Duszczyk, M.M., Hinniger, A., Faller, M., Krepl, M., Gadi, A., Schneider, R.J., Sponer, J., Meisner-Kober, N.C., Allain, F.H., 2019. Molecular basis for AU-rich element recognition and dimerization by the HuR C-terminal RRM. Proc Natl Acad Sci U S A 116, 2935-2944.

      Russo, J., Heck, A.M., Wilusz, J., Wilusz, C.J., 2017. Metabolic labeling and recovery of nascent RNA to accurately quantify mRNA stability. Methods 120, 39-48.

      Wang, J., Hjelmeland, A.B., Nabors, L.B., King, P.H., 2019. Anti-cancer effects of the HuR inhibitor, MS-444, in malignant glioma cells. Cancer Biol Ther 20, 979-988.

      Wei, L., Kim, S.H., Armaly, A.M., Aube, J., Xu, L., Wu, X., 2024. RNA-binding protein HuR inhibition induces multiple programmed cell death in breast and prostate cancer. Cell Commun Signal 22, 580.

      Zeng, X., Zeng, W.H., Zhou, J., Liu, X.M., Huang, G., Zhu, H., Xiao, S., Zeng, Y., Cao, D., 2022. Removal of nonspecific binding proteins is required in co-immunoprecipitation with nuclear proteins. Biotechniques 73, 289-296.

      Zhang, X., Chen, X., Liu, Q., Zhang, S., Hu, W., 2017. Translation repression via modulation of the cytoplasmic poly(A)-binding protein in the inflammatory response. Elife 6.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      (1) The primary weakness of the paper concerns its conclusion of having generated "homogenous mature microglia", partly based on the RNAseq analysis. However, the comparison of gene profiles was carried out only between "hiPSC-derived mature microglia" and the proliferating myeloid progenitors. While the transcriptome profiles revealed a trend of enrichment of microglia-like gene expression in "hiPSC-derived mature microglia" compared to proliferating myeloid progenitors, this is not sufficient to claim they are "mature microglia". It is important that one carries out a comparative analysis of the RNAseq data with those of primary human microglia, which may be done by leveraging the public database. To convincingly claim these cells are mature microglia, questions need to be addressed including how similar the molecular signatures of these cells are compared with the fully differentiated primary microglia cell or if they remain progenitor-like or take on mosaic properties, and how they distinguish from macrophages.

      We greatly appreciate the insightful comments and suggestions from the reviewers, which were instrumental in enhancing our data analysis and organization. In response to the feedback, we have updated the terminology from “mature microglia” to simply “microglia” while clarifying in our text that these are fully differentiated microglia under single-type cell culture conditions.

      Guided by the reviewer's advice, we incorporated RNA-seq data from human brain microglia studies conducted by Dr. Poon and Dr. Blurton-Jones' Lab (Abud et al., Neuron, 2017) and Dr. Huitinga's Lab (van der Poel et al., Nat Commun, 2019). We then conducted a comparative analysis of the gene expression profiles between our fully differentiated hiPSC-derived microglia and those from fetal/adult brain microglia (see Fig.2. Suppl. B, C and D; Suppl. table 1 and table 2). The correlation analysis revealed that our hiPSC-derived microglia closely resemble fetal and adult brain microglia, distinguishing them significantly from monocytes and inflammatory monocytes.

      (2) While the authors attempted to demonstrate the functional property of "hiPSC-derived mature microglia" in culture, they used LPS challenge, which is an inappropriate assay. This is because human microglia respond poorly to LPS alone but need to be activated by a combination of LPS with other factors, such as IFNγ. Their data that "hiPSC-derived mature microglia" showed robust responses to LPS indeed implicates that these cells do not behave like mature human microglia.

      We appreciate the feedback received. In response, we cultured hiPSC-derived microglia cells and subjected them to treatments with IFNγ, LPS, and a combination of both IFNγ+LPS, as illustrated in Figure 3 suppl. Our findings revealed that the IFNγ+LPS combination notably enhanced the expression of IL1a, IL1b, TNFa, CCL8, and CXCL10, whereas IL6 and CCL2 levels remained unchanged. Treatment with IFNγ alone significantly elevated the expression of TNFa, CCL8, CXCL10, and CCL2. These outcomes align with the findings reported by Rustenhoven et al. (Sci Rep, 2016), suggesting that the functionality of our hiPSC-derived microglia cells closely mirrors that of primary human adult microglia cells.

      (3) The resolution of Figs. 4 - 6 is so low that even some of the text and labels are hardly readable. Based on the morphology shown in Fig. 4 and the statement in line 147, these hiPSC-derived "cells altered their morphology to a rounded shape within an hour of incubation and rapidly internalized the fluorescent-labeled particles". This is a peculiar response. Usually, microglia do not respond to fluorescent-labeled zymosan by turning into a rounded shaped within an hour when they internalize them. Such a behavior usually implicates weak phagocytotic capacity.

      Thank you for your insightful comments. During submission, the main text's PDF version was converted online, resulting in low-quality output. We have since updated this with a high-resolution version. The observed alterations in cell morphology following zymosan phagocytosis may be attributed to the high zymosan concentration used (2mg/ml). We conducted an assessment to understand the impact of zymosan concentration on the morphology of hiPSC-derived microglial cells, as shown in Figure 4 suppl B. Our findings indicate that microglia cells adopt an amoeboid, rounded shape at zymosan concentrations exceeding 20ug/ml. To clarify this point, we have amended the text to read: "The cells altered their morphology and rapidly internalized the fluorescent-labeled particles."

      (4) Data presented in Fig. 5 are not very convincing to support that transplanted cells were immunopositive for "human CD11b (Fig.5C), as well as microglia signature markers P2ry12 and TMEM119 (Fig.5D)" (line 167). The resolution and magnification of Fig. 5D is too low to tell the colocalization of tdT and human microglial marker immunolabeling. In the flat-mount images (C, I), hCD11b immunolabeling is not visible in the GCL or barely visible in the IPL. This should be discussed.

      We are grateful for the reviewer's comments. As previously mentioned, the low quality of the images was due to the online conversion of the PDF version. We have now submitted both high-quality PDF and Word versions for the reviewer's assessment. In these high-quality versions, the colocalization of tdT with human P2ry12 and TMEM119 is distinctly visible. Additionally, we have updated the hTMEM119 staining images in Figure 5D. The results from hCD11b staining align with those observed in mouse CD11b staining, notably showing more effective staining in the outer plexiform layer (OPL) microglia cells. The reason for this—whether it pertains to a staining issue, a variance in CD11b expression among microglia cells in the OPL and ganglion layer (GL), or differences in the samples due to varying conditions—is not yet clear and warrants further investigation.

      (5) Microglia respond to injury by becoming active and lose their expression of the resting state microglial marker, such as P2ry12, which is used in Fig. 6 for detection of migrated microglia. To confirm that these cells indeed respond to injury like native microglia, one should check for activated microglial markers and induction of pro-inflammatory cytokines in the sodium iodate-injury model.

      The reviewer's insights are spot-on. We utilized preserved retinas to extract mRNA, which was then reverse-transcribed to cDNA for conducting qRT-PCR using human-specific primers, as detailed in the updated Table 5. The findings revealed that following retinal pigment epithelium (RPE) injury for 3 days, the transplanted hiPSC-derived microglial cells exhibited an increase in the production of inflammatory cytokines and upregulated genes related to phagocytosis, migration, and adhesion. Conversely, there was a decrease in the expression of microglia-specific signature genes and neurotrophic factors, as demonstrated in Figure 7 suppl.

      Reviewer #1 (Recommendations For The Authors):

      Line 52: "Microglia cell repopulation research suggests that: 1) if no injury or infection occurs, retinal microglia cells can sustain their homeostasis indefinitely" - this statement is too strong or delivers a confusing message; it needs clarification or to be backed up by evidence. Recent single cell RNA sequencing analyses suggest that even under a normal condition, residential microglia do not present as a single homeostatic cell cluster, rather a subpopulation of activated inflammatory microglia are constantly detectable in the normal retina. This is likely because normal retinal neurons can be stressed due to various reasons, such as the temporal accumulation of misfolded proteins, exposed to strong light, or ageing, etc.

      We appreciate the comments. We changed the sentence to read, "Microglia cell repopulation research suggests that: 1) retinal resident microglia cells can sustain their population with the local dividing and migration if any perturbations do not exceed the threshold of the recovery speed by local neighbor microglia cells."

      Line 83: "we applied an appropriate protocol for culturing human iPSC-derived microglia cells" - it would be more appropriate if the word "appropriate" can be replaced by either "unique" or a phrase like "we adopted a (previously published) protocol...".

      Thanks! We changed it to “We modified a previously published protocol to culture human iPSC-derived microglia cells.".

      Fig. 1F,G: A method of flow cytometry will provide more comprehensive cell quantification for percentages of positively labeled cells than cell counts under high magnification confocal images.

      Thanks for the comments! We agreed with the reviewer. Given the experimental resources available, the quantifications of confocal images did provide a reasonable assessment. We will perform flow cytometry analysis in future experiments.

      Reviewer #2 (Public review):

      Weaknesses:

      Gene expression analysis of mature microglia cells should be better interpreted and it would be beneficial to compare the iPSC-derived microglia gene set to a human microglial cell line (for example, HMC3) instead of myeloid progenitor cells.<br /> The way that the manuscript has been written, unfortunately, is not optimal. I recommend that the entire manuscript be edited and proofread in English. The text contains spelling and grammar mistakes, and the manuscript is inconsistent in several parts. The manuscript should also be revised for a scientific paper format.

      We appreciate the reviewer's comments and have taken them into consideration along with similar inquiries from Reviewer 1. Following the suggestions, we conducted a comparison of gene expression profiles between our hiPSC-derived microglia and those from fetal/adult brain microglia, as depicted in the updated Fig.2. Suppl. B, C and D; as well as in the Suppl. table 1 and table 2. The correlation analysis demonstrated that the hiPSC-derived microglia cells closely resemble fetal and adult brain microglia, significantly differing from monocytes and inflammatory monocytes. Additionally, we have revised the manuscript to adhere more closely to the conventional scientific format.

      Reviewer #2 (Recommendations For The Authors):

      Specific suggestions for improvement:

      - Regarding the characterization of human iPSC-derived microglia, P2RY12 is a general hematopoietic cell marker. One cannot judge the maturity of microglia only by P2RY12 expression (for example, line 261). The expression of more specific markers such as TMEM119 and PROS1 should be studied and discussed.

      We are thankful for the reviewer's valuable feedback. In response:

      We have removed the term "mature" and clarified that the hiPSC-derived microglia we studied are fully differentiated within single-type cell culture conditions.

      We performed a comparative analysis of the gene expression profiles between our hiPSC-derived microglia and microglia from human brains, as illustrated in the updated Fig.2. Suppl. B, C and D. The results affirm that hiPSC-derived microglia closely resemble human fetal and adult microglia.

      We noted that the expression of TMEM119 in hiPSC-derived microglia under in vitro single-type cell culture conditions is notably low, as shown in the below A. This suggests that the stimulatory factors in our single-type cell culture might not sufficiently induce TMEM119 expression in microglia. The necessity for a retinal environment or interaction with neuronal and/or other glial cells for TMEM119 expression mirrors the behavior of infiltrating peripheral monocytes in pathological conditions, which initially lack TMEM119 but later differentiate into microglial-like macrophages that express TMEM119, as reported by Ma et al. in Sci Rep (2017).

      Additionally, our findings suggest that PROS1 is not uniquely characteristic of microglia but is expressed across a variety of cell types. Within our specific culture conditions, we noted a higher expression of PROS1 in microglial progenitor cells, as shown in Author response image 1B and C.

      Author response image 1.

      - In Figure 2, Part E, the names of the genes or pathways in the figure are not clear, and are these genes the set that are the most differentially expressed between iPSCs-derived microglia and MPC? The analysis needs more explanation.

      We regret any confusion caused by our previous explanation. To clarify, we compiled a list of microglia-enriched genes from the research conducted by Barres BA Lab (Bennett et al., Proc Natl Acad Sci U S A, 2016) and from our own RNA sequencing data of mouse retinal microglia, identifying a total of 130 genes predominantly expressed in microglia (Suppl. Table 3). We then applied this gene list to analyze our hiPSC-derived microglia RNA sequencing data, resulting in the identification of 71 microglia-specific genes. These 71 genes were subjected to Ingenuity Pathway Analysis (IPA) to visualize the signaling pathways involved. The details of these microglia genes can be found in the updated suppl. table 3.

      - Lines 124 to 128 mention that high expression of Stat3, IL1b, and IL6 and their central role in pathway analysis emphasize the efficiency of the maturation protocol. Regarding the fact that Stat3, IL1b, and IL6 are contributors to proinflammatory pathways, it is not convincing that the high expression of these genes in iPSC-derived microglia demonstrates the efficiency of the maturation protocol, given that microglia are not stimulated.

      Thanks for the comments! We added the sentences about the comparison results between hiPSC-derived microglia and human brain microglia. We have also replaced the “mature” with “functional.” The sentence reads, “Thus, our method of obtaining differentiated microglia is a reliable method to generate a large number of homogenous functional microglia cells.”

      - Statistical analysis is missing for some graphs, for example, figures 1-3 and 5.

      We appreciate the comments. We have added the statistical results in the revised version.

      - The legend for Figure 3 needs to be rewritten. The graphs or applied assays should be explained in the legend, not the interpretation of the data.

      The legend was rewritten.

      - There is no Figure 3 in the supplement figures file.

      We added Figure 3. Suppl.

      - hTMEM119 staining in Figure 5, Part D, is mostly background. Please provide another image.

      The images were unclear after on-line converting due to the low number of pixels. We replaced them with new hTMEM119 staining images in Figure 5D.

      - In line 176, figure 5I has been forgotten to be mentioned.

      Thank you very much! We added 5I.

      - Lines 241 to 244 state that more than 50% of the AMD-associated genes are highly expressed in retinal microglia according to Fig. discussion suppl A & B. It is not clear that the gene set that was used for analysis is from a healthy retinal microglia or AMD-related ones. Please explain precisely.

      Thank you for your feedback. The gene list we referenced originates from a Genome-Wide Association Study (GWAS) that compared patients with Age-related Macular Degeneration (AMD) to healthy cohorts. We did not directly utilize this list in our experiments but referred to it to underscore the importance of microglia cells in the context of AMD.

      Some of the English proofreading and manuscript format comments:

      Line 805: Iba1 is written in lowercase. Is it human IBA1? It is not consistent with the way it is written in the text (in line 117, for example).

      Thank you for pointing out the error. We reformed all Iba1 as “Iba1”. The Iba1 we used here are all from Wako (#019–19741), which labels both mouse and human microglial cells.

      Line 814: microglia-enriched gene expression instead of microglia-enrich gene expression

      Thank you! We changed it.

      Line 345: Starting a sentence with lower case letter.

      Thank you! We changed it.

      Line 342: Myeloid lineage instead of myeloid cell linage.

      Thank you! We changed it.

      Line 815: What does FPKM stand for? The abbreviations should be explained.

      The FPKM is the abbreviation of Fragments Per Kilobase of transcript per Million mapped reads. We added it in the text.

      Line 309: The manuscript has occasionally referred to PLX-5622 without a minus. Please follow a uniform format.

      We changed all “PLX5622” to “PLX-5622”.

      Lines 327-331: should be rewritten.

      The mentioned paragraph was rewritten.

      Lines 335-340: should be rewritten.

      The mentioned sentence was rewritten.

      Line 135: qRT-PCR instead of QPCR," as it is also mentioned in the methods and material. The correction also applies to all the QPCRs in the text.

      We changed “QPCR” with “qRT-PCR”

      Figure 3: Graph B should be right side of graph A

      Images description: It is better to have the images description in the left side of the image, for example, figure 5 part B, GL, IPL and OPL

      Thanks for the suggestion. We changed the image organization as per the reviewer’s advice.

      Lines 258 to 260 in the discussion have also been repeated with the same words in the introduction.

      The mentioned paragraph was rewritten.

      Lines 327-331 should be rewritten.

      The mentioned paragraph was rewritten.

      Lines 335-340 should be rewritten.

      The mentioned paragraph was rewritten.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Comments

      Reviewer 1

      (1) Despite the well-established role of Netrin-1 and UNC5C axon guidance during embryonic commissural axons, it remains unclear which cell type(s) express Netrin-1 or UNC5C in the dopaminergic axons and their targets. For instance, the data in Figure 1F-G and Figure 2 are quite confusing. Does Netrin-1 or UNC5C express in all cell types or only dopamine-positive neurons in these two mouse models? It will also be important to provide quantitative assessments of UNC5C expression in dopaminergic axons at different ages.

      Netrin-1 is a secreted protein and in this manuscript we did not examine what cell types express Netrin-1. This question is not the focus of the study and we consider it irrelevant to the main issue we are addressing, which is where in the forebrain regions we examined Netrin-1+ cells are present. As per the reviewer’s request we include below images showing Netrin-1 protein and Netrin-1 mRNA expression in the forebrain. In Figure 1 below, we show a high magnification immunofluorescent image of a coronal forebrain section showing Netrin-1 protein expression.

      Author response image 1.

      This confocal microscope image shows immunofluorescent staining for Netrin-1 (green) localized around cell nuclei (stained by DAPI in blue). This image was taken from a coronal section of the lateral septum of an adult male mouse. Scale bar = 20µm

      In Figures 2 and 3 below we show low and high magnification images from an RNAscope experiment confirming that cells in the forebrain regions examined express Netrin-1 mRNA.

      Author response image 2.

      This confocal microscope image of a coronal brain section of the medial prefrontal cortex of an adult male mouse shows Netrin-1 mRNA expression (green) and cell nuclei (DAPI, blue). Brain regions are as follows: Cg1: Anterior cingulate cortex 1, DP: dorsopeduncular cortex, fmi: forceps minor of the corpus callosum, IL: Infralimbic Cortex, PrL: Prelimbic Cortex

      Author response image 3.

      A higher resolution image from the same sample as in Figure 2 shows Netrin-1 mRNA (green) and cell nuclei (DAPI; blue). DP = dorsopeduncular cortex

      Regarding UNC5c, this receptor homologue is expressed by dopamine neurons in the rodent ventral tegmental area (Daubaras et al., 2014; Manitt et al., 2010; Phillips et al., 2022). This does not preclude UNC5c expression in other cell types. UNC5c receptors are ubiquitously expressed in the brain throughout development, performing many different developmental functions (Kim and Ackerman, 2011; Murcia-Belmonte et al., 2019; Srivatsa et al., 2014). In this study we are interested in UNC5c expression by dopamine neurons, and particularly by their axons projecting to the nucleus accumbens. We therefore used immunofluorescent staining in the nucleus accumbens, showing UNC5 expression in TH+ axons. This work adds to the study by Manitt et al., 2010, which examined UNC5 expression in the VTA. Manitt et al. used Western blotting to demonstrate that UNC5 expression in VTA dopamine neurons increases during adolescence, as can be seen in the following figure:

      References:

      Daubaras M, Bo GD, Flores C. 2014. Target-dependent expression of the netrin-1 receptor, UNC5C, in projection neurons of the ventral tegmental area. Neuroscience 260:36–46. doi:10.1016/j.neuroscience.2013.12.007

      Kim D, Ackerman SL. 2011. The UNC5C Netrin Receptor Regulates Dorsal Guidance of Mouse Hindbrain Axons. J Neurosci 31:2167–2179. doi:10.1523/jneurosci.5254-10.20110.2011

      Manitt C, Labelle-Dumais C, Eng C, Grant A, Mimee A, Stroh T, Flores C. 2010. Peri-Pubertal Emergence of UNC-5 Homologue Expression by Dopamine Neurons in Rodents. PLoS ONE 5:e11463-14. doi:10.1371/journal.pone.0011463

      Murcia-Belmonte V, Coca Y, Vegar C, Negueruela S, Romero C de J, Valiño AJ, Sala S, DaSilva R, Kania A, Borrell V, Martinez LM, Erskine L, Herrera E. 2019. A Retino-retinal Projection Guided by Unc5c Emerged in Species with Retinal Waves. Current Biology 29:1149-1160.e4. doi:10.1016/j.cub.2019.02.052

      Phillips RA, Tuscher JJ, Black SL, Andraka E, Fitzgerald ND, Ianov L, Day JJ. 2022. An atlas of transcriptionally defined cell populations in the rat ventral tegmental area. Cell Reports 39:110616. doi:10.1016/j.celrep.2022.110616

      Srivatsa S, Parthasarathy S, Britanova O, Bormuth I, Donahoo A-L, Ackerman SL, Richards LJ, Tarabykin V. 2014. Unc5C and DCC act downstream of Ctip2 and Satb2 and contribute to corpus callosum formation. Nat Commun 5:3708. doi:10.1038/ncomms4708

      (2) Figure 1 used shRNA to knockdown Netrin-1 in the Septum and these mice were subjected to behavioral testing. These results, again, are not supported by any valid data that the knockdown approach actually worked in dopaminergic axons. It is also unclear whether knocking down Netrin-1 in the septum will re-route dopaminergic axons or lead to cell death in the dopaminergic neurons in the substantia nigra pars compacta?

      First we want to clarify and emphasize, that our knockdown approach was not designed to knock down Netrin-1 in dopamine neurons or their axons. Our goal was to knock down Netrin-1 expression in cells expressing this guidance cue gene in the dorsal peduncular cortex.

      We have previously established the efficacy of the shRNA Netrin-1 knockdown virus used in this experiment for reducing the expression of Netrin-1 (Cuesta et al., 2020). The shRNA reduces Netrin-1 levels in vitro and in vivo.

      We agree that our experiments do not address the fate of the dopamine axons that are misrouted away from the medial prefrontal cortex. This research is ongoing, and we have now added a note regarding this to our manuscript.

      Our current hypothesis, based on experiments being conducted as part of another line of research in the lab, is that these axons are rerouted to a different brain region which they then ectopically innervate. In these experiments we are finding that male mice exposed to tetrahydrocannabinol in adolescence show reduced dopamine innervation in the medial prefrontal cortex in adulthood but increased dopamine input in the orbitofrontal cortex. In addition, these mice show increased action impulsivity in the Go/No-Go task in adulthood (Capolicchio et al., Society for Neuroscience 2023 Abstracts)

      References:

      Capolicchio T., Hernandez, G., Dube, E., Estrada, K., Giroux, M., Flores, C. (2023) Divergent outcomes of delta 9 - tetrahydrocannabinol in adolescence on dopamine and cognitive development in male and female mice. Society for Neuroscience, Washington, DC, United States [abstract].

      Cuesta S, Nouel D, Reynolds LM, Morgunova A, Torres-Berrío A, White A, Hernandez G, Cooper HM, Flores C. 2020. Dopamine Axon Targeting in the Nucleus Accumbens in Adolescence Requires Netrin-1. Frontiers Cell Dev Biology 8:487. doi:10.3389/fcell.2020.00487

      (3) Another issue with Figure1J. It is unclear whether the viruses were injected into a WT mouse model or into a Cre-mouse model driven by a promoter specifically expresses in dorsal peduncular cortex? The authors should provide evidence that Netrin-1 mRNA and proteins are indeed significantly reduced. The authors should address the anatomic results of the area of virus diffusion to confirm the virus specifically infected the cells in dorsal peduncular cortex.

      All the virus knockdown experiments were conducted in wild type mice, we added this information to Figure 1k.

      The efficacy of the shRNA in knocking down Netrin-1 was demonstrated by Cuesta et al. (2020) both in vitro and in vivo, as we show in our response to the reviewer’s previous comment above.

      We also now provide anatomical images demonstrating the localization of the injection and area of virus diffusion in the mouse forebrain. In Author response image 4 below the area of virus diffusion is visible as green fluorescent signal.

      Author response image 4.

      Fluorescent microscopy image of a mouse forebrain demonstrating the localization of the injection of a virus to knock down Netrin-1. The location of the virus is in green, while cell nuclei are in blue (DAPI). Abbreviations: DP: dorsopeduncular cortex IL: infralimbic cortex

      References:

      Cuesta S, Nouel D, Reynolds LM, Morgunova A, Torres-Berrío A, White A, Hernandez G, Cooper HM, Flores C. 2020. Dopamine Axon Targeting in the Nucleus Accumbens in Adolescence Requires Netrin-1. Frontiers Cell Dev Biology 8:487. doi:10.3389/fcell.2020.00487

      (4) The authors need to provide information regarding the efficiency and duration of knocking down. For instance, in Figure 1K, the mice were tested after 53 days post injection, can the virus activity in the brain last for such a long time?

      In our study we are interested in the role of Netrin-1 expression in the guidance of dopamine axons from the nucleus accumbens to the medial prefrontal cortex. The critical window for these axons leaving the nucleus accumbens and growing to the cortex is early adolescence (Reynolds et al., 2018b). This is why we injected the virus at the onset of adolescence, at postnatal day 21. As dopamine axons grow from the nucleus accumbens to the prefrontal cortex, they pass through the dorsal peduncular cortex. We disrupted Netrin-1 expression at this point along their route to determine whether it is the Netrin-1 present along their route that guides these axons to the prefrontal cortex. We hypothesized that the shRNA Netrin-1 virus would disrupt the growth of the dopamine axons, reducing the number of axons that reach the prefrontal cortex and therefore the number of axons that innervate this region in adulthood.

      We conducted our behavioural tests during adulthood, after the critical window during which dopamine axon growth occurs, so as to observe the enduring behavioral consequences of this misrouting. This experimental approach is designed for the shRNa Netrin-1 virus to be expressed in cells in the dorsopeduncular cortex when the dopamine axons are growing, during adolescence.

      References:

      Capolicchio T., Hernandez, G., Dube, E., Estrada, K., Giroux, M., Flores, C. (2023) Divergent outcomes of delta 9 - tetrahydrocannabinol in adolescence on dopamine and cognitive development in male and female mice. Society for Neuroscience, Washington, DC, United States [abstract].

      Reynolds LM, Yetnikoff L, Pokinko M, Wodzinski M, Epelbaum JG, Lambert LC, Cossette M-P, Arvanitogiannis A, Flores C. 2018b. Early Adolescence is a Critical Period for the Maturation of Inhibitory Behavior. Cerebral cortex 29:3676–3686. doi:10.1093/cercor/bhy247

      (5) In Figure 1N-Q, silencing Netrin-1 results in less DA axons targeting to infralimbic cortex, but why the Netrin-1 knocking down mice revealed the improved behavior?

      This is indeed an intriguing finding, and we have now added a mention of it to our manuscript. We have demonstrated that misrouting dopamine axons away from the medial prefrontal cortex during adolescence alters behaviour, but why this improves their action impulsivity ability is something currently unknown to us. One potential answer is that the dopamine axons are misrouted to a different brain region that is also involved in controlling impulsive behaviour, perhaps the dorsal striatum (Kim and Im, 2019) or the orbital prefrontal cortex (Jonker et al., 2015).

      We would also like to note that we are finding that other manipulations that appear to reroute dopamine axons to unintended targets can lead to reduced action impulsivity as measured using the Go No Go task. As we mentioned above, current experiments in the lab, which are part of a different line of research, are showing that male mice exposed to tetrahydrocannabinol in adolescence show reduced dopamine innervation in the medial prefrontal cortex in adulthood, but increased dopamine input in the orbitofrontal cortex. In addition, these mice show increased action impulsivity in the Go/No-Go task in adulthood (Capolicchio et al., Society for Neuroscience 2023 Abstracts)

      References

      Capolicchio T., Hernandez, G., Dube, E., Estrada, K., Giroux, M., Flores, C. (2023) Divergent outcomes of delta 9 - tetrahydrocannabinol in adolescence on dopamine and cognitive development in male and female mice. Society for Neuroscience, Washington, DC, United States [abstract].

      Jonker FA, Jonker C, Scheltens P, Scherder EJA. 2015. The role of the orbitofrontal cortex in cognition and behavior. Rev Neurosci 26:1–11. doi:10.1515/revneuro2014-0043 Kim B, Im H. 2019. The role of the dorsal striatum in choice impulsivity. Ann N York Acad Sci 1451:92–111. doi:10.1111/nyas.13961

      (6) What is the effect of knocking down UNC5C on dopamine axons guidance to the cortex?

      We have found that mice that are heterozygous for a nonsense Unc5c mutation, and as a result have reduced levels of UNC5c protein, show reduced amphetamine-induced locomotion and stereotypy (Auger et al., 2013). In the same manuscript we show that this effect only emerges during adolescence, in concert with the growth of dopamine axons to the prefrontal cortex. This is indirect but strong evidence that UNC5c receptors are necessary for correct adolescent dopamine axon development.

      References

      Auger ML, Schmidt ERE, Manitt C, Dal-Bo G, Pasterkamp RJ, Flores C. 2013. unc5c haploinsufficient phenotype: striking similarities with the dcc haploinsufficiency model. European Journal of Neuroscience 38:2853–2863. doi:10.1111/ejn.12270

      (7) In Figures 2-4, the authors only showed the amount of DA axons and UNC5C in NAcc. However, it remains unclear whether these experiments also impact the projections of dopaminergic axons to other brain regions, critical for the behavioral phenotypes. What about other brain regions such as prefrontal cortex? Do the projection of DA axons and UNC5c level in cortex have similar pattern to those in NAcc?

      UNC5c receptors are expressed throughout development and are involved in many developmental processes (Kim and Ackerman, 2011; Murcia-Belmonte et al., 2019; Srivatsa et al., 2014). We cannot say whether the pattern we observe here is unique to the nucleus accumbens, but it is certainly not universal throughout the brain.

      The brain region we focus on in our manuscript, in addition to the nucleus accumbens, is the medial prefrontal cortex. Close and thorough examination of the prefrontal cortices of adult mice revealed practically no UNC5c expression by dopamine axons. However, we did observe very rare cases of dopamine axons expressing UNC5c. It is not clear whether these rare cases are present before or during adolescence.

      Below is a representative set of images of this observation, which is now also included as Supplementary Figure 4:

      Author response image 5.

      Expression of UNC5c protein in the medial prefrontal cortex of an adult male mouse. Low (A) and high (B) magnification images demonstrate that there is little UNC5c expression in dopamine axons in the medial prefrontal cortex. Here we identify dopamine axons by immunofluorescent staining for tyrosine hydroxylase (TH, see our response to comment #9 regarding the specificity of the TH antibody for dopamine axons in the prefrontal cortex). This figure is also included as Supplementary Figure 4 in the manuscript. Abbreviations: fmi: forceps minor of the corpus callosum, mPFC: medial prefrontal cortex.

      References:

      Kim D, Ackerman SL. 2011. The UNC5C Netrin Receptor Regulates Dorsal Guidance of Mouse Hindbrain Axons. J Neurosci 31:2167–2179. doi:10.1523/jneurosci.5254- 10.20110.2011

      Murcia-Belmonte V, Coca Y, Vegar C, Negueruela S, Romero C de J, Valiño AJ, Sala S, DaSilva R, Kania A, Borrell V, Martinez LM, Erskine L, Herrera E. 2019. A Retino-retinal Projection Guided by Unc5c Emerged in Species with Retinal Waves. Current Biology 29:1149-1160.e4. doi:10.1016/j.cub.2019.02.052

      Srivatsa S, Parthasarathy S, Britanova O, Bormuth I, Donahoo A-L, Ackerman SL, Richards LJ, Tarabykin V. 2014. Unc5C and DCC act downstream of Ctip2 and Satb2 and contribute to corpus callosum formation. Nat Commun 5:3708. doi:10.1038/ncomms4708

      (8) Can overexpression of UNC5c or Netrin-1 in male winter hamsters mimic the observations in summer hamsters? Or overexpression of UNC5c in female summer hamsters to mimic the winter hamster? This would be helpful to confirm the causal role of UNC5C in guiding DA axons during adolescence.

      This is an excellent question. We are very interested in both increasing and decreasing UNC5c expression in hamster dopamine axons to see if we can directly manipulate summer hamsters into winter hamsters and vice versa. We are currently exploring virus-based approaches to design these experiments and are excited for results in this area.

      (9) The entire study relied on using tyrosine hydroxylase (TH) as a marker for dopaminergic axons. However, the expression of TH (either by IHC or IF) can be influenced by other environmental factors, that could alter the expression of TH at the cellular level.

      This is an excellent point that we now carefully address in our methods by adding the following:

      In this study we pay great attention to the morphology and localization of the fibres from which we quantify varicosities to avoid counting any fibres stained with TH antibodies that are not dopamine fibres. The fibres that we examine and that are labelled by the TH antibody show features indistinguishable from the classic features of cortical dopamine axons in rodents (Berger et al., 1974; 1983; Van Eden et al., 1987; Manitt et al., 2011), namely they are thin fibres with irregularly-spaced varicosities, are densely packed in the nucleus accumbens, sparsely present only in the deep layers of the prefrontal cortex, and are not regularly oriented in relation to the pial surface. This is in contrast to rodent norepinephrine fibres, which are smooth or beaded in appearance, relatively thick with regularly spaced varicosities, increase in density towards the shallow cortical layers, and are in large part oriented either parallel or perpendicular to the pial surface (Berger et al., 1974; Levitt and Moore, 1979; Berger et al., 1983; Miner et al., 2003). Furthermore, previous studies in rodents have noted that only norepinephrine cell bodies are detectable using immunofluorescence for TH, not norepinephrine processes (Pickel et al., 1975; Verney et al., 1982; Miner et al., 2003), and we did not observe any norepinephrine-like fibres.

      Furthermore, we are not aware of any other processes in the forebrain that are known to be immunopositive for TH under any environmental conditions.

      To reduce confusion, we have replaced the abbreviation for dopamine – DA – with TH in the relevant panels in Figures 1, 2, 3, and 4 to clarify exactly what is represented in these images. As can be seen in these images, fluorescent green labelling is present only in axons, which is to be expected of dopamine labelling in these forebrain regions.

      References:

      Berger B, Tassin JP, Blanc G, Moyne MA, Thierry AM (1974) Histochemical confirmation for dopaminergic innervation of the rat cerebral cortex after destruction of the noradrenergic ascending pathways. Brain Res 81:332–337.

      Berger B, Verney C, Gay M, Vigny A (1983) Immunocytochemical Characterization of the Dopaminergic and Noradrenergic Innervation of the Rat Neocortex During Early Ontogeny. In: Proceedings of the 9th Meeting of the International Neurobiology Society, pp 263–267 Progress in Brain Research. Elsevier.

      Levitt P, Moore RY (1979) Development of the noradrenergic innervation of neocortex. Brain Res 162:243–259.

      Manitt C, Mimee A, Eng C, Pokinko M, Stroh T, Cooper HM, Kolb B, Flores C (2011) The Netrin Receptor DCC Is Required in the Pubertal Organization of Mesocortical Dopamine Circuitry. J Neurosci 31:8381–8394.

      Miner LH, Schroeter S, Blakely RD, Sesack SR (2003) Ultrastructural localization of the norepinephrine transporter in superficial and deep layers of the rat prelimbic prefrontal cortex and its spatial relationship to probable dopamine terminals. J Comp Neurol 466:478–494.

      Pickel VM, Joh TH, Field PM, Becker CG, Reis DJ (1975) Cellular localization of tyrosine hydroxylase by immunohistochemistry. J Histochem Cytochem 23:1–12.

      Van Eden CG, Hoorneman EM, Buijs RM, Matthijssen MA, Geffard M, Uylings HBM (1987) Immunocytochemical localization of dopamine in the prefrontal cortex of the rat at the light and electron microscopical level. Neurosci 22:849–862.

      Verney C, Berger B, Adrien J, Vigny A, Gay M (1982) Development of the dopaminergic innervation of the rat cerebral cortex. A light microscopic immunocytochemical study using anti-tyrosine hydroxylase antibodies. Dev Brain Res 5:41–52.

      (10) Are Netrin-1/UNC5C the only signal guiding dopamine axon during adolescence? Are there other neuronal circuits involved in this process?

      Our intention for this study was to examine the role of Netrin-1 and its receptor UNC5C specifically, but we do not suggest that they are the only molecules to play a role. The process of guiding growing dopamine axons during adolescence is likely complex and we expect other guidance mechanisms to also be involved. From our previous work we know that the Netrin-1 receptor DCC is critical in this process (Hoops and Flores, 2017; Reynolds et al., 2023). Several other molecules have been identified in Netrin-1/DCC signaling processes that control corpus callosum development and there is every possibility that the same or similar molecules may be important in guiding dopamine axons (Schlienger et al., 2023).

      References:

      Hoops D, Flores C. 2017. Making Dopamine Connections in Adolescence. Trends in Neurosciences 1–11. doi:10.1016/j.tins.2017.09.004

      Reynolds LM, Hernandez G, MacGowan D, Popescu C, Nouel D, Cuesta S, Burke S, Savell KE, Zhao J, Restrepo-Lozano JM, Giroux M, Israel S, Orsini T, He S, Wodzinski M, Avramescu RG, Pokinko M, Epelbaum JG, Niu Z, Pantoja-Urbán AH, Trudeau L-É, Kolb B, Day JJ, Flores C. 2023. Amphetamine disrupts dopamine axon growth in adolescence by a sex-specific mechanism in mice. Nat Commun 14:4035. doi:10.1038/s41467-023-39665-1

      Schlienger S, Yam PT, Balekoglu N, Ducuing H, Michaud J-F, Makihara S, Kramer DK, Chen B, Fasano A, Berardelli A, Hamdan FF, Rouleau GA, Srour M, Charron F. 2023. Genetics of mirror movements identifies a multifunctional complex required for Netrin-1 guidance and lateralization of motor control. Sci Adv 9:eadd5501. doi:10.1126/sciadv.add5501

      (11) Finally, despite the authors' claim that the dopaminergic axon project is sensitive to the duration of daylight in the hamster, they never provided definitive evidence to support this hypothesis.

      By “definitive evidence” we think that the reviewer is requesting a single statistical model including measures from both the summer and winter groups. Such a model would provide a probability estimate of whether dopamine axon growth is sensitive to daylight duration. Therefore, we ran these models, one for male hamsters and one for female hamsters.

      In both sexes we find a significant effect of daylength on dopamine innervation, interacting with age. Male age by daylength interaction: F = 6.383, p = 0.00242. Female age by daylength interaction: F = 21.872, p = 1.97 x 10-9. The full statistical analysis is available as a supplement to this letter (Response_Letter_Stats_Details.docx).

      Reviewer 3

      (1) Fig 1 A and B don't appear to be the same section level.

      The reviewer is correct that Fig 1B is anterior to Fig 1A. We have changed Figure 1A to match the section level of Figure 1B.

      (2) Fig 1C. It is not clear that these axons are crossing from the shell of the NAC.

      We have added a dashed line to Figure 1C to highlight the boundary of the nucleus accumbens, which hopefully emphasizes that there are fibres crossing the boundary. We also include here an enlarged image of this panel:

      Author response image 6.

      An enlarged image of Figure1c in the manuscript. The nucleus accumbens (left of the dotted line) is densely packed with TH+ axons (in green). Some of these TH+ axons can be observed extending from the nucleus accumbens medially towards a region containing dorsally oriented TH+ fibres (white arrows).

      (3) Fig 1. Measuring width of the bundle is an odd way to measure DA axon numbers. First the width could be changing during adult for various reasons including change in brain size. Second, I wouldn't consider these axons in a traditional bundle. Third, could DA axon counts be provided, rather than these proxy measures.

      With regards to potential changes in brain size, we agree that this could have potentially explained the increased width of the dopamine axon pathway. That is why it was important for us to use stereology to measure the density of dopamine axons within the pathway. If the width increased but no new axons grew along the pathway, we would have seen a decrease in axon density from adolescence to adulthood. Instead, our results show that the density of axons remained constant.

      We agree with the reviewer that the dopamine axons do not form a traditional “bundle”. Therefore, throughout the manuscript we now avoid using the term bundle.

      Although we cannot count every single axon, an accurate estimate of this number can be obtained using stereology, an unbiassed method for efficiently quantifying large, irregularly distributed objects. We used stereology to count TH+ axons in an unbiased subset of the total area occupied by these axons. Unbiased stereology is the gold-standard technique for estimating populations of anatomical objects, such as axons, that are so numerous that it would be impractical or impossible to measure every single one. Here and elsewhere we generally provide results as densities and areas of occupancy (Reynolds et al., 2022). To avoid confusion, we now clarify that we are counting the width of the area that dopamine axons occupy (rather than the dopamine axon “bundle”).

      References:

      Reynolds LM, Pantoja-Urbán AH, MacGowan D, Manitt C, Nouel D, Flores C. 2022. Dopaminergic System Function and Dysfunction: Experimental Approaches. Neuromethods 31–63. doi:10.1007/978-1-0716-2799-0_2

      (4) TH in the cortex could also be of noradrenergic origin. This needs to be ruled out to score DA axons

      This is the same comment as Reviewer 1 #9. Please see our response below, which we have also added to our methods:

      In this study we pay great attention to the morphology and localization of the fibres from which we quantify varicosities to avoid counting any fibres stained with TH antibodies that are not dopamine fibres. The fibres that we examine and that are labelled by the TH antibody show features indistinguishable from the classic features of cortical dopamine axons in rodents (Berger et al., 1974; 1983; Van Eden et al., 1987; Manitt et al., 2011), namely they are thin fibres with irregularly-spaced varicosities, are densely packed in the nucleus accumbens, sparsely present only in the deep layers of the prefrontal cortex, and are not regularly oriented in relation to the pial surface. This is in contrast to rodent norepinephrine fibres, which are smooth or beaded in appearance, relatively thick with regularly spaced varicosities, increase in density towards the shallow cortical layers, and are in large part oriented either parallel or perpendicular to the pial surface (Berger et al., 1974; Levitt and Moore, 1979; Berger et al., 1983; Miner et al., 2003). Furthermore, previous studies in rodents have noted that only norepinephrine cell bodies are detectable using immunofluorescence for TH, not norepinephrine processes (Pickel et al., 1975; Verney et al., 1982; Miner et al., 2003), and we did not observe any norepinephrine-like fibres.

      References:

      Berger B, Tassin JP, Blanc G, Moyne MA, Thierry AM (1974) Histochemical confirmation for dopaminergic innervation of the rat cerebral cortex after destruction of the noradrenergic ascending pathways. Brain Res 81:332–337.

      Berger B, Verney C, Gay M, Vigny A (1983) Immunocytochemical Characterization of the Dopaminergic and Noradrenergic Innervation of the Rat Neocortex During Early Ontogeny. In: Proceedings of the 9th Meeting of the International Neurobiology Society, pp 263–267 Progress in Brain Research. Elsevier.

      Levitt P, Moore RY (1979) Development of the noradrenergic innervation of neocortex. Brain Res 162:243–259.

      Manitt C, Mimee A, Eng C, Pokinko M, Stroh T, Cooper HM, Kolb B, Flores C (2011) The Netrin Receptor DCC Is Required in the Pubertal Organization of Mesocortical Dopamine Circuitry. J Neurosci 31:8381–8394.

      Miner LH, Schroeter S, Blakely RD, Sesack SR (2003) Ultrastructural localization of the norepinephrine transporter in superficial and deep layers of the rat prelimbic prefrontal cortex and its spatial relationship to probable dopamine terminals. J Comp Neurol 466:478–494.

      Pickel VM, Joh TH, Field PM, Becker CG, Reis DJ (1975) Cellular localization of tyrosine hydroxylase by immunohistochemistry. J Histochem Cytochem 23:1–12.

      Van Eden CG, Hoorneman EM, Buijs RM, Matthijssen MA, Geffard M, Uylings HBM (1987) Immunocytochemical localization of dopamine in the prefrontal cortex of the rat at the light and electron microscopical level. Neurosci 22:849–862.

      Verney C, Berger B, Adrien J, Vigny A, Gay M (1982) Development of the dopaminergic innervation of the rat cerebral cortex. A light microscopic immunocytochemical study using anti-tyrosine hydroxylase antibodies. Dev Brain Res 5:41–52.

      (5) Netrin staining should be provided with NeuN + DAPI; its not clear these are all cell bodies. An in situ of Netrin would help as well.

      A similar comment was raised by Reviewer 1 in point #1. Please see below the immunofluorescent and RNA scope images showing expression of Netrin-1 protein and mRNA in the forebrain.

      Author response image 7.

      This confocal microscope image shows immunofluorescent staining for Netrin-1 (green) localized around cell nuclei (stained by DAPI in blue). This image was taken from a coronal section of the lateral septum of an adult male mouse. Scale bar = 20µm

      Author response image 8.

      This confocal microscope image of a coronal brain section of the medial prefrontal cortex of an adult male mouse shows Netrin-1 mRNA expression (green) and cell nuclei (DAPI, blue). RNAscope was used to generate this image. Brain regions are as follows: Cg1: Anterior cingulate cortex 1, DP: dorsopeduncular cortex, IL: Infralimbic Cortex, PrL: Prelimbic Cortex, fmi: forceps minor of the corpus callosum

      Author response image 9.

      A higher resolution image from the same sample as in Figure 2 shows Netrin-1 mRNA (green) and cell nuclei (DAPI; blue). DP = dorsopeduncular cortex

      (6) The Netrin knockdown needs validation. How strong was the knockdown etc?

      This comment was also raised by Reviewer 1 #1.

      We have previously established the efficacy of the shRNA Netrin-1 knockdown virus used in this experiment for reducing the expression of Netrin-1 (Cuesta et al., 2020). The shRNA reduces Netrin-1 levels in vitro and in vivo.

      References:

      Cuesta S, Nouel D, Reynolds LM, Morgunova A, Torres-Berrío A, White A, Hernandez G, Cooper HM, Flores C. 2020. Dopamine Axon Targeting in the Nucleus Accumbens in Adolescence Requires Netrin-1. Frontiers Cell Dev Biology 8:487. doi:10.3389/fcell.2020.00487

      (7) If the conclusion that knocking down Netrin in cortex decreases DA innervation of the IL, how can that be reconciled with Netrin-Unc repulsion.

      This is an intriguing question and one that we are in the planning stages of addressing with new experiments.

      Although we do not have a mechanistic answered for how a repulsive receptor helps guide these axons, we would like to note that previous indirect evidence from a study by our group also suggests that reducing UNC5c signaling in dopamine axons in adolescence increases dopamine innervation to the prefrontal cortex (Auger et al, 2013).

      References

      Auger ML, Schmidt ERE, Manitt C, Dal-Bo G, Pasterkamp RJ, Flores C. 2013. unc5c haploinsufficient phenotype: striking similarities with the dcc haploinsufficiency model. European Journal of Neuroscience 38:2853–2863. doi:10.1111/ejn.12270

      (8) The behavioral phenotype in Fig 1 is interesting, but its not clear if its related to DA axons/signaling. IN general, no evidence in this paper is provided for the role of DA in the adolescent behaviors described.

      We agree with the reviewer that the behaviours we describe in adult mice are complex and are likely to involve several neurotransmitter systems. However, there is ample evidence for the role of dopamine signaling in cognitive control behaviours (Bari and Robbins, 2013; Eagle et al., 2008; Ott et al., 2023) and our published work has shown that alterations in the growth of dopamine axons to the prefrontal cortex leads to changes in impulse control as measured via the Go/No-Go task in adulthood (Reynolds et al., 2023, 2018a; Vassilev et al., 2021).

      The other adolescent behaviour we examined was risk-like taking behaviour in male and female hamsters (Figures 4 and 5), as a means of characterizing maturation in this behavior over time. We decided not to use the Go/No-Go task because as far as we know, this has never been employed in Siberian Hamsters and it will be difficult to implement. Instead, we chose the light/dark box paradigm, which requires no training and is ideal for charting behavioural changes over short time periods. Indeed, risk-like taking behavior in rodents and in humans changes from adolescence to adulthood paralleling changes in prefrontal cortex development, including the gradual input of dopamine axons to this region.

      References:

      Bari A, Robbins TW. 2013. Inhibition and impulsivity: Behavioral and neural basis of response control. Progress in neurobiology 108:44–79. doi:10.1016/j.pneurobio.2013.06.005

      Eagle DM, Bari A, Robbins TW. 2008. The neuropsychopharmacology of action inhibition: cross-species translation of the stop-signal and go/no-go tasks. Psychopharmacology 199:439–456. doi:10.1007/s00213-008-1127-6

      Ott T, Stein AM, Nieder A. 2023. Dopamine receptor activation regulates reward expectancy signals during cognitive control in primate prefrontal neurons. Nat Commun 14:7537. doi:10.1038/s41467-023-43271-6

      Reynolds LM, Hernandez G, MacGowan D, Popescu C, Nouel D, Cuesta S, Burke S, Savell KE, Zhao J, Restrepo-Lozano JM, Giroux M, Israel S, Orsini T, He S, Wodzinski M, Avramescu RG, Pokinko M, Epelbaum JG, Niu Z, Pantoja-Urbán AH, Trudeau L-É, Kolb B, Day JJ, Flores C. 2023. Amphetamine disrupts dopamine axon growth in adolescence by a sex-specific mechanism in mice. Nat Commun 14:4035. doi:10.1038/s41467-023-39665-1

      Reynolds LM, Pokinko M, Torres-Berrío A, Cuesta S, Lambert LC, Pellitero EDC, Wodzinski M, Manitt C, Krimpenfort P, Kolb B, Flores C. 2018a. DCC Receptors Drive Prefrontal Cortex Maturation by Determining Dopamine Axon Targeting in Adolescence. Biological psychiatry 83:181–192. doi:10.1016/j.biopsych.2017.06.009

      Vassilev P, Pantoja-Urban AH, Giroux M, Nouel D, Hernandez G, Orsini T, Flores C. 2021. Unique effects of social defeat stress in adolescent male mice on the Netrin-1/DCC pathway, prefrontal cortex dopamine and cognition (Social stress in adolescent vs. adult male mice). Eneuro ENEURO.0045-21.2021. doi:10.1523/eneuro.0045-21.2021

      (9) Fig2 - boxes should be drawn on the NAc diagram to indicate sampled regions. Some quantification of Unc5c would be useful. Also, some validation of the Unc5c antibody would be nice.

      The images presented were taken medial to the anterior commissure and we have edited Figure 2 to show this. However, we did not notice any intra-accumbens variation, including between the core and the shell. Therefore, the images are representative of what was observed throughout the entire nucleus accumbens.

      To quantify UNC5c in the accumbens we conducted a Western blot experiment in male mice at different ages. A one-way ANOVA analyzing band intensity (relative to the 15-day-old average band intensity) as the response variable and age as the predictor variable showed a significant effect of age (F=5.615, p=0.01). Posthoc analysis revealed that 15-day-old mice have less UNC5c in the nucleus accumbens compared to 21- and 35-day-old mice.

      Author response image 10.

      The graph depicts the results of a Western blot experiment of UNC5c protein levels in the nucleus accumbens of male mice at postnatal days 15, 21 or 35 and reveals a significant increase in protein levels at the onset adolescence.

      Our methods for this Western blot were as follows: Samples were prepared as previously (Torres-Berrío et al., 2017). Briefly, mice were sacrificed by live decapitation and brains were flash frozen in heptane on dry ice for 10 seconds. Frozen brains were mounted in a cryomicrotome and two 500um sections were collected for the nucleus accumbens, corresponding to plates 14 and 18 of the Paxinos mouse brain atlas. Two tissue core samples were collected per section, one for each side of the brain, using a 15-gauge tissue corer (Fine surgical tools Cat no. NC9128328) and ejected in a microtube on dry ice. The tissue samples were homogenized in 100ul of standard radioimmunoprecipitation assay buffer using a handheld electric tissue homogenizer. The samples were clarified by centrifugation at 4C at a speed of 15000g for 30 minutes. Protein concentration was quantified using a bicinchoninic acid assay kit (Pierce BCA protein assay kit, Cat no.PI23225) and denatured with standard Laemmli buffer for 5 minutes at 70C. 10ug of protein per sample was loaded and run by SDS-PAGE gel electrophoresis in a Mini-PROTEAN system (Bio-Rad) on an 8% acrylamide gel by stacking for 30 minutes at 60V and resolving for 1.5 hours at 130V. The proteins were transferred to a nitrocellulose membrane for 1 hour at 100V in standard transfer buffer on ice. The membranes were blocked using 5% bovine serum albumin dissolved in tris-buffered saline with Tween 20 and probed with primary (UNC5c, Abcam Cat. no ab302924) and HRP-conjugated secondary antibodies for 1 hour. a-tubulin was probed and used as loading control. The probed membranes were resolved using SuperSignal West Pico PLUS chemiluminescent substrate (ThermoFisher Cat no.34579) in a ChemiDoc MP Imaging system (Bio-Rad). Band intensity was quantified using the ChemiDoc software and all ages were normalized to the P15 age group average.

      Validation of the UNC5c antibody was performed in the lab of Dr. Liu, from whom it was kindly provided. Briefly, in the validation study the authors showed that the anti-UNC5C antibody can detect endogenous UNC5C expression and the level of UNC5C is dramatically reduced after UNC5C knockdown. The antibody can also detect the tagged-UNC5C protein in several cell lines, which was confirmed by a tag antibody (Purohit et al., 2012; Shao et al., 2017).

      References:

      Purohit AA, Li W, Qu C, Dwyer T, Shao Q, Guan K-L, Liu G. 2012. Down Syndrome Cell Adhesion Molecule (DSCAM) Associates with Uncoordinated-5C (UNC5C) in Netrin-1mediated Growth Cone Collapse. The Journal of biological chemistry 287:27126–27138. doi:10.1074/jbc.m112.340174

      Shao Q, Yang T, Huang H, Alarmanazi F, Liu G. 2017. Uncoupling of UNC5C with Polymerized TUBB3 in Microtubules Mediates Netrin-1 Repulsion. J Neurosci 37:5620–5633. doi:10.1523/jneurosci.2617-16.2017

      (10) "In adolescence, dopamine neurons begin to express the repulsive Netrin-1 receptor UNC5C, and reduction in UNC5C expression appears to cause growth of mesolimbic dopamine axons to the prefrontal cortex".....This is confusing. Figure 2 shows a developmental increase in UNc5c not a decrease. So when is the "reduction in Unc5c expression" occurring?

      We apologize for the mistake in this sentence. We have corrected the relevant passage in our manuscript as follows:

      In adolescence, dopamine neurons begin to express the repulsive Netrin-1 receptor UNC5C, particularly when mesolimbic and mesocortical dopamine projections segregate in the nucleus accumbens (Manitt et al., 2010; Reynolds et al., 2018a). In contrast, dopamine axons in the prefrontal cortex do not express UNC5c except in very rare cases (Supplementary Figure 4). In adult male mice with Unc5c haploinsufficiency, there appears to be ectopic growth of mesolimbic dopamine axons to the prefrontal cortex (Auger et al., 2013). This miswiring is associated with alterations in prefrontal cortex-dependent behaviours (Auger et al., 2013).

      References:

      Auger ML, Schmidt ERE, Manitt C, Dal-Bo G, Pasterkamp RJ, Flores C. 2013. unc5c haploinsufficient phenotype: striking similarities with the dcc haploinsufficiency model. European Journal of Neuroscience 38:2853–2863. doi:10.1111/ejn.12270

      Manitt C, Labelle-Dumais C, Eng C, Grant A, Mimee A, Stroh T, Flores C. 2010. Peri-Pubertal Emergence of UNC-5 Homologue Expression by Dopamine Neurons in Rodents. PLoS ONE 5:e11463-14. doi:10.1371/journal.pone.0011463

      Reynolds LM, Pokinko M, Torres-Berrío A, Cuesta S, Lambert LC, Pellitero EDC, Wodzinski M, Manitt C, Krimpenfort P, Kolb B, Flores C. 2018a. DCC Receptors Drive Prefrontal Cortex Maturation by Determining Dopamine Axon Targeting in Adolescence. Biological psychiatry 83:181–192. doi:10.1016/j.biopsych.2017.06.009

      (11) In Fig 3, a statistical comparison should be made between summer male and winter male, to justify the conclusions that the winter males have delayed DA innervation.

      This analysis was also suggested by Reviewer 1, #11. Here is our response:

      We analyzed the summer and winter data together in ANOVAs separately for males and females. In both sexes we find a significant effect of daylength on dopamine innervation, interacting with age. Male age by daylength interaction: F = 6.383, p = 0.00242. Female age by daylength interaction: F = 21.872, p = 1.97 x 10-9. The full statistical analysis is available as a supplement to this letter (Response_Letter_Stats_Details.docx).

      (12) Should axon length also be measured here (Fig 3)? It is not clear why the authors have switched to varicosity density. Also, a box should be drawn in the NAC cartoon to indicate the region that was sampled.

      It is untenable to quantify axon length in the prefrontal cortex as we cannot distinguish independent axons. Rather, they are “tangled”; they twist and turn in a multitude of directions as they make contact with various dendrites. Furthermore, they branch extensively. It would therefore be impossible to accurately quantify the number of axons. Using unbiased stereology to quantify varicosities is a valid, well-characterized and straightforward alternative (Reynolds et al., 2022).

      References:

      Reynolds LM, Pantoja-Urbán AH, MacGowan D, Manitt C, Nouel D, Flores C. 2022. Dopaminergic System Function and Dysfunction: Experimental Approaches. Neuromethods 31–63. doi:10.1007/978-1-0716-2799-0_2

      (13) In Fig 3, Unc5c should be quantified to bolster the interesting finding that Unc5c expression dynamics are different between summer and winter hamsters. Unc5c mRNA experiments would also be important to see if similar changes are observed at the transcript level.

      We agree that it would be very interesting to see how UNC5c mRNA and protein levels change over time in summer and winter hamsters, both in males, as the reviewer suggests here, and in females. We are working on conducting these experiments in hamsters as part of a broader expansion of our research in this area. These experiments will require a lengthy amount of time and at this point we feel that they are beyond the scope of this manuscript.

      (14) Fig 4. The peak in exploratory behavior in winter females is counterintuitive and needs to be better discussed. IN general, the light dark behavior seems quite variable.

      This is indeed a very interesting finding, which we have expanded upon in our manuscript as follows:

      When raised under a winter-mimicking daylength, hamsters of either sex show a protracted peak in risk taking. In males, it is delayed beyond 80 days old, but the delay is substantially less in females. This is a counterintuitive finding considering that dopamine development in winter females appears to be accelerated. Our interpretation of this finding is that the timing of the risk-taking peak in females may reflect a balance between different adolescent developmental processes. The fact that dopamine axon growth is accelerated does not imply that all adolescent maturational processes are accelerated. Some may be delayed, for example those that induce axon pruning in the cortex. The timing of the risk-taking peak in winter female hamsters may therefore reflect the amalgamation of developmental processes that are advanced with those that are delayed – producing a behavioural effect that is timed somewhere in the middle. Disentangling the effects of different developmental processes on behaviour will require further experiments in hamsters, including the direct manipulation of dopamine activity in the nucleus accumbens and prefrontal cortex.

      Full Reference List

      Auger ML, Schmidt ERE, Manitt C, Dal-Bo G, Pasterkamp RJ, Flores C. 2013. unc5c haploinsufficient phenotype: striking similarities with the dcc haploinsufficiency model. European Journal of Neuroscience 38:2853–2863. doi:10.1111/ejn.12270

      Bari A, Robbins TW. 2013. Inhibition and impulsivity: Behavioral and neural basis of response control. Progress in neurobiology 108:44–79. doi:10.1016/j.pneurobio.2013.06.005

      Cuesta S, Nouel D, Reynolds LM, Morgunova A, Torres-Berrío A, White A, Hernandez G, Cooper HM, Flores C. 2020. Dopamine Axon Targeting in the Nucleus Accumbens in Adolescence Requires Netrin-1. Frontiers Cell Dev Biology 8:487. doi:10.3389/fcell.2020.00487

      Daubaras M, Bo GD, Flores C. 2014. Target-dependent expression of the netrin-1 receptor, UNC5C, in projection neurons of the ventral tegmental area. Neuroscience 260:36–46. doi:10.1016/j.neuroscience.2013.12.007

      Eagle DM, Bari A, Robbins TW. 2008. The neuropsychopharmacology of action inhibition: crossspecies translation of the stop-signal and go/no-go tasks. Psychopharmacology 199:439– 456. doi:10.1007/s00213-008-1127-6

      Hoops D, Flores C. 2017. Making Dopamine Connections in Adolescence. Trends in Neurosciences 1–11. doi:10.1016/j.tins.2017.09.004

      Jonker FA, Jonker C, Scheltens P, Scherder EJA. 2015. The role of the orbitofrontal cortex in cognition and behavior. Rev Neurosci 26:1–11. doi:10.1515/revneuro-2014-0043

      Kim B, Im H. 2019. The role of the dorsal striatum in choice impulsivity. Ann N York Acad Sci 1451:92–111. doi:10.1111/nyas.13961

      Kim D, Ackerman SL. 2011. The UNC5C Netrin Receptor Regulates Dorsal Guidance of Mouse Hindbrain Axons. J Neurosci 31:2167–2179. doi:10.1523/jneurosci.5254-10.2011

      Manitt C, Labelle-Dumais C, Eng C, Grant A, Mimee A, Stroh T, Flores C. 2010. Peri-Pubertal Emergence of UNC-5 Homologue Expression by Dopamine Neurons in Rodents. PLoS ONE 5:e11463-14. doi:10.1371/journal.pone.0011463

      Murcia-Belmonte V, Coca Y, Vegar C, Negueruela S, Romero C de J, Valiño AJ, Sala S, DaSilva R, Kania A, Borrell V, Martinez LM, Erskine L, Herrera E. 2019. A Retino-retinal Projection Guided by Unc5c Emerged in Species with Retinal Waves. Current Biology 29:1149-1160.e4. doi:10.1016/j.cub.2019.02.052

      Ott T, Stein AM, Nieder A. 2023. Dopamine receptor activation regulates reward expectancy signals during cognitive control in primate prefrontal neurons. Nat Commun 14:7537. doi:10.1038/s41467-023-43271-6

      Phillips RA, Tuscher JJ, Black SL, Andraka E, Fitzgerald ND, Ianov L, Day JJ. 2022. An atlas of transcriptionally defined cell populations in the rat ventral tegmental area. Cell Reports 39:110616. doi:10.1016/j.celrep.2022.110616

      Purohit AA, Li W, Qu C, Dwyer T, Shao Q, Guan K-L, Liu G. 2012. Down Syndrome Cell Adhesion Molecule (DSCAM) Associates with Uncoordinated-5C (UNC5C) in Netrin-1-mediated Growth Cone Collapse. The Journal of biological chemistry 287:27126–27138. doi:10.1074/jbc.m112.340174

      Reynolds LM, Hernandez G, MacGowan D, Popescu C, Nouel D, Cuesta S, Burke S, Savell KE, Zhao J, Restrepo-Lozano JM, Giroux M, Israel S, Orsini T, He S, Wodzinski M, Avramescu RG, Pokinko M, Epelbaum JG, Niu Z, Pantoja-Urbán AH, Trudeau L-É, Kolb B, Day JJ, Flores C. 2023. Amphetamine disrupts dopamine axon growth in adolescence by a sex-specific mechanism in mice. Nat Commun 14:4035. doi:10.1038/s41467-023-39665-1

      Reynolds LM, Pantoja-Urbán AH, MacGowan D, Manitt C, Nouel D, Flores C. 2022. Dopaminergic System Function and Dysfunction: Experimental Approaches. Neuromethods 31–63. doi:10.1007/978-1-0716-2799-0_2

      Reynolds LM, Pokinko M, Torres-Berrío A, Cuesta S, Lambert LC, Pellitero EDC, Wodzinski M, Manitt C, Krimpenfort P, Kolb B, Flores C. 2018a. DCC Receptors Drive Prefrontal Cortex Maturation by Determining Dopamine Axon Targeting in Adolescence. Biological psychiatry 83:181–192. doi:10.1016/j.biopsych.2017.06.009

      Reynolds LM, Yetnikoff L, Pokinko M, Wodzinski M, Epelbaum JG, Lambert LC, Cossette M-P, Arvanitogiannis A, Flores C. 2018b. Early Adolescence is a Critical Period for the Maturation of Inhibitory Behavior. Cerebral cortex 29:3676–3686. doi:10.1093/cercor/bhy247

      Schlienger S, Yam PT, Balekoglu N, Ducuing H, Michaud J-F, Makihara S, Kramer DK, Chen B, Fasano A, Berardelli A, Hamdan FF, Rouleau GA, Srour M, Charron F. 2023. Genetics of mirror movements identifies a multifunctional complex required for Netrin-1 guidance and lateralization of motor control. Sci Adv 9:eadd5501. doi:10.1126/sciadv.add5501

      Shao Q, Yang T, Huang H, Alarmanazi F, Liu G. 2017. Uncoupling of UNC5C with Polymerized TUBB3 in Microtubules Mediates Netrin-1 Repulsion. J Neurosci 37:5620–5633. doi:10.1523/jneurosci.2617-16.2017

      Srivatsa S, Parthasarathy S, Britanova O, Bormuth I, Donahoo A-L, Ackerman SL, Richards LJ, Tarabykin V. 2014. Unc5C and DCC act downstream of Ctip2 and Satb2 and contribute to corpus callosum formation. Nat Commun 5:3708. doi:10.1038/ncomms4708

      Torres-Berrío A, Lopez JP, Bagot RC, Nouel D, Dal-Bo G, Cuesta S, Zhu L, Manitt C, Eng C, Cooper HM, Storch K-F, Turecki G, Nestler EJ, Flores C. 2017. DCC Confers Susceptibility to Depression-like Behaviors in Humans and Mice and Is Regulated by miR-218. Biological psychiatry 81:306–315. doi:10.1016/j.biopsych.2016.08.017

      Vassilev P, Pantoja-Urban AH, Giroux M, Nouel D, Hernandez G, Orsini T, Flores C. 2021. Unique effects of social defeat stress in adolescent male mice on the Netrin-1/DCC pathway, prefrontal cortex dopamine and cognition (Social stress in adolescent vs. adult male mice). Eneuro ENEURO.0045-21.2021. doi:10.1523/eneuro.0045-21.2021

      Private Comments

      Reviewer #1

      (12) The language should be improved. Some expression is confusing (line178-179). Also some spelling errors (eg. Figure 1M).

      We have removed the word “Already” to make the sentence in lines 178-179 clearer, however we cannot find a spelling error in Figure 1M or its caption. We have further edited the manuscript for clarity and flow.

      Reviewer #2

      (1) The authors claim to have revealed how the 'timing of adolescence is programmed in the brain'. While their findings certainly shed light on molecular, circuit and behavioral processes that are unique to adolescence, their claim may be an overstatement. I suggest they refine this statement to discuss more specifically the processes they observed in the brain and animal behavior, rather than adolescence itself.

      We agree with the reviewer and have revised the manuscript to specify that we are referring to the timing of specific developmental processes that occur in the adolescent brain, not adolescence overall.

      (2) Along the same lines, the authors should also include a more substantiative discussion of how they selected their ages for investigation (for both mice and hamsters), For mice, their definition of adolescence (P21) is earlier than some (e.g. Spear L.P., Neurosci. and Beh. Reviews, 2000).

      There are certainly differences of opinion between researchers as to the precise definition of adolescence and the period it encompasses. Spear, 2000, provides one excellent discussion of the challenges related to identifying adolescence across species. This work gives specific ages only for rats, not mice (as we use here), and characterizes post-natal days 28-42 as being the conservative age range of “peak” adolescence (page 419, paragraph 1). Immediately thereafter the review states that the full adolescent period is longer than this, and it could encompass post-natal days 20-55 (page 419, paragraph 2).

      We have added the following statement to our methods:

      There is no universally accepted way to define the precise onset of adolescence. Therefore, there is no clear-cut boundary to define adolescent onset in rodents (Spear, 2000). Puberty can be more sharply defined, and puberty and adolescence overlap in time, but the terms are not interchangeable. Puberty is the onset of sexual maturation, while adolescence is a more diffuse period marked by the gradual transition from a juvenile state to independence. We, and others, suggest that adolescence in rodents spans from weaning (postnatal day 21) until adulthood, which we take to start on postnatal day 60 (Reynolds and Flores, 2021). We refer to “early adolescence” as the first two weeks postweaning (postnatal days 21-34). These ranges encompass discrete DA developmental periods (Kalsbeek et al., 1988; Manitt et al., 2011; Reynolds et al., 2018a), vulnerability to drug effects on DA circuitry (Hammerslag and Gulley, 2014; Reynolds et al., 2018a), and distinct behavioral characteristics (Adriani and Laviola, 2004; Makinodan et al., 2012; Schneider, 2013; Wheeler et al., 2013).

      References:

      Adriani W, Laviola G. 2004. Windows of vulnerability to psychopathology and therapeutic strategy in the adolescent rodent model. Behav Pharmacol 15:341–352. doi:10.1097/00008877-200409000-00005

      Hammerslag LR, Gulley JM. 2014. Age and sex differences in reward behavior in adolescent and adult rats. Dev Psychobiol 56:611–621. doi:10.1002/dev.21127

      Hoops D, Flores C. 2017. Making Dopamine Connections in Adolescence. Trends in Neurosciences 1–11. doi:10.1016/j.tins.2017.09.004

      Kalsbeek A, Voorn P, Buijs RM, Pool CW, Uylings HBM. 1988. Development of the Dopaminergic Innervation in the Prefrontal Cortex of the Rat. The Journal of Comparative Neurology 269:58–72. doi:10.1002/cne.902690105

      Makinodan M, Rosen KM, Ito S, Corfas G. 2012. A critical period for social experiencedependent oligodendrocyte maturation and myelination. Science 337:1357–1360. doi:10.1126/science.1220845

      Manitt C, Mimee A, Eng C, Pokinko M, Stroh T, Cooper HM, Kolb B, Flores C. 2011. The Netrin Receptor DCC Is Required in the Pubertal Organization of Mesocortical Dopamine Circuitry. J Neurosci 31:8381–8394. doi:10.1523/jneurosci.0606-11.2011

      Reynolds LM, Flores C. 2021. Mesocorticolimbic Dopamine Pathways Across Adolescence: Diversity in Development. Front Neural Circuit 15:735625. doi:10.3389/fncir.2021.735625

      Reynolds LM, Yetnikoff L, Pokinko M, Wodzinski M, Epelbaum JG, Lambert LC, Cossette MP, Arvanitogiannis A, Flores C. 2018. Early Adolescence is a Critical Period for the Maturation of Inhibitory Behavior. Cerebral cortex 29:3676–3686. doi:10.1093/cercor/bhy247

      Schneider M. 2013. Adolescence as a vulnerable period to alter rodent behavior. Cell and tissue research 354:99–106. Doi:10.1007/s00441-013-1581-2

      Spear LP. 2000. Neurobehavioral Changes in Adolescence. Current directions in psychological science 9:111–114. doi:10.1111/1467-8721.00072

      Wheeler AL, Lerch JP, Chakravarty MM, Friedel M, Sled JG, Fletcher PJ, Josselyn SA, Frankland PW. 2013. Adolescent Cocaine Exposure Causes Enduring Macroscale Changes in Mouse Brain Structure. J Neurosci 33:1797–1803. doi:10.1523/jneurosci.3830-12.2013

      (3) Figure 1 - the conclusions hinge on the Netrin-1 staining, as shown in panel G, but the cells are difficult to see. It would be helpful to provide clearer, more zoomed images so readers can better assess the staining. Since Netrin-1 expression reduces dramatically after P4 and they had to use antigen retrieval to see signal, it would be helpful to show some images from additional brain regions and ages to see if expression levels follow predicted patterns. For instance, based on the allen brain atlas, it seems that around P21, there should be high levels of Netrin-1 in the cerebellum, but low levels in the cortex. These would be nice controls to demonstrate the specificity and sensitivity of the antibody in older tissue.

      We do not study the cerebellum and have never stained this region; doing so now would require generating additional tissue and we’re not sure it would add enough to the information provided to be worthwhile. Note that we have stained the forebrain for Netrin-1 previously, providing broad staining of many brain regions (Manitt et al., 2011)

      References:

      Manitt C, Mimee A, Eng C, Pokinko M, Stroh T, Cooper HM, Kolb B, Flores C. 2011. The Netrin Receptor DCC Is Required in the Pubertal Organization of Mesocortical Dopamine Circuitry. J Neurosci 31:8381–8394. doi:10.1523/jneurosci.0606-11.2011

      (4) Figure 3 - Because mice tend to avoid brightly-lit spaces, the light/dark box is more commonly used as a measure of anxiety-like behavior than purely exploratory behavior (including in the paper they cited). It is important to address this possibility in their discussion of their findings. To bolster their conclusions about the coincidence of circuit and behavioral changes in adolescent hamsters, it would be useful to add an additional measure of exploratory behaviors (e.g. hole board).

      Regarding the light/dark box test, this is an excellent point. We prefer the term “risk taking” to “anxiety-like” and now use the former term in our manuscript. Furthermore, our interest in the behaviour is purely to chart the development of adolescent behaviour across our treatment groups, not to study a particular emotional state. Regardless of the specific emotion or emotions governing the light/dark box behaviour, it is an ideal test for charting adolescent shifts in behaviour as it is well-characterized in this respect, as we discuss in our manuscript.

      (5) Supplementary Figure 4,5 The authors defined puberty onset using uterine and testes weights in hamsters. While the weights appear to be different for summer and winter hamsters, there were no statistical comparison. Please add statistical analyses to bolster claims about puberty start times. Also, as many studies use vaginal opening to define puberty onset, it would be helpful to discuss how these measurements typically align and cite relevant literature that described use of uterine weights. Also, Supplementary Figures 4 and 5 were mis-cited as Supp. Fig. 2 in the text (e.g. line 317 and others).

      These are great suggestions. We have added statistical analyses to Supplementary Figures 5 and 6 and provided Vaginal Opening data as Supplementary Figure 7. The statistical analyses confirm that all three characters are delayed in winter hamsters compared to summer hamsters.

      We have also added the following references to the manuscript:

      Darrow JM, Davis FC, Elliott JA, Stetson MH, Turek FW, Menaker M. 1980. Influence of Photoperiod on Reproductive Development in the Golden Hamster. Biol Reprod 22:443–450. doi:10.1095/biolreprod22.3.443

      Ebling FJP. 1994. Photoperiodic Differences during Development in the Dwarf Hamsters Phodopus sungorus and Phodopus campbelli. Gen Comp Endocrinol 95:475–482. doi:10.1006/gcen.1994.1147

      Timonin ME, Place NJ, Wanderi E, Wynne-Edwards KE. 2006. Phodopus campbelli detect reduced photoperiod during development but, unlike Phodopus sungorus, retain functional reproductive physiology. Reproduction 132:661–670. doi:10.1530/rep.1.00019

      (6) The font in many figure panels is small and hard to read (e.g. 1A,D,E,H,I,L...). Please increase the size for legibility.

      We have increased the font size of our figure text throughout the manuscript.

      Reviewer #3

      (15) Fig 1 C,D. Clarify the units of the y axis

      We have now fixed this.

      Full Reference List

      Adriani W, Laviola G. 2004. Windows of vulnerability to psychopathology and therapeutic strategy in the adolescent rodent model. Behav Pharmacol 15:341–352. doi:10.1097/00008877-200409000-00005

      Hammerslag LR, Gulley JM. 2014. Age and sex differences in reward behavior in adolescent and adult rats. Dev Psychobiol 56:611–621. doi:10.1002/dev.21127

      Hoops D, Flores C. 2017. Making Dopamine Connections in Adolescence. Trends in Neurosciences 1–11. doi:10.1016/j.tins.2017.09.004

      Kalsbeek A, Voorn P, Buijs RM, Pool CW, Uylings HBM. 1988. Development of the Dopaminergic Innervation in the Prefrontal Cortex of the Rat. The Journal of Comparative Neurology 269:58–72. doi:10.1002/cne.902690105

      Makinodan M, Rosen KM, Ito S, Corfas G. 2012. A critical period for social experiencedependent oligodendrocyte maturation and myelination. Science 337:1357–1360. doi:10.1126/science.1220845

      Manitt C, Mimee A, Eng C, Pokinko M, Stroh T, Cooper HM, Kolb B, Flores C. 2011. The Netrin Receptor DCC Is Required in the Pubertal Organization of Mesocortical Dopamine Circuitry. J Neurosci 31:8381–8394. doi:10.1523/jneurosci.0606-11.2011

      Reynolds LM, Flores C. 2021. Mesocorticolimbic Dopamine Pathways Across Adolescence: Diversity in Development. Front Neural Circuit 15:735625. doi:10.3389/fncir.2021.735625 Reynolds LM, Yetnikoff L, Pokinko M, Wodzinski M, Epelbaum JG, Lambert LC, Cossette M-P, Arvanitogiannis A, Flores C. 2018. Early Adolescence is a Critical Period for the Maturation of Inhibitory Behavior. Cerebral cortex 29:3676–3686. doi:10.1093/cercor/bhy247

      Schneider M. 2013. Adolescence as a vulnerable period to alter rodent behavior. Cell and tissue research 354:99–106. doi:10.1007/s00441-013-1581-2

      Spear LP. 2000. Neurobehavioral Changes in Adolescence. Current directions in psychological science 9:111–114. doi:10.1111/1467-8721.00072

      Wheeler AL, Lerch JP, Chakravarty MM, Friedel M, Sled JG, Fletcher PJ, Josselyn SA, Frankland PW. 2013. Adolescent Cocaine Exposure Causes Enduring Macroscale Changes in Mouse Brain Structure. J Neurosci 33:1797–1803. doi:10.1523/jneurosci.3830-12.2013

    1. Author Response

      The following is the authors’ response to the original reviews.

      To the reviewers.

      We appreciate a detailed and deep review of our manuscript. Below are our comments and responses. Many requested data are present in the Supplementary figures of the manuscript. There seem to be two main concerns: one regarding the evidence of TLT2 expression in HFSCs; and second, regarding CEP/TLR2. As detailed below, we utilized 3 different methods to document TLR2 expression: TLR2-reporter mouse, staining for TLR2 and qPCR of isolated cells for TLR2. The source (the data are in Supplementary Fig. 5A, B and in references below) and nature of CEP (it is not a protein, but metabolic product of Polyunsaturated acid DHA oxidation by MPO amongst other ROS sources) are also explained below.

      1) “The expression analysis of TLR2 is questionable. Many of the conclusions about the level of target genes are based on quantifying fluorescence intensity in microscopy images (e.g., TLR2 level in young or aged mice, BMP7 levels in mice with/without TLR2 KO). This could be strengthened by using qPCR to measure gene expression levels in FACS-sorted HFSCs, which would provide more accurate quantification. Additionally, the authors should test if the TLR2 antibody used is valid.”

      In most instances we have used TLR2 reporter mouse, which presents an advantage over immunostaining. Fig.2 (A-H) shows expression of TLR2 reporter, not the staining with TLR2 abs. For selected experiments we utilized immunostaining with anti- TLR2 (Santa Cruz Biotechnology, sc-21759) antibody, which has been validated in our previous publication (see Michael G. McCoy and all. Endothelial TLR2 promotes proangiogenic immune cell recruitment and tumor angiogenesis. // Sci Signal. 2021 Jan 19; 14(666): eabc5371/doi: 10.1126/ scisignal.abc5371). In Fig.S2E of that manuscript we validated these abs using a knockout of TLR2. In the current paper, we further validate anti-TLR2 abs by showing its co-localization with the TLR2-GFP reporter (Fig. S1A).

      We then confirmed reporter and immunostaining data by qPCR showing Tlr2 expression in FACS-purified mouse HFSCs in anagen, telogen, and catagen (Fig.2J), in mouse epidermal cells and FACS-purified HFSCs (Fig.2K), and FACS-purified HFSCs isolated from Control and TLR2HFSC-KO mice (Fig.4E).

      As for the mechanistic link between TLR2 and BMP signaling was identified using RNAseq on FACS-purified HFSCs (supplementary Fig.4), then verified using qPCR (Fig.4E shows Bmp7,Bmp2, Bmpr1a ) and only then immunohistochemistry staining for BMP7 and phosphoSMAD1/5/9 was used (Fig.4A-D, F-H). Note that the large body of requested evidence is presented in Supplementary data. Other mechanistic links shown using qPCR include Nfkb2, Il1b, Il6, and Bmp7 in FACS-purified mouse HFSCs treated with BSA control or CEP (Fig.6Q,6R).

      “As the reviewers note, it is not clear whether the TLR2+ signal is located at the basal side of bulge stem cells, basement membrane underlying bulge stem cells, or dermal sheath cells encapsulating bulge structure. Co-staining with basement membrane markers such as collagen and laminin or HFSC basal side membrane markers such as Itga6, Itgb1, and Itgb4 will clarify this. In addition, showing the expression pattern of TLR2 in full skin including epidermis and dermis would be helpful. As TLR2 is highly expressed in immune cells or blood endothelial cells, if the antibody staining is valid, strong positive signals should present in the cells. Moreover, testing the TLR2 antibody in Tlr2 knock-out mouse tissues would be an appropriate control experiment.”

      Once again, in most instances we have used not the staining for TLR2 but TLP2 reporter mouse (Fig.2 legend). Anti-TLR2 abs have been verified in TLR2 KO as described above. Fig.2K shows comparison of Tlr2 mRNA expression in mouse epidermal cells to FACS-purified HFSCs by qPCR.

      TLR2 signal is detected in several cell types within the hair follicle as well as in dermal cells surrounding the hair follicles, such as lymphocytes, resident tissue macrophages, fibroblast, and fibroblast precursors, etc. (https://www.proteinatlas.org/ENSG00000137462-TLR2/single+cell+type). In Author response image 1 below, white arrows point to the TLR2-positive cells around the hair follicle. In our paper, we focus on HFSC TLR2 and use the respective inducible tissue specific TLR2 KO. The contribution of TLR2 on other cell types can be assessed by the comparison of the phenotypes of global TLR2 KO, TLR2 KO-WT bone marrow chimeras and HFSC-specific TLR2 KO. The results are presented in both, main and supplementary figures (Fig.5D-I and SFig.5I-K shows global TLR2 KO, Fig.6H-I, SFig.5G-h shows bone marrow chimeras and Figs.3,4, 5 (J-M), Fig.5 (J-N) shows the main focus, HFSC-TLR2 KO. Overall, the phenotype (delay of hair regeneration after wounding) seems to be the strongest in TLR2 KO, whereas bone marrow chimeras and HFSCs phenotypes are comparable. Thus, TLR2 on bone marrow derived cells complements the main role for TLR2 on HFSCs.

      Author response image 1.

      Staining for TRLR2 (white), DAPI (blue) and Keratin 17 (purple) is shown

      “The increase in expression of TLR2 during the hair follicle stem cell activation should be documented by FACS and/or qPCR. This is important because as noted by one of the reviewers.”

      While original observation was done using both, a TLR2 reporter mouse and immunostaining, the data were confirmed by qPCR showing Tlr2 mRNA expression in FACS-purified mouse HFSCs in anagen, telogen, and catagen (Fig.2J).

      “In Fig 1D, the authors mentioned that they re-analyzed published RNA-seq data (Greco et al., 2009) to show the increase of Tlr2 and Tlr6 expression in late telogen compared to early telogen. However, there is no RNA-seq data in that paper, but only microarray data of bulge vs HG comparison and dermal papillae cells (DP) in early, mid, late Telo. If the authors used DP data to show the increase of Tlr2 transcripts in late Telo, the analysis is completely wrong and has to be corrected. The problem is compounded by the fact that in other published HFSC RNA-seq datasets (Yang et al., Cell, 2017, Adam et al., Nature Cell Biology, 2020), the expression levels of Tlr2 and Tlr6 are very low (below 5 TPM). In Fig 1G, the authors also re-analyzed Morinaga et al., 2021 data to show the reduction of Tlr2 expression in HFSCs in high-fat diet mice. However, in the raw data of Morinaga et al., 2021 (GSE169173), Tlr2 expression FPKM values are below 1 in both normal diet and high-fat diet samples, which are too low to perform comparative analysis and are not statistically meaningful. Like Tlr2, the expressions of Tlr1 and Tlr6, which form heterodimer with TLR2, are almost 0. Thus, the authors should revisit the dataset and revise their analysis and conclusion.”

      To document the existence of Tlr2 and Tlr6 expression in HFSCs, the authors should perform RNR-seq-based gene expression analysis by themselves. Otherwise, the authors' TLR2 expression analyses in Fig 1 are not convincing. These are serious issues that the authors will want to rectify so that eLIFE readers will not discount their findings and importance.”

      It is correct, we analyzed a published array, not RNAseq data (Greco et al., 2009) using GEO2R tool which allowed us to compare the mRNA expression levels between early, middle, and late telogen in bulge CD34 positive cells. We changed the “RNA-seq” (the term was used incorrectly) to “RNA microarray” in the main text.

      In our manuscript, TLR2 expression is documented not only in Fig.1, but also in Fig.2 and S.Fig.1. We utilized 3 different methods to document TLR2 expression: TLR2-reporter mouse, staining for TLR2 and qPCR of isolated cells for TLR2. Fig.2K shows comparison of Tlr2 mRNA expression in mouse epidermal cells to FACS-purified HFSCs by qPCR to document increased TLR2 expression on HFSCs. Likewise, Fig.2J shows qPCR for TLR2 on HFSC during various phases of hair growth.

      “In Fig 2, to support the expression of Tlr2 in HFSCs, the authors utilized TLR2-GFP mice and showed the strong GFP expression in HFSCs, hair bulb, and ORS. However, as the expression data in Fig 1 are questionable, the GFP reporter data should be carefully analyzed with proper control experiments. For example, although TLRs are highly expressed in immune cells and endothelial cells, which are abundantly present in skin, Fig 2 data did show the GFP expression in these cells. Instead, the GFP signals looked very specific to epithelial compartments, which is odd. Again, to convince readers, the authors should provide more comprehensive analyses of expression patterns of TLR2-GFP mice in skin. Also, if the TLR2-GFP signals faithfully reflect the actual expression of Tlr2 mRNA, the GFP signals should increase in late telogen compared to early telogen. The authors should check whether TLR2-GFP expression follows this pattern.”

      The specificity of TLR reporter was characterized in Price et al. , 2018. A Map of Toll-like Receptor Expression in the Intestinal Epithelium Reveals Distinct Spatial, Cell Type-Specific, and Temporal Patterns. Immunity, 49. Thus, TLR2 reporter mouse is well characterized (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6152941/) and represents one of the best available tools to show TLR2 expression.

      Expression of TLR2 on endothelial cells and validation of anti-TLR2 abs was performed in McCoy et al, Science Signaling as mentioned above. Also as discussed above we show a strong correlation between TLR2-GFP reporter expression and TLR2 expression using coimmunostaining with GFP and TLR2 antibodies with appropriate isotype-match non-immune antibodies as negative controls.

      There is no doubt that TLR2 is expressed on immune, endothelial and epithelial cells. According to the Human Protein Atlas, TLR2 expression is identified in skin fibroblasts, keratinocytes, melanocytes, etc., so our findings are well supported by the literature (https://www.proteinatlas.org/ENSG00000137462-TLR2/single+cell+type). Indeed, we detected TLR2 in cells surrounding the hair follicle (see the pictures above). TLR2 signal was detected in nearly all niches of hair follicles including the CD34-positive cells.

      In Fig.S1 we demonstrated an increased level of TLR2 in the late (competent) telogen compared to the early (refractory) telogen using immunostaining for TLR2-GFP. The results mirrored published RNA-array data in Fig.1D. Again, reporter and immunostaining results have been validated by qPCR for TLR2.

      The levels of TLR2 might be heavily influences by the environment, i.e. pathogens availability. In this regard, note that mice for this study were kept in normal, not pathogen-free conditions.

      “Overall, the existence of Tlr2 expression in HFSCs is still questionable. Without resolving these, genetic deletion of Tlr2 in HFSCs cannot be rationalized.”

      In our manuscript, TLR2 expression is documented not only in Fig.1, but also in Fig.2 and S.Fig.1. We utilized 3 different methods to document TLR2 expression: TLR2-reporter mouse, staining for TLR2 and qPCR of isolated cells for TLR2. Besides these data, we show the functional responses to canonical TLR2 ligand, PAM3CSK4, and previously characterized endogenous ligand, CEP, using proliferation, western blotting and many other approaches. In numerous immunostainings we show co-localization of TLR2 and CD34 (Fig.2) using IMARIS surface rendering and colocalization tools. Our conclusions are further supported by published results as discussed above.

      2) “The central conclusion of this study is that the activation of TLR2 can suppress BMP signaling; however, the molecular link between TLR2 and BMP signaling is still missing. Given the importance of this finding, it would be intriguing to further investigate how TLR2 activation suppresses BMP signaling. A better characterization of the molecular-level interaction between TLR2 and BMP signaling can further enhance the impact of this study.

      -The published dataset should be re-analyzed, as some images and their quantification do not appear to be matched. Representative images should be used.”“In Fig 4, the authors propose that the activation of TLR2 pathway inhibits the BMP signaling pathway, which makes HFSCs quiescent. In TLR2-HFSC-KO, the authors showed that BMP7 is increased and pSMAD1/5/9 is sustained. The increase in BMP7 expression and SMAD activation should be demonstrated by additional assays. Are SMAD target genes activated in the cKO mice?”

      This mechanistic link between TLR2 and BMP was originally identified by RNAseq, confirmed by qPCR and then by immunostaining for both, BMP7 and BMP pathway activation based on phosphoSMAD1/5/9 levels. The connection to BMP pathway was also shown by western blotting (S.Fig.4B,C). The rescue experiments have been performed using Noggin injections. According to our data, numerous SMAD target genes are upregulated in TLR2-HFSC-KO, such as Kank2, Ptk2b, Scarf2, Camk1, Dpysl2, as well as BMP2 and BMP7, and these changes were confirmed by qPCR analysis in Fig.4E. Additional evidence is shown in Fig.6, which demonstrates that endogenous TLR2 ligand, CEP-carboxyethylpyrrole, acts by a similar, BMP-dependent pathway. Also, Supplemental Fig.4 adds more details to this link. SFig.4B,C shows that TLR2 activation by canonical ligand PAM3CSK4 inhibits pSMAD levels induced by BMP (western blot is shown). At the same time, as anticipated PAM3CSK4 upregulated NFkB, however, little of no effect of BMP stimulation on NFkB is observed. To summarize: TLR2 affects both, BMP7 production and BMP induced downstream signaling judged by PhosphoSMADs. The later connection appears to go in one direction: TLR2 signaling affects BMP-induced pSMADs, however, BMP signaling does not seem to substantially change TLR2-dependent NFkB. We plan to delve into the intersection of these important pathways in future.

      “Functionally, downregulation of BMP signaling by injecting Noggin, a BMP antagonist, in TLR2HFSC-KO mice induces HFSC proliferation. These functional data are solid. However, it is still curious how TLR2 signaling interact with BMP pathway molecularly. Is it transcriptional regulation or translational regulation? Perhaps, RNA-seq analysis of TLR2HFSC-KO could give some hints to answer this question. Furthermore, checking out other signaling pathways such as WNT/LEF1 and pCREB, which are important for hair cycle activation and NFkB, a downstream effector of TLR signaling would be helpful to interrogate mechanistic insights.”

      As discussed above, TLR2 affects both, BMP7 production and BMP-induced downstream signaling judged by PhosphoSMADs. The later connection appears to go in one direction: TLR2 signaling affects BMP-induced pSMADs, however, BMP signaling does not seem to substantially change TLR2-dependent NFkB.

      Indeed, in addition to BMP signaling, the Wnt signaling and β-catenin stabilization within HFSCs, known to trigger their activation (Deschene et al., 2014). However, this axis remained unchanged upon TLR2HFSC-KO (as shown in Supplementary Fig. 4J). There were several published reports on the crosstalk between TLR and BMP signaling such as (doi: 10.1089/scd.2013.0345. Epub 2013 Nov 7) showing that activation of TLR4 inhibits BMP-induced pSMAD1/5/8 and this connection requires NFkB. We probed NfkB activation, please, see the responses above.

      However, we were not able to detect substantial effect of NFkB inhibition on BMP signaling in hair follicles (not shown).

      3) “The function of CEP, a proposed endogenous ligand of TLR2, is still not clear. The authors imply that the decreased CEP level in aged mice could lead to deficient TLR2 signaling, which could further cause aging-associated hair regeneration defects. But this has not been demonstrated. What are the BMPs and pSmad1/5 levels in aged skin? Another important experiment to confirm the importance of this link during aging would be to inject CEP into the aged skin and examine whether this could restore hair regeneration in aged mice. Does CEP activate hair cycling during the endogenous pathway? What might be the source of CEP? Does CEP treatment activate BMP7 signaling? The authors should clarify these issues. The authors suggested that CEP is an endogenous ligand of TLR2, and administration of CEP induces hair cycle entry in a TLR2dependent manner. How potent is CEP in terms of HFSC activation? In Fig 6Q, CEP increases the expression of Nfkb2, Il1b, and Il6, but the fold changes are marginal. Also, if CEP is a critical ligand, the loss of CEP by a genetic deletion or a pharmacological inhibition should result in the delay of hair cycle entry. Furthermore, the source of CEP expression is curious. Is it expressed by HFSCs or dermal fibroblast or immune cells? Finally, comparing the effect of CEP to the effect of other bacterial origin Tlr2 ligands such as heat killed bacteria, purified microbial cell-wall components, and synthetic agonists (Pam3CSK4) would be helpful. It is curious if HFSC directly senses the bacterial materials and triggers hair follicle regeneration or are indirectly directed by immune cells and endothelial cells, which could be primary sensor.”

      CEP is not a protein, it is an oxidative stress-generated metabolite of polyunsaturated fatty acid, DHA (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5360178/), thus, it is impossible to generate a knockout of this molecule. As demonstrated in previous publications (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2990914/, https://pubmed.ncbi.nlm.nih.gov/34871763/) CEP serves as a critical endogenous ligand supporting TLR2 signaling in the absence of pathogens. While other TLR2 endogenous ligands, such as HMGBs or HSPs exist (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4373479/), CEP binds to TLR2 directly, and its generation is aided by MPO (myeloperoxidase) amongst other peroxidases and sources of reactive oxygen/nitrogen species. MPO (produced by immune cells amongst others) serves as an innate immunity response against pathogens, but it also generates CEP adducts (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6034644/) adducts in both protein and lipid form. The knockout of MPO diminishes CEP generation in skin (PMC6034644), thereby demonstrating the causative relationship between CEP and MPO.

      Author response image 2.

      Additional immunostaining of mouse skin for Keratin 17 (purple), CEP (green) and MPO (red). Similar staining is in S.Fig.5A and quantification is in S.Fig.5B.

      Also, the above-mentioned manuscripts show that CEP effects are milder but overall comparable with canonical TLR2 agonists, PAM3SCK4. As we mention in the present manuscript, normal young mice’s tissues are devoid of CEP (which is generated in response to inflammation) with an exception of hair follicles. This is likely attributed to the secretion of MPO by hair follicles (PMID: 36402231) especially in conditions of inflammation (PMID: 32893875). Supplementary Fig.5A,B show that MPO is present at the high level in sebaceous gland (as a part of anti-microbial mechanism). Again, MPO is a secreted enzyme and it is likely to be a source of continuous DHA oxidation into CEP in hair follicles. We also document that both, TLR2 and CEP levels in hair follicles (but not in other tissues-an important point for CEP) are reduced in aging. Likewise, SFig.5A,B shows that MPO secretion in hair follicle is reduced by more than 60% in aging mice. Thus, it is likely that reduced MPO levels in aging hair follicle produce less CEP. Together with reduced TLR2 levels, the lack of CEP might contribute to hair loss in aging.

      We show that similar to TLR2, CEP in hair follicles operates via a BMP-7 dependent pathway (see Fig.6). We also provide results using canonical bacterial ligand for TLR2, PAM3CSK4 whose effect on HFSCs proliferation is similar to CEP in a TLR2-dependent manner. TLR2 blocking approaches were used (Supp. Fig.4B, C, D, E, Supp. Fig.5D-5F). It remains to be seen whether CEP is required for the normal hair cycling and whether its administration might improve hair loss in aging subjects.

      “The impacts of CEP/TLR2 on proliferation of keratinocytes is still weak. How much of this effect is a result of NFkB activation, and how much is simply due to inhibiting BMP signaling?

      Impact of TLR2 on proliferation was demonstrated using a variety of mouse models, from global TLR2 KO to bone marrow chimeras to HFSCs-specific TLR2 KO, again using multiple approaches. The same applies to the effects of CEP as well as to canonical TLR2 ligand, PAM3CSK4, which were demonstrated both in vivo and in culture to be TLR2-dependent (Fig.6MO) and Supplementary Fig.4E-D). As for NFkB connection, see our responses above. It seems that the connection between TLR2 and BMP pathway occurs independently of NFkB activation.

      4) The links between TLR2 pathway and aging and obesity are only correlative. Although the authors suggest that the reduction of TLR2 expression in aging and obesity may diminish hair growth (Fig 1), there is no direct functional evidence that supports this possibility. If the authors wish to make this claim, they should test the roles of TLR2 and CEP in aging and obesity conditions.”

      We show that both, TLR2 and CEP are reduced in aging, and that this pathway contributes to hair cycling and regeneration upon wounding, we do not wish to claim more.

      5) More minor points:

      “Fig.4: The Noggin treatment in TLR2 KO mice is an important experiment. However, it is unclear why Noggin only enhances proliferation (Ki67 level) in HG but not in the bulge. This discrepancy should be addressed.”

      As we showed in Fig. 3B-3F, TLR2 HFSC-KO mice have prolonged first telogen. Noggin treatment at the first postnatal telogen promotes telogen to anagen transition in TLR2HFSC-KO characterized by the activation of HG cells prior to the bulge cells. According to the literature, the bulge cells remained silent during the late telogen, however, HGs became Ki67- positive and the proliferation of HG cells contributed to the telogen-to-anagen transition.

      (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2668200/

      https://www.sciencedirect.com/science/article/pii/S0022202X15404518?via%3Dihub

      https://journals.biologists.com/jcs/article/114/19/3419/34892/Hair-follicle-predetermination).

      “Fig.5: Does TLR2 cKO slow down wound healing, in addition to affecting pigmentation and the number of hair follicles?”

      In our previous publication, we demonstrated that deletion of TLR2 in HFSC does not affect wound healing process. Instead, endothelial TLR2 promotes wound vascularization and healing.

      (see Xiong and all. Timely Wound Healing Is Dependent on Endothelial but Not on Hair Follicle Stem Cell Toll-Like Receptor 2 Signaling.// Journal of Investigative Dermatology, Volume 142, Issue 11, November 2022, Pages 3082-3092.e1).

      “There is no panel B in Fig.4. There is no image in Fig 4D. Please correct this properly.”

      We corrected Fig.4

      “Discussion: The constant production of CEP in homeostatic skin and in the absence of inflammation should be further discussed. Additionally, the possible causes of reducing CEP levels during aging should also be further discussed.”

      We explained the sources of CEP generation, such as MPO as a one of the key enzyme, above.<br /> The data on MPO levels in hair follicles of young and old mice are presented in Supplementary Fig.5A,B. Since we previously shown that MPO produces CEP from DHA (PMC6034644), the reduction in MPO in aging is likely to contribute to reduced CEP levels.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      ⍺-synuclein (syn) is a critical protein involved in many aspects of human health and disease. Previous studies have demonstrated that post-translational modifications (PTMs) play an important role in regulating the structural dynamics of syn. However, how post-translational modifications regulate syn function remains unclear. In this manuscript, Wang et al. reported an exciting discovery that N-acetylation of syn enhances the clustering of synaptic vesicles (SVs) through its interaction with lysophosphatidylcholine (LPC). Using an array of biochemical reconstitution, single vesicle imaging, and structural approaches, the authors uncovered that N-acetylation caused distinct oligomerization of syn in the presence of LPC, which is directly related to the level of SV clustering. This work provides novel insights into the regulation of synaptic transmission by syn and might also shed light on new ways to control neurological disorders caused by syn mutations.

      We thank the reviewer for appreciating the importance of our work and his/her positive comments.

      Reviewer #1 (Recommendations For The Authors):

      (1) The authors employed DLS to quantify the percentage of SV clustering in Fig. 1c and d. As DLS usually measures particle size distribution, I am not sure how the data was plotted in Fig. 1c and d. It would be great to show a representative raw dataset here.

      We thank the reviewer for the comment. To address this, we have put four representative DLS datasets of different α-Syn variants mediating SV clustering for clarification (Author response image 1). Rather than presenting the particle distribution based on the light scattering intensity, DLS can also convert the intensity to present the data as particle size distribution based on the particle number counts. In our analysis, particle diameters around 50 nm are considered to represent single SV species, whereas diameters larger than 120 nm indicate SV clusters. Specifically, as shown in Author response image 1, adding Ac-α-syn to a homogeneous SV sample altered the distribution from one single SV particle species (Author response image 1d) to three distinct species (Author response image 1a); this resulted in 68.5% of the particles being single SVs and 31.5% being SV clusters.

      Author response image 1.

      Representative raw dataset of α-Syn-mediated synaptic vesicle (SV) clustering monitored by dynamic light scattering (DLS). The grey-colored rows represent small particles (< 5 nm) that contributed zero to the particle number count.

      (2) Syn-lipid interactions are known to be altered by mutations involved in neurodegenerative diseases. I am wondering how those mutations will affect SV clustering mediated by the interaction of LPC with N-acetylated syn.

      We thank the reviewer for the insightful comment. Our data indicate that N-acetylation enhances the binding of the N-terminal region of α-syn to LPC, thereby facilitating SV clustering. This enhancement benefits from the fact that N-acetylation effectively neutralizes the positive charge of α-syn’s N-terminal region, promoting its insertion into LPC-rich membranes through hydrophobic interactions. Therefore, we envision that any mutation that weakens membrane binding capability of the N-terminal unmodified α-Syn may decrease SV clustering mediated by the interaction between the Ac-α-syn and LPC.

      In a separated work (doi: 10.1093/nsr/nwae182, Fig. S8), we compared the binding affinity of LPC with wild-type N-terminal un-modified α-syn and six Parkinson’s disease (PD) familial mutants (A30P, E46K, H50Q, G51D, A53E, and A53T). Among these, only the A30P mutation showed a significant decrease in binding with LPC. Furthermore, using the same single vesicle assay setup, in another paper (doi: 10.1073/pnas.2310174120, Fig. 4C), we demonstrated that the A30P-mutated α-Syn lost its ability to facilitate SV clusters. Therefore, among the six PD mutations, the A30P mutation may significantly impact the SV clustering mediated by Ac-α-syn LPC interaction.

      (3) The crosslinking data in Fig. 4 was obtained using LPC or PS liposomes. I am wondering if these results truly mimic physiological conditions. Could the authors use SVs for these experiments?

      We thank the reviewer for the suggestion. To elucidate the mechanistic differences between N-terminal unmodified α-syn and N-acetylated α-syn, we utilized pure LPC and PS liposomes for clarity. If using natural source SVs, which contain many synaptic proteins, could complicate or obscure the interaction patterns of Ac-α-syn due to potential crosstalk with other SV proteins. Additionally, the complex lipid environment of SV membranes would not help us decipher the specific molecular mechanism by which Ac-α-Syn facilitates SV clustering through LPC.

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript, the authors provide evidence that posttranslational modification of synuclein by N-acetylation increases clustering of synaptic vesicles in vitro. When using liposomes the authors found that while clustering is enhanced by the presence of either lysophosphatidylcholine (LPC) or phosphatidylcholine in the membrane, N-acetylation enhanced clustering only in the presence of LPC. Enhancement of binding was also observed when LPC micelles were used, which was corroborated by increased intra/intermolecular cross-linking of N-acetylated synuclein in the presence of LPC.

      Strengths:

      It is known for many years that synuclein binds to synaptic vesicles but the physiological role of this interaction is still debated. The strength of this manuscript is clearly in the structural characterization of the interaction of synuclein and lipids (involving NMR-spectroscopy) showing that the N-terminal 100 residues of synuclein are involved in LPC-interaction, and the demonstration that N-acetylation enhances the interaction between synuclein and LPC.

      We thank the reviewer for their positive assessment of our work.

      Weaknesses:

      Lysophosphatides form detergent-like micelles that destabilize membranes, with their steady-state concentrations in native membranes being low, questioning the significance of the findings. Oddly, no difference in binding between the N-acetylated and unmodified form was observed when the acidic phospholipid phosphatidylserine was included. It remains unclear to which extent binding to LPC is physiologically relevant, particularly in the light of recent reports from other laboratories showing that synuclein may interact with liquid-liquid phases of synapsin I that were reported to cause vesicle clustering.

      We appreciate the reviewers’ insightful comments. Indeed, in another paper (doi: 10.1093/nr/nwae182), employing conventional α-Syn pull-down assay and LC-MS lipidomics method, we found that α-Syn has a preference for binding to lysophospholipids across in vivo and in vitro systems. Additionally, by comparing the lipid compositions of mouse brains, SVs and SV lipid-raft membranes, we found LPC levels to be twice as high in SVs compared to brain homogenates, and twice as high in lipid-raft membranes compared to non-lipid-raft membranes. Altogether, these findings emphasize the physiological relevance of understanding the mechanism by which Ac-α-syn mediated SV clustering through LPC.

      Liquid-liquid phase separation has been implicated in the assembly and maintenance of SV clusters, and we believe that the SV cluster liquid phase is interconnected by highly abundant proteins with multivalent low-affinity interactions. Besides the previously discovered protein-protein interactions between α-Syn and synapsin (doi: 10.1016/j.jmb.2021.166961) or VAMP2 (doi: 10.1038/s41556-024-01456-1) that contribute to SV condensates, protein-lipid interactions between α-Syn and acidic phospholipids or LPC may also play a role. Furthermore, post-translational modifications, such as N-acetylation of α-Syn, may also contribute to SV condensates.

      Reviewer #2 (Recommendations For The Authors):

      In Fig. 2, the authors indicate that for the binding assay both vesicle populations, the immobilized "acceptor" and the superfused "donor" population were labeled with different fluorescent dyes whereas in the text it is stated that the immobilized acceptor liposomes were unlabeled. Please clarify. Moreover, a control is missing showing that binding indeed depends on the immobilised liposome fraction and does not occur in their absence. This control is important because due to the long incubation times non-specific adsorption may occur which may be enhanced by adding destabilizing LPC or charged PS to the membrane.

      We thank the reviewer for pointing out this inconsistency. To avoid signal leakage from a high concentration of DiD vesicles upon green laser irradiation, we immobilized unlabeled vesicles. We have revised the Figure 2a as well as the figure caption.

      Regarding the control mentioned by the reviewer, we agree with the reviewer that non-specific binding could occur with the long incubation. In fact, a layer of highly dense liposomes (100 μM) immobilized on the imaging surface is also for reducing non-specific interactions. In the absence of this layer of immobilized liposomes, we did see a high level of non-specific binding that significantly impacted our experiments. Therefore, we need to perform clustering experiments in the presence of immobilized liposomes.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The manuscript co-authored by Pál Barzó et al is very clear and very well written, demonstrating the electrophysiological and morphological properties of human cortical layer 2/3 pyramidal cells across a wide age range, from age 1 month to 85 years using whole-cell patch clamp. To my knowledge, this is the first study that looks at the cross-age differences in biophysical and morphological properties of human cortical pyramidal cells. The community will also appreciate the significant effort involved in recording data from 485 cells, given the challenges associated with collecting data from human tissue. Understanding the electrophysiological properties of individual cells, which are essential for brain function, is crucial for comprehending human cortical circuits. I think this research enhances our knowledge of how biophysical properties change over time in the human cortex. I also think that by building models of human single cells at different ages using these data, we can develop more accurate representations of brain function. This, in turn, provides valuable insights into human cortical circuits and function and helps in predicting changes in biophysical properties in both health and disease.

      Strengths:

      The strength of this work lies in demonstrating how the electrophysiological and morphological features of human cortical layer 2/3 pyramidal cells change with age, offering crucial insights into brain function throughout life.

      Weaknesses:

      One potential weakness of the paper is that the methodology could be clearer, especially in how different cells were used for various electrophysiological measurements and the conditions under which the recordings were made. Clarifying these points would improve the study's rigor and make the results easier to interpret.

      Reviewer #2 (Public review):

      Summary:

      In this study, Barzo and colleagues aim to establish an appraisal for the development of basal electrophysiology of human layer 2/3 pyramidal cells across life and compare their morphological features at the same ages.

      Strengths:

      The authors have generated recordings from an impressive array of patient samples, allowing them to directly compare the same electrophysiological features as a function of age and other biological features. These data are extremely robust and well organised.

      Weaknesses:

      The use of spine density and shape characteristics is performed from an extremely limited sample (2 individuals). How reflective these data are of the population is not possible to interpret. Furthermore, these data assume that spines fall into discrete types - which is an increasingly controversial assumption.

      Many data are shown according to somewhat arbitrary age ranges. It would have been more informative to plot by absolute age, and then perform more rigourous statistics to test age-dependent effects.

      Overall, the authors achieve their aims by assessing the physiological and morphological properties of human L2/3 pyramidal neurons across life. Their findings have extremely important ramifications for our understanding of human life and implications for how different neuronal properties may influence neurological conditions.

      Reviewer #3 (Public review):

      Summary:

      To understand the specificity of age-dependent changes in the human neocortex, this paper investigated the electrophysiological and morphological characteristics of pyramidal cells in a wide age range from infants to the elderly.

      The results show that some electrophysiological characteristics change with age, particularly in early childhood. In contrast, the larger morphological structures, such as the spatial extent and branching frequency of dendrites, remained largely stable from infancy to old age. On the other hand, the shape of dendritic spines is considered immature in infancy, i.e., the proportion of mushroom-shaped spines increases with age.

      Strengths:

      Whole-cell recordings and intracellular staining of pyramidal cells in defined areas of the human neocortex allowed the authors to compare quantitative parameters of electrophysiological and morphological properties between finely divided age groups.

      They succeeded in finding symmetrical changes specific to both infants and the elderly, and asymmetrical changes specific to either infants or the elderly. The similarity of pyramidal cell characteristics between areas is unexpected.

      Weaknesses:

      Human L2/3 pyramidal cells are thought to be heterogeneous, as L2/3 has expanded to a high degree during the evolution from rodents to humans. However, the diversity (subtyping) is not revealed in this paper.

      Recommendations for the authors: 

      Reviewer #1 (Recommendations for the authors):

      The manuscript co-authored by Pál Barzó et al is very clear and very well written, demonstrating the electrophysiological and morphological properties of the human cortical layer 2/3 pyramidal cells across a wide age range, from age 1 month to 85 years using whole-cell patch clamp. To my knowledge, this is the first study that looks at the cross-age differences in morphological and electrophysiological properties of human cortical pyramidal cells. The community will also appreciate the significant effort involved in recording data from 485 cells, given the challenges associated with collecting data from human tissue. understanding the electrophysiological properties of individual cells, which are essential for brain function, is crucial for comprehending human cortical circuits. I think this research enhances our knowledge of how biophysical properties change over time in the human cortex. I also think that by building models of human single cells at different ages using these data, we can develop more accurate representations of brain function. This, in turn, provides valuable insights into human cortical circuits and function and helps in predicting changes in biophysical properties in both health and disease.

      We are grateful for the positive evaluation of our work. We also thank the reviewers for their comments and believe that our manuscript has improved significantly with their help. In addition to the reviewer’s suggestions for improvement, further cell reconstructions were performed to make the anatomical data more robust (n = 1,2,3,3,4,3,2 additional reconstruction in age groups infant, early childhood, late childhood, adolescence, young adulthood, middle adulthood and late adulthood, respectively; Σn = 18). Four additional cells were added to the spine analysis and the statistics associated with each additional dataset were updated.

      I have some comments, particularly regarding the methodology and data presentation, to improve the clarity of the paper

      (1) I assume the tissue is from the resected area adjacent to the tumor. Could you please clarify this in the Methods section?

      Thank you for this comment, it has been clarified in the Methods section with the following sentence: “We used human cortical tissue adjacent to the pathological lesion  that had to be surgically removed from patients (n = 63 female  n = 45 male) as part of the treatment for tumors, hydrocephalus, apoplexy, cysts, and arteriovenous malformation.”

      (2) Regarding the presentation of data in the Methods section, could you please clarify whether the authors used different cells for measuring the various electrophysiological properties? The number of recorded cells for calculating subthreshold properties (e.g., late adulthood: n = 113) differs from the number the cells used for calculating suprathreshold properties (e.g., late adulthood: n = 83). If this is the case, it may make it difficult to compare the electrophysiological properties. Could you please clarify this?

      The different element numbers are indeed due to the fact that different quality criteria were defined for the analysis of fast and slow signals. For the analysis of fast signals (e.g. AP half-width, AP upstroke velocity, AP amplitude), higher quality requirements were established therefore cells with high series resistance (> 30 MΩ) were excluded. We have updated and clarified the recording conditions in the text, figures, and methodology section accordingly.

      (3) Additionally, they mentioned that their recordings were done at zero holding current and at more than -50 pA. Could you clarify whether the data from these two sets of experiments were combined? If so, please provide an explanation in the methods section.

      Basically, we wanted to determine the parameters of the potential changes of the membrane at rest. However, for technical reasons related to the biological amplifier, in some of the experiments a certain continuous holding current may be present during the measurement (3.5% of all experiments). The holding currents were in the range of -50 pA to +60 pA. Within this range, previously checked on mouse neurons we have not found linear correlation between the electrophysiological properties and the holding current. This is reported in the Methods section.

      (4) This section needs revision. It is unclear why different series resistances (Rs) or different cells were used to compute various electrophysiological properties." To calculate passive membrane properties (resting membrane potential, input resistance, time constant, and sag) either cells with series resistance (Rs): 22.85 {plus minus} 9.04 MΩ (ranging between -4.55 MΩ and 56.76 MΩ) and 0 pA holding current (n = 154), or cells with holding current > -50 pA (-7.46 {plus minus} 28.56 pA, min: -49.89 pA, max: 59.68pA) and Rs < 30 MΩ (18.96 {plus minus} 6.48 MΩ) (n = 23) were used. For the analysis of high frequency action potential features (AP half-width, AP up-stroke velocity, AP amplitude and rheobase) cells with Rs < 30 MΩ (n = 331 cells with Rs 19.2 {plus minus} 6.6 MΩ) and holding current > -50pA (n = 308 with 0 pA holding current and Rs: 19.22 {plus minus} 6.59 MΩ, n = 23 withholding current: -7.46 {plus minus} 28.56 pA and Rs: 18.96 {plus minus} 6.48 MΩ) were used."

      To make the chapter clearer, we simplified the cell groups used to analyse the different electrophysical properties and revised the Method section as follows: “For the analysis of the electrophysiological recordings n = 457 recordings with a series resistance (Rs) of 24.93 ± 11.18 MΩ (max: 63.77 MΩ) were used. For the analysis of fast parameters related to the action potential (AP half-width, AP upstroke velocity, AP amplitude and rheobase), higher quality requirements were set and cells with Rs > 30 MΩ were excluded. This reduced the data set to n = 331 cells with Rs 19.42 ± 6.2 MΩ.”

      (5) The authors recorded the sag ratio using a -100 pA injected current. Is there a technical reason why they did not inject more than -100 PA?

      There is no particular technical reason, we use similar to others this current amplitude for voltage response recordings over the years to record electrophysiological traces.

      (6) In the abstract, the authors mentioned that data were recorded from ages 1 month to 85 years. However, in the results, they stated that data were recorded from ages 0 to 85 years. Could you please clarify this discrepancy?

      We corrected this discrepancy.

      (7) Additionally, the results mention that data were collected from 485 human cortical layer 2/3 (L2/3) pyramidal cells, but subthreshold membrane features such as resting membrane potential, input resistance, time constant (tau), and sag ratio were calculated in 475 cortical pyramidal cells from 99 patients. Could you please clarify these discrepancies? In the discussion "We recorded from n = 457 human cortical excitatory pyramidal cells from the supragranular layer from birth to 85 years"

      Thank you for pointing this out, we have corrected the error. Although our full data set contained 485 pyramidal cells, 28 recordings were excluded from the electrophysiological analysis and were used for morphological evaluation only, therefore 457 recordings were used for passive parameter measurements.

      (8) Regarding the distance from the pia to the border layer L1/L2, did the authors notice any differences across ages?

      To investigate whether the thickness of cortical layer 1 changes throughout life, we measured the L1 thickness and found no significant differences between age groups (P = 0.09, Kruskal-Wallis test) (Author response image 1).

      Author response image 1.

      Thickness of cortical layer 1 at different life stages. (A) Boxplot shows the thickness of layer 1. (B) Scatter plot shows the distribution of L1 thickness measured on the reconstructed cells. Age is shown in years on a logarithmic scale, dots are color-coded according to the corresponding age groups.

      (9) I am not sure why they referred to the data as layer 2/3 when most of the data, based on Figure 1E, were recorded from a distance of 0-200 µm from the L1/L2 border. Could it be that there is no significant depth-dependent variation in electrophysiological properties, as reported by Berg (2021), Kalmbach (2018), and Chameh (2021)?

      Although the vast majority of our data comes from a distance of less than 200 μm from the L1/L2 border, we cannot neglect the fact that our dataset also contains a small number of cells deeper than this, which are layer 3 cells. Apart from some differences shown in Supplementary Figures 7-9, we found no general difference between cells located at a distance of less than 200 μm and more than 200 μm from the L1 border.

      (10) In Figure 1, there is variability in resting membrane potential (RMP), tau, and input resistance (IR) within the infant age group. However, this trend is not observed in the sag ratio. Could you please discuss this finding?

      The large variance in the data is due to dramatic changes in these three parameters during the first year of life. Supplementary Figure 3 shows the comparisons of parameter distributions of patients between 0-6 months and 6-12 months. The sag amplitude in these cells is generally low therefore no such large changes could have occurred in them.

      (11) Did the authors use a K-Nearest Neighbors (KNN) test to assess the accuracy of the infant cluster in Figure 3F?

      Based on eight electrophysiological features of the cells (resting Vm, input resistance, tau, sag ratio, rheobase, AP half-width, AP up-stroke, and AP amplitude), the infant pyramidal cells on a UMAP form a distinct group (Author response image 2A) represented by cluster 4 on Author response image 2B. When calculating the sum of the Euclidean distances of cells within the cluster from the centroid, the isolated infant group (cluster 4) shows the smallest distance value from the centroid (cluster 1: 40.2, cluster 2: 36.21, cluster 3: 39.96, cluster 4: 5.72, cluster 5: 39.2, cluster 6: 55.74, cluster 7: 54.27), demonstrating that infant cells create a discrete cluster distinct from other age groups (Author response image 2B).

      Author response image 2.

      (A) Uniform Manifold Approximation and Projection (UMAP) of 8 selected electrophysiological properties (resting Vm, input resistance, tau, sag ratio, rheobase, AP half-width, AP up-stroke, and AP amplitude) with data points for 331 cortical L2/3 pyramidal cells, colored with the corresponding age groups. (B) UMAP colored by k-means clustering with 7 clusters, red crosses represent the centroids of the clusters.

      (12) Missing citation: 'Previous research has shown that the biophysical properties of human pyramidal cells show depth-related correlations throughout L2/3 (Berg et al., 2021).' Please include citations for Kalmbach (2018) and Chameh (2021).

      We thank for the additional references, these studies are now cited.

      (13) Have they noticed any morphological properties differences among the different cortical lobes (Parietal, Temporal, Frontal, and Occipital). It would be beneficial to present this data, especially since they have a sufficient sample size from each cortical lobe.

      The majority of our data set on the morphological properties of pyramidal cells comes from the parietal (n = 17 cells) and temporal lobe (n = 15). We found no significant differences in the morphological properties of cells from these two brain regions and no differences between age groups in the same cortical lobes.

      (14) Have the authors found differences in spine characteristics among different cortical areas, as reported previously by 10.1023/a:1024134312173).

      We found morphological differences in dendritic spines in the different brain regions, yet, our data are limited to draw definitive conclusions.

      Reviewer #2 (Recommendations for the authors):

      Major

      (1) I believe that these data presented in all main text figures would be more intuitive to be plotted on a log(age) scale, such as shown in supplementary Figure 13. The bounds of the ages used for different groups, as summarised in Figure 1 feel somewhat arbitrary.

      Recent neuroscientific studies on postnatal ageing mainly use the age-group comparison format (Kang 2011, Bethlehem 2022), which has been defined based on milestones in the cognitive, motor, social-emotional, and language/communications domains of observable behaviour (Zubler et al. 2022, for detailed definitions see Kang 2011). Since many parameters do not vary linearly but take a U-shape (or inverted U-shape), statistical quantification of these is not straightforward, so we would retain the age-group format for the main graphs. However, at the reviewer's suggestion, electrophysiological and morphological parameters are presented on a log(age) scale as supplementary figures (Supplementary Figures 2,4 and 6), also further statistical analysis was also carried out without grouping the data (see response 5).

      (2) The authors present a lot of data values in the text, which is also shown in the figures. This makes reading of the manuscript somewhat difficult in places. For brevity, it may be best to present this data as supplementary tables.

      Thank you for this suggestion. We have inserted these data as tables.

      (3) I am unclear why the authors excluded cells that fired doublets or triplets in Figure 4? Were these included in the passive and AP-specific analysis - but excluded from F-I plots? Please clarify the rationale and the relative abundance of these physiological types based on age - one might predict that more initial-burst firing types are associated with older neurons?

      Thank you for drawing attention to this anomaly. We have updated the figures and text by adding the cells with initial burst firing. These cells are also included in the analysis of passive and action potential properties. In our overall dataset, 6.78% of cells show burst firing; infant: 0%, early childhood: 3.57% (1 cell), late childhood: 0%, adolescence: 11.11% (6 cells), young adulthood: 10.11% (9), middle adulthood: 10.71% (6 cells), late adulthood: 7.96 (9 cells) of all cells including the age groups.

      (4) The statistical analyses performed in Figure 6 are not justified. From the authors' description of these data, they derive spine density measurements from 1 infant and 1 aged adult, then perform pseudoreplicated analysis in these individuals. These data would require greater replication from infant and aged groups - with the possible inclusion of a younger adult group also. It would be ideal to have n=3/age group to allow robust statistical analysis.

      Thank you for this point. Accordingly, we have expanded our data set to include n = 3 infant pyramidal cells (83 days old, from one patient) and n = 3 pyramidal cells from three late adulthood patients (64.3 ± 2.08 years old).

      (5) Given the high number of individuals and replicates throughout this manuscript, a more circumspect approach to statistics would be appreciated, e.g. a generalised linear mixed effects model - with age as a fixed effect and sex, patient, etc as random effects. This may reveal the greatest statistical power of these important and rich data.

      Of the generative models we used the Generalized Additive Mixed Model (GAMM) to describe the relationship between age and the various passive and active electrophysiological features. We defined age with cubic spline smoothing term as the fixed effect and gender, brain area, surgical procedure, and hemisphere as random effects. With GAMM we found that the age-dependent correlation of the examined parameters (resting membrane potential, input resistance, tau, sag ratio, rheobase current, AP half-width, AP up-stroke velocity, AP amplitude, first AP latency, adaptation) was significant, except for F-I slope, described by the model incorporating the four random effects.  We also observed correlation with gender, brain area, hemisphere, and surgical procedure in various intrinsic properties. The Author response table 1 below shows the statistical values of GAMM and the statistical tests used in the manuscript to compare.

      Author response table 1.

      Statistical significance of patient attributes *In the pairwise comparison, the age of cells in the two groups was significantly different: female (subthreshold: 37.36 ± 26.25 years old, suprathreshold: 38.3 ± 25.6 y.o.) - male (subthreshold: 24.86 ± 23.7 y.o., suprathreshold: 25.7 ± 23.93 y.o.), subthreshold: P = 1.96*10-6, suprathreshold: P = 3.25*10-5 Mann-Whitney test. **In the pairwise comparison, the age of cells in the two groups was significantly different: surgical procedure: tumor removal (subthreshold: 33.72 ± 24.33 y.o., suprathreshold: 36.43 ± 27.07 y.o.) - VP shunt (subthreshold: 27.38 ± 29.69 y.o., suprathreshold: 27.07 ± 29.37 y.o.) subthreshold: P = 3.68*10-3, suprathreshold: P = 1.64-10-3, Mann-Whitney test)

      (6) Regarding the morphological diversity of dendritic spines. There is some debate in the field as to whether the distinction of specific dendritic spine types - as conveyed in this manuscript - are true subtypes or reflect a continuum of diverse morphology (see Tønneson et al., 2014 Nature Neuroscience). It is appreciated that the approach taken by the authors is the dogma within the field - however, dogma should continue to be challenged. Given that the authors have used DAB labelling combined with light microscopy, the possibility of accurately measuring spine morphology required for determining this continuum is extremely limited (e.g. Li et al., (2023) ACS Chemical Neuroscience). I would suggest that alongside the inclusion of further replicates for their spine analysis, the authors tone down their discussion of spine subtypes given the absence of any synaptic data presented in this current study to support the maturation (or otherwise) of dendritic spine synapses.

      Many thanks to the reviewer for this comment. We agree with the drawbacks of our method for testing spine categorization. To increase the reliability of our results, we increased the number of pyramidal cells in the infant and late adult groups. We also revised the figure and as suggested by Reviewer#3 added photos of spines to each category in addition to schematic drawings to give an impression of the phenotype. In the discussion, we only address the differences between two readily separable mushroom and filopodial forms and highlight results that only confirm findings already known in the literature. Although the concerns are valid, we apply the sentence from the above Li et al. (2023) reference “...the most sophisticated equipment may not always be necessary for answering some research questions”. We believe that it is worth sharing our data and the somewhat subjective grouping, which we hope to report in more detail in the future.

      Minor

      (1) The order of the supplemental materials is out of order with their introduction in the text. These should be revised to reflect the order mentioned in the text.

      Thank you for your comment, we have corrected the order of the supplementary figures.

      (2) In Supplementary Figure 13, it would be informative to include some form of linear regression to confirm whether an age-dependent effect on neuronal morphology exists.

      We have added linear regression to the figure.

      (3) Figure 3D = should this be AP - not Ap?

      Thank you for drawing attention to this, we have corrected the incorrect typing on the figure.

      (4) For UMAP analysis in Figure 3, please provide a table of the features that were used for the 32 & 8-parameter UMAPs respectively.

      We have added a table to the Materials and methods section of all the electrophysiological features included in the UMAP.

      (5) For morphology, please include pia and L1/2 border for reconstructions shown for clarity.

      We indicated both the pia mater and the L1/2 border on the figure showing all the reconstructions (Supplementary Figure 10).

      Reviewer #3 (Recommendations for the authors):

      Major:

      (1) Data were obtained from different cortical areas of human patients of different ages. The electrophysiological characteristics were largely independent of other attributes such as disease, gender, and cortical areas (Supplementary Figure 2). To support the conclusion that age is one of the key attributes responsible for change, a similar morphological analysis would be necessary for gender.

      We updated the text and the supplementary section with Supplementary Figures 18-21. to determine if age-related differences in biophysical characteristics are affected by the patient's gender.

      (2) 'mushroom-shaped, thin, filopodial, branched, and stubby spines'

      Show photographs of individual typical spine types to make the classification easier to understand.

      To make the classification more understandable, we have updated the corresponding figure (Figure 6) with representative photos of the dendritic spine types.

      (3) Some electrophysiological parameters of the infant group showed higher deviations compared to other age groups. A UMAP (Supplementary Figure 2) shows that some infant neurons form a small cluster, while other infant neurons are scattered with neurons of other ages. Are there any differences between infant neurons in the small cluster and other infant neurons with respect to attributes other than age?

      For most of the electrophysiological parameters, the infant age group showed age-dependent variability, as illustrated in Supplementary Figures 3, 2,4 and 6 . The small group of infant cells is not clustered by gender, brain region, or medical condition, as shown in Supplementary Figure 5.

      (4) A recent paper (Benavides-Piccione et al. 2024, doi:10.1093/cercor/bhae180) reported that some morphological parameters of human layer 3 neurons differ between occipital and temporal regions. Area-dependent morphological differences have been also reported in non-human primates. Discussion of potential contradictions may therefore be requested.

      Most of the cells we reconstructed originated from the parietal and temporal regions (parietal: n = 20, temporal: n = 23, frontal: n = 15, occipital: n = 5). We found no differences in morphological features between these two regions, and we also found no significant differences when we compared the cells from the same brain regions by age group.

      (5) L2/3 cells of rodents are morphologically differentiated according to cortical depth. If individual L2/3 cells of humans are less differentiated than those of rodents, this point should be discussed.

      Depth-related morphological heterogeneity has already been reported previously (Berg 2021), however, our dataset on the morphological characteristics of pyramidal cells is from the upper L2/3 region, with their soma located at a distance of 117.85 ± 65.3 μm (between: 11.05 and 243.3 μm) from the L1/L2 border. Therefore, we cannot conclude from our data whether humans are less differentiated than rodents.

      Minor:

      (1) Cell body morphology may affect electrophysiological properties. However, morphological quantification of cell bodies has not been reported. It may be added.

      In our DAB-labeled samples, we could not perfectly measure the total volume of the cell body in the reconstructions, therefore our measurements regarding the soma morphology are not shown in the manuscript. When comparing the cell body area of the middle sections of the soma of the reconstructed cells between the age groups, we found no significant differences (P = 0.082, Kruskal–Wallis test).

      (2) 'The adaptation of the AP frequency response'

      Describe how this parameter was obtained.

      The adaptation of the AP frequency response or adaptation was calculated as the average adaptation of the interspike interval between consecutive APs.

      (3) 'we excluded cells showing initial duplet or triplet action potential bursts'

      Why were the burst cells excluded from the analysis?

      We have modified the figures and text to include cells with initial burst firing.

      (4) Electrophysiological characteristics to be analyzed:

      Spike thresholds and afterhyperpolarizations

      We found age-related differences in the amplitude of the afterhyperpolarization (P = 2.56*10<sup>-30</sup>, Kruskal-Wallis test) and in the threshold of the action potential (P = 5.24*10<sup>-12</sup>, Kruskal-Wallis test) (Author response image 3).

      Author response image 3.

      Age-dependence of afterhyperpolarization and AP threshold. (A-B) Boxplots show the differences in afterhyperpolarization (AHP) amplitude (A) and AP threshold (B) between age groups. Asterisks indicate statistical significance (* P < 0.05, ** P < 0.01, *** P < 0.001, Kruskal-Wallis test with post-hoc Dunn test). (C-D) Scatter plots show AHP amplitude (C) and AP threshold (D) across the lifespan. Age is shown on a logarithmic scale, dots are colored according to the corresponding age group.

      (5) 'We identified and labeled each spine on n = 2 fully 3D-reconstructed cells'

      To which cortical area do these cells belong?

      At what depths are they distributed?

      Is it possible to report the number of spines, in addition to the density per unit length?

      We increased the number of cells in which we analyzed dendritic spine density. The data shown in Figure 6. are from pyramidal cells from an infant patient (n = 3 from a single patient) and late adulthood patients (n = 3 from 3 patients) (Supplementary Figure 13). The infant cells are from the same patient, the sample is from the right parietal lobe, and the patient is 83 days old. The older cells are from three different patients (#1: 65 years old, right temporal lobe; #2: 66 years old, right parietal lobe; #3: 62 years old, right frontal lobe). Infant cells are located 144.43 ± 45.26 µm (#1: 109.3, #2: 128.49, #3: 195.5 µm), late adult cells 161.22 ± 66.22 µm (#1: 183.5, #2: 213.42, #3: 86.73 µm) from the L1/2 border. We provide the number of spines in an additional supplementary table (Supplementary table 2.).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Thank you for your time and consideration on our submission. We also thank the reviewers for their consideration and helpful comments.  We have revised the introduction, results, and discussion sections of the revised manuscript in accordance with the reviewers’ suggestions, which have enhanced the clarity of our work. Specifically, we have clarified that the aim of the study is to report newly discovered sperm behaviours inside the uterus via high resolution deep tissue live imaging, and to stimulate further studies and discussion in the field of postcopulatory sexual selection in mice based on our observations. To the best of our knowledge, many of the specific sperm behaviours described in our manuscript are being reported for the first time, proven through direct observation inside the living reproductive tract.

      We have also restructured our manuscript and moved our hypothetical interpretations based on our experimental observations to the discussion section. We hope that these revisions have clarified our claims and that our revised manuscript effectively communicates the importance of our findings and its values in prompting new questions and insight that encourage further studies. We believe that our work clearly demonstrates the importance of sperm/reproductive tract interaction, which cannot be adequately studied in artificial environments, and may become an important guideline for designing future experiments and studies.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      The authors want to determine the role of the sperm hook of the house mouse sperm in movement through the uterus. The authors are trying to distinguish between two hypotheses put forward by others on the role of the sperm hook: (1) the sperm cooperation hypothesis (the sperm hook helps to form sperm trains) vs (2) the migration hypothesis (that the sperm hook is needed for sperm movement through the uterus). They use transgenic lines with fluorescent labels to sperm proteins, and they cross these males to C57BL/6 females in pathogen-free conditions. They use 2-photon microscopy on ex vivo uteri within 3 hours of mating and the appearance of a copulation plug. There are a total of 10 post-mating uteri that were imaged with 3 different males. They provide 10 supplementary movies that form the basis for some of the quantitative analysis in the main body figures. Their data suggest that the role of the sperm hook is to facilitate movement along the uterine wall. 

      We thank the reviewer for summarizing our work and the critical review of our paper. As summarized, the sperm hook has been primarily associated with the sperm cooperation (sperm hook) hypothesis and the migration hypothesis. However, we would like to emphasize that the aim of our work is not to cross check between the two hypotheses. Our aim was not to disprove either hypothesis, but rather to develop an experimental platform that enables detailed observation of sperm migration dynamics within the live reproductive tract. 

      Through live imaging, we observed both the formation of sperm trains as well as interaction between the sperm and female reproductive tract epithelium. However, in our observations, we could not find advantage in terms of faster movement for the rarely observed sperm trains. While these events were infrequent in our experiments, we are not asserting that the sperm train hypothesis is invalid but rather reporting our observations as is. 

      The main findings of our work lie in the newly observed dynamic behaviours of mouse sperm interacting with the female reproductive tract epithelium. Specifically, tapping and associated guided movement along the uterus wall, anchoring and related resistance to internal fluid flow and migration through the utero-tubal junction, and self-organized behaviour while clinging onto the colliculus tubarius. We have extensively revised the manuscript structure to clarify our findings.

      Strengths: 

      Ex vivo live imaging of fluorescently labeled sperm with 2-photon microscopy is a powerful tool for studying the behavior of sperm. 

      Weaknesses: 

      The paper is descriptive and the data are correlations. 

      The data are not properly described in the figure legends. 

      When statistical analyses are performed, the authors do not comment on the trend that sperm from the three males behave differently from each other. This weakens confidence in the results. For example, in Figure 1 the sperm from male 3613 (blue squares) look different from male 838 (red circles), but all of these data are considered together. The authors should comment on why sperm across males are considered together when the individual data points appear to be different across males. 

      Thank you for your comments and suggestions. We have revisited all figure legends and made the necessary amendments (shown in the red-lined manuscript). Please note that, for a better flow of the paper, the previous Figure 1 has been changed to Figure 2 in the revised manuscript.

      Regarding the analysis using different males, we would like to explain the statistics used. We used generalized linear mixed models to test the effect of the Angle and Distance to the wall on the migration kinetic parameters. The advantage of the generalized linear mixed models is that they consider individual variations in the data as an error term, thereby controlling such individual variations. 

      There are two main factors contributing to individual variations. One is, as you pointed out, the difference in sperm from different males. However, we used genetically similar mice, so genetical variations must be minimal. Nonetheless, there must be individual differences that caused variations including age, stress level as well as body conditions. As these factors cannot be controlled, we used the mixed model approach where individual variations are grouped within the individual. This approach enabled us to test the effect of each explanatory variable (Angle and Distance) within an individual. 

      The second factor that could cause variations is the female oestrous status. To avoid artifacts that could influence sperm behaviour, we did not use any invasive methods, such as hormone injections, to control or induce female oestrus. We controlled for this possible effect by including the mating date as a random effect. Since each female was used only once, the mating date reflects the variation caused by each female.

      To provide further verification that the variation between individual males do not affect our results, we conducted analysis per individual male and mating dates (per each female). As clearly shown, sperm data points from individual males or female also show consistent clear correlations with the distance from the uterus wall. As pointed out, while the mean sperm speed could be different between individuals, they are not the topic we are interested in here. Our interest here is the effect of the distance between sperm and the uterine wall. Additionally, the variation between males is not always larger than those effect of the day (female), which in total suggest that integrating male variation is not essential. We have added this information to Supplementary Figure (Fig. S3) of the revised supplementary materials.

      Moving forward, we can also consider the same analysis for the effects of the distance from wall on sperm SWR and LIN (linearity of forward progression) where no statistical significance was found. As see in the following figures, no statistically significant effect of the distance to wall on SWR and LIN are seen in that the regression lines drawn for each male and mating dates.

      In summary, the statistical approach we used here has successfully reflected variations in sperm kinetics from different males as well as the variance from different females. We hope that our explanations and additional analysis answer your concerns. 

      Movies S8-S10 are single data points and no statistical analyses are performed. Therefore, it is unclear how penetrant the sperm movements are. 

      With respect to Movie S8, Figure 4A and B (Figure 5A and B in the current revised manuscript) depict the trajectories of accumulated spermatozoa (sperm trains) in the female uterus, as shown in Movie S8. We have added this information to the revised figure legend (L 293) for clarity. We could not observe sperm trains that moved faster than single sperms during over 100 hours of observation and collection of over 10TB of images. The three sperm trains presented in Fig. 5B were the sperm trains that moved in the head-forward direction. Most other identifiable trains, or clusters, did not move or could not move forward as their heads were entangled randomly. Although we of course agree that a statistical test for Movie S8 (also Fig. 5B) would be great, due to the small number of sperm trains we found, we could not perform meaningful statistical tests. Instead, we provided all data in the box plots in Fig. 5C so that readers can evaluate and understand our points. We believe that this is a more neutral way of presenting our data rather than providing statistical significance.

      Regarding Movies S9 and S10, we are not entirely sure whether we understood your comments clearly. It would be very helpful if you could point out more specifically to the manuscript with line numbers as we would like to address your concerns and suggestions, and we believe that your input will improve our manuscript. We did not describe the penetration of sperm in these movies. Movies S9 and S10 are newly found sperm behaviours inside the UTJ and Isthmus. We observed that sperm beating is influenced by the width of luminal space as well as internal flow as see in Movies S9 and S10. As our animal model only expresses red fluorescence in the midpiece, accurate beating frequency measurement cannot be performed. However, we can clearly observe that beating is not continuous and almost results in a halt with respect to reproductive tract variations. We revised our description about the findings about beating speed changes in the revised manuscript (LL 305-335).  

      Movies S1B - did the authors also track the movement of sperm located in the middle of the uterus (not close to the wall)? Without this measurement, they can't be certain that sperm close to the uterus wall travels faster. 

      We revised the new Movie S1B to include videos that were used for the sperm migration kinetics analysis in Figure 2 (previously Figure 1). As you can see in the movies, the graph, and statistical analysis, there is a clear trend showing spermatozoa migration is slower as a function of distance from the uterus wall. Regarding your comment with respect to the middle of the uterus (not close to the wall), we have added another movie (Movie S1C) that was acquired at different depths from the wall (going towards the centre of the uterus). As clearly seen in Movie S1c, when imaging deeper into the uterus, there are an increasing number of inactive or slow-moving spermatozoa. Since the diameter of the uterus is easily over 2mm, we currently do not have optical access to exactly the centre of the uterus, but for all depths that are observable, spermatozoa near the wall were clearly faster.

      Movie S5A - is of lower magnitude (200 um scale bar) while the others have 50 and 20 uM scale bars. Individual sperm movement can be observed in the 20 uM (Movie 5SC). If the authors went to prove that there is no upsucking movement of sperm by the uterine contractions, they need to provide a high magnification image. 

      The main focus of video S5A, is the intramural UTJ where spermatozoa are located in rows within narrow luminal space (see Author response image 1). When there is up-suck like sperm passive carriage, there must be sperm movement from the uterus to intramural UTJ as in Author response image 1 left. However, there is no such sperm movement could be seen in our observations, as shown in Movie 5A. Importantly, as you can see in Movie 5A, indicated by an arrow from 5 sec to 6 sec, some spermatozoa are moving downward (see also Author response image 1 right). This is the opposite direction of movement with respect to possible up-suck like sperm carriage. 

      Genetical evidence also support up-suck like passive sperm carriage is not the case for sperm migration from the uterus to UTJ. If environmental up-suck like passive transfer plays an important role, it is unlikely that genetically modified spermatozoa cannot pass the entrance of the intramural UTJ (Nakanishi et al., 2004, Biol. Reprod.; Li et al., 2013, J. Mol. Cell Biol.; Larasati et al., 2020, Biol. Reprod.; Qu et al., 2021, Protein Cell). 

      Author response image 1.

      The left image represents what is expected when up-suck like passive sperm carriage occurs. The right image represents what is actually experimentally observed in the intramural UTJ (see Movie S5A). The direction of the arrowheads indicates the direction of sperm movement.

      Movie S8 - if the authors want to make the case that clustered sperm do not move faster than unclustered sperm, then they need to show Movie S8 at higher magnification. They also need to quantify these data. 

      We understand your concern. As shown in Figure 5B, we included all sperm kinetics data of each sperm train and unlinked spermatozoon around the trains as individual dots. The only analysis we did not conduct was a statistical test with the data as it could be erroneous due to the large sample size difference (3 trains vs 181 unlinked spermatozoa). As the medians of the four sperm kinetic parameters are similar except SWR, we concluded that they are not necessarily faster than unlinked single spermatozoa. Since there is no known advantage to spermatozoa (including sperm trains) with intermediate moving speeds for sperm competition – for example in IVF, success fertilization rate is high when faster and active spermatozoa with normal shape are selected (Vaughan & Sakkas, 2019, Biol. Reprod.) – it is questionable whether there can be an advantage to the formation of sperm trains whose speed is not faster than unlinked spermatozoa in our data.

      However, we do not agree with your comment regarding the need for higher magnification. Measurement of the sperm migration speeds (kinetic parameters) does not require measurement of exact tail movements in this study. Only sperm heads were tracked to measure their trajectory and such tracking was better done at low mag. For example, measuring the speed of a car does not need higher magnifications to visualize the rotation of the wheels. Additionally, including the effect of observation magnification on the sperm kinetic parameters for all 4 GLMM models for Figure 2 (Table S3) does not change the result, which shows that magnification is not a factor that influences our analysis. 

      Movie S9C - what is the evidence that these sperm are dead or damaged? 

      Thank you for your valid comment. We tracked sperm movements for at least 10 minutes and such entangled spermatozoa in the UTJ never became re-active. As you can see in the new Movie S9b, entangled spermatozoa were also acrosome re-acted (green acrosome head is gone) while active spermatozoa are responding to peristaltic movement by exhibiting movements within the same video. However, as you pointed out, we did not measure their viability with appropriate dyes. Although we also considered about extracting these spermatozoa and performing viability tests, we could not come up with a way to specifically extract the exact spermatozoa that were imaged. Considering your comments, we changed the term damaged or dead to inactive in the revised manuscript (LL 313-316, Legend Figure 6D. LL 380-384).

      Movie S10 - both slow- and fast-moving sperm are seen throughout the course of the movie, which does not support the authors' conclusion that sperm tails beat faster over time. 

      There must have been a misunderstanding. We did not indicate that sperm beating got faster over time anywhere in the main manuscript, including the figure legend and related movie captions. As correctly pointed out, the sperm beating speed changes over time (not getting faster over time) and shows a correlation with internal fluid flow and width of luminal space (LL 320-332). Please let us know if you meant something else. 

      Reviewer #2 (Public Review): 

      Summary: 

      The specific objective of this study was to determine the role of the large apical hook on the head of mouse sperm (Mus musculus) in sperm migration through the female reproductive tract. The authors used a custom-built two-photon microscope system to obtain digital videos of sperm moving within the female reproductive tract. They used sperm from genetically modified male mice that produce fluorescence in the sperm head and flagellar midpiece to enable visualization of sperm moving within the tract. Based on various observations, the authors concluded that the hook serves to facilitate sperm migration by hooking sperm onto the lining of the female reproductive tract, rather than by hooking sperm together to form a sperm train that would move them more quickly through the tract. The images and videos are excellent and inspirational to researchers in the field of mammalian sperm migration, but interpretations of the behaviors are highly speculative and not supported by controlled experimentation. 

      Thank you for your critical review and valuable comments on our manuscript. As pointed out, some of our findings and suggestions were largely observation based. However, to the best of our knowledge, many of our observations are novel, particularly in the context of live imaging inside the female uterus and reproductive tract. We believe these observations open doors to many questions and follow up studies that can be envisioned based on our findings, which is what drives science forward. 

      That being said, we entirely agree that many follow up experiments need to be designed and performed, especially to validate the exact molecular mechanisms of the observed dynamics. We acknowledge that it is unfortunate we currently lack the proper molecular experimental toolsets to perform further tests. We have removed much of the hypothetical discussions from the results section and moved them to the discussion section. We hope that our revision more clearly defines the observed experimental data and our interpretations.

      Strengths: 

      The microscope system developed by the authors could be of interest to others investigating sperm migration. 

      The new behaviors shown in the images and videos could be of interest to others in the field, in terms of stimulating the development of new hypotheses to investigate. 

      Weaknesses: 

      The authors stated several hypotheses about the functions of the sperm behaviors they saw, but the hypotheses were not clearly stated or tested experimentally. 

      The hypothesis statements were weakened by the use of hedge words, such as "may". 

      We appreciate your helpful comments and have revised our hypotheses and suggestions accordingly. We have removed instances of “may” or revised it to be more direct. We have also moved most of our interpretations and hypotheses from the results to the discussion section. 

      It is important to note that experimental approaches to test what we suggested from our findings in the current ex-vivo observation platform are not trivial and require extensive investigation of several unknown factors of the female reproductive tract. For instance, obtaining detailed information on the chemical characteristics and fluid dynamics in the female reproductive tract is essential to build a microfluidic channel that accurately resembles the uterus and oviduct, replicating what we found in an extracted living entire organ. This poses a significant challenge and requires collaborative expertise from many labs, which we hope to build in the near future. 

      Furthermore, our biggest concern is that, even if we were to construct the appropriate microfluidic channel to test sperm migration, it is very likely that the sperm behaviours that we observed under natural conditions may not be replicated in artificial environments. This raises questions about whether in-silico or in-vitro findings can truly resemble what we reported here using the ex-vivo observation inside a living organ.

      To share our experience related to this difficulty, at the initial stage of our study, we attempted sperm injection combined with fluorescent beads to visualize the fluid flow, as well as dyeing the female reproductive tract and spermatozoa after mating. However, none of these resulted in meaningful results. Another potential approach to perform similar research regarding our claims is using genetical engineering to indirectly confirm the influence of the sperm hook morphology on sperm behaviour. However, such an approach lacks a mechanical demonstration about how the sperm hook interacts with the female reproductive tract. 

      It is unfortunate that the sperm behaviours that we found and reported here are considered as highly speculative. The main findings of our work lie in the newly observed dynamic behaviours of mouse sperm interacting with the female reproductive tract epithelium. Specifically, these behaviours include tapping and associated guided movement along the uterus wall, anchoring and related resistance to internal fluid flow and migration through the utero-tubal junction, and self-organized behaviour while clinging onto the colliculus tubarius. 

      We have extensively revised the manuscript structure to clarify our findings and integrated our points in the introduction. Although we understand our following hypotheses may be considered speculative and the causative relationship between the sperm hook and its role in sperm migration requires further experimental approaches, we believe that the image-based observation of dynamic behaviours of spermatozoa are solid. We believe our findings will facilitate further studies and discussion in the field of studies on postcopulatory sexual selection in rodents.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      The manuscript is written for an expert in a fairly small field. I recommend that the authors rewrite the manuscript to make it more accessible to people outside of the field. These suggestions include 

      (1) Provide a diagram of the female reproductive tract in Figure 1. 

      a. Indicate where sperm enter the tract and the location of the oocyte they are trying to reach. 

      b. Label all areas of the uterus that are mentioned in this study and be consistent about the label. 

      (2) All movies should have a diagram of the location of the uterus that is being imaged. 

      Thank you for the great suggestion. We have added a diagram of the female reproductive tract in the revised Figure 1A. In response to your comments 1a and b, we have indicated such information by including eggs in the ampulla and arrows that indicate sperm migration direction. We have also labelled the name of the specific areas that were studied in the manuscript.

      We are unsure how to integrate the diagram in all movies without reframing the videos, which could cause serious corruption of the files. More importantly, we think that adding the same diagram to all movies may complicate the visuals and disrupt indications and subject in the movie. Instead, we have referred to the common diagram (Figure 1A) in each movie caption, specifying where the video was taken. Thank you for the suggestion. With this information, we hope readers can now more easily understand where we made the observations. 

      (3) The major questions in the field need to be better described in the introduction. 

      Thank you for your valuable suggestions and specific comments which have greatly helped improve our manuscript. We have revised our introduction and discussion sections by adding more literature reviews and integrating studies across a wider range of the postcopulatory sexual selection, as per your suggestion (LL 34-57, LL 385-398).

      (4) The major question that the authors are trying to address should be described in the introduction. 

      Thank you for the helpful suggestion. We have clarified in the introduction that our aim was to contribute to the field of postcopulatory sexual selection in rodents by advancing methodological progress and to stimulate discussion and future research on the function of the sperm hook in murine rodents (LL 76-94) based on our observations.

      (5) A discussion of the sperm hook should be provided. How many species have this structure (or similar structure)? 

      We have integrated your point into the revised discussion section. Essentially, most murine rodent species have sperm hooks (while their exact shapes differ). However, as there are over 500 species and not all of them have been tested, we do not know exactly how many of them have this structure. Therefore, we included paper references that examined species variations in sperm hook characteristics and their possible correlation with sperm competition (LL 385417) in the discussion. Additionally, we also included papers by Breed (2004) and by Roldan et al (1992) that investigated murine rodents with a sperm hook in the introduction section as well (LL 58-61).  

      (6) The figure legends must describe everything in the figure or movie. 

      Thank you for the helpful suggestion. We previously thought that our figure legends may be too long. We have included further information in the figure legends and movie captions. We have also revised the movies by adding some clips following our revision (Movie S1).

      Reviewer #2 (Recommendations For The Authors): 

      Here are some specific concerns I had about the clarity of approach to experiments and interpretations of results. 

      In the Introduction, the authors stated that the study was intended to determine the function of the hooks on the mouse sperm heads. However, in the Results section, the authors did not explain the rationale for the first set of experiments with respect to the overall objective of the study. In this experiment, the authors measured the velocities of sperm swimming in the uterus and found that the sperm moved faster when closer to the uterine wall (VCL, VSL). They concluded that migration along the uterine wall "may" be an efficient strategy for reaching the entrance to the uterotubal junction (UTJ) and did not explain how this related to the function of the hooks. 

      Thank you for your critical comment and guidance. We have changed the order of Figure 1 and Figure 2 and revised the result section to integrate your points. At the initial stage of the study, we expected to find evidence of the function of sperm trains in aiding sperm migration in the female uterus (which has not been observed in the live uterus; previous works were done invitro with extracted sperm from epididymis or uterus after mating). However, what we found was something unexpected: dynamic sperm hook related movements facilitating sperm migration inside the female uterus by playing a mechanical role in sperm interaction with the uterine wall. These results that were presented in the previous Figure 2 has been reorganized as the new Figure 1.

      Based on this observation, our research later moved to clarify whether such sperm-epithelium interaction indeed helps sperm migration. This led us to measure sperm kinetics in relation to their distance and angle to the uterine wall. We have revised our introduction and result parts by integrating these points. We hope that our revision will answer your questions. We have also reduced the use of ‘may’ or ‘can’ in the results section. In the revised manuscript, we have moved such hypotheses to the discussion section and focused on what we observed in the results section.

      The authors proposed that the sperm hook "may" play a crucial role in determining the direction of migration. When sperm encountered a uterine wall, significantly more changed migration direction toward the pro-hook direction than toward the anti-hook direction. In Figure 2B, sperm behavior is not visually understandable nor clearly explained. 

      Thank you for the helpful comments. We have removed “may” and “might” to make our claim clearer and more concise. We have also revised the previous Figure 2B by combining it with the previous Figure 2C (they have been combined into Figure 1C now). We have also revised Figure 1B by increasing the line thickness of the sperm trajectory of the pro-wall-hook direction and added the anti-wall-hook trajectory. We hope that these revisions make the figure easier to understand.

      In Figure 2E, are the authors showing that the tip of the hook is caught between two epithelial cells? Please clarify the meaning of this figure. 

      Please clarify the difference between "tapping" and "anchoring". 

      Thank you for the detailed comments. As you pointed out, we currently have no evidence whether sperm can be caught in epithelia inter-cellular gaps. We have revised this source of confusion by removing the gap in the revised figure (Figure 1E). We have also included the definition of anchoring (LL 142-143) and tapping (LL 128-130). Anchoring facilitates the attachment of sperm to the uterine epithelia. Such anchoring also involves the catching of the sperm head in the inter-mucosal fold or gap, particularly at the entrance of the intramural UTJ at the end of the uterus. Tapping is the interaction between the head hook and epithelia in which the sperm hook is tapping (or patting) on the surface. Sperm tapping can be a byproduct that results from flagella beating when spermatozoa migrate toward the pro-wall-hook direction along the uterine wall (epithelia) or can play some role in sperm migration. As we currently cannot draw a conclusion, we did not integrate the possible function of the tapping in the manuscript.

      The authors proposed that opposite sliding of neighboring mucosal folds lining the UTJ would cause small openings to form, through which only perhaps one sperm at a time could enter and pass through the UTJ into the uterus. This hypothesis was not actually tested. 

      Imaging inside deep tissue is challenging due to light scattering as it penetrates through biological tissue. While this is also true for the uterus, the intramural UTJ is especially difficult to image because the UTJ consists of several thick muscle and cell layers (see Movie S5A). Another challenge is that the peristaltic movement of the UTJ results in constant movement, making continuous tracking of single sperms while passing through the entirety of the UTJ impossible in our current experiments. We have moved this hypothesis to the discussion section and restated that this is a pure hypothetical model (LL 399-406). We hope that our model encourages the community in designing or establishing an improved ex-vivo observation system that may be able to test this hypothetical model in the near future.

      Next, the authors hypothesized that sperm that encounter the small openings in the UTJ may then be guided onward and the hooks could prevent backward slipping. This was also not tested. 

      As you’ve noted, the function of the sperm hook that aids in sliding and preventing backward slipping could not be tested directly in our ex-vivo observation platform that relies on natural movement of the living organ. However, we believe that these limitations also highlight the importance of continued research and the development of more advanced methodologies in this field.

      We would also like to note that we provide direct observations of spermatozoa resisting internal flow due to reproductive tract contractions in Movie S3A, B as well as Movie S5B. We referred to these movies and pointed out the role of anchoring (sperm attachment) in preventing sperm from being squeezing out (LL 140-149, LL 224-241). Unfortunately, we cannot conceive of how this behaviour can be tested additionally in any uterus-resembling microfluidic device or ex-vivo systems. In line with your suggestion, we have rewritten the related result section and moved our related discussions in the result part to the discussion section (LL 224-241, LL 399-417). 

      The authors observed that large numbers of uterine sperm are attached to the entrance of the UTJ. Some sperm clustered and synchronized their flagellar beating. The authors speculated that this behavior served to push sperm in clusters onward through the UTJ. 

      We would like to note that we did not speculate that sperm clustering and their synchronization could serve to push spermatozoa in a cluster to move onward through the UTJ. We only pointed out our observation in recorded videos, that generative flow from the clustered spermatozoa pushed away other spermatozoa as seen in Movie S7 (LL 261-264). Although such sperm cooperation is possible (blocking passage of later sperm), we cannot draw that conclusion from our observation. The possibility you pointed out (pushing sperm onward through the UTJ) was suggested by Qu et al in 2021 [Cooperation-based sperm clusters mediate sperm oviduct entry and fertilization, Protein & Cell] based on their observations on cleared dead reproductive tracts.

      The authors found only a few sperm trains in the uterus, UTJ, and oviduct, so they could not measure sufficient numbers of samples to test whether sperm trains swim faster than single sperm. Without sufficient data, they concluded that the "sperm trains did not move faster than unlinked single spermatozoa." 

      We would like to take this opportunity to clarify our claims. We do not claim that our current experiments can give the final verdict on whether the sperm train hypothesis for faster swimming is correct or not. The phrase “sperm trains did not move faster” was not intended to mean that the sperm train hypothesis is invalid.  We did not draw a conclusion but dryly described the experimental data that we observed (LL 279-286).  We would once again like to emphasize that the main claim of our manuscript is not to rule out the sperm train hypothesis, but to present the various dynamic interactions of the sperm head with the female reproductive tract. To make the statement more balanced, we revised the sentence as “observed sperm trains did not move faster or slower than unlinked single spermatozoa” (LL 281-282).

      The authors hypothesized that the dense sperm clusters at the entrance into the UTJ could prevent the rival's sperm from entering the UTJ (due to plugging entrance and/or creating an outward flow to sweep back the rival's sperm), but they did not test it. 

      We agree that we were not able to test such possible function of the sperm cluster at UTJ entrance. Following your concerns, we revised the result part (LL 256-264) by removing most of our discussions related to the observed phenomena. We also integrated some interpretation rather to the discussion section (LL 421-437) and suggested that future works using appropriate microfluidic channel designs or sequential double mating experiments may be performed for additional tests (LL 443-447). However, we would like to point out that Movie S7C clearly shows surrounding sperms that are swept away from the sperm clusters. Since the sperm density is high, this is almost equivalent to a particle image velocimetry experiment, and we can clearly see the effect of the outward flow generated by the sperm clusters.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Weakness#1: The authors claim to have identified drivers that label single DANs in Figure 1, but their confocal images in Figure S1 suggest that many of those drivers label additional neurons in the larval brain. It is also not clear why only some of the 57 drivers are displayed in Figure S1.

      As described in the Results section, we screened 57 GAL4 driver lines based on previous reports. These included drivers that had been shown to label a single dopaminergic neuron (DAN) or a small subset of DANs in the larval or adult brain hemisphere, suggesting potential for specific DAN labeling in larvae.

      In Figure 1, TH-GAL4 was used to cover all neurons in the DL1 cluster, while R58E02 and R30G08 were well known drivers for pPAM. Fly strains in Figure 1h, k, l, and m were reported as single DAN strains in larvae[1], while strains in Figure 1e, f, g were reported identifying only several DANs in adult brains[2,3]. We examined these strains and only some of them labeled single DANs in 3rd instar larval brain hemisphere (Figure 1f, g, h, l and m). Among them, only strains in Figure 1f and h labeled single DAN in the brain hemisphere, without labeling other non-DANs. Other strains labeled non-DANs in addition to single DANs (Figure 1g, l and m). Taking ventral nerve cord (VNC) into consideration, strain in Figure 1h also labeled neurons in VNC (Figure S1e), while strain in Figure 1f did not (Figure S1c).

      In summary, the driver shown in Figure 1f (R76F02AD;R55C10DBD, labeling DAN-c1) is the only line we identified that labels a single DAN in the 3rd instar larval brain hemisphere without additional labeling. The other lines shown in Figure 1 (g, h, l, m) label a single DAN but also include some non-DANs. Figure 1 focuses on strains that label a single or a pair of DANs.

      Labeling patterns for all 57 driver lines are summarized in Table 1. Figure S1 includes representative examples; full confocal images for all screened strains are available upon request, as stated in the figure legend.

      Weakness #2: Critically, R76F02-AD; R55C10-DBD labels more than one neuron per hemisphere in Figure S1c, and the authors cite Xie et al. (2018) to note that this driver labels two DANs in adult brains. Therefore, the authors cannot argue that the experiments throughout their paper using this driver exclusively target DAN-c1.

      Figure S1c shows a single dopaminergic (DA) neuron in each brain hemisphere. While additional GFP-positive signals were occasionally observed, they did not originate from the cell bodies of DA neurons, as these were not labeled by the tyrosine hydroxylase (TH) antibody. These additional GFP signals primarily appeared to be neurites, including axonal terminals, although we cannot rule out the possibility that some represent false-positive signals or weakly stained non-neuronal cell bodies. This interpretation is based on the analysis of 22 third-instar larval brains.

      To clarify this point in the manuscript, we added the following sentence to the Results section: “Based on the analysis of 22 brain samples, we observed this driver strain labels one neuron per hemisphere in the third-instar larval brain (Figure 2a–d, Figure S1c, Table S3).” Additionally, Table S3 was included to summarize the DAN-c1 labeling pattern across all 22 samples. An enlarged inset highlighting GFP-positive signals was also added to Figure S1c.

      Weakness #3: Missing from the screen of 57 drivers is the driver MB320C, which typically labels only PPL1-γ1pedc in the adult and should label DAN-c1 in the larva. If MB320C labels DAN-c1 exclusively in the larva, then the authors should repeat their key experiments with MB320C to provide more evidence for DAN-c1 involvement specifically.

      We thank the reviewer for this insightful suggestion. The MB320C driver primarily labels the PPL1-γ1pedc neuron in the adult brain, along with one or two additional weakly labeled cells. It would indeed be interesting to examine the expression pattern of this driver in third-instar larval brains. If it is found to label only DAN-c1 at this stage, we could consider using it to knock down D2R and assess whether this recapitulates our current findings.

      While we agree that this is a promising direction for future studies, we believe it is not essential for the current manuscript, given the specificity of the DAN-c1 driver (please see our response to Reviewer #3 for details). Nonetheless, we appreciate the reviewer’s suggestion, and we recognize that MB320C could be a valuable tool for future experiments.

      Weakness #4: The authors claim that the SS02160 driver used by Eschbach et al. (2020) labels other neurons in addition to DAN-c1. Could the authors use confocal imaging to show how many other neurons SS02160 labels? Given that both Eschbach et al. and Weber et al. (2023) found no evidence that DAN-c1 plays a role in larval aversive learning, it would be informative to see how SS02160 expression compares with the driver the authors use to label DAN-c1.

      We did not have our own images showing DANs in brains of SS02160 driver cross line. However, Extended Data Figure 1 in the paper of Eschbach et al. shows strongly labeled four neurons on each brain hemisphere[4], indicating that this driver is not a strain only labeling one neuron, DAN-c1.

      Weakness #5: The claim that DAN-c1 is both necessary and sufficient in larval aversive learning should be reworded. Such a claim would logically exclude any other neuron or even the training stimuli from being involved in aversive learning (see Yoshihara and Yoshihara (2018) for a detailed discussion of the logic), which is presumably not what the authors intended because they describe the possible roles of other DANs during aversive learning in the discussion.

      We agree with the reviewer that the terms “necessary” and “sufficient” may be too exclusive and could unintentionally exclude contributions from other neurons. As noted in the Discussion section, we acknowledge that additional dopaminergic neurons may also play roles in larval aversive learning. To reflect this, we have revised our wording to use “important” and “mediates” instead of the more definitive terms “necessary” and “sufficient,” making our conclusions more accurate and appropriately measured.

      Weakness #6: Moreover, if DAN-c1 artificial activation conveyed an aversive teaching signal irrespective of the gustatory stimulus, then it should not impair aversive learning after quinine training (Figure 2k). While the authors interpret Figure 2k (and Figure 5) to indicate that artificial activation causes excessive DAN-c1 dopamine release, an alternative explanation is that artificial activation compromises aversive learning by overriding DAN-c1 activity that could be evoked by quinine.

      This is an excellent point, and we agree that we cannot rule out the possibility that artificial activation interferes with aversive learning by overriding the natural activity of DAN-c1 that would normally be evoked by quinine. The observed results with TRPA1 could potentially be attributed to dopamine depletion, inactivation due to prolonged depolarization, or neural adaptation. However, we believe that our hypothesis - that over-excitation of DAN-c1 impairs learning - is more consistent with our experimental findings and with previously published data. Our rationale is as follows: (1) Associative learning in larvae occurs only when the conditioned stimulus (CS, e.g., an odor such as pentyl acetate) and unconditioned stimulus (US, e.g., quinine) are paired. In wild-type larvae, the CS depolarizes a subset of Kenyon cells in the mushroom body (MB), while the US induces dopamine (DA) release from DAN-c1 into the lower peduncle (LP) compartment (Figure 7a). When both stimuli coincide, calcium influx from CS activation and Gαs signaling via D1-type dopamine receptors activate the MB-specific adenylyl cyclase, rutabaga, which functions as a coincidence detector (Figure 7d). (2) Rutabaga converts ATP to cAMP, activating the PKA signaling pathway and modifying synaptic strength between Kenyon cells and mushroom body output neurons (MBONs) (Figure 7d). These changes in synaptic strength underlie learned behavioral responses to future presentations of the same odor. (3) Our results show that D2R is expressed in DAN-c1, and that D2R knockdown impairs aversive learning. Since D2Rs typically inhibit neuronal excitability and reduce cAMP levels[5], we hypothesize that D2R acts as an autoreceptor in DAN-c1 to restrict DA release. When D2R is knocked down, this inhibition is lifted, leading to increased DA release in response to the US (quinine). The resulting excess DA, in combination with CS-induced calcium influx, would elevate cAMP levels in Kenyon cells excessively - disrupting normal learning processes (Figure 7b). This is supported by studies showing that dunce mutants, which have elevated cAMP levels, also exhibit aversive learning deficits[6]. (4) The TRPA1 activation results are consistent with our over-excitation model. When DAN-c1 was artificially activated at 34°C in the distilled water group, this mimicked the natural activation by quinine, producing an aversive learning response toward the odor (Figure 2k or new Figure 2i, DW group). Similarly, in the sucrose group, artificial activation mimicked quinine, producing a learning response that reflected both appetitive and aversive conditioning (Figure 2k, SUC group). (5) Over-excitation impairs learning in the quinine group. When DAN-c1 was activated during quinine exposure, both artificial and natural activation combined to produce excessive DA release. This over-excitation likely disrupted the cAMP balance in Kenyon cells, impairing learning and resulting in failure of aversive memory formation (Figure 2k, QUI group). This phenotype closely mirrors the effect of D2R knockdown in DAN-c1. (6) Optogenetic activation of DAN-c1 during aversive training similarly produced elevated DA levels due to both natural and artificial stimulation. This again would result in MBN over-excitation and a corresponding learning deficit. When optogenetic activation occurred during non-training phases (resting or testing), no additional DA was released during training, and aversive learning remained intact (Figure 5b). (7) Notably, when optogenetic activation was applied during training, we observed no aversive learning in the distilled water group and no reduction in the sucrose group (Figure 5c, 5d). We interpret this as evidence that the optogenetic stimulation was strong enough to cause elevated DA release in both groups, impairing learning in a manner similar to D2R knockdown or TRPA1 overactivation. (8) We extended this over-excitation framework to directly activate Kenyon cells (MBNs). Since MBNs are involved in both appetitive and aversive learning, their over-excitation disrupted both types of learning (Figure 6), further supporting our hypothesis. In summary, we propose that DAN-c1 activity is tightly regulated by D2R autoreceptors to ensure appropriate levels of dopamine release during aversive learning. Disruption of this regulation - either through D2R knockdown or artificial overactivation of DAN-c1 - results in excessive DA release, over-excitation of Kenyon cells, and impaired learning. This over-excitation model is consistent with both our experimental results and prior literature.

      Weakness #7: The authors should not necessarily expect that D2R enhancer driver strains would reflect D2R endogenous expression, since it is known that TH-GAL4 does not label p(PAM) dopaminergic neurons.

      Just like the example of TH-GAL4, it is possible that the D2R driver strains may partially reflect the expression pattern of endogenous D2R in larval brains. When we crossed the D2R driver strains with the GFP-tagged D2R strain, however, we observed co-localization in DM1 and DL2b dopaminergic neurons, as well as in mushroom body neurons (Figure S3c to h). In addition, D2R knockdown with D2R-miR directly supported that the GFP-tagged D2R strain reflected the expression pattern of endogenous D2R (Figure 4b to d, signals were reduced in DM1). In summary, we think the D2R driver strains supported the expression pattern we observed from the GFP-tagged D2R strain, especially in DM1 DANs.

      Weakness #8: Their observations of GFP-tagged D2R expression could be strengthened with an anti-D2R antibody such as that used by Lam et al., (1999) or Love et al., (2023).

      Love et al. (2023) used the antibody originally described by Draper et al.[6]. We attempted to use the same antibody in our experiments; however, we were unable to detect clear signals following staining. This may be due to a lack of specificity for neurons in the Drosophila larval brain or incompatibility with our staining protocol. Unfortunately, we were unable to locate a copy of the Lam (1999) paper for further reference.

      Weakness #9: Finally, the authors could consider the possibility other DANs may also mediate aversive learning via D2R. Knockdown of D2R in DAN-g1 appears to cause a defect in aversive quinine learning compared with its genetic control (Figure S4e). It is unclear why the same genetic control has unexpectedly poor aversive quinine learning after training with propionic acid (Figure S5a). The authors could comment on why RNAi knockdown of D2R in DAN-g1 does not similarly impair aversive quinine learning (Figure S5b).

      We re-analyzed the data related to DAN-g1. Interestingly, knockdown of D2R in DAN-g1 larvae trained with quinine (QUI) showed a significant difference in response index (R.I.) compared to the distilled water (DW) control group. However, it also differed significantly from the DAN-g1 genetic control group trained with QUI (two-way ANOVA with Tukey’s multiple comparisons, p = 0.0002), while it was not significantly different from the UAS-D2R-miR genetic control group (p = 0.2724). Furthermore, knockdown of D2R in DAN-g1 did not lead to aversive learning deficits when larvae were trained with a different odorant, propionic acid (ProA; Figure S5a). Similarly, using an RNAi line to knock down D2R in DAN-g1 did not result in learning impairment when larvae were trained with pentyl acetate (PA; Figure S5b). These inconsistencies may stem from differences in stimulus intensity across odorants, as well as the variable efficiency of the knockdown strategies (microRNA vs. RNAi). Based on these results, we propose that D2Rs in DAN-g1 may modulate larval aversive learning in a quantitative manner but do not play as critical a role as those in DAN-c1, where knockdown produces a clear qualitative effect. We have added this paragraph to the Discussion section of the manuscript.

      Reviewer #2 (Public review):

      Weakness#1: Is not completely clear how the system DAN-c1, MB neurons and Behavioral performance work. We can be quite sure that DAN-c1;Shits1 were reducing dopamine release and impairing aversive memory (Figure 2h). Similarly, DAN-c1;ChR2 were increasing dopamine release and also impaired aversive memory (Figure 5b). However, is not clear what is happening with DAN-c1;TrpA1 (Figure 2K). In this case the thermos-induction appears to impair the behavioral performance of all three conditions (QUI, DW and SUC) and the behavior is quite distinct from the increase and decrease of dopamine tone (Figure 2h and 5b).

      The study successfully examined the role of D2R in DAN-c1 and MB neurons in olfactory conditioning. The conclusions are well supported by the data, with the exception of the claim that dopamine release from DAN-c1 is sufficient for aversive learning in the absence of unconditional stimulus (Figure 2K). Alternatively, the authors need to provide a better explanation of this point.

      Please refer to our response to Weakness #6 of Reviewer #1 above.

      Reviewer #3 (Public review):

      Weakness #1: It is a strength of the paper that it analyses the function of dopamine neurons (DANs) at the level of single, identified neurons, and uses tools to address specific dopamine receptors (DopRs), exploiting the unique experimental possibilities available in larval Drosophila as a model system. Indeed, the result of their screening for transgenic drivers covering single or small groups of DANs and their histological characterization provides the community with a very valuable resource. In particular the transgenic driver to cover the DANc1 neuron might turn out useful. However, I wonder in which fraction of the preparations an expression pattern as in Figure 1f/ S1c is observed, and how many preparations the authors have analyzed. Also, given the function of DANs throughout the body, in addition to the expression pattern in the mushroom body region (Figure 1f) and in the central nervous system (Figure S1c) maybe attempts can be made to assess expression from this driver throughout the larval body (same for Dop2R distribution).

      We thank the reviewer for the positive comments and thoughtful suggestions.

      Regarding the R76F02AD; R55C10DBD strain, we examined 22 third instar larval brains expressing GFP, Syt-GFP, or Den-mCherry. All brains clearly labeled DAN-c1. In approximately half of the samples, only DAN-c1 was labeled. In the remaining samples, 1 to 5 additional weakly labeled soma were observed, typically without associated neurites. Only 1 or 2 strongly labeled non-DAN-c1 cells were occasionally detected. These additional labeled neurons were rarely dopaminergic. In the ventral nerve cord (VNC), 8 out of 12 samples showed no labeled cells. The remaining 4 samples had 2–4 strongly labeled cells. These results support our conclusion that the R76F02AD; R55C10DBD combination predominantly and specifically labels DAN-c1 in the third instar larval brain. As for the reviewer’s question about the expression pattern of R76F02AD; R55C10DBD and D2R in the larval body, we agree that this is a very interesting avenue for further investigation. However, our current study is focused on the central nervous system and larval learning behaviors. We hope to explore this question more fully in future work.

      We added the following sentence to the Results section: “Based on analysis of 22 brain samples, we believe this driver strain consistently labels one neuron per hemisphere in the third-instar larval brain (Figure 2a - d, Figure S1c, Table S3).” In addition, we included Table S3 to summarize the DAN-c1 labeling patterns observed across these samples.

      Weakness #2: A first major weakness is that the main conclusion of the paper, which pertains to associative memory (last sentence of the abstract, and throughout the manuscript), is not justified by their evidence. Why so? Consider the paradigm in Figure 2g, and the data in Figure 2h (22 degrees, the control condition), where the assay and the experimental rationale used throughout the manuscript are introduced. Different groups of larvae are exposed, for 30min, to an odour paired with either i) quinine solution (red bar), ii) distilled water (yellow bar), or iii) sucrose solution (blue bar); in all cases this is followed by a choice test for the odour on one side and a distilled-water blank on the other side of a testing Petri dish. The authors observe that odour preference is low after odour-quinine pairing, intermediate after odour-water pairing and high after odour-sucrose pairing. The differences in odour preference relative to the odour-water case are interpreted as reflecting odour-quinine aversive associations and odour-sucrose appetitive associations, respectively. However, these differences could just as well reflect non-associative effects of the 30-min quinine or sucrose exposure per se (for a classical discussion of such types of issues see Rescorla 1988, Annu Rev Neurosci, or regarding Drosophila Tully 1988, Behav Genetics, or with some reference to the original paper by Honjo & Furukubo-Tokunaga 2005, J Neurosci that the authors reference, also Gerber & Stocker 2007, Chem Sens).

      As it stands, therefore, the current 3-group type of comparison does not allow conclusions about associative learning.

      We adopted the single-odor larval learning paradigm from Honjo et al., who first developed and validated this method for studying larval olfactory associative learning7,8. To address the reviewer’s concern regarding potential non-associative effects from 30-minute exposure to quinine or sucrose, we refer to multiple lines of evidence provided in Honjo’s studies: (1) Honjo et al. demonstrated that only larvae receiving paired presentations of odor and unconditioned stimulus (quinine or sucrose) exhibited learned responses. Exposure to either stimulus alone, or temporally dissociated presentations, failed to induce any learning response. (2) When tested with a second, non-trained odorant, larvae only responded to the odorant previously paired with the unconditioned stimulus. This rules out generalized olfactory suppression and confirms odor-specific associative learning. (3) Well-characterized learning mutants (e.g., rutabaga, dunce) that show deficits in adult reciprocal odor learning also failed to exhibit learned responses in this single-odor paradigm, further supporting its validity. (4) In our study, we used two distinct odorants (pentyl acetate and propionic acid) and two independent D2R knockdown approaches (UAS-miR and UAS-RNAi). We consistently observed that D2R knockdown in DAN-c1 impaired aversive learning. Importantly, naïve olfactory, gustatory, and locomotor assays ruled out general sensory or motor defects. Comparisons with control groups (odor paired with distilled water) also ruled out non-associative effects such as habituation. Taken together, these results strongly support that the single-odor paradigm is a robust and reliable assay for assessing larval olfactory associative learning in Drosophila. We have added a section in the Discussion to clarify and defend the use of this paradigm in our study.

      Weakness #3: A second major weakness is apparent when considering the sketch in Figure 2g and the equation defining the response index (R.I.) (line 480). The point is that the larvae that are located in the middle zone are not included in the denominator. This can inflate scores and is not appropriate. That is, suppose from a group of 30 animals (line 471) only 1 chooses the odor side and 29, bedazzled after 30-min quinine or sucrose exposure or otherwise confused by a given opto- or thermogenetic treatment, stay in the middle zone... a P.I. of 1.0 would result.

      We gave 5 min during the testing stage to allow the larvae to wander on the testing plate. Under most conditions, more than half of larvae (>50%) will explore around, and the rest may stay in the middle zone (will not be calculated). We used 25-50 larvae in each learning assay, so finally around 10-30 larvae will locate in two semicircular areas. Indeed, based on our raw data, a R.I. of 1 seldom appears. Most of the R.I.s fall into a region from -0.2 to 0.8. We should admit that the calculation equation of R. I. is not linear, so it would be sharper (change steeply) when it approaches -1 and 1. However, as most of the values fall into the region from -0.2 to 0.8, we think ‘border effects’ can be neglected if we have enough numbers of larvae in the calculation (10-30).

      Weakness #4: Unless experimentally demonstrated, claims that the thermogenetic effector shibire/ts reduces dopamine release from DANs are questionable. This is because firstly, there might be shibire/ts-insensitive ways of dopamine release, and secondly because shibire/ts may affect co-transmitter release from DANs.

      Shibire<sup>ts1</sup> gene encodes a thermosensitive mutant of dynamin, expressing this mutant version in target neurons will block neurotransmitter release at the ambient temperature higher than 30C, as it represses vesicle recycling[7]. It is a widely used tool to examine whether the target neuron is involved in a specific physiological function. We cannot rule out that there might be Shibire<sup>ts1</sup> insensitive ways of dopamine release exist. However, blocking dopamine release from DAN-c1 with Shibire<sup>ts1</sup> has already led to learning responses changing (Figure 2h). This result indicated that the dopamine release from DAN-c1 during training is important for larval aversive learning, which has already supported our hypothesis.

      For the second question about the potential co-transmitter release, we think it is a great question. Recently Yamazaki et al. reported co-neurotransmitters in dopaminergic system modulate adult olfactory memories in Drosophila[9], and we cannot rule out the roles of co-released neurotransmitters/neuropeptides in larval learning. Ideally, if we could observe the real time changes of dopamine release from DAN-c1 in wild type and TH knockdown larvae would answer this question. However, live imaging of dopamine release from one dopaminergic neuron is not practical for us at this time. On the other hand, the roles of dopamine receptors in olfactory associative learning support that dopamine is important for Drosophila learning. D1 receptor, dDA1, has been proven to be involved in both adult and larval appetitive and aversive learning[10,11]. In our work, D2R in the mushroom body showed important roles in both larval appetitive and aversive learning (Figure 6a). All this evidence reveals the importance of dopamine in Drosophila olfactory associative learning. In addition, there is too much unknow information about the co-release neurotransmitter/neuropeptides, as well as their potential complex ‘interaction/crosstalk’ relations. We believe that investigation of co-released neurotransmitter/neuropeptides is beyond the scope of this study at this time.

      Weakness #5: It is not clear whether the genetic controls when using the Gal4/ UAS system are the homozygous, parental strains (XY-Gal4/ XY-Gal4 and UAS-effector/ UAS-effector), or as is standard in the field the heterozygous driver (XY-Gal4/ wildtype) and effector controls (UAS-effector/ wildtype) (in some cases effector controls appear to be missing, e.g. Figure 4d, Figure S4e, Figure S5c).

      Almost all controls we used were homozygous parental strains. They did not show abnormal behaviors in either learnings or naïve sensory or locomotion assays. The only exception is the control for DAN-c1, the larvae from homozygous R76F02AD; R55C10DBD strain showed much reduced locomotion speed (Figure S6). To prevent this reduced locomotion speed affecting the learning ability, we used heterozygous R76F02AD; R55C10DBD/wildtype as control, which showed normal learning, naïve sensory and locomotion abilities (Figure 4e to i).

      For Figure 4d, it is a column graph to quantify the efficiency of D2R knockdown with miR. Because we need to induce and quantify the knockdown effect in specific DANs (DM1), only TH-GAL4 can be used as the control group, rather than UAS-D2R-miR. For the missing control groups in Figure S4e and S5c, we have shown them in other Figures (Figure 4e).

      We described this in the Materials and Methods part, “All control strains used in learning assays were homozygous (except DAN-c1×WT), while all experimental groups (D2R knockdown and thermogenetics) used were heterozygous by crossing the corresponding control strains”.

      We also re-organized the Figure S4e and S5c along with the control groups to make it easier to understand.

      Weakness #6: As recently suggested by Yamada et al 2024, bioRxiv, high cAMP can lead to synaptic depression (sic). That would call into question the interpretation of low-Dop2R leading to high-cAMP, leading to high-dopamine release, and thus the authors interpretation of the matching effects of low-Dop2R and driving DANs.

      We appreciate the reviewer’s suggestion. We read through this literature, which also addresses the question we mentioned in the Discussion section, about the discrepancy between the cAMP elevation in the mushroom body neurons and the reduced MBN-MBON synaptic plasticity after olfactory associative learning in Drosophila. The author gave an explanation to the existing D1R-cAMP elevation-MBN-MBON LTD axis, which is really helpful to our understanding about the learning mechanism. However, unfortunately, we do not think this offers a possible explanation for our D2R-related mechanisms. We added this literature into our citation.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Throughout the behavioral experiments, a defect in aversive learning is defined as a relative increase in the response index (RI) after olfactory training with quinine (red) and a defect in appetitive learning as a relative decrease in RI after training with sucrose (blue). Training with distilled water (yellow) is intended to be a control for comparisons within genotypes/treatment groups but causes interpretation issues if it is also affected by experimental manipulations.

      The authors typically make comparisons between quinine, water, and sucrose within each group, but this often forces readers to infer the key comparisons of interest. For example, the key comparison in Figure 2h is the statistically significant difference between the red groups, which differ only in the temperature used during training. Many other figure panels in the paper would also benefit from more direct statistical comparisons, particularly Figure 2k.

      While I recognize the value of the water control, I strongly recommend that the authors make statistical comparisons directly between genotypes/treatment groups where possible and to interpret results with more caution when the water RI score differs substantially between groups. Also, since the authors are conducting two-way ANOVAs before Dunnett's multiple comparisons tests, they ideally should report the p-value for the main effect of each factor, plus the interaction p-value between the two factors before making multiple comparisons.

      We appreciate the reviewer’s suggestion. In response, we re-analyzed all learning assay data in Figures 2 and 4 using two-way ANOVA followed by Tukey’s multiple comparisons test. Unlike our previous analysis, which only compared each experimental group to its corresponding DW control, we now compared all groups against one another. First, we found that most R.I. values from different temperature conditions (Figure 2) or genotypes (Figure 4) trained with DW were not significantly different, with the exception of the data in Figure 2i (formerly Figure 2k; discussed further below). The R.I. from DAN-c1 × D2R-miR larvae trained with QUI was significantly different from both genotype control groups (DAN-c1 × WT and UAS-D2R-miR), while no significant difference was observed between the two controls trained with QUI. Thus, this more comprehensive statistical approach supports the conclusions we previously reported. Second, as the reviewer noted, the new analysis allows for a more direct interpretation of our findings. For example, in the thermogenetic experiments using the Shibire<sup>ts1</sup> strain, the R.I. of DAN-c1 × UAS-Shibire<sup>ts1</sup> larvae trained with QUI at 34°C was not significantly different from the DW group at 34°C, but was significantly different from the QUI group at 22°C. Both findings support our conclusion that blocking dopamine release from DAN-c1 impairs larval aversive learning (Figure 2f).

      In the dTRPA1 activation experiments, the R.I. of DAN-c1 × UAS-dTRPA1 larvae trained with DW at 34°C was significantly lower than that of the DW group at 22°C and the QUI group at 34°C, but not significantly different from the QUI group at 22°C (Figure 2i). These results indicate that activating DAN-c1 during training is sufficient to drive aversive learning even in the absence of QUI. Interestingly, when DAN-c1 × UAS-dTRPA1 larvae were trained with QUI at 34°C, their R.I. was significantly higher than that of the DW group at 34°C and significantly different from the QUI group at 22°C, but not significantly different from the DW group at 22°C (Figure 2i). We interpret this as evidence that simultaneous activation of DAN-c1 by both QUI and dTRPA1 leads to over-excitation, which in turn impairs aversive learning.

      We have revised the figures (Figures 2, 4, 5, and 6) and updated the corresponding Results sections to reflect this new statistical analysis. Additionally, we now report the p-values for interaction, row factor, and column factor - either in Table S4 (for Figure 2) or in the figure captions for Figures 4, 5, 6, S4, S5, and S7.

      (2) The authors' motivation to find tools that label DANs other than DAN-c1 was unclear until much later in the paper when I saw the screening experiments in Figures S4 and S5. The authors could provide a clearer justification for why they focus on DAN-c1 in Figure 2 rather than another DAN for which they found a specific driver in Figure 1. The motivation for looking at individual pPAM neurons was also unclear.

      We sincerely appreciate the reviewer’s thoughtful suggestion. Our study was initially motivated by the goal of characterizing the expression pattern of D2R in the larval brain. From there, we aimed to identify DAN drivers that label specific pairs of dopaminergic neurons, enabling us to assess the functional role of D2R in distinct DAN subtypes through targeted knockdown experiments. This approach ultimately led us to focus on DAN-c1, as it was the only neuronal population for which D2R knockdown resulted in a learning deficit. We then returned to examine the functional significance of DAN-c1 in aversive learning. While we recognize that a more comprehensive narrative might be desirable, the current structure of our manuscript reflects the most logical progression of our work based on our research priorities and experimental outcomes. We did explore alternative manuscript structures - such as beginning with the D2R expression pattern - but found that the current format best conveys our findings and rtionale.

      Regarding our motivation to study individual PAM neurons: we aimed to identify whether D2R plays a role in a specific pair of pPAM neurons involved in larval appetitive learning. However, we were unable to find a driver that exclusively labels DAN-j1, which we believe to be the key neuron in this context (see Figure 1). As a result, our investigation into appetitive learning did not progress beyond the observation of D2R expression in pPAM neurons (Figure 3d), and we did not proceed with learning assays in this context. While we acknowledge the limitations of our study, we believe that our focus on DAN-c1 is well-justified based on both our findings and the tools currently available. We respectfully note that a major restructuring of the manuscript would not necessarily clarify the rationale for focusing on DAN-c1, and therefore we have maintained the current organization.

      (3) The authors should also double-check and update the expression patterns of the drivers in Table 1 using references such as the FlyLight online resource. For example, MB438B labels PPL1-α'2α2, PPL1-α3, PPL1-γ1pedc according to FlyLight, not just PPL1-γ1pedc as initially reported by Aso and Hattori et al. (2014).

      We appreciate the reviewer’s suggestion. We have double-checked and updated the driver expression patterns in Table 1, using FlyLight data as a reference.

      (4) Interpreting overlaid green-and-red fluorescence confocal images would be difficult for any colorblind readers; I suggest that the authors consider using a more friendly color set.

      We thank the reviewer for the suggestion. In our study, we need three distinct colors to represent different channels. We also tested an alternative color scheme using and cyan , magenta, and yellow (CMY) instead of the standard red, green, and blue (RGB). As a comparison (see below), we used a R76F02AD;R55C10DBD (DAN-c1) GFP-labeled brain as an example. In our evaluation, the RGB combination provided clearer visualization and appeared more natural, while the CMY scheme looked somewhat artificial. Therefore, we decided to retain the original RGB color scheme and did not modify the colors in the figures.

      Author response image 1.

      (5) For Figure 4d, counting each DAN as an individual N would violate the assumption of independence made by the unpaired t test, since multiple DANs are found in each brain and therefore are not independent. Instead, it would be better to count each individual N as the average intensity of the four DANs measured in each brain.

      We revised the analysis of microRNA efficiency by averaging the fluorescence intensity of DANs within each brain, treating each brain as a single sample. Based on this approach, we re-plotted Figure 4d.

      (6) Finally, the authors ought to make it clearer throughout the paper that they have implicated a pair of DAN-c1 neurons in aversive learning, not just a single DAN as currently stated in the title.

      We thank the reviewer for the suggestion about the phrase we are using under this scenario. We have changed all “single neuron” to “a pair of neurons”.

      Reviewer #2 (Recommendations for the authors):

      (1) The results section presents: "Activation of DAN-c1 with dTRPA1 at 34°C during training induced repulsion to PA in the distilled water group (Figure 2k). These data suggested that DAN-c1 excitation and presumably increased dopamine release is sufficient for larval aversive learning in the absence of gustatory pairing."<br /> An alternative interpretation is that 30 min of TrpA activation depletes synaptic vesicle pool, or inactivates neurons because of prolonged depolarization, or DAN shows firing rate adaptation (e.g. see Pulver et al. 2009; doi:10.1152/jn.00071.2009). In such a case DA release would be reduced and not increased. Therefore, the interpretation that DAN-c1 activation is both necessary and sufficient in larval aversive learning is difficult to be sustained.

      In this regard it is important to know how the sensory motor abilities are during a thermos-induction at 34°C during 30 min.

      We thank the reviewer for the thoughtful suggestion. Regarding the concern about potential dopamine depletion or neuronal inactivation, we believe a comparison with the Shibire<sup>ts1</sup> experiments helps clarify the interpretation. Activation of Shibire<sup>ts1</sup> during training with distilled water did not result in aversive learning (Figure 2f), which is a distinct phenotype from that observed with dTRPA1 activation (Figure 2i). This suggests that the phenotypes seen with dTRPA1 activation are not due to reduced dopamine release. Additionally, as the reviewer suggested, we have revised our conclusion to state that “DAN-c1 is important for larval aversive learning,” rather than claiming it is both necessary and sufficient.

      (2) The GRASP system can label the contact of a cell in close proximity like synaptic contacts, but also other situations like no synaptic contact. It would be useful to use a more specific synaptic labelling tool, like the trans-synaptic tracing system (Talay et al., 2017 https://doi.org/10.1016/j.neuron.2017.10.011), which provides a better label of synaptic contact.

      We really appreciate the reviewer’s suggestion. First, we acknowledge that there are four general methods to reveal synaptic connections between neurons: immunohistochemistry (IHC), neuron labeling, viral tracing, GRASP, and electron microscopy (EM). Among these, IHC is not sufficiently convincing, viral tracing is challenging and rarely used in Drosophila, and EM, while the most accurate, is prohibitively expensive for our current goals. For these reasons, we chose the GRASP system to demonstrate the synaptic connections from dopaminergic neurons to the mushroom body. Second, we utilized an activity-dependent version of the GRASP system, linking split-GFP1-10 with synaptic proteins (e.g., synaptobrevin)[12] rather than with cell surface proteins like CD4 or CD8. This version significantly reduces false positive signals compared to the previous version, which was tagged with cell surface proteins. While we admit that this method does not provide as solid evidence of synaptic connections as EM, it is the most efficient method available to us for showing the synaptic connections from dopaminergic neurons to the mushroom body. Finally, we thank the reviewer for suggesting the literature on trans-synaptic tracing methods. Unfortunately, this method is not suitable for our goal, as it labels the entire postsynaptic neuron. In our study, we use GRASP to identify the specific dopaminergic neurons based on the synaptic locations and compartments within the mushroom body lobe. We require a labeling system at the subcellular level because, as noted, DAN-c1 forms synapses specifically in the lower peduncle (LP) of the mushroom body lobe, which is part of the axonal bundles from mushroom body neurons. Using the trans-synaptic tracing method would label the entire mushroom body, making it impossible to distinguish DAN-c1 from other DL1 dopaminergic neurons.

      (3) Previously, Honjo et al (2009) used a petri dish of 8.5 cm and a filter paper for reinforcement of 5.5 cm. In this study the petri dish was 10 cm and the size of the filter paper was not informed. That is important information because it will determine the probability of conditioning.

      A piece of filter paper (0.25cm<sup>2</sup> square) was used to hold odorants in this study. We have added this information to the Materials and Methods.

      (4) Statistic analysis of Behavioral performance of Fig 2H-I was made by ANOVA followed by Dunnett multiple comparisons test. Which was the control group? In each graph 2 independent Dunnett tests were performed against the DW control group?

      We have re-analyzed the data using a two-way ANOVA followed by Tukey’s multiple comparison test, as suggested by Reviewer #1. In Figure 2f-j (previously Figure 2h-l), the DW groups serve as the control groups. In our new analysis, we compared data across all groups using Tukey’s multiple comparison test, with particular focus on comparisons to the corresponding DW control groups.

      (5) The sample size in staining experiments of figures 1-4 were not informed.

      We have added Table S2 in the supplementary materials to provide the N numbers for brain samples used in the figures.

      (6) Color code in Fig 5 is missing, I assumed that is the same as in figure 4e

      We added color code in the figure legend of Figure 5.

      (7) Line 506 "0.1% QH solutions" should be 0.1% QUI solutions

      Changed.

      (8) There is no information on the availability of data

      We added Data Availability Statement: Data will be made available on request.

      Reviewer #3 (Recommendations for the authors):

      (1) Axes of behavioural experiments should better show the full span of possible values (-1;1) to allow a fair assessment.

      We have adjusted the axes in all learning assay graphs to a range from -1 to 1 for consistency and clarity.

      (2) Ns should better be given within the figures.

      We have added Table S2 in the supplementary materials to provide the N numbers for brain samples used in the figures. Additionally, Tables S4 to S6 include the N numbers for the learning assays. While we initially considered including the N numbers within the figure captions, we found it challenging to present this information clearly and efficiently. Therefore, we decided to summarize the N numbers in the tables instead.

      (3) Dot- or box-plots would be better for visualizing the data than means and SEMs.

      We agree with the reviewer’s suggestion. In the behavioral assay graphs, both dot plots and mean ± SEM have been included for better visualization of the data.

      (4) The paper reads as if Dop2R would reduce neuronal activity, rather than "just" cAMP levels. Such a misunderstanding should be avoided.

      We appreciate the reviewer’s comment. Under most conditions, dopamine binding to D2Rs activates the Gαi/o pathway, which inhibits adenylyl cyclase (AC) and reduces cAMP levels. This reduction in cAMP ultimately leads to decreased neuronal activity. In other words, D2R activation typically has an inhibitory effect on neurons. Additionally, D2R can exert inhibitory effects through other signaling pathways, such as the inhibition of voltage-gated associative learning, we continue to emphasize the importance of the D2R-mediated AC-cAMP-PKA signaling pathway. However, we do not rule out the potential involvement of additional signaling pathways, such as inhibition of voltage-gated calcium channels via Gβγ subunits[5]. As noted in the Introduction, dopamine receptors are also involved in other signaling cascades, including PKC, MAPK, and CaMKII pathways. In the context of our study, based on current understanding of molecular signaling in Drosophila olfactory, we still think D2R mediated AC-cAMP-PKA signaling pathway would be the most important one. However, we cannot rule out the involvement of other signaling pathways.

      (5) It would be better if citations were more clearly separated into ones that refer to adult flies versus work on larvae.

      We separated the citations related to adult flies from those working on larvae.

      (6) Line 81-83. DopECR is not found in mammals, is it?

      You are correct. DopECR is not found in mammals. This non-canonical receptor shares structural homology with vertebrate β-adrenergic-like receptors. It can be activated rapidly by dopamine as well as insect ecdysteroids[13,14].

      (7) Line 99: Better "a" learning center (some forms of learning work without mushroom bodies).

      We have revised the text from "the learning center" to "a learning center," as suggested by the reviewer.

      (8) Supplemental figures should be numbered according to the sequence in which they are mentioned in the text.

      We have rearranged the sequence of supplemental figures to match the order in which they are referenced in the text.

      (9) It is striking that dTRPA1-driving DANc1 is punishing in the water condition but that this effect does not summate with quinine punishment (but rather seems to impair it). Maybe you can back this up by ChR- or Chrimson-driving DANc1? Or by silencing DANc1 by GtACR1?

      We appreciate the reviewer’s suggestion. Indeed, we observed similar but not identical results when we used ChR2 to activate DAN-c1 during the training stage (Figure 5b and c). We found that activating DAN-c1 with quinine (QUI) impaired aversive learning (Figure 5b), consistent with our findings using dTRPA1 activation of DAN-c1 when trained in QUI at 34°C (Figure 2i). We propose that the over-excitation of DAN-c1, whether induced by QUI or artificial manipulation (optogenetics and thermogenetics), impairs aversive learning, which aligns with our findings for D2R knockdown (Figure 4e). However, there are some differences between dTRPA1 and ChR2 activation. While dTRPA1 activation induced aversive learning when trained with distilled water (DW) at 34°C (Figure 2i), ChR2 did not induce aversive learning under the same conditions (Figure 5c). We believe this difference is due to the varying activation levels between the two manipulations. Our optogenetic stimulus may have been stronger than the thermogenetic one, potentially leading to over-excitation in the DW group, preventing aversive learning. In the QUI group, the more severe over-excitation impaired aversive learning, producing a phenotype similar to that observed with other over-excitation methods (e.g., thermogenetics or D2R knockdown), where the phenotype reached a maximum level. We have also addressed these points in the Discussion section.

      (10) Unless I got the experimental procedure wrong, isn't it surprising that Figure S7b does not uncover a punishing effect of driving TH-Gals neurons?

      This optogenetic experiment with ChR2 expression in TH-GAL4 neurons was a pioneering attempt to activate DAN-c1 using ChR2. As explained in response to question (9), the failure to observe a punishing effect in the DW group when TH-GAL4 neurons were activated during training may be due to our optogenetic stimulus being too strong. This likely resulted in over-excitation of DAN-c1 (among the neurons labeled by TH-GAL4), impairing aversive learning and preventing the appearance of typical aversive behaviors.

      (11) It seems that Figure1f´ is repeated, in a mirrored manner, in Figure 2e.

      We have removed Figure 2e, as it was deemed redundant and not necessary for this section.

      Reference

      (1) Saumweber, T. et al. Functional architecture of reward learning in mushroom body extrinsic neurons of larval Drosophila. Nat Commun 9, 1104 (2018). https://doi.org/10.1038/s41467-018-03130-1

      (2) Aso, Y. & Rubin, G. M. Dopaminergic neurons write and update memories with cell-type-specific rules. Elife 5 (2016). https://doi.org/10.7554/eLife.16135

      (3) Xie, T. et al. A Genetic Toolkit for Dissecting Dopamine Circuit Function in Drosophila. Cell Rep 23, 652-665 (2018). https://doi.org/10.1016/j.celrep.2018.03.068

      (4) Eschbach, C. et al. Recurrent architecture for adaptive regulation of learning in the insect brain. Nat Neurosci 23, 544-555 (2020). https://doi.org/10.1038/s41593-020-0607-9

      (5) Neve, K. A., Seamans, J. K. & Trantham-Davidson, H. Dopamine receptor signaling. J Recept Signal Transduct Res 24, 165-205 (2004). https://doi.org/10.1081/rrs-200029981

      (6) Draper, I., Kurshan, P. T., McBride, E., Jackson, F. R. & Kopin, A. S. Locomotor activity is regulated by D2-like receptors in Drosophila: an anatomic and functional analysis. Dev Neurobiol 67, 378-393 (2007). https://doi.org/10.1002/dneu.20355

      (7) Honjo, K. & Furukubo-Tokunaga, K. Induction of cAMP response element-binding protein-dependent medium-term memory by appetitive gustatory reinforcement in Drosophila larvae. J Neurosci 25, 7905-7913 (2005). https://doi.org/10.1523/JNEUROSCI.2135-05.2005

      (8) Honjo, K. & Furukubo-Tokunaga, K. Distinctive neuronal networks and biochemical pathways for appetitive and aversive memory in Drosophila larvae. J Neurosci 29, 852-862 (2009). https://doi.org/10.1523/JNEUROSCI.1315-08.2009

      (9) Yamazaki, D., Maeyama, Y. & Tabata, T. Combinatory Actions of Co-transmitters in Dopaminergic Systems Modulate Drosophila Olfactory Memories. J Neurosci 43, 8294-8305 (2023). https://doi.org/10.1523/jneurosci.2152-22.2023

      (10) Selcho, M., Pauls, D., Han, K. A., Stocker, R. F. & Thum, A. S. The role of dopamine in Drosophila larval classical olfactory conditioning. PLoS One 4, e5897 (2009). https://doi.org/10.1371/journal.pone.0005897

      (11) Kim, Y. C., Lee, H. G. & Han, K. A. D1 dopamine receptor dDA1 is required in the mushroom body neurons for aversive and appetitive learning in Drosophila. J Neurosci 27, 7640-7647 (2007). https://doi.org/10.1523/JNEUROSCI.1167-07.2007

      (12) Macpherson, L. J. et al. Dynamic labelling of neural connections in multiple colours by trans-synaptic fluorescence complementation. Nat Commun 6, 10024 (2015). https://doi.org/10.1038/ncomms10024

      (13) Abrieux, A., Duportets, L., Debernard, S., Gadenne, C. & Anton, S. The GPCR membrane receptor, DopEcR, mediates the actions of both dopamine and ecdysone to control sex pheromone perception in an insect. Front Behav Neurosci 8, 312 (2014). https://doi.org/10.3389/fnbeh.2014.00312

      (14) Lark, A., Kitamoto, T. & Martin, J. R. Modulation of neuronal activity in the Drosophila mushroom body by DopEcR, a unique dual receptor for ecdysone and dopamine. Biochim Biophys Acta Mol Cell Res 1864, 1578-1588 (2017). https://doi.org/10.1016/j.bbamcr.2017.05.015

    1. Author Response

      The following is the authors’ response to the original reviews.

      We would like to first thank the Editor as well as the two reviewers for their enthusiasm and careful evaluation of our manuscript. We also appreciate their thoughtful and constructive comments and suggestions. They did, however, have concerns regarding experimental design, data analysis, and over-interpretation of our findings. We endeavored to address these concerns through refinement of our framing, inclusion of additional new analyses, and rewriting some parts of our discussion section. We hope our response can better explain the rationale of our experimental design and data interpretation. In addition, we also acknowledge the limitations of our present study, so that it will benefit future investigations into this topic. Our detail responses are provided below.

      Reviewer #1 (Public Review)

      This study examines whether the human brain uses a hexagonal grid-like representation to navigate in a non-spatial space constructed by competence and trustworthiness. To test this, the authors asked human participants to learn the levels of competence and trustworthiness for six faces by associating them with specific lengths of bar graphs that indicate their levels in each trait. After learning, participants were asked to extrapolate the location from the partially observed morphing bar graphs. Using fMRI, the authors identified brain areas where activity is modulated by the angles of morphing trajectories in six-fold symmetry. The strength of this paper lies in the question it attempts to address. Specifically, the question of whether and how the human brain uses grid-like representations not only for spatial navigation but also for navigating abstract concepts, such as social space, and guiding everyday decision-making. This question is of emerging importance.

      Thanks very much again for the evaluation and comments. Please find our revision plans to each comment below.

      The weak points of this paper are that its findings are not sufficiently supporting their arguments, and there are several reasons for this:

      (1) Does the grid-like activity reflect 'navigation over the social space' or 'navigation in sensory feature space'? The grid-like representation in this study could simply reflect the transition between stimuli (the length of bar graphs). Participants in this study associated each face with a specific length of two bars, and the 'navigation' was only guided by the morphing of a bar graph image. Moreover, any social cognition was not required to perform the task where they estimate the gridlike activity. To make social decision-making that was conducted separately, we do not know if participants needed to navigate between faces in a social space. Instead, they can recall bar graphs associated with faces and compute the decision values by comparing the length of bars. Notably, in the trust game in this study, competence and trustworthiness are not equally important to make a decision (Equation 1). The expected value is more sensitive to one over the other. This also suggests that the space might not reflect social values but perceptual differences.

      The Reviewer raises an interesting point. We apologize for not being clear enough to address this possibility in our original manuscript and we will improve the clarity in our revision. To address this issue, we would like to break it into two sub-questions and answer them separately: 1) Are participants merely memorizing the values associated with each avatar or do they place the avatars on a two-dimensional map in their internal representation. 2) If so, are the two dimensions of this internal representation social dimensions relating to competence and trust or sensory dimensions relating to bar height (i.e., social space or sensory space).

      For the first question, we hope our analysis of the distance effect on the reaction time in the comparison task can address this issue. Specifically, it came from the idea that distance is a measure of similarity between two avatars in the 2D social space. The closer two avatars are, the more similar they are, hence distinguishing them will be harder and result in longer reaction time. If participants are merely memorizing the avatars as six isolated instances without integrating them into a low-dimensional map, then avatars should be equidistant (as if they were lying on the vertices of a 5-simplex), and would not show a distance effect. Therefore, we interpreted the stronger distance effect as a behavioural index of having a better internal map-like representation. This approach is adopted from the work by Park et al. (2020), where they used the distance effect to demonstrate human brains map abstract relationships among entities from piecemeal learning.

      For the second question of ‘social space’ vs. ‘sensory space’, our study adopted the paradigm developed by, in which they used a similar way to construct a conceptual space and found that such space can be represented with grid-like code in the entorhinal and prefrontal cortex. We stayed close to the original design by Constantinescu et al. (2016) and hoped that our work could provide, to some extent, a close replication of their result but using non-spatial social concepts instead. Indeed, this led to the limitation of our study that participants are passively traversing the artificial space rather than actively navigating in the space to make decisions/inferences. And we did not find sufficient evidence as reported in previous grid-like coding fMRI studies. This may have to do with low signal quality in the medial temporal region, we are not entirely sure. Nevertheless, we don’t think our findings contradict or disprove previous findings in any way. Here we would also like to point to the work by Park et al. (2021). Their task involves making novel inferences in a 2D social hierarchy space and found that grid-like code in the entorhinal cortex and medial prefrontal cortex support such novel inferences. Hence, we argue that results from these studies and partial evidence from our study collectively support the idea that the entorhinal is important for representing abstract knowledge (spatial and non-spatial).

      (2) Does the brain have a common representation of faces in a social space? In this study, participants don't need to have a map-like representation of six faces according to their levels of social traits. Instead, they can remember the values of each trait. The evidence of neural representations of the faces in a 2-dimensional social space is lacking. The authors argued that the relationship between the reaction times and the distances between faces provides evidence of the formation of internal representations. However, this can be found without the internal representation of the relationships between faces. If the authors seek internal representations of the faces in the brain, it would be important to show that this representation is not simply driven by perceptual differences between bar graphs that participants may recall in association with each face.

      Considering these caveats, it is hard for me to agree if the authors provide evidence to support their claims.

      With regard to the common representation of faces, this is a potential limitation of our paradigm because our current task design didn’t include a stage of face presentation to properly test this question. With regard to the asymmetry between the two dimensions in determining expected value. We think that the prerequisite for identifying six-fold grid-like coding is to have an abstract space formed by orthogonal dimensions, i.e., competence and trustworthiness in our task are not correlated. In addition, the scanner task does not require computation of expected value. However, we do think that it is worth investigating whether the extent to which each dimension contributes to decision-making and inference will distort the grid-like representation of the map. Our prediction is that the entorhinal cortex will maintain a representation of the map invariant to this aspect so that it can support inferences in different contexts where different weights may be assigned to different dimensions. But this will be an interesting hypothesis for future studies to test. We hope that our revision plans with above considerations could address the Reviewer’s comments.

      Reviewer #2 (Public Review)

      Summary:

      In this work, Liang et al. investigate whether an abstract social space is neurally represented by a grid-like code. They trained participants to 'navigate' around a two-dimensional space of social agents characterized by the traits of warmth and competence, then measured neural activity as participants imagined navigating through this space. The primary neural analysis consisted of three procedures: 1) identifying brain regions exhibiting the hexagonal modulation characteristic of a grid-like code, 2) estimating the orientation of each region's grid, and 3) testing whether the strength of the univariate neural signal increases when a participant is navigating in a direction aligned with the grid, compared to a direction that is misaligned with the grid.

      From these analyses, the authors find the clearest evidence of a grid-like code in the prefrontal cortex and weaker evidence in the entorhinal cortex.

      Strengths:

      The work demonstrates the existence of a grid-like neural code for a socially-relevant task, providing evidence that such coding schemes may be relevant for a variety of two-dimensional task spaces.

      Thank you very much again for your careful evaluation and thoughtful comments. Please find our response to the comments below.

      Weaknesses:

      In various parts of this manuscript, the authors appear to use a variety of terms to refer to the (ostensibly) same neural regions: prefrontal cortex, frontal pole, ventromedial prefrontal cortex (vmPFC), and orbitofrontal cortex (OFC). It would be useful for the authors to use more consistent terminology to avoid confusing readers.

      Thanks for pointing out the use of terms, we will try to improve that in the revision of our manuscript.

      Claims about a grid code in the entorhinal cortex are not well-supported by the analyses presented. The whole-brain analysis does not suggest that the entorhinal cortex exhibits hexagonal modulation; the strength of the entorhinal BOLD signal does not track the putative alignment of the grid code there; multivariate analyses do not reveal any evidence of a grid-like representational geometry.

      On a conceptual level, it is not entirely clear how this work advances our understanding of gridlike encoding of two-dimensional abstract spaces, or of social cognition. The study design borrows heavily from Constantinescu et al. 2016, which is itself not an inherent weakness, but the Constantinescu et al. study already suggests that grid codes are likely to underlie two-dimensional spaces, no matter how abstract or arbitrary. If there were a hypothesis that there is something unique about how grid codes operate in the social domain, that would help motivate the search for social grid codes specifically, but no such theory is provided. The authors do note that warmth and competence likely have ecological importance as social traits, but other past studies have used slightly different social dimensions without any apparent loss of generality (e.g., Park et al. 2021). There are some (seemingly) exploratory analyses examining how individual difference measures like social anxiety and avoidance might affect the brain and behavior in this study, but a strong theoretical basis for examining these particular measures is lacking.

      We acknowledge that we used very similar dimensions to the work by Park et al. (2021). While Park and colleagues (2021) took a more innovative and rigorous approach, we tried to stay close to the original design by Constantinescu et al. (2016) with the hope that our work could provide, to some extent, a close replication of their result. Our data was collected before the 2021 paper came out and as the comment points out, we did not find as complete and convincing evidence as in these previous grid-like coding fMRI papers. This may be due to low signal quality in the medial temporal region, we are not entirely sure. But we don’t think our current findings can contradict or disprove previous findings in any way.

      I found it difficult to understand the analyses examining whether behavior (i.e., reaction times) and individual difference measures (i.e., social anxiety and avoidance) can be predicted by the hexagonal modulation strength in some region X, conditional on region X having a similar estimated grid alignment with some other region Y. It is possible that I have misunderstood the authors' logic and/or methodology, but I do not feel comfortable commenting on the correctness or implications of this approach given the information provided in the current version of this manuscript.

      We apologize for not being clear enough in the manuscript and we will improve the clarity in our revision. This exploratory analysis aims to examine if there is any correlation between the strength of grid-like representation of social value map and behavioral indicators of map-like representation; and test if there are any correlation between the strength of grid-like representation of this social value map and participants’ social trait. For the behavioral indicator, we used the distance effect in the reaction time of the comparison task outside the scanner. The closer a pair of avatars are, the more similar they are, hence distinguishing them will be harder and results in longer reaction time when making comparison judgement. If participants are merely memorizing the avatars as six isolated instances without integrating them into a map, all avatars should be equidistant and there wouldn’t be a distance effect. We interpreted stronger grid-like activity as a neural index of better representation of the 2D social space, and we interpreted stronger distance effect as a behavioral index of having better internal map-like representation.

      It was puzzling to see passing references to multivariate analyses using representational similarity analysis (RSA) in the main text, given that RSA is only used in analyses presented in the supplementary material.

      We speculate if RSA in entorhinal ROI would be more sensitive than the wholebrain univariate analysis to identify grid-like code because a previous paper on grid-like code in olfactory space (Bao et al., 2019) didn’t identify grid-like representation with univariate analysis but identified it with RSA analysis. However, we failed to find evidence of grid-like code in the entorhinal ROI aligned to its own putative grid orientation with the RSA approach. We reported this result in the main text to show that we carried out a relatively thorough investigation to test the hypothesis using various approaches and decided to add references to the RSA approach in the main text as well.

      Reviewer #3 (Public Review)

      Liang and colleagues set out to test whether the human brain uses distance and grid-like codes in social knowledge using a design where participants had to navigate in a two-dimensional social space based on competence and warmth during an fMRI scan. They showed that participants were able to navigate the social space and found distance-based codes as well as grid-like codes in various brain regions, and the grid-like code correlated with behavior (reaction times).

      On the whole, the experiment is designed appropriately for testing for distant-based and grid-like codes and is relatively well-powered for this type of study, with a large amount of behavioral training per participant. They revealed that a number of brain regions correlated positively or negatively with distance in the social space, and found grid-like codes in the frontal polar cortex and posterior medial entorhinal cortex, the latter in line with prior findings on grid-like activity in the entorhinal cortex. The current paper seems quite similar conceptually and in design to previous work, most notably by Park et al., 2021, Nature Neuroscience.

      Thanks very much again for your careful evaluation and comments. Please find our response to the comments below.

      Below, I raise a few issues and questions on the evidence presented here for a grid-like code as the basis of navigating abstract social space or social knowledge.

      (1) The authors claim that this study provides evidence that humans use a spatial / grid code for abstract knowledge like social knowledge.

      This data does specifically not add anything new to this argument. As with almost all studies that test for a grid code in a similar "conceptual" space (not only the current study), the problem is that when the space is not a uniform, square/circular space, and 2-dimensional then there is no reason the code will be perfectly grid-like, i.e., show six-fold symmetry. In real-world scenarios of social space (as well as navigation, semantic concepts), it must be higher dimensional - or at least more than two-dimensional. It is unclear if this generalizes to larger spaces where not all part of the space is relevant. Modelling work from Tim Behrens' lab (e.g., Whittington et al., 2020) and Bradley Love's lab (e.g., Mok & Love, 2019) have shown/argued this to be the case. In experimental work, like in mazes from the Mosers' labs (e.g., Derdikman et al., 2009), or trapezoid environments from the O'Keefe lab (Krupic et al., 2015), there are distortions in mEC cells, and would not pass as grid cells in terms of the six-fold symmetry criterion.

      The authors briefly discuss the limitations of this at the very end but do not really say how this speaks to the goal of their study and the claim that social space or knowledge is organized as a grid code and if it is in fact used in the brain in their study and beyond. This issue deserves to be discussed in more depth, possibly referring to prior work that addressed this, and raising the issue for future work to address the problem - or if the authors think it is a problem at all.

      Thanks very much for the references to the papers that we haven’t considered enough in our discussion. We will endeavour to discuss the topic in more depth in our revision. In summary, we raise this discussion point because various research groups have found gridlike representations in 2D artificial conceptual space. We think that the next step for a stronger claim would be to find the representation of more spontaneous non-spatial maps.

      Data and analysis

      (2) Concerning the negative correlation of distance with activation in the fusiform gyrus and visual cortex: this is a slightly puzzling but potentially interesting finding. However, could this be related to reaction times? The larger the distance, the longer the reaction times, so the original finding might reflect larger activations with smaller distances.

      Thanks very much for the suggestion. However, we didn’t find a correlation between response time in the choice stage in the scanner task and the negative distance activation in the fusiform gyrus (Figures below). Meanwhile, the morph period in each trial remains the same, the negative correlation of distance with activation in the fusiform gyrus could also be interpreted as a positive correlation of morphing speed with activation in the fusiform gyrus. Indeed, stronger negative activation indicates larger activation for smaller distances, but we are uncertain what it indicates concerning the functional role of Fusiform in our current task.

      Author response image 1.

      (3) Concerning the correlation of grid-like activity with behavior: is the correlation with reaction time just about how long people took (rather than a task-related neural signal)? The authors have only reported correlations with reaction time. The issue here is that the duration of reaction times also relates to the starting positions of each trial and where participants will navigate to. Considering the speed-accuracy tradeoff, could performance accuracy be negatively correlated with these grid consistency metrics? Or it could be positively correlated, which would suggest the grid signal reflects a good representation of the task.

      We apologize for not being clear enough in the manuscript and we will improve the clarity in our revision. The reaction time used to calculate the distance effect is from a task outside the scanner. The closer a pair of avatars are, the more similar they are, hence distinguishing them will be harder and results in longer reaction time when making comparison judgement. If participants are merely memorizing the avatars as six isolated instances without integrating them into a map, all avatars should be equidistant and there wouldn’t be a distance effect. We interpreted stronger grid-like activity as a neural index of better representation of the 2D social space, and we interpreted stronger distance effect as a behavioural index of having better internal map-like representation. This was the motivation behind this analysis.

      References

      Bao, X., Gjorgieva, E., Shanahan, L. K., Howard, J. D., Kahnt, T., & Gottfried, J. A. (2019). Grid-like Neural Representations Support Olfactory Navigation of a Two-Dimensional Odor Space. Neuron, 102(5), 1066-1075 e1065. https://doi.org/10.1016/j.neuron.2019.03.034

      Constantinescu, A. O., O'Reilly, J. X., & Behrens, T. E. J. (2016). Organizing conceptual knowledge in humans with a gridlike code. Science,352(6292), 1464-1468. https://doi.org/10.1126/science.aaf0941

      Park, S. A., Miller, D. S., & Boorman, E. D. (2021). Inferences on a multidimensional social hierarchy use a grid-like code. Nat Neurosci, 24(9), 1292-1301. https://doi.org/10.1038/s41593-02100916-3

      Park, S. A., Miller, D. S., Nili, H., Ranganath, C., & Boorman, E. D. (2020). Map Making: Constructing, Combining, and Inferring on Abstract Cognitive Maps. Neuron, 107(6), 1226-1238 e1228. https://doi.org/10.1016/j.neuron.2020.06.030

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      The authors present the cryo-EM structure of of PSI-fucoxanthin chlorophyll a/c-binding proteins (FCPs) supercomplex from the diatom Thalassiosira pseudonana CCMP1335 at a global resolution of 2.3 Å. This exceptional resolution allows the authors to construct a near-atomic model of the entire supercomplex and elucidate the molecular details of FCPs arrangement. The high-resolution structure reveals subunits not previously identified in earlier reconstructions and models, as well as sequence analysis of PSI-FCPIs from other diatoms and red algae. Additionally, the authors use their model in conjunction with a phylogenetic analysis to compare and contrast the structural features of the T. pseudonana supercomplex with those of Chaetoceros gracilis, uncovering key structural features that contribute to the efficiency of light energy conversion in diatoms.

      The study employs the advanced technique of single particle cryo-electron microscopy to visualize the complex architecture of the PSI supercomplex at near-atomic resolution and analyze the specific roles of FCPs in enhancing photosynthetic performance in diatoms.

      Overall, the approach and data are both compelling and of high quality. The paper is well written and will be of wide interest for comprehending the molecular mechanisms of photosynthesis in diatoms. This work provides valuable insights for applications in bioenergy, environmental conservation, plant physiology, and membrane protein structural biology.

      We thank you very much for your highly positive evaluation and comments on our manuscript.

      Reviewer #2 (Public Review):

      Summary:

      This manuscript elucidated the cryo-electron microscopic structure of a PSI supercomplex incorporating fucoxanthin chlorophyll a/c-binding proteins (FCPs), designated as PSI-FCPI, isolated from the diatom Thalassiosira pseudonana CCMP1335. Combining structural, sequence, and phylogenetic analyses, the authors provided solid evidence to reveal the evolutionary conservation of protein motifs crucial for the selective binding of individual FCPI subunits and provided valuable information about the molecular mechanisms governing the assembly and selective binding of FCPIs in diatoms.

      Strengths:

      The manuscript is well-written and presented clearly as well as consistently. The supplemental figures are also of high quality.

      Weaknesses:

      Only minor comments (provided in recommendations for authors) to help improve the manuscript.

      We thank you very much for your highly positive evaluation and comments on our manuscript.

      Reviewer #3 (Public Review):

      Summary:

      Understanding the structure and function of the photosynthetic machinery is crucial for grasping its mode of action. Photosystem I (PSI) plays a vital role in light-driven electron transfer, which is essential for generating cellular reducing power. A primary strategy to mitigate light and environmental stresses involves incorporating peripheral light-harvesting proteins. Among various lineages, the number of LHCIs and their protein and pigment compositions differ significantly in PSI-LHCI structures. However, it is still unclear how LHCIs recognize their specific binding sites in the PSI core. This study aims to address this question by obtaining a high-resolution structure of the PSI supercomplex, including fucoxanthin chlorophyll a/c-binding proteins (FCPs), referred to as PSI-FCPI, isolated from the diatom Thalassiosira pseudonana. Through structural and sequence analyses, distinct protein-protein interactions are identified at the interfaces between FCPI and PSI subunits, as well as among FCPI subunits themselves.

      Strengths:

      The primary strength of this work lies in its superb isolation and structural determination, followed by clear discussion and conclusions. However, the interactions among the protein complexes and their relevance in formulating general rules are not definitively established. While efficiency is a crucial aspect, preventing damage is equally important, and currently, we cannot infer this from the provided structures.

      Weaknesses:

      The interactions among the protein complexes and their relevance in formulating general rules are not definitively established. While efficiency is a crucial aspect, preventing damage is equally important, and currently, we cannot infer this from the provided structures.

      We thank you very much for your highly positive evaluation and comments on our manuscript. This study is aimed to decipher the interactions among different protein subunits within the PSI-FCPI supercomplex, from which we wish to draw their relevance in formulating general rules. While we agree that damage is equally important, it is unclear to us what kind of damage you are mentioning, and we consider that this may need to be treated in another publication, as we cannot elucidate everything in one paper.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Line 69: "Diatoms are one of the most important phytoplankton in aquatic environments and contribute to the primary production in the ocean remarkably." Check the sentence, something is missing.

      We modified the sentence as follow:

      "Diatoms are among the most essential phytoplankton in aquatic environments, playing a crucial role in the global carbon cycle, supporting marine food webs, and contributing significantly to nutrient cycling, thus ensuring the health and sustainability of marine ecosystems"

      (2) Supplementary Figure 1B: The SDS-PAGE gel shows multiple bands. Do the authors know the identity of these proteins, or have they considered analyzing the bands using mass spectrometry? The band at ~17 kDa is particularly intense. Could you comment on this? Have you tried running a Native-PAGE gel?

      We did not identify protein bands by MS analysis. The protein bands in the PSI-FCPI supercomplex of this diatom have been identified by Ikeda et al. 2013. The protein bands of our sample were similar to those of Ikeda et al. 2013. To explain this, we modified the sentences and cited Ikeda et al. 2013 in the revised manuscript (lines 89-91).

      "The PSI-FCPI supercomplexes were purified from the diatom T. pseudonana CCMP1335 and analyzed by biochemical and spectroscopic techniques (Fig. S1). Notably, the protein bands of PSI-FCPI closely resembled those reported in a previous study (31)."

      The ~17 kDa protein band appears to be FCPIs, which was identified in Ikeda et al. 2013. We did not perform BN-PAGE of this sample; however, we performed trehalose density gradient centrifugation (Fig. S1A).

      (3) Can the authors comment on the position of the FCPI subunits in the PSI supercomplex in diatoms compared to the arrangement of LHCIs in complex with PSI in cyanobacteria, green algae, and angiosperms? This information would be useful to incorporate into the text.

      We previously compared the PSI-FCPI structures of the diatom C. gracilis to the PSI-LHCI structures of land plant, green alga, and red alga (Nagao et al., 2020). Also, Xu et al. 2020 compared the C. gracilis PSI-FCPI structure to the PSI-LHCI structures of land plant, green alga, and red alga. The binding sites between FCPIs and LHCIs are conserved to some extent. However, our recent study revealed that no orthologous relationship exists among LHCs bound to PSI between primitive red algae and diatoms (Kato et al., 2024). Consequently, we found that the information obtained from structural comparisons alone is extremely limited. To avoid misinterpretation, this study focused on comparing the structures and amino acid sequences of FCPIs between T. pseudonana and C. gracilis.

      (4) Line 104: Despite achieving high resolution, the authors modeled only six lipid densities (the PDB model contains actually 9 lipids, you should correct it in the text). Do you believe this is due to the detergent used for purification? Can you comment on the position, identity, and potential role of the lipids within your model?

      There are 6 lipids associated with the PSI core and 3 with FCP, giving rise to a total of 9 lipids. We have described it in our original text (lines 102-104 in the modified manuscript). Additionally, our structure reveals unidentified densities which likely represent lipids; they are modeled as 88 unknown lipids (UNLs). Thus, there are more lipids in the supercomplex. However, we also observed 4 β-DDM molecules (LMT) in the structure, which are used as detergents. Thus, it is possible that some lipids have dissociated and replaced by detergents. Many of the observed lipids are located between subunits, likely contributing to the stabilization of the complex.

      (5) Line 111: The global resolution is very high. Why does the unknown protein have such low resolution that it was impossible to model it properly and perform de novo identification from the density map? Is it due to a lower abundance of particles with this subunit bound? Have you tried improving this with 3D classification/ focus refinement /density modification?

      The Unknown subunit (UNK) is located peripherally, and its density is significantly lower compared to the neighboring subunits, which may suggest a low abundance. We applied density modification using Topaz for 3D map denoising, but the effect was minimal. As the low abundance of UNK may be the cause, 3D classification and focus refinement also had limited impact.

      (6) Figure 2A: It would be useful to show the density map for the subunit together with the model, especially to demonstrate visualization of the long loop.

      We added the model and map of Psa29 to Figure S4C in the revised manuscript.

      (7) Given the proximity of Psa29 to PsaC, is the protein involved in electron shuttling? If so, could you comment on this? In line 131, you state that Psa29 was not found in other organisms. Can the authors speculate on the potential role of this protein in diatoms?

      We have no idea about the function of Psa29 at present. However, Psa29 does not contain any cofactors, indicating no contribution of it to electron transfer reactions. To understand the function of Psa29, a deletion mutant of this gene is required for examining its functional and physiological roles in diatom photosynthesis. To explain this, we added the following sentences to the revised manuscript (lines 129-133):

      "However, the functional and physiological roles of Psa29 remain unclear at present. It is evident that Psa29 does not have any pigments, quinones, or metal complexes, suggesting no contribution of Psa29 to electron transfer reactions within PSI. Further mutagenesis studies will be necessary to investigate the role of Psa29 in diatom photosynthesis."

      (8) Line 163: "Among the FCPI subunits, only FCPI-1 has BCRs in addition to Fxs and Ddxs (Figure S6A). FCPI-1 is a RedCAP, which belongs to the LHC protein superfamily but is distinct from the LHC protein family (6, 7)." It would be useful if the authors could add the carotenoid model embedded in the cryoEM density map to the figure to show the features that led to modeling BCR instead of other carotenoids. Additionally, it would be helpful to include in the text why RedCAPs differ from LHCIs and their proposed role.

      We added the model and map of two BCRs in FCPI-1 (RedCAP) to Figure S4F in the revised manuscript.

      Phylogenetic analysis showed that RedCAPs are distinct from the LHC protein family. This has been explained in lines 163-164. Also, the functional and physiological roles of RedCAP remain unclear. To explain this, we added the sentence "; however, the functional and physiological roles of RedCAP remain unclear" to the revised manuscript (lines 164-165).

      (9) Line 185: "However, it is unknown (i) whether CgRedCAP is indeed bound to the C. gracilis PSI-FCPI supercomplex and (ii) if a loop structure corresponding to the Q96-T116 loop of TpRedCAP exists in CgRedCAP." Have the authors attempted to model the protein using AlphaFold? If so, are there significant differences? Could you speculate on the absence of RedCAP in C. gracilis? Do you believe it is due to using a different detergent or related to environmental factors?

      We did not model CgRedCAP using AlphaFold. Our recent study “Kato et al. 2024” proposed that CgRedCAP binds to the LHCI-1 site in the PSI-FCPI structure based on sequence comparison. There are two types of PSI-FCPI supercomplexes, one having 16 FCPIs and the other having 24 FCPs, from C. gracilis. The different antenna sizes may depend on the growth conditions of C. gracilis (Nagao et al. 2020). These explanations were already described in the manuscript (lines 243-246).

      (10) Line 193: Figure 8 is mentioned before Figures 4-7.

      We are sorry for the mistake of Figure number. Figure 8 is Supplementary Figure 8, so that we modified Fig. S8B in the revised manuscript.

      (11) Line 223: FCPI-4 interacts only with FCPI-5, primarily through the interaction of Y196/4 with the FCPI-5 backbone. Is this interaction facilitated by other factors such as lipids, carotenoids, or other ligands? Also, FCPI-4 occupies a peculiar position compared to other LHCIs proteins (it is peripheral to FCPI-4 and FCPI-5). Do you believe this could be due to a transient interaction with the complex? Could the presence of this protein be related to the growth conditions experienced by the plant? Are there any literature reports on environmental conditions influencing FCPI arrangements? Including this information in the text would be interesting.

      Y196/4 interacts with only backbones by hydrogen-bond interactions; therefore, other cofactors do not contribute to the interactions.

      We do not believe that the interaction of FCPI-4 is transient; rather, this binding appears to be stable within the complex. Given that the PSI-FCPI supercomplexes were isolated by anion exchange chromatography, FCPI-4 and FCPI-5 are tightly associated within this complex. However, it is important to note that the expression of diatom FCPI proteins can indeed vary depending on growth conditions, as highlighted in our previous study (Nagao et al., 2020). While the peculiar position of FCPI-4 may not be directly related to transient interactions, environmental conditions could still influence the overall arrangement and expression levels of FCPIs. This information has already been described in the manuscript (lines 243-246).

      (12) Given the high resolution of your map, the overall model quality does not seem to match the map quality. Specifically, the clash score (10) and sidechain outliers (3%) are elevated. Could you comment on this? Do you believe it is related to the high number of ligands?

      Our structure contains a total of 295 ligands, including cofactors, detergents, and unknown lipids. We believe the high clash score and number of sidechain outliers are due to the large number of ligands present.

      (13) Supplementary Figure 2: You should show the 3D classes that were discarded.

      According to your comment, we added the 3D classes that were discarded and the sentence "Red boxes highlight selected particles from each 3D classification." to Figure S2 and its legend in the revised manuscript.

      (14) Which masks were used for refinement? How were they generated, and which parameters were chosen? This information should be added to the Materials and Methods section. You should show the masks used during classification, for example.

      We used a 240 Å spherical mask for refinement and classification, without applying any reference mask as input. To explain this, we added the corresponding sentence to Methods in the revised manuscript (lines 347-348) as follow:

      "A 240-Å spherical mask was used during the 3D classification and refinement processes."

      (15) Were any extra proteins detected in the early stages of the cryoEM analysis (i.e., 2D classification) that were discarded? Could you visualize the superior oligomeric states of the supercomplex?

      In the single-particle analysis, no larger particles than the analyzed complex were detected. The results of 2D classification using a sufficiently large spherical mask with a diameter of 320 Å are shown below.

      Author response image 1.

      (16) Have you tried using cryoSPARC for data analysis? If so, could you comment on that?

      We did not use cryoSPARC for data analysis.

      Reviewer #2 (Recommendations For The Authors):

      I have some minor comments below to help improve the manuscript. The line numbers below refer to those in the Word version of the manuscript.

      (1) Figure 1 legend, line 559, "membrane normal"? Panel A and B, structures with the same colors, do they refer to the closely related or interacted parts? For example, the red color for FCP1-1 in A and PsaA in B. If not, the authors may want to clarify it.

      The term 'membrane normal' refers to the direction perpendicular to the surface of a membrane. It is a concept frequently used in physics and biology to describe the orientation relative to the membrane's plane.

      We do not refer to either the closely related or interacted parts used in Figure 1. According to your comments, the colors of subunits were revised in the revised manuscript.

      (2) Line 109-117. "Psa28 is a novel subunit found in the C. gracilis PSI-FCPI structure, and its name follows the nomenclature as suggested previously (31).... After psaZ, the newly identified genes should be named psa27, psa28, etc., and the corresponding proteins are called Psa27, Psa28, etc... Psa28 was also named PsaR in the PSI-FCPI structure of C. gracilis (16)". It is confusing. Was Psa28 named twice, PsaR and Psa28? It would be helpful to add a simple explanation here.

      According to your comment, we modified the sentence as follow (lines 117-118):

      " However, Xu et al. named the subunit as PsaR in the PSI-FCPI structure of C. gracilis "

      (3) Line 134, "One of the Car molecules in PsaJ was identified as ZXT103 in the T. pseudonana PSI-FCPI structure but it is BCR112 in the C. gracilis PSI-FCPI structure (15)". Figure S4D mentioned BCR863 but did not mention BCR112. Figure S4C, D, it may need better explanations of the colors and labels, and indicate which parts are from T. pseudonana or C. gracilis.

      BCR112 was misnumbered; the correct number is BCR103. In response to your comments, we revised Figure S4C and D by labeling the characteristic pigments in the revised manuscript.

      (4) Figure S7, although mentioned in the legend, it would be helpful to label interaction pairs on the figure directly with corresponding colours.

      According to your comments, we modified the Figure and legends in the revised manuscript.

      (5) Figure 3E, it is better to avoid red/green colours in one figure as some readers may be colour-blind. It would also be helpful to label each FCPI with the same colour as its structure on the figure directly.

      According to your comments, we modified Figure 3E in the revised manuscript.

      (6) Line 185, "structures similar to the Q96-T116 loop in TpRedCAP found in the present study (Figure 8B).". The authors refer to Figure S8B? I have the same comment for line 186, Figure 8C.

      We are sorry for the mistake of Figure number. Figure 8 is Supplementary Figure 8, so we modified it as Fig. S8B in the revised manuscript.

      (7) Line 270, "TpLhcq10 cannot bind at the FCPI-2 site". Why not use FCPI-3 for TpLhcq10?

      This means that the gene product of TpLhcq10 binds at the FCPI-3 site but not at the other sites such as FCPI-2. To avoid misreading, we modified the sentence as follows:

      "TpLhcq10 binds specifically at the FCPI-3 site but not at the other sites such as FCPI-2" (lines 278-279)

      Reviewer #3 (Recommendations For The Authors):

      I have no technical or conceptual suggestions at the current stage.

      Thank you.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      The Bagnat and Rawls groups' previous published work (Park et al., 2019) described the kinetics and genetic basis of protein absorption in a specialized cell population of young vertebrates termed lysosome-rich enterocytes (LREs). In this study they seek to understand how the presence and composition of the microbiota impacts the protein absorption function of these cells and reciprocally, how diet and intestinal protein absorption function impact the microbiome.

      Strengths of the study include the functional assays for protein absorption performed in live larval zebrafish, which provides detailed kinetics on protein uptake and degradation with anatomic precision, and the gnotobiotic manipulations. The authors clearly show that the presence of the microbiota or of certain individual bacterial members slows the uptake and degradation of multiple different tester fluorescent proteins.

      To understand the mechanistic basis for these differences, the authors also provide detailed single-cell transcriptomic analyses of cells isolated based on both an intestinal epithelial cell identity (based on a transgenic marker) and their protein uptake activity. The data generated from these analyses, presented in Figures 3-5, are valuable for expanding knowledge about zebrafish intestinal epithelial cell identities, but of more limited interest to a broader readership. Some of the descriptive analysis in this section is circular because the authors define subsets of LREs (termed anterior and posterior) based on their fabp2 expression levels, but then go on to note transcriptional differences between these cells (for example in fabp2) that are a consequence of this initial subsetting.

      Inspired by their single-cell profiling and by previous characterization of the genes required for protein uptake and degradation in the LREs, the authors use quantitative hybridization chain reaction RNA-fluorescent in situ hybridization to examine transcript levels of several of these genes along the length of the LRE intestinal region of germ-free versus mono-associated larvae. They provide good evidence for reduced transcript levels of these genes that correlate with the reduced protein uptake in the mono-associated larval groups.

      The final part of the study (shown in Figure 7) characterized the microbiomes of 30-day-old zebrafish reared from 6-30 days on defined diets of low and high protein and with or without homozygous loss of the cubn gene required for protein uptake. The analysis of these microbiomes notes some significant differences between fish genotypes by diet treatments, but the discussion of these data does not provide strong support for the hypothesis that "LRE activity has reciprocal effects on the gut microbiome". The most striking feature of the MDS plot of Bray Curtis distance between zebrafish samples shown in Figure 7B is the separation by diet independent of host genotype, which is not discussed in the associated text. Additionally, the high protein diet microbiomes have a greater spread than those of the low protein treatment groups, with the high protein diet cubn mutant samples being the most dispersed. This pattern is consistent with the intestinal microbiota under a high protein diet regimen and in the absence of protein absorption machinery being most perturbed in stochastic ways than in hosts competent for protein uptake, consistent with greater beta dispersal associated with more dysbiotic microbiomes (described as the Anna Karenina principle here: https://pubmed.ncbi.nlm.nih.gov/28836573/). It would be useful for the authors to provide statistics on the beta dispersal of each treatment group.

      Overall, this study provides strong evidence that specific members of the microbiota differentially impact gene expression and cellular activities of enterocyte protein uptake and degradation, findings that have a significant impact on the field of gastrointestinal physiology. The work refines our understanding of intestinal cell types that contribute to protein uptake and their respective transcriptomes. The work also provides some evidence that microbiomes are modulated by enterocyte protein uptake capacity in a diet-dependent manner. These latter findings provide valuable datasets for future related studies.

      We thank the Reviewer for their thorough and kind assessment. We appreciate the suggestion for edits and for pointing out areas that needed further clarification.

      One point in need of further explanation is the use fabp6 (referred to as fabp2 by the reviewer) to define anterior LREs and their gene expression pattern, which includes high levels of fabp6, something that was deemed a “circular argument” by the reviewer.  The rationale for using fabp6 as a reference is that we were able to define its spatial pattern in relation to other LRE markers and the neighboring ileocyte population using transgenic markers (Lickwar et al., 2017; Wen et al., 2021). Thus, far from being a circular argument, using fabp6 allowed us to identify other markers that are differentially expressed between anterior and posterior LREs, which share a core program that we highlight in our study. In the revised manuscript, we clarified this point (lines 166 – 169).

      We followed the Reviewer’s suggestion to test if LRE activity and dietary protein affected beta dispersal. Our analyses revealed that beta dispersion was not significantly different between our experimental conditions. We added details about this analysis (lines 384 – 386) and a new supplemental figure panel (Figure S7C).

      Reviewer #2 (Public review):

      Summary:

      The authors set out to determine how the microbiome and host genotype impact host protein-based nutrition.

      Strengths:

      The quantification of protein uptake dynamics is a major strength of this work and the sensitivity of this assay shows that the microbiome and even mono-associated bacterial strains dampen protein uptake in the host by causing down-regulation of genes involved in this process rather than a change in cell type.

      The use of fluorescent proteins in combination with transcript clustering in the single cell seq analysis deepens our understanding of the cells that participate in protein uptake along the intestine. In addition to the lysozome-rich enterocytes (LRE), subsets of enteroendocrine cells, acinar, and goblet cells also take up protein. Intriguingly, these non-LRE cells did not show lysosomal-based protein degradation; but importantly analysis of the transcripts upregulated in these cells include dab2 and cubn, genes shown previously as being essential to protein uptake.

      The derivation of zebrafish mono-associated with single strains of microbes paired with HCR to localize and quantify the expression of host protein absorption genes shows that different bacterial strains suppress these genes to variable extents.

      The analysis of microbiome composition, when host protein absorption is compromised in cubn-/- larvae or by reducing protein in the food, demonstrates that changes to host uptake can alter the abundance of specific microbial taxa like Aeramonas.

      Weaknesses:

      The finding that neurons are positive for protein uptake in the single-cell data set is not adequately discussed. It is curious because the cldn:GFP line used for sorting does not mark neurons and if the neurons are taking up mCherry via trans-synaptic uptake from EECs, those neurons should be mCherry+/GFP-; yet methods indicate GFP+ and GFP+/mCherry+ cells were the ones collected and analyzed.

      We thank the Reviewer for the kind and positive assessment of our work, for suggestions to improve the accessibility and clarity of the manuscript, and for pointing out an issue related to a neuronal population that needed further clarification.

      It turns out that there is a population of neurons that express cldn15la. They are not easily visualized by microscopy because IECs express this gene much more highly. However, the endogenous cldn15la transcripts can be found in neurons as shown in a recently published dataset (PMID: 35108531) as well as in this study We added a discussion point to clarify this issue (lines 463 – 465).

      Reviewer #3 (Public review):

      Summary:

      Childers et al. address a fundamental question about the complex relationship within the gut: the link between nutrient absorption, microbial presence, and intestinal physiology. They focus on the role of lysosome-rich enterocytes (LREs) and the microbiota in protein absorption within the intestinal epithelium. By using germ-free and conventional zebrafishes, they demonstrate that microbial association leads to a reduction in protein uptake by LREs. Through impressive in vivo imaging of gavaged fluorescent proteins, they detail the degradation rate within the LRE region, positioning these cells as key players in the process. Additionally, the authors map protein absorption in the gut using single-cell sequencing analysis, extensively describing LRE subpopulations in terms of clustering and transcriptomic patterns. They further explore the monoassociation of ex-germ-free animals with specific bacterial strains, revealing that the reduction in protein absorption in the LRE region is strain-specific.

      Strengths:

      The authors employ state-of-the-art imaging to provide clear evidence of the protein absorption rate phenotype, focusing on a specific intestinal region. This innovative method of fluorescent protein tracing expands the field of in vivo gut physiology.

      Using both conventional and germ-free animals for single-cell sequencing analysis, they offer valuable epithelial datasets for researchers studying host-microbe interactions. By capitalizing on fluorescently labelled proteins in vivo, they create a new and specific atlas of cells involved in protein absorption, along with a detailed LRE single-cell transcriptomic dataset.

      Weaknesses:

      While the authors present tangible hypotheses, the data are primarily correlative, and the statistical methods are inadequate. They examine protein absorption in a specific, normalized intestinal region but do not address confounding factors between germ-free and conventional animals, such as size differences, transit time, and oral gavage, which may impact their in vivo observations. This oversight can lead to bold conclusions, where the data appear valuable but require more nuance.

      The sections of the study describing the microbiota or attempting functional analysis are elusive, with related data being overinterpreted. The microbiome field has long used 16S sequencing to characterize the microbiota, but its variability due to experimental parameters limits the ability to draw causative conclusions about the link between LRE activity, dietary protein, and microbial composition. Additionally, the complex networks involved in dopamine synthesis and signalling cannot be fully represented by RNA levels alone. The authors' conclusions on this biological phenomenon based on single-cell data need support from functional and in vivo experiments.

      We thank the Reviewer for their assessment and for pointing out some areas that needed to be explained better and/or discussed.

      The Reviewer mentions some potential confounding factors (ie., size differences, transit time, oral gavage) in the gnotobiology experiments. We would like to convey that these aspects have been addressed in our experimental design and are now clarified in the revised manuscript: 1- larval sizes were recorded and found to be similar between GF and monoassociated larvae (Figure S6A); 2- while intestinal transit time may be affected by microbes and is a topic of interest, in our assay luminal mCherry cargo is present at high levels throughout the gut and is not limiting at any point during the experiment; 3- gavage, which is necessary for quantitative assays, is indeed an experimental manipulation that may somehow alter the subjects (the same is true for microscopy and virtually any research method). However, it cannot explain differences between GF and CV or alter our conclusions via microbial or dietary effects. We now elaborate the former point in the revised discussion (line 426). A new panel has been added for Fig.S6 to show that standard length was similar in GF and monoassociated larvae (Figure S6A).

      We are aware that microbial community composition is often highly variable between experiments and this necessitates adequately high biological replication and inclusion of internal controls to allow conclusions to be drawn. Nevertheless, studies evaluating the utility of 16S rRNA gene sequencing have found that this analysis reveals important impacts of environmental factors on the gut microbiome (PMIDs: 21346791, 31409661, 31324413). Our results provide further evidence that 16S rRNA gene sequencing remains a useful method to detect perturbations to the zebrafish gut microbiome. Reproducing previous findings, we detected many of the core zebrafish microbiota strains in our samples that have been identified by other studies (PMIDs: 26339860, 21472014, 17055441). To ensure the robustness of our results, we included several biological replicates for each condition, co-housed genotypes and included large sample sizes to minimize environmental variability between groups. In response to this reviewer concern, we have added a supplemental beta diversity plot and statistical analyses showing that the microbiomes in our larvae were significantly different from the diets or tank water (Figure S7A). This analysis shows that the host environment influenced microbial community composition (lines 376 – 378). We also added an additional supplemental panel and performed analysis showing that the experimental replicates (i.e., different tanks) were not a significant source of variation in this study (lines 378 – 380) (Figure S7B). This result underscores that the microbiota in these larvae were influenced by both the host and diet.

      Regarding dopamine pathways, we acknowledge that it involves complex biology that will require dedicated studies. In this work, we simply point out gene expression patterns we find interesting as they may inform future studies.

      Finally, the Reviewer mentions the use of inadequate statistical methods for some analyses without specifying or indicating alternative analyses, only the need to justify the use of two-way ANOVA is made explicit. In this point, we respectfully disagree and would like to emphasize that we use statistical methods that are standard in the field (PMID: 37707499). We nevertheless added a justification for the use of two-way ANOVA where appropriate (lines 635-637, 653-654, 773-776). The two-way ANOVA test was to compare fluorescence profiles of gavages cargoes or HCR probes along the length of the LRE region. This test accounts for differences in fluorescence between experimental conditions in segments (30 μm) along the LRE region (~300 μm). This allows us to capture differences in fluorescence between experimental conditions while accounting for heterogeneity in the LRE region. Please see our comment below for more information about our use of the 2-way ANOVA.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Please provide in the materials and methods the strain identifiers and sources of the bacteria used in the study.

      Thank you for the suggestions. Strain identifiers and source information were added to the methods (lines 576-579).

      Reviewer #2 (Recommendations for the authors):

      (1) This is a very satisfying and thorough analysis of the reciprocal influence of diet, microbiome, and host genotype on protein absorption by the host. Below I make suggestions that mainly relate to making the paper more accessible to a broader audience.

      (2) Line 233 Starts a section that reports the findings of the scRNA dataset. The writing is inconsistent with respect to how the genes are listed: whether abbreviation only or spelled out followed by abbreviation. I prefer the latter. For example, slc10a2 is a bile acid Na cotransporter but for those not in the know, they would have to look this up. Perhaps adding a supplementary table that provides a gene list of those discussed in the text with abbreviation/spelled-out, and KEGG terms.

      Thank you for pointing out inconsistent gene labeling. We have revised the text with spelled out gene names followed by abbreviations.

      (3) Line 461 Where did the neurons come from when you were sorting cldn+ cells?

      Neuronal expression of cldn15la was detected in our data and other published datasets (PMID: 37995681, 35108531). We added a note to the text clarifying that neuronal cells can express cldn15la (lines 463-465).

      (4) Line 561 1x tricaine should be converted to percentage in solution or concentration throughout.

      The tricaine concentration was 0.2 mg/mL. We added this detail to the methods (line 596).

      (5) Line 612 Please clarify how normalizations are carried out: is it to the peak value in the germ-free condition? CV never reaches 1.

      AUC values were normalized to the peak value in the GF condition at 60 minutes PG. We clarified this step in the methods (lines 618-619).

      (6) Line 654-663 I think mCherry here should be mTourquoise?

      Thank you for catching this typo. We corrected it in the text.

      (7) In Figure 1 Please consider adding a color so that magenta does not represent BOTH germ-free AND mCherry.

      Due to the many colors of fluorescent proteins and HCR probes in this paper, we were not able to find an alternative plot line color to represent GF.

      (8) In Figure 2 I suggest consistency with respect to the order you present GF/CV

      Figure 1 GF->CV

      Figure 2 CV->GF

      My preference is GF->CV

      Images in Figure 2 were re-ordered following reviewer’s recommendation.

      Here, 20 minute time point also appears qualitatively different between GF and CV.

      There can be slight differences in LREs between individuals. These images were selected because they represented the average differences in the amount of mTurquoise degradation activity that occurred between 20 – 60 minutes post-flushing in the GF and CV conditions.

      In Figure 3E Figure legend refers to being able to see BSA in vacuoles. The image should be modified to show this- currently too small.

      In response, we enlarged the confocal microscopy images showing DQ red BSA in the LRE region (Figure 3E). We added a panel with confocal microscopy images of the LREs in 6 dpf larva gavaged with DQ red BSA (Figure S3F). These images show that DQ red BSA fluorescence was localized to the LRE lysosomal vacuole.

      In Figure 5D, Posterior LRE should be pink not green in the key to the right of the heatmap.

      Thank you for catching this error. We have corrected the colors (Figure 5D).

      Reviewer #3 (Recommendations for the authors):

      (1) Introduction and context:

      Expand the introduction to include more background on microbial-mediated protein absorption, with references to relevant findings in Drosophila. This will provide a stronger foundation for the study's contributions to the field.

      Thank you for this suggestion. We added information about microbe-mediated amino acid harvest in Drosophila to the introduction (lines 49-53).

      (12) Methodological suggestions:

      Measure and report differences between germ-free (GF) and conventional (CV) animals, such as transit time, to account for potential confounding factors in protein absorption dynamics.

      We respectfully assert that a transit assay is not required for this study and could actually create confusion as an effect in transit time could be interpreted as a contributing factor when it is in fact not the case due to the experimental design. This is because the concentration of luminal protein was equivalent in GF and CV larvae (Figure S1E), so the LREs had equal saturating access to those proteins in both conditions. Furthermore, we showed the microbiota did not degrade fluorescent protein (Figure S1F). Therefore, we feel confident that there was lower protein uptake in the LREs of CV larvae because the microbiome exerted regulatory effects on LRE activity.

      Provide detailed information on the gating strategy used for single-cell sorting to enhance the dataset's utility and support claims about cell changes.

      The methods we used for sorting cells were previously described (PMID: 31474562). In this manuscript, we describe them under the heading “Fluorescence activated cell sorting for single cell RNA-sequencing.”

      Explain the "GeneRatio" metric in figure legends for clarity.

      The GeneRatio is the ratio of genes associated with each individual GO term to the number of genes associated with the domain. An explanation was added to the caption (Figure S3C).

      (13) Visual and statistical improvements:

      Include images of labeled peptidases within lysosome-rich enterocytes (LREs) to reinforce findings.

      Thank you for the suggestion. We added images of labeled peptidases in the LRE region (Figure S6E-D).

      For Panels 4-F and 5-D, consider using violin plots of selected genes to improve clarity and emphasize major ideas.

      In Figure 4F, the heatmap shows multiple genes were upregulated in mCherry-positive cells. We tried the plotting suggested by the reviewer and felt that violin plots could not convey this message as clearly. Likewise, the heatmap in Figure 5D effectively shows the gradient of expression between ileocytes, anterior and posterior LREs.

      Strengthen statistical analysis by employing more rigorous methods and justifying their selection, such as using two-way ANOVA where appropriate.

      The two-way ANOVA was used to quantify protein uptake or HCR probe fluorescence along the length of the LRE region. This statistical test allowed us to compare differences in fluorescence between experimental conditions in multiple LRE segments (see Authoer response image 1 below for example). As our assays show, the LRE region is heterogenous with segments showing different levels of activity and gene expression. The two-way ANOVA is appropriate because it allows us to account for this heterogeneity by comparing fluorescence across multiple segments.

      Author response image 1.

      Our figures display these fluorescent levels in line plots (above, left) rather than bar plots (above, right). The results are easier to visualize interpret in line plots, and they display the fluorescence profiles in greater detail.

      (14) Technical corrections:

      Correct figure references: Figure 5 about tryptophan metabolism should be 5A, S5G-S5H.

      We corrected the figure references.

      Line 518: Spell out "heterozygotes" instead of using "gets".

      We changed the term from “hets” to “heterozygotes.”

      (15) Revise Figure S2 citation to match the actual figure labeling.

      We corrected the text to indicate “Figure S2” rather than “Figure S2A.”

      Additional manuscript modification

      · Figure panels 3B-C, S3A-B, 4A-C: Two cluster were relabeled with improved descriptors based on our updated annotations. The clusters “Pharynx-esophagus-cloaca 1” (PEC1) and PEC2 were relabeled as “Pharynx-cloaca 1” and “Pharynx-cloaca 2.”

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1:  

      Overall, the conclusions appear appropriately supported by the data, and the data appear of high quality.

      Strengths:

      The particular strengths of the paper include an impressive combination of genomic and imaging-based approaches and insightful genetically engineered cell systems. The manuscript reports interesting and potentially important findings. The text is generally very well written, the ideas are clearly explained, and the reasoning is easy to follow.

      Weaknesses:

      The main weakness seems to be that the heat and ethanol shock approaches likely elicit pleiotropic effects, and therefore it is a challenge to test the causal relationship between various observations. Nevertheless, even as indirect effects might contribute to some of the authors' observations, the results are definitively worth reporting.  

      We agree that these two proteotoxic stresses can impact cell physiology in multiple ways and discuss this on lines 132-143 and 500-519. Moreover, in this revision we have more rigorously quantified the extent of proteotoxic stress elicited by the 39°C heat shock and 8.5% ethanol stress (Figure 1E; see response 1 to Reviewer 2). We have additionally added new Figure 2 that reveals an important difference in the way Hsf1 and its negative regulator, the Hsp70 co-chaperone Sis1, respond to HS and ES. This difference is evident at two different intensities for each stress as described in more detail below (see response 1 to Reviewer 2).

      Presentation of some of the data could be improved.

      We agree and have made improvements/data additions to multiple figures: Figure 1E; Figures 3A, B; Figures 4A, B; Figure 7 (data drawn from original Fig. 6 and Fig. 6 – fig. suppl. 1 and reorganized); Fig. 8B; Figure 9; Figure 10. Corresponding enhancements to the supplemental figures have been made as well. 

      Reviewer #2:  

      (1) The central finding of the study highlights the different dynamics of Hsf1, Pol II, and gene organization in response to heat shock versus ethanol stress. However, one important limitation to consider is that the two chosen conditions may not be directly comparable. For a balanced assessment, the authors should ideally expose yeast to various ethanol concentrations and different heat shock temperatures, ensuring the observed differences stem from the nature of the stressor rather than suboptimal stress intensity. At the very least, an additional single ethanol concentration point on each side of 8.5% should be investigated to ensure that 8.5% is near the optimum. In fact, comparing the number of Hsp104 foci in the two conditions in Fig. 1E and F suggests that the yeast is likely experiencing different intensities of stress for the chosen heat shock condition and ethanol concentration used in this study.

      We thank the reviewer for this important suggestion. In this revision, we have included an enhanced analysis of the yeast cellular response to each of these stresses. As illustrated in revised Figure 1, the two stresses used throughout this study – 39°C heat shock and 8.5% ethanol stress – both elicit a proteotoxic response, as assayed by the de novo formation of Hsp104 clusters. While 10 min exposure to 8.5% ethanol results in the formation of multiple discrete (spherical) foci, a 10 min exposure to the elevated temperature leads the appearance of multiple, largely diffuse Hsp104 clusters, some of which are spherical (new Fig. 1D). The difference in morphology notwithstanding, we have attempted to quantify these clusters using Imaris v. 10.0.1 image analysis software; the results are depicted in Fig. 1E. Such quantification suggests that 8.5% ethanol elicits a more intense stress than exposure to 39°C. A caveat is that it is unclear whether diffuse Hsp104 clusters are comparable to compact Hsp104 foci (see response 3 below).

      Beyond the apparent difference in intensity, a new analysis presented in new Figure 2 reveals that heat shock, elicited by temperature upshift to either 39°C or 42°C, induces relocalization of the J-protein Sis1 – a key negative regulator of Hsf1 – from the nucleoplasm to the nucleolar periphery. Sis1’s perinucleolar ring localization agrees with previous findings of 39°C heat-shocked cells (Feder et al., 2021). Ethanol stress, whether 5% or 8.5%, initially causes Sis1 to relocalize diffusely throughout the nucleus and cytosol. At 10 min, Sis1 localizes to the periphery of the nucleus, thereby providing a marked contrast to what is observed in response to heat shock. These new results are described on lines 174-191.

      Taking these two observations together, we asked whether a less severe ethanol stress (5%) would induce Hsf1 puncta. It does, and as rapidly as 8.5% ethanol (data are presented in revised Figure 8-figure supplement 1). Interestingly, in the presence of 5% ethanol, Hsf1 puncta begin to dissolve at 30 min. This strongly contrasts with the case when cells are exposed to 8.5% ethanol (Figure 8; Figure 8-figure supplement 1). As we state in this revision (lines 414-424), the sustained presence of condensates that we originally observed is likely the consequence of the intensity of the proteotoxic stress elicited by exposure to 8.5% ethanol; analogous responses to these two stress conditions have been observed before (lines 495-501). 

      (2) A second significant concern is the use of the term "Hsf1 condensate". Chowdhary et al.'s 2022 Molecular Cell study highlighted an inhomogeneous distribution and rapid dynamics of Hsf1 clustering upon heat shock, with sensitivity to 1,6-hexandiol, which is interpreted as evidence for condensation by LLPS. However this interpretation has been criticized severely by McSwiggen et al. Genes Dev 2019 and Mussacchio EMBO J 2022. It is important to mention that 1,6-hexandiol is known to affect chromatin organization (Itoh et al. Life Science Alliance 2021). Describing such clusters as 'condensates' without further experimental evidence is premature.  

      While we appreciate and largely agree with the point made by this reviewer, we prefer to maintain the term “condensate”. Banani et al (2017) originally defined “biomolecular condensate” to mean selforganized membrane-free compartments that concentrate specific biomolecules. It was never meant to imply LLPS although its widespread use in the literature has led to that implication. We clarify our use of this term on lines 99-104.   

      (3) Figure 1: Why does ethanol stress at 0 min display a larger number of Hsp104 foci per cell than heat shock at the same time? How are foci defined by the authors? In Fig. 1D, there are many smaller puncta. A comparative assessment of the number and size of foci for heat shock and ethanol stress would be beneficial.

      We thank the reviewer for raising this point and have addressed it as follows.  First, we repeated the assay with a different strain (DPY1561) and increased the number of cells assayed from 40 to 200. This larger sample size created the same T=0 baseline for both stresses (Figure 1E). Second, we define Hsp104 foci as diffraction-limited structures with a diameter of ~0.4 µm (lines 747-749).  Third, employing Imaris v. 10.0.1, we quantified foci size (= volume) and a summary graph has been added to Figure 1E that also displays the number of foci per cell. In the legend to this figure, we point out that to conduct this analysis we assumed that the diffuse Hsp104 clusters seen in HS cells are comparable to the compact Hsp104 foci in ES cells (lines 1169-1171). 

      (4) Figure 2: Selecting a housekeeping gene with consistent expression levels is crucial for meaningful qPCR analysis. Do SCR1 mRNA levels fluctuate during heat shock or ethanol stress?  

      We thank the reviewer for this question. In revised Figure 3 – figure supplement 1C we provide a new graph (reproduced here) revealing that the levels of SCR1 do not significantly change under either heat shock or ethanol stress relative to the non-stressed control (0 min). One-way ANOVA analysis was performed for both HS and ES and p values were 0.094 and 0.083, respectively (calculated using GraphPad Prism 8).

      (5) Additionally, certain genes, such as TMA10 and SSA4, lack visible bars at time 0. Are these levels undetectable? The varying y-axis scales are confusing; presenting data as relative fold changes could offer a clearer perspective.

      Transcript levels for all genes evaluated here are detectable, even in the basal unstressed state. They are not visible on the histogram for certain genes at T= 0 due to the prodigious fold-increase in RNA elicited by heat shock.  However, to address this concern, we have added a bar graph inset displaying basal transcript levels for each gene in revised Figure 3. We reproduce data for SSA4 and TMA10 in the graphs below. In addition, we present transcript levels in new Figure 3 - figure supplement 1 for cells subjected to ethanol stress to allow a better appreciation of their increase over time. 

      Author response image 1.

      (6) Line 239: The evidence for chromatin compaction is unconvincing. An increase in H3 occupancy by ChIP might indicate a reduction in histone exchange dynamics but may not relate to overall chromatin compaction. The authors use H2A-mCherry to suggest a decrease in chromatin volume, but this data is not persuasive. Did the authors observe any changes in nuclear size? Perhaps quantifying chromatin compaction more directly, using signal intensity per volume, would be informative.

      To address this concern, we attempted to quantify integrated density for H2A-mCherry using Image J software. While the volume decreased for both stresses, the integrated density only increased for ethanol stress. We speculate that this may be due to photobleaching which has been reported for heat shock. The combination of heat and acidic pH contribute to loss of fluorescence signal (Alkaabi et al., 2005). While the integrated density supports the idea of global chromatin compaction in the ethanol stress condition, given the above concerns with the HS sample we elected to not present these data.

      (7) Line 340: The claim of a "strong spatiotemporal correlation" isn't evident from the data. Could correlation coefficients be provided? There is potential anti-correlation in Fig. 6 - Figure Supplement 1C.

      We thank the reviewer for this excellent suggestion. We now present an analysis of the correlation between HSP104 – HSP12 coalescence and HSP104 transcription for both HS and ES time courses, using single cell data of Figures 7D, 7E and Figure 7- suppl. 1D.  This analysis is presented in new Figure 7F.

      (8) Figure 8: The WT data in Fig 8 seem inconsistent with Fig. 4 (e.g. the interaction frequency for HSP104 and SSA2). Are these fluctuations between experiments, or are they side effects of IAA treatment? The use of ethanol as an IAA solvent vehicle raises concerns. It would be beneficial if the authors could demonstrate that 1.7% ethanol in the control does not induce ethanol stress.

      We acknowledge that there existed an inconsistency in the magnitude of intergenic interaction frequencies reported in the two experiments for HSP104 and SSA2. Some of this might be attributed to the fact that different strains were used, W303-1B in Figure 4 and LRY016 (W303-1B; LEU2::pGPD1osTIR1) in Figure 8. Nonetheless, in each experiment there was a prodigious fold-increase in interaction frequency over the no stress (T= 0 min) control for both HS and ES conditions and moreover, in each experiment the magnitude of this interaction was greater for the 2.5 min HS sample vs. the 10 min ES sample. However, to obviate this concern, we have removed the HSP104-SSA2 analysis from Figure 9 (corresponds to original Fig. 8).

      Regarding the second point, we cannot entirely rule out the concern that the 1.7% ethanol vehicle might impact 3C interaction frequencies. It is unlikely to be significant, however, given that most other pairwise tests evaluated in the two experiments (Figs. 5 and 9) resulted in similar 3C values. In particular, there was no consistent trend towards higher (or lower) interaction frequencies in the IAA experiment of Fig. 9.  

      Reviewer #3:  

      This is an interesting manuscript that builds off of this group's previous work focused on the interface between Hsf1, heat shock protein (HSP) mRNA production, and 3D genome topology. Here the group subjects the yeast Saccharomyces cerevisiae to either heat stress (HS) or ethanol stress (ES) and examines Hsf1 and Pol II chromatin binding, Histone occupancy, Hsf1 condensates, HSP gene coalescence (by 3C and live cell imaging), and HSP mRNA expression (by RT-qPCR and live cell imaging). The manuscript is well written, and the experiments seem well done, and generally rigorous, with orthogonal approaches performed to support conclusions…While identifying a mechanistic basis for the results [presented here] would be a tough task perhaps beyond the scope of this study, it would nevertheless be helpful to place these results in context with a series of other studies…importantly, this work left out PMID: 32015439 (HSF1 phase transition mediates stress adaptation and cell fate decisions) which is particularly relevant considering that it shows that it is human HSF1 condensate resolution rather than simple condensate formation that is associated with HSF1 transcriptional activity - which is similar to the findings here with this particular dose of HS resulting in resolution and high transcriptional activity versus ES resulting in resolution failure and lower activity. 

      We thank the Reviewer for pointing out this oversight. In this revision, we cite Gaglia et al., 2020 and several others reporting HSF1 foci formation in human cells exposed to heat shock. The single cell analysis of Gaglia et al argued that dissolution of large HSF1 foci (aka “nuclear stress bodies”), typically several µm in diameter and localized over satellite III DNA repeats (Jolly et al., 1997, 2002), correlates with HSP gene activation. Importantly, these condensates are postulated to act as reservoirs of HSF1, sequestered away from HSP genes (Gaglia et al., 2020).  In contrast, Zhang et al., 2022 has shown that human HSF1 inducibly forms small condensates (~300 nm) that localize over HSP genes and whose formation directly correlates with HSP gene activation (we discuss the Jolly, Gaglia and Zhang findings on lines 382-394). Likewise, our work shows that in yeast, Hsf1 inducibly forms small, dynamic clusters that colocalize with HSR genes within 2.5 min of exposure to elevated temperature; these dissolve ~20-60 min later (Figure 8 and Figure 8-supp. 1). In concert with Hsf1 condensate formation, HSR gene repositioning and transcription/ Pol II recruitment are likewise evident within 2.5 min. Therefore, in HS cells there exists coordinate induction of condensate formation, Pol II recruitment, transcription and intergenic interactions (for a detailed kinetic analysis of HSR gene interactions, see Figures 5 and 6 of Chowdhary et al, 2017).  This tight temporal relationship is absent in ethanol stressed cells (Figures 3, 4, 5, 6, 7, 8; summarized in Figure 10 and Table 1).

      It is also worth noting that the stresses themselves are quite different - ethanol can be used as a carbon source and so beyond inducing proteotoxic stress, the yeast are presumably adapting to this distinct metabolic state. Basically, it is not clear whether these differences are due to the dose of stress, versus we are looking at an early timepoint as ES initiates a genome-wide chromatin restructuring and gene expression reprogramming that goes beyond a response to proteotoxic stress. This reviewer is not suggesting a barrage of new experiments, but perhaps discussion points to contextualize results.

      We thank the reviewer for this suggestion and in our revised manuscript discuss these issues (lines 414424 and 486-498 [5% vs. 8.5% ethanol]; lines 500-519 [ethanol as a metabolite]).

      Recommendations for the authors:

      Reviewer #1:

      (1) In Figure 1E, the number of foci in control (0 min) cells is very different for the two conditions. Could the authors clarify/check this? Based on the mean numbers at time point 0, the control cells for the ethanol treatment already contain about 10-20 Hsp104 foci, compared to around 5 foci per cell in the control for heat shock.

      We thank the reviewer for raising this point and have repeated the assay with a different strain (DPY1561).  And as shown in Figure 1E, have confirmed that the control samples have similar number of foci.  

      (2) In the same Figure 1E, is the P-value relative to the control or the same time point in the other treatment? A comparison across treatments would be necessary to support the claim in lines 168-171 of the text.

      The statistical analysis (Mann Whitney test) was performed by comparing each stress timepoint to the no stress control. We clarify this in the figure legend. 

      (3) In Figure 1D, the heat-shock condition shows the same cells that are used in the control, but the cells in the ethanol-shock condition are different. This is a bit visually misleading compared to the experimental setup shown in panel 1C. The authors could show the control cells for the ethanol condition as well.

      We thank the reviewer for this excellent suggestion and have added the 0 min image for the ethanol stress conditions.

      (4) In Figure 7B adding images at 60min would help underscore the point that the condensates are stable in ethanol shocked cells.

      We appreciate this suggestion as well and have included a 60 min timepoint for both stresses (Figure 8B). 

      Reviewer #2:

      (1) Line 113: Has it not been established that yeast Hsf1 is constitutively trimeric?

      In yeast, only a fraction of Hsf1 is thought to be constitutively trimeric and it is this species that binds high-affinity HSEs even under non-stressful conditions (Giardina & Lis, 1995; Pincus et al., 2018). We have added this clarification to the text (lines 121-123). 

      (2) Ethanol can precipitate proteins, especially in rich media like YPD. Did the authors notice any protein precipitation? If yes, how do they account for effects due to nutrient loss by precipitation?

      This is an interesting point, but we did not notice any precipitates in either rich or synthetic liquid media containing 8.5% (v/v) ethanol for any of the time points used in the experiments.

      (3) Figure 3: The figure appears incomplete. Can enhancer, promoter, coding region, and 3'UTR be shown consistently for all genes examined?

      In response to this point, we have simplified this figure (new Fig. 4) by uniform presentation of factor occupancy at enhancer, promoter, and coding region loci for all but one of the genes evaluated. For HSP12 (330 bp), we were unable to distinguish promoter from coding region since the average sonicated chromatin fragment obtained using a Bioruptor is ~300 bp. Therefore, we evaluated only the HSP12 coding region for Pol II and histone H3 occupancy. 

      (4) Figure 4: The comparison between heat shock at 2.5 min and ethanol stress at later points is puzzling. Why not use consistent time points as in Fig. 3?

      Time points for the two stresses examined in this figure (new Fig. 5) were selected to represent times of peak intergenic interaction between HSR genes. These times were derived from our earlier analysis of 3C interactions during a heat shock time course (Figs. 5, 6 of Chowdhary et al., 2017) and ES data presented in this study, including Fig. 4 (Pol II ChIP time course) and Fig. 6 (3C time course). Data presented in Figs. 5 and 6 are consistent with the notion that intergenic interactions in cells subjected to ethanol stress are delayed relative to those observed in heat shocked cells, peaking in most cases at ~10 min (vs. ~2.5 min for heat stress (Chowdhary et al., 2017)).  

      (5) Figure 5: Fig. 5B top panel seems to show color inconsistencies for bars at 0 and 120 min. Also, the xaxis on the top left panel seems to have a typo; should it read "10," not "0?"

      We thank the reviewer for the observation. We changed the graphs in new Figure 6 to display the same color for all time points.  We also fixed the typo. 

      (6) Line 302: The evidence presented supports maximal mRNA levels, but the claim of "maximal transcription" requires support from nascent RNA analysis.

      We agree that RT-qPCR measures mRNA abundance, not nascent transcription. We have changed the text to refer to “transcript levels” where pertinent (lines 301-302; 1331-1332).

      (7) How long do loci remain coalescent during heat shock versus ethanol stress? Both 3C and imaging analyses do not differentiate between frequency and duration, which seems essential for understanding interaction dynamics.

      We thank the reviewer for this excellent question. In new Fig. 7D,E (data drawn from Fig. 6 – fig. suppl. 1), HSR gene coalescence detected in single cells over a HS or ES time course is charted.  Interpretable data exist for a small number of cells. Moreover, for both HS and ES states, in certain cells coalescence between the representative Hsf1 target genes HSP104 and HSP12 dissolves and then reappears. With this caveat in mind, the data suggest that HSP104-HSP12 coalescence can last at least 15 min in HS cells and up to 30 min in ES cells. We have not emphasized this point in the manuscript since a far more comprehensive analysis – beyond the scope of this study – is required.

      (8) For longer analyses, how do the authors accommodate potential ethanol concentration changes due to evaporation?

      For liquid cultures, we relied on maintaining minimal changes in the vapor pressure within the experimental vessel; to facilitate that, flasks were tightly covered to minimize evaporation and temperature was kept at 25°C. For most molecular analyses (RT-qPCR, ChIP, 3C), we confined our analysis to the first 60 min. For microscopy, the samples were encased within a concave slide, covered by a coverslip, as illustrated below. In addition, to tightly seal the coverslip on the slide we used petrolatum.  This arrangement minimized evaporation.

      Author response image 2.

      (9) Figure 9: This legend seems to have an incomplete sentence: "(represented using ...)."

      We have substituted an entirely new model in this revised manuscript (new Figure 10) that omits the use of an ellipsis. (We had used it to symbolize a delay in the appearance of HSR gene transcription in ES cells.)

      References  

      Alkaabi, K. M., Yafea, A., & Ashraf, S. S. (2005). Effect of pH on thermal- and chemical-induced denaturation of GFP. Applied Biochemistry and Biotechnology, 126(2), 149–156. https://doi.org/10.1385/ABAB:126:2:149

      Chowdhary, S., Kainth, A. S., & Gross, D. S. (2017). Heat Shock Protein Genes Undergo Dynamic Alteration in Their Three-Dimensional Structure and Genome Organization in Response to Thermal Stress. Molecular and Cellular Biology, 37(24), 1–23. https://doi.org/10.1128/mcb.00292-17

      Feder, Z. A., Ali, A., Singh, A., Krakowiak, J., Zheng, X., Bindokas, V. P., Wolfgeher, D., Kron, S. J., & Pincus, D. (2021). Subcellular localization of the J-protein Sis1 regulates the heat shock response. Journal of Cell Biology, 220(1), e202005165. https://doi.org/10.1083/JCB.202005165

      Gaglia, G., Rashid, R., Yapp, C., Joshi, G. N., Li, C. G., Lindquist, S. L., Sarosiek, K. A., Whitesell, L., Sorger, P. K., & Santagata, S. (2020). HSF1 phase transition mediates stress adaptation and cell fate decisions. Nature Cell Biology, 22(2), 151–158. https://doi.org/10.1038/s41556-019-0458-3

      Giardina, C., & Lis, J. T. (1995). Dynamic protein-DNA architecture of a yeast heat shock promoter. Molecular and Cellular Biology, 15(5), 2737–2744. https://doi.org/10.1128/mcb.15.5.2737

      Jolly, C., Konecny, L., Grady, D. L., Kutskova, Y. A., Cotto, J. J., Morimoto, R. I., & Vourc’h, C. (2002). In vivo binding of active heat shock transcription factor 1 to human chromosome 9 heterochromatin during stress. Journal of Cell Biology, 156(5), 775–781. https://doi.org/10.1083/jcb.200109018

      Jolly, C., Morimoto, R. I., Robert-Nicoud, M., & Vourc’h, C. (1997). HSF1 transcription factor concentrates in nuclear foci during heat shock: Relationship with transcription sites. Journal of Cell Science, 110(23), 2935–2941. https://doi.org/10.1242/jcs.110.23.2935

      Pincus, D., Anandhakumar, J., Thiru, P., Guertin, M. J., Erkine, A. M., & Gross, D. S. (2018). Genetic and epigenetic determinants establish a continuum of Hsf1 occupancy and activity across the yeast genome. Molecular Biology of the Cell, 29(26), 3168–3182. https://doi.org/10.1091/mbc.E18-060353

      Zhang, H., Shao, S., Zeng, Y., Wang, X., Qin, Y., Ren, Q., Xiang, S., Wang, Y., Xiao, J., & Sun, Y. (2022). Reversible phase separation of HSF1 is required for an acute transcriptional response during heat shock. Nature Cell Biology, 24(3), 340–352. https://doi.org/10.1038/s41556-022-00846-7

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1:

      This study of mixed glutamate/GABA transmission from axons of the supramammillary nucleus to dentate gyrus seeks to sort out whether the two transmitters are released from the same or different synaptic vesicles. This conundrum has been examined in other dual-transmission cases and even in this particular pathway, there are different views. The authors use a variety of electrophysiological and immunohistochemical methods to reach the surprising (to me) conclusion that glutamate and GABA- filled vesicles are distinct yet released from the same nerve terminals. The strength of the conclusion rests on the abundance of data (approaches) rather than the decisiveness of any one approach, and I came away believing that the boutons may indeed produce and release distinct types of vesicles, but have reservations. 

      We thank the reviewer for his/her evaluation of our work. At present, several studies reported that a variety of combinations of two transmitters are co-released from different synaptic vesicles in the central nervous system. In this regard, we think the cotransmission of glutamate/GABA from different synaptic vesicles is not surprising. To better explain to the reader how much we know about co-release of dual transmitters in the brain, we have now added new sentences describing segregated co-release of two neurotransmitters in other synapses in the Introduction (line 63-80).

      Accepting the conclusion, one is now left with another conundrum, not addressed even in the discussion: how can a single bouton sort out VGLUTs and VIAATs to different vesicles, position them in distinct locations with nm precision, and recycle them without mixing? And why do it this way instead of with single vesicles having mixed chemical content? For example, could a quantitative argument be made that separate vesicles allow for higher transmitter concentrations? I feel the paper needs to address these problems with some coherent discussion, at minimum. 

      Although these questions are very important and interesting to address, little is known about molecular mechanisms how VGluT2 and VIAAT are sorted to different vesicles and each synaptic vesicle is segregated. That is why we had not mentioned the sorting mechanisms in the original manuscript. Nevertheless, in response to the reviewer’s suggestion, we have now added new sentences describing possible mechanisms for the sorting and segregation of VGluT2 and VIAAT in the Discussion (line 439-462).

      As for the question regarding why glutamate and GABA are released from different synaptic vesicles, we mentioned the functional roles of separate release of two transmitters over release from single vesicles several times in the Introduction (line 94100), Results (line 300-302), and Discussion (line 406-408, 521-522). Although it seems to be an interesting point to think about transmitter concentrations in the vesicles, we think this issue is beyond the scope of the present study. Given that manipulation of vesicular transmitter contents is technically possible (Hori and Takamori, 2021), this issue awaits further investigation.

      Major concerns: 

      (1) Throughout the paper, the authors use repetitive optogenetic stimulation to activate SuM fibers and co-release glutamate and GABA. There are several issues here: first, can the authors definitively assure the reader that all the short-term plasticity is presynaptic and not due to ChR2 desensitization? This has not been addressed. Second, can the authors also say that all the activated fibers release both transmitters? If for example 20% of the fibers retained a onetransmitter identity and had distinct physiological properties, could that account for some of the physiological findings? 

      Thank you for raising this important point. To examine whether repetitive light illumination induces ChR2 desensitization, the fiber volley was extracellularly recorded. We found that paired-pulse or 10 stimuli at 5, 10, and 20 Hz reliably evoked similar amplitudes of fiber volley during light stimulation. These results clearly indicate that repetitive light stimulation can reliably activate ChR2 and elicit action potentials in the SuM axons. These new findings are now included in Figure 1-figure supplement 2 and Figure 5-figure supplement 2. We also previously demonstrated that by direct patch-clamp recordings from ChR2-expressing hippocampal mossy fiber terminals, 125 times light stimulation at 25 Hz reliably elicited action potentials (Fig. S1: Fukaya et al., 2023). Therefore, we believe that if expression level of ChR2 is high, activation of ChR2 induces action potentials in response to repetitive light stimulation and mediates synaptic transmission with high efficiency.

      We found that most of the SuM terminals (95%) have both VGluT2 and VIAAT (Figure 1E). This anatomical evidence strongly indicates that most of the SuM terminals have the ability to release both glutamate and GABA, and the SuM fibers having one transmitter identity should be minor populations.

      (2) PPR differences in Figures 1F-I are statistically significant but still quite small. You could say they are more similar than different in fact, and residual differences are accounted for by secondary factors like differential receptor saturation. 

      In this experiment, the light intensity was adjusted to yield less than 80% of the maximum response as described in the method section of original and revised manuscript, minimizing the possibility of receptor saturation. We also excluded the possibility that PPR differences could be attributed to differential receptor saturation and desensitization by using a low-affinity AMPA receptor antagonist and a low-affinity GABAA receptor antagonist (Figure 5-figure supplement 3). These results indicate that PPR differences are mediated by the presynaptic origin.

      (3) The logic of the GPCR experiments needs a better setup. I could imagine different fibers released different transmitters and had different numbers of mGluRs, so that one would get different modulations. On the assumption that all the release is from a single population of boutons, then either the mGluRs are differentially segregated within the bouton, or the vesicles have differential responsiveness to the same modulatory signal (presumably a reduced Ca current). This is not developed in the paper. 

      Based on our minimal stimulation results and anatomical analysis, we believe that many SuM terminals contain both glutamate and GABA. Therefore, both transmissions are able to be modulated by mGluRs and GABAB receptors within the same terminals. As the reviewer pointed out, differential responsiveness of glutamate-containing and GABA-containing vesicles to the GPCR signal could be one of the molecular mechanisms for differential effects of GPCRs on EPSCs and IPSCs. In addition, the spatial coupling between GPCRs and active zones for glutamate and GABA in the same SuM terminals may be different, which may give rise to differential modulation of glutamate and GABA release. These possible mechanisms are now described in the Discussion (line 469-476).

      (4) The biphasic events of Figures 3 and S3: I find these (unaveraged) events a bit ambiguous. Another way to look at them is that they are not biphasic per se but rather are not categorizable. Moreover, these events are really tiny, perhaps generated by only a few receptors whose open probability is variable, thus introducing noise into the small currents. 

      We agree with the reviewer that some events are tiny and some small currents could be masked by background noise. We understand that detecting the biphasic events by minimal stimulation has technical limitations. Because we automatically detected biphasic events, which were defined as an EPSC-IPSC sequence, only if an outward peak current following an inward current appeared within 20 ms of light illumination as described in the method section, we cannot exclude the possibility that the biphasic events we detected might include false biphasic responses. To compensate these technical issues, we also performed strontium-induced asynchronous release as another approach and found similar results as minimal stimulation experiments (Figures 3E and 3F). Furthermore, we confirmed that the amplitudes and kinetics of minimal light stimulation-evoked EPSCs or IPSCs were not altered by blockade of their counterpart currents (Figure 3-figure supplement 2). Even if false biphasic responses were accidentally included in the analysis, eventually biphasic events are a minor population and we successfully detected discernible independent EPSCs and IPSCs, which were the major population of uniquantal release-mediated synaptic responses. Thus, multiple pieces of evidence support distinct release of glutamate and GABA from SuM terminals.

      (5) Figure 4 indicates that the immunohistochemical analysis is done on SuM terminals, but I do not see how the authors know that these terminals come from SuM vs other inputs that converge in DG. 

      We thank the reviewer for raising an important point. As shown in Figure 4A, B, almost all VGluT2-positive terminals in the GC layer co-expressed with VIAAT. We are aware that VTA neurons reportedly project to the GC layer of the DG and co-release glutamate and GABA (Ntamati and Luscher, 2016). Contrary to this report, our retrograde tracing analysis did not reveal direct projections from the VTA to the DG. This new data is now included in Figure 4-figure supplement 1. We also added pre-embedding immunogold EM analysis, in which SuM terminals were virally labeled with eYFP, confirming that they form both asymmetric and symmetric synapses (revised Figure 4F). Together with these new data, our results clearly demonstrate that SuM terminals in the GC layer form both asymmetric and symmetric synapses. While our results strongly suggest that VGluT2positive terminals and SuM terminals in the GC layer are nearly identical, we cannot fully exclude the possibility that other inputs originating from unidentified brain regions may co-express VGluT2 and VIAAT in the GC layer. Therefore, in Figure 4 of the revised manuscript, we described “VGluT2-positive terminals” instead of “SuM terminals”.

      (6) Figure 4E also shows many GluN1 terminals not associated with anything, not even Vglut, and the apparent numbers do not mesh with the statistics. Why? 

      In triple immunofluorescence for VGluT2, VIAAT, and GluN1, free GluN1 puncta were predominantly observed in the molecular layer. Given that VGluT2-positive terminals are sparse in the molecular layer, these GluN1 puncta are primarily associated with VGluT1, the dominant subtype. In this study, we focused the analysis of GluN1 puncta specifically on the GC layer, excluding the molecular layer. To avoid miscommunication, we changed the original Figure 4E to the new Figure 4G, which focuses on the GC layer and aligns with the quantitative analysis. Additionally, we used ultrathin sections (100-nm-thick) to enhance spatial resolution, which limits the detection of co-localization events within this confined spatial range, as noted in the Discussion (line 485-488).

      (7) Do the conclusions based on the fluorescence immuno mesh with the apparent dimensions of the EM active zones and the apparent intermixing of labeled vesicles in immuno EM? 

      To further support our immunofluorescence results, we performed EM study and found that a single SuM terminal formed both asymmetric and symmetric synapses on a GC soma (revised Figures 4E and 4F). These new data and our immunofluorescence results clearly indicate that a single SuM terminal forms both glutamatergic and GABAergic synapses on a GC and co-release glutamate and GABA. 

      As the reviewer pointed out, our immuno EM shows that VGluT2 and VIAAT labeled vesicles appear to intermix in asymmetric and symmetric synapses. Accordingly, in the revised manuscript, Figure 7 has been modified to show the intermixing of glutamate and GABA-containing vesicles in the SuM terminal. It should be noted that because of low labeling efficiency, our immuno-EM images don’t represent the whole picture of synaptic vesicles for glutamate and GABA. There could be biased distribution of vesicles close to their release site (more VGluT2-containing vesicles close to asymmetric synapses and more VIAAT-containing vesicles close to symmetric synapses) as reported previously (Root et al., 2018). Additionally, our results could be explained by other mechanisms: co-release of glutamate and GABA from the same vesicles, with one transmitter undetected due to the absence of its postsynaptic receptor. This possibility is now mentioned in the Discussion (line 512-520). More detailed vesicle configuration in a single SuM terminal will have to be investigated in future studies.

      (8) Figure 6 is not so interesting to me and could be removed. It seems to test the obvious: EPSPs promote firing and IPSPs oppose it. 

      We believe these results are necessary for the following two reasons. First, we showed that glutamate/GABA co-transmission balance is dynamically changed in a frequency-dependent manner (Figure 5). In terms of physiological significance, it is important to demonstrate how these frequency-dependent dynamic changes affect GC firing. Therefore, we believe that figure 6, which shows how SuM inputs modulate GC firing by repetitive SuM stimulation, is necessary for this paper. Second, we previously reported the excitatory effects of the SuM inputs on GC firing, suggesting the important roles of glutamatergic transmission of the SuM inputs in synaptic plasticity (Hashimotodani et al., 2018; Hirai et al., 2022; Tabuchi et al., 2022). In contrast, how GABAergic cotransmission contributes to SuM-GC synaptic plasticity and DG information processing was not well understood. Our results in figure 6, which demonstrate the inhibitory effects of GABAergic co-transmission on GC firing by high frequency repetitive SuM input activity, clearly show the contribution of GABAergic co-transmission to short-term plasticity at SuM-GC synapses. For these reasons, we would like to keep Figure 6. We hope that our explanations convince the reviewer. 

      Reviewer #2:

      Summary:

      In this study, the authors investigated the release properties of glutamate/GABA co-transmission at the supramammillary nucleus (SuM)-granule cell (GC) synapses using in vitro electrophysiology and anatomical approaches at the light and electron microscopy level. They found that SuM to dentate granule cell synapses, which co-release glutamate and GABA, exhibit distinct differences in paired-pulse ratio, Ca2+ sensitivity, presynaptic receptor modulation, and Ca2+ channel-vesicle coupling configuration for each neurotransmitter. The study shows that glutamate/GABA co-release produces independent glutamatergic and GABAergic synaptic responses, with postsynaptic targets segregated. They show that most SuM boutons form distinct glutamatergic and GABAergic synapses in close proximity, characterized by GluN1 and GABAAα1 receptor labeling, respectively. Furthermore, they demonstrate that glutamate/GABA co-transmission exhibits distinct short-term plasticity, with glutamate showing frequencydependent depression and GABA showing frequency-independent stable depression. 

      Their findings suggest that these distinct modes of glutamate/GABA co-release by SuM terminals serve as frequency-dependent filters of SuM inputs. 

      Strengths:

      The conclusions of this paper are mostly well supported by the data. 

      We thank the reviewer for their positive and constructive comments on our manuscript.

      Weaknesses: 

      Some aspects of Supplementary Figure 1A and the table need clarification. Specifically, the claim that the authors have stimulated an axon fiber rather than axon terminals is not convincingly supported by the diagram of the experimental setup. Additionally, the antibody listed in the primary antibodies section recognizes the gamma2 subunit of the GABAA receptor, not the alpha1 subunit mentioned in the results and Figure 4. 

      We have now answered these questions in recommendations section below.

      Reviewer #3:

      Summary: 

      In this manuscript, Hirai et al investigated the release properties of glutamate/GABA cotransmission at SuM-GC synapses and reported that glutamate/GABA co-transmission exhibits distinct short-term plasticity with segregated postsynaptic targets. Using optogenetics, whole-cell patch-clamp recordings, and immunohistochemistry, the authors reveal distinct transmission modes of glutamate/GABA co-release as frequency-dependent filters of incoming SuM inputs. 

      Strengths: 

      Overall, this study is well-designed and executed; conclusions are supported by the results. This study addressed a long-standing question of whether GABA and glutamate are packaged in the same vesicles and co-released in response to the same stimuli in the SuM-GC synapses (Pedersen et al., 2017; Hashimotodani et al., 2018; Billwiller et al., 2020; Chen et al., 2020; Li et al., 2020; Ajibola et al., 2021). Knowledge gained from this study advances our understanding of neurotransmitter co-release mechanisms and their functional roles in the hippocampal circuits. 

      Weaknesses:

      No major issues are noted. Some minor issues related to data presentation and experimental details are listed below. 

      We appreciate the reviewer’s positive view of our study. We responded in more detail in recommendations section below.

      Recommendations for the authors:

      Reviewer #1:

      (1) The blue color for VIAAT in panel 1C is extremely hard to see. 

      Thank you for pointing out. We have changed to the cyan color for VIAAT in Figure 1C and D in the revised manuscript.

      (2) Line 329 "perforant" not "perfomant".  

      We appreciate the reviewer’s careful attention. In the revised manuscript, we corrected this misword.

      Reviewer #2:

      To convincingly demonstrate that the authors stimulated SuM axon fiber instead of SuM terminals (Supplementary Figures 1A), they should provide an image showing the distribution of SuMlabeled fibers and axon terminals reaching the dentate gyrus (DG) and the trace of the optic fiber, rather than providing a diagram of the experimental setup. 

      We appreciate the reviewer’s suggestion. We have now provided a new experimental setup image (Figure 1-figure supplement 1A) showing a single GC, the distribution of SuM fibers in the GC layer, and the illumination area at each location. As SuM inputs make synapses onto the GC soma and dendrite close to the GC cell body, SuM-GC synapses in the recording GCs exist in a very limited area. This characteristic synaptic localization allowed us to control the illumination area without applying light to the SuM terminals in the recording GCs. Delayed onsets of EPSCs/IPSCs by over-axon stimulation (Figure 1-figure supplement 1C, D) also support that SuM terminals in the recording GCs were out of illumination area.

      Additionally, the authors should clarify the discrepancy between the antibody mentioned in the list of primary antibodies, which recognizes the gamma2 subunit of the GABAA receptor, and the alpha1 subunit of the GABAA receptor mentioned in the results and Figure 4. 

      We apologize for this mistake. As described in the main text and figure, we used the antibody for a1 subunit of the GABAA receptor. Table S1 has been corrected in the revised version of the paper.

      Reviewer #3:

      (1) In Figure 1, the authors used two [Ca2+]o concentrations to study the EPSC and IPSC amplitudes. How does the Ca2+ concentration affect the PPR in the EPSC and IPSC, respectively? 

      Given that lowering the extracellular Ca2+ concentration reduces the release probability, it is expected that 1 mM extracellular Ca2+ concentration increases PPR compared to 2.5 mM. Actually, we observed that lowering the extracellular Ca2+ concentration increased the synaptic responses from 2nd to 10th (both EPSC and IPSC) by train stimulation (Figure 5).

      (2) In Figure 2D, does baclofen also have a dose-dependent effect on the inhibition of the EPSC and IPSC similar to the DCG-IV in Figure 2C? 

      Thank you for your question. Because we aimed to demonstrate the differential inhibitory effects of baclofen at a certain concentration on glutamatergic and GABAergic co-transmission, we did not go into detail regarding a dose-dependent effect. In response to the reviewer’s comment, we performed the effects of higher concentration of baclofen on EPSCs and IPSCs. As shown in the figure below, 50 µM baclofen inhibited EPSCs and IPSCs to the similar extent. Therefore, by comparing inhibitory effect of two different concentrations of baclofen (5 and 50 µM), we believe that baclofen also has a dose-dependent inhibitory effect on both EPSCs and IPSCs similar to the DCGIV.

      Author response image 1.

      (3) In Figure 2E, statistical labels, such as "*" or "n.s." (not significant), should be provided on the plots to facilitate the reading of figures. 

      In response to the reviewer’s comment, we have provided statistical labels in the Figure 2E.

      (4) In Figure 3A, the latency of the evoked EPSC for the lower light stimulation groups seems to be much slower than the one shown on the left or other figures in the paper, such as Figure 1F.

      Please double-check if the blue light stimulation label is placed in the right location. 

      Corrected, thanks.

      (5) The use of minimal light stimulation in optogenetic experiments is not appropriately justified or described. More detailed information should be provided, such as whether the optogenetic stimulation is performed on the axon or the terminals of the SuM. 

      We appreciate the reviewer’s suggestion. To effectively detect stochastic synaptic responses, the light stimulation was applied on the terminals of the SuM. We have now stated this information (line 212). We also further described the justification of use of minimal light stimulation in the revised manuscript (line 207-209). 

      References

      Fukaya R, Hirai H, Sakamoto H, Hashimotodani Y, Hirose K, Sakaba T (2023) Increased vesicle fusion competence underlies long-term potentiation at hippocampal mossy fiber synapses. Sci Adv 9:eadd3616.

      Hashimotodani Y, Karube F, Yanagawa Y, Fujiyama F, Kano M (2018) Supramammillary Nucleus Afferents to the Dentate Gyrus Co-release Glutamate and GABA and Potentiate Granule Cell Output. Cell Rep 25:2704-2715 e2704.

      Hirai H, Sakaba T, Hashimotodani Y (2022) Subcortical glutamatergic inputs exhibit a Hebbian form of long-term potentiation in the dentate gyrus. Cell Rep 41:111871.

      Hori T, Takamori S (2021) Physiological Perspectives on Molecular Mechanisms and Regulation of Vesicular Glutamate Transport: Lessons From Calyx of Held Synapses. Front Cell Neurosci 15:811892.

      Ntamati NR, Luscher C (2016) VTA Projection Neurons Releasing GABA and Glutamate in the Dentate Gyrus. eNeuro 3.

      Root DH, Zhang S, Barker DJ, Miranda-Barrientos J, Liu B, Wang HL, Morales M (2018) Selective Brain Distribution and Distinctive Synaptic Architecture of Dual Glutamatergic-GABAergic Neurons. Cell Rep 23:3465-3479.

      Tabuchi E, Sakaba T, Hashimotodani Y (2022) Excitatory selective LTP of supra-mammillary glutamatergic/GABAergic co-transmission potentiates dentate granule cell firing. Proc Natl Acad Sci U S A 119:e2119636119.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      The manuscript by Goetz et al. takes a new perspective on sensory information processing in cells. In contrast to previous studies, which have used population data to build a response distribution and which estimate sensory information at about 1 bit, this work defines sensory information at the single cell level. To do so, the authors take two approaches. First, they estimate single cells' response distributions to various input levels from time-series data directly. Second, they infer these single-cell response distributions from the population data by assuming a biochemical model and extracting the cells' parameters with a maximum-entropy approach. In either case, they find, for two experimental examples, that single-cell sensory information is much higher than 1 bit, and that the reduction to 1 bit at the population level is due to the fact that cells' response functions are so different from each other. Finally, the authors identify examples of measurable cell properties that do or do not correlate with single-cell sensory information.

      The work brings an important and distinct new insight to a research direction that generated strong interest about a decade ago: measuring sensory information in cells and understanding why it is so low. The manuscript is clear, the results are compelling, and the conclusions are well supported by the findings. Several contributions should be of interest to the quantitative biology community (e.g., the demonstration that single cells' sensory information is considerably larger than previously implied, and the approach of inferring single-cell data from population data with the help of a model and a maximum-entropy assumption).

      We thank the reviewer for the excellent summary of our research.

      Reviewer #2 (Public Review):

      In this paper the authors present an existing information theoretic framework to assess the ability of single cells to encode external signals sensed through membrane receptors.

      The main point is to distinguish actual noise in the signaling pathway from cell-cell variability, which could be due to differences in their phenotypic state, and to formalize this difference using information theory.

      After correcting for this cellular variability, the authors find that cells may encode more information than one would estimate from ignoring it, which is expected. The authors show this using simple models of different complexities, and also by analyzing an imaging dataset of the IGF/FoxO pathway.

      The implications of the work are limited because the analysed data is not rich enough to draw clear conclusions. Specifically,

      • the authors do not distinguish what could be methodological noise inherent to microscopy techniques (segmentation etc), and actual intrinsic cell state. It's not clear that cell-cell variability in the analyzed dataset is not just a constant offset or normalization factor. Other authors (e.g. Gregor et al Cell 130, 153-164) have re-centered and re-normalized their data before further analysis, which is more or less equivalent to the idea of the conditional information in the sense that it aims to correct for this experimental noise.

      We thank the reviewer for the comment. However, we do not believe our analysis is a consequence of normalization artifacts. Prior to modeling the single cell data, we removed well-dependent background fluorescence. This should take care of technical variation related to overall offsets in the data. We agree with the reviewer that background subtraction may not fully account for technical variability. For example, some of the cell-to-cell variability may potentially be ascribed to issues such as incorrect segmentation. Unfortunately, however, attempting to remove this technical variability through cell-specific normalization as suggested by the reviewer1 will diminish to a very large extent the true biological effects related to extensivity (cell size, total protein abundance). We note that these effects are a direct function of cell state-variables (see for example Cohen-Saidon et al.2 who use cell-state specific normalization to improve signaling fidelity). Therefore, an increase in mutual information after normalization does not only reflect removal of technical noise but also accounts for effect of cell state variables.

      Nonetheless, as the reviewer suggested, we performed a cell-specific normalization wherein the mean nuclear FoxO levels in each cell (in the absence of IGF) were normalized to one. Then, for each ligand concentration, we collated FoxO response across all cells and computed the channel capacity corresponding to cell-state agnostic mutual information ICSA. As expected, ICSA increases from ∼0.9 bits to ∼1.3 bits when cell-specific normalization was performed (Author response image 1). However, this value is significantly lower than the average ∼1.95 of cell-state specific mutual information ⟨ICee⟩. Finally, we note that the cell specific normalization does not change the calculations of channel capacity at the single cell level as these calculations do not depend on linear transformations of the data (centering and normalization). Therefore, we do not think that our analysis of experimental data suffers from artifacts related to microscopy.

      Author response image 1.

      Author response image 1. Left: nuclear FoxO response averaged over all cells in the population across different ligand concentration. Right: nuclear FoxO response was first normalized at the single cell level and then averaged over all cells in the population across different ligand concentrations.

      • in the experiment, each condition is shown only once and sequentially. This means that the reproducibility of the response upon repeated exposures in a single cell was not tested, casting doubt on the estimate of the response fidelity (estimated as the variance over time in a single response).

      The reviewer raises an excellent question about persistence of cell states. To verify that cell states are indeed conserved at the time scale of the experiment, we reanalyzed data generated by Gross et al.3 wherein cells were perturbed with IGF (37.5 pM), followed by a washout which allowed the cells to reach pre-stimulation nuclear FoxO levels, followed by a re-perturbation with the same amount of IGF. Nuclear FoxO response was measured at the single cell level after 90 minutes with IGF exposure both these times. Since the response x to the same input u was measured twice in the same cell (x1 and x2), we could evaluate the intrinsic variability in response at the single cell level. We then compared this intrinsic variability to the extrinsic cell-state dependent variability in the population.

      To do so, we computed for each cell δ=x1-x2 the difference between the two responses. reviewer Figure 2 show the histogram p(δ) as computed from the data (pink) and the same computed from the model that was trained on the single cell data (blue). We also computed p(δ0) which represented the difference between responses of two different cells both from the data and from the model.

      As we see in Author response image 2, the distribution p(δ) is significantly narrower than p(δ0) suggesting that intracellular variability is significantly smaller than across-population variability and that cells’ response to the same stimuli are quite conserved, especially when compared to responses in randomly picked pairs of cells. This shows that cell states and the corresponding response to extracellular perturbations are conserved, at least at the time scale of the experiment. Therefore, our estimates of cell-to-cell variability signaling fidelity are stable and reliable. We have now incorporated this discussion in the manuscript (lines 275-281).

      Author response image 2.

      Author response image 2. Left: Cells were treated with 37.5 pM of IGF for 90 minutes, washed out for 120 minutes and again treated with 37.5 pM of IGF. Nuclear FoxO was measured during the treatment and the washout. The distributions on the left show the difference in FoxO levels in single cells after the two 90 minutes IGF stimulations (pink: data, blue: model). Right: Distribution of difference in FoxO levels in two randomly picked cells after 90 minutes of exposure to 37.5 pM IGF.

      • another dataset on the EGF/EGFR pathway is analyzed, but no conclusion can be drawn from it because single-cell information cannot be directly estimated from it. The authors instead use a maximum-entropy Ansatz, which cannot be validated for lack of data.

      We thank the reviewer for this comment. We agree with the reviewer that we have not verified our predictions for the EGF/EGFR pathway. That study was meant to show the potential generality of our analysis. We look forward to validating our predictions for the EGF/EGFR pathway in future studies.

      Reviewer #3 (Public Review):

      Goetz, Akl and Dixit investigated the heterogeneity in the fidelity of sensing the environment by individual cells in a population using computational modeling and analysis of experimental data for two important and well-studied mammalian signaling pathways: (insulin-like growth factor) IGF/FoxO and (epidermal growth factor) EFG/EFGR mammalian pathways. They quantified this heterogeneity using the conditional mutual information between the input (eg. level of IGF) and output (eg. level of FoxO in the nucleus), conditioned on the "state" variables which characterize the signaling pathway (such as abundances of key proteins, reaction rates, etc.) First, using a toy stochastic model of a receptor-ligand system - which constitutes the first step of both signaling pathways - they constructed the population average of the mutual information conditioned on the number of receptors and maximized over the input distribution and showed that it is always greater than or equal to the usual or "cell state agnostic" channel capacity. They constructed the probability distribution of cell state dependent mutual information for the two pathways, demonstrating agreement with experimental data in the case of the IGF/FoxO pathway using previously published data. Finally, for the IGF/FoxO pathway, they found the joint distribution of the cell state dependent mutual information and two experimentally accessible state variables: the response range of FoxO and total nuclear FoxO level prior to IGF stimulation. In both cases, the data approximately follow the contour lines of the joint distribution. Interestingly, high nuclear FoxO levels, and therefore lower associated noise in the number of output readout molecules, is not correlated with higher cell state dependent mutual information, as one might expect. This paper contributes to the vibrant body of work on information theoretic characterization of biochemical signaling pathways, using the distribution of cell state dependent mutual information as a metric to highlight the importance of heterogeneity in cell populations. The authors suggest that this metric can be used to infer "bottlenecks" in information transfer in signaling networks, where certain cell state variables have a lower joint distribution with the cell state dependent mutual information.

      The utility of a metric based on the conditional mutual information to quantify fidelity of sensing and its heterogeneity (distribution) in a cell population is supported in the comparison with data. Some aspects of the analysis and claims in the main body of the paper and SI need to be clarified and extended.

      1. The authors use their previously published (Ref. 32) maximum-entropy based method to extract the probability distribution of cell state variables, which is needed to construct their main result, namely p_CeeMI (I). The salient features of their method, and how it compares with other similar methods of parameter inference should be summarized in the section with this title. In SI 3.3, the Lagrangian, L, and Rm should be defined.

      We thank the reviewer for the comment and apologize for the omission. We have now rewritten the manuscript to include references to previous reviews of works that infer probability distributions4 of cell state variables (lines 156-168). Notably, as we argued in our previous work5, no current method can efficiently estimate the joint distribution over parameters that is consistent with measured single cell data and models of signaling networks. Therefore, we could not use multiple approaches to infer parameter distributions. We have now expanded our discussion of the method in the supplementary information sections.

      1. Throughout the text, the authors refer to "low" and "high" values of the channel capacity. For example, a value of 1-1.5 bits is claimed to be "low". The authors need to clarify the context in which this value is low: In some physically realistic cases, the signaling network may need to simply distinguish between the present or absence of a ligand, in which case this value would not be low.

      We agree with the reviewer that small values of channel capacities might be sufficient for cells to carry out some tasks, in which case a low channel capacity does not necessarily indicate a network not performing its task. Indeed, how much information is needed for a specific task is a related but distinct question from how much information is provided though a signaling network. Both questions are essential to understand a cell's signaling behavior, with the former being far less easy to answer in a way which is generalizable. In contrast, the latter can be quantitatively answered using the analysis presented in our manuscript.

      1. Related to (2), the authors should comment on why in Fig. 3A, I_Cee=3. Importantly, where does the fact that the network is able to distinguish between 23 ligand levels come from? Is this related to the choice (and binning) of the input ligand distribution (described in the SI)?

      We thank the reviewer for the comment. The network can distinguish between all inputs used in the in silico experiment precisely because the noise at the cellular level is small enough that there is negligible overlap between single cell response distributions. Indeed, the mutual information will not increase with the number of equally spaced inputs in a sub-linear manner, especially when the input number is very high.

      1. The authors should justify the choice of the gamma distribution in a number of cases (eg. distribution of ligand, distribution cell state parameters, such as number of receptors, receptor degradation rate, etc.).

      We thank the reviewer for the comment. We note that previous works in protein abundances and gene expression levels (e.g. see6) have reported distributions with positive skews that can be fit well with gamma distributions or log-normal distributions. Moreover, many stochastic models of protein abundance levels and signaling networks are also known to result in abundances that are distributed according to a negative binomial distribution, the discrete counterpart of gamma distribution. Therefore, we chose Gamma distributions in our study. We have now clarified this point in the Supplementary Information. At the same time, gamma distribution only serves as a regularization for the finite data and in principle, our analysis and conclusion do not depend on choice of gamma distribution for abundances of proteins, ligands, and cell parameters.

      1. Referring to SI Section 2, it is stated that the probability of the response (receptor binding occupancy) conditioned on the input ligand concentration and number of receptors is a Poisson distribution. Indeed this is nicely demonstrated in Fig. S2. Therefore it is the coefficient of variation (std/mean) that decreases with increasing R0, not the noise (which is strictly the standard deviation) as stated in the paper.

      We thank the reviewer of the comment. We have now corrected our text.

      1. In addition to explicitly stating what the input (IGF level) and the output (nuclear GFP-tagged FoxO level) are, it would be helpful if it is also stated what is the vector of state variables, theta, corresponding to the schematic diagram in Fig. 2C.

      We thank the reviewer of the comment. We have now corrected our text in the supplementary material as well as the main text (Figure 2 caption).

      1. Related to Fig. 2C, the statement in the caption: "Phosphorylated Akt leads to phosphorylation of FoxO which effectively shuttles it out of the nucleus." needs clarification: From the figure, it appears that pFoxO does not cross the nuclear membrane, in which case it would be less confusing to say that phosphorylation prevents reentry of FoxO into the nucleus.

      We thank the reviewer of the comment. We have now corrected our text (Figure 2 caption).

      1. The explanations for Fig. 2D, E and insets are sparse and therefore not clear. The authors should expand on what is meant by model and experimental I(theta). What is CC input dose? Also in Fig. 2E, the overlap between the blue and pink histograms means that the value of the blue histogram for the final bin - and therefore agreement or lack thereof with the experimental result - is not visible. Also, the significance of the values 3.25 bits and 3 bits in these plots should be discussed in connection with the input distributions.

      We thank the reviewer of the comment. We have now corrected our text (Figure 2 caption and lines 249-251).

      1. While the joint distribution of the cell state dependent mutual information and various biochemical parameters is given in Fig. S7, there is no explanation of what these results mean, either in the SI or main text. Related to this, while a central claim of the work is that establishing this joint distribution will allow determination of cell state variables that differentiate between high and low fidelity sensing, this claim would be stronger with more discussion of Figs. 3 and S7. The related central claim that cell state dependent mutual information leads to higher fidelity sensing at the population level would be made stronger if it can be demonstrated that in the limit of rapidly varying cell state variables, the I_CSA is retrieved.

      We thank the reviewer for this excellent comment. We have now added more discussion about interpreting the correlation between cell state variables and cell-state specific mutual information (lines 294-306). We also appreciate the suggestion about a toy model calculation to show that dynamics of cell state variables affects cell state specific mutual information. We have now performed a simple calculation to show how dynamics of cell state variables affects cells’ sensing ability (lines 325-363). Specifically, we constructed a model of a receptor binding to the ligand wherein the receptor levels themselves changed over time through a slow process of gene expression (Author response image 3, main text Figure 4). In this model, the timescales of fluctuations of ligand-free receptors on the cell surface can be tuned by speeding up/slowing down the degradation rate of the corresponding mRNA while keeping the total amount of steady state mRNA constant. As shown in Author response image 3, the dependence of cell-specific mutual information on cell state variable diminishes when the time scale of change of cell state variables is fast.

      Author response image 3.

      Author response image 3. Cell state dynamics governs cell state conditioned mutual information. A. In a simple stochastic model, receptor mRNA is produced at a constant rate from the DNA and the translated into ligand-free receptors. The number of ligand-bound receptors after a short exposure to ligands is considered the output. B. A schematic showing dynamics of receptor numbers when mRNA dynamics are slower compared to signaling time scales. C. Conditioning on receptor numbers leads to differing abilities in sensing the environment when the time scale of mRNA dynamics τ is slow. In contrast, when the mRNA dynamics are fast (large τ-1), conditioning on cell state variables does not lead to difference in sensing abilities.

      Reviewer #1 (Recommendations For The Authors):

      My major concerns are mainly conceptual, as described below. With proper attention to these concerns, I feel that this manuscript could be a good candidate for the eLife community.

      Major concerns:

      1. The manuscript convincingly demonstrates that cells good sensors after all, and that heterogeneity makes their input-output functions different from each other. This raises the question of what happens downstream of sensing. For single-celled organisms, where it may be natural to define behavioral consequences at the single-cell level, it may very well be relevant that single-cell information is high, even if cells respond differently to the environment. But for cells in multicellular organisms, like those studied here, I imagine that most behavioral consequences of sensing occur at the multicellular level. Thus, many cells' responses are combined into a larger response. Because their responses are different, their high-information individual responses may combine into a low-information collective response. In fact, one could argue that a decent indicator of the fidelity of this collective response is indeed the population-level information measure estimated in previous works. Thus, a fundamental question that the authors must address is: what is the ultimate utility of reliable, but heterogeneous, responses for a multicellular system? This question has an important bearing for the relevance of their findings.

      We thank the reviewer for this thought-provoking comment. We agree that the fidelity with which cells sense their environment, especially those in multicellular organisms, may not always need to be very high. We speculate that when the biological function of a collection of cells can be expressed as an average over the response of individual cells; high-information but heterogeneous cells can be considered equivalent to low-information homogeneous cells. An example of such a function is population differentiation to maintain relative proportions of different cell types in a tissue or producing a certain amount of extracellular enzyme.

      In contrast, we believe that when the biological function involves collective action, spatial patterning, or temporal memory, the difference between reliable but heterogeneous population and unreliable homogeneous population will become significant. We plan to explore this topic in future studies.

      1. The authors demonstrate that the agreement is good between their inference approach and the direct estimation of response distributions from single-cell time series data. In fact, the agreement is so good that it raises the question of why one would need the inference approach at all. Is it because single-cell time series data is not always available? Is that why the authors used it for one example and not the other? The validation is an asset, but I imagine that the inference approach is complicated and may make assumptions that are not always true. Thus, its utility and appropriate use must be clarified.

      We thank the reviewer for the comment. As the reviewer correctly pointed out, live cell imaging data is not always available and has limited scope. Specifically, optical resolution limits measurements of multiple targets. Moreover, typical live cell measurements measure total abundance or localization and not post-translational modification (phosphorylation, methylation, etc.) which are crucial to signaling dynamics. The most readily available single cell data such those measured using single cell RNA sequencing, immunofluorescence, or flow cytometry are necessarily snapshots. Therefore, computational models that can connect underlying signaling networks to snapshot data become essential when imputing single cell trajectories. In addition, the modeling also allows us to identify network parameters that correlate most strongly with cellular heterogeneity. We have now clarified this point in the manuscript (lines 366-380).

      Minor comments:

      1. I would point out that the maximum values in the single-cell mutual information distributions (Fig 2D and E) correspond to log2 of the number of inputs levels, corresponding to perfect distinguishability of each of the equally-weighted input states. It is clear that many of the mutual information values cluster toward this maximum, and it would help readers to point out why.

      We thank the reviewer for the comment. We have now included a discussion about the skew in the distribution in the text (lines 251-260).

      1. Line 216 references Fig 2C for the EGF/EGFR pathway, but Fig 2C shows the FoxO pathway. In fact, I did not see a schematic of the EGF/EGFR pathway. It may be helpful to include one, and for completeness perhaps also one for the toy model, and organize the figures accordingly.

      We thank the reviewer for the comment. We did not include three separate schematics because the schematics of the EGF/EGFR model and the toy model are subsets of the schematic of the IGF/FoxO model. We have now clarified this point in the manuscript (Figure 2 caption).

      Reviewer #2 (Recommendations For The Authors):

      • the simple model of Fig. 2A would gain from a small cartoon explaining the model and its parameters.

      We thank the reviewer for the comment. We did not include a schematic for the toy model as it is a subset of the schematic of the IGF/FoxO model. The schematic of the toy model is included in the supplementary information.

      • L should be called u, and B should be called x, to be consistent with the rest of the notations in the paper.

      We have decided to keep the notation originally presented in the manuscript.

      • legend of 2E and D should be clarified. "CC input dose" is cryptic. The x axis is the input dose, the y axis is its distribution at the argmax of I. CC is the max of I, not its argmax. Likewise "I" in the legend for the colors should not be used to describe the insets, which are input distributions.

      We have now changed this in the manuscript.

      • the data analysis of the IGF/FoxO pathway should be explained in the main text, not the SI. Otherwise it's impossible to understand how one arrives at, or how to intepret, figure 2E, which is central to the paper. For instance the fact that p(x|u,theta) is assumed to be Gaussian, and how the variance and mean are estimated from the actual data is very important to understand the significance of the results.

      While we have added more details in the manuscript in various places, for the sake of brevity and clarity, we have decided to keep the details of the calculations in the supplementary materials.

      • there's no Method's section. Most of the paper's theoretical work is hidden in the SI, while it should be described in the methods.

      We thank the review of the comment. However, we believe that adding a methods section will break the narrative of the paper. The methods are described in detail in the supplementary materials with sufficient detail to reproduce our results. Additionally, we also provide a link to the github page that has all scripts related to the manuscript.

      PS: please submit a PDF of the SI for review, so that people can read it on any platform (as opposed to a word document, especially with equations)

      We have now done this.

      Reviewer #3 (Recommendations For The Authors):

      1. Subplots in Fig. 1, inset in Fig. 3 are not legible due to small font.

      We have now increased the font.

      1. Mean absolute error in Fig. S5 and relative error in related text should be clarified.

      We have now clarified this in the manuscript.

      1. Acronyms (MACO, MERIDIAN) should be defined.

      We have now made these changes.

      References

      1. Gregor T, Tank DW, Wieschaus EF, Bialek W. Probing the limits to positional information. Cell. 2007;130(1):153-64. doi: 10.1016/j.cell.2007.05.025. PubMed PMID: WOS:000248587000018.

      2. Cohen-Saidon C, Cohen AA, Sigal A, Liron Y, Alon U. Dynamics and Variability of ERK2 Response to EGF in Individual Living Cells. Mol Cell. 2009;36(5):885-93. doi: 10.1016/j.molcel.2009.11.025. PubMed PMID: WOS:000272965400020.

      3. Gross SM, Dane MA, Bucher E, Heiser LM. Individual Cells Can Resolve Variations in Stimulus Intensity along the IGF-PI3K-AKT Signaling Axis. Cell Syst. 2019;9(6):580-8 e4.

      4. Loos C H, J. Mathematical modeling of variability in intracellular signaling. Current Opinion in Systems Biology. 2019;16:17-24.

      5. Dixit PD, Lyashenko E, Niepel M, Vitkup D. Maximum Entropy Framework for Predictive Inference of Cell Population Heterogeneity and Responses in Signaling Networks. Cell Syst. 2020;10(2):204-12 e8.

      6. Taniguchi Y, Choi PJ, Li GW, Chen H, Babu M, Hearn J, Emili A, Xie XS. Quantifying E. coli proteome and transcriptome with single-molecule sensitivity in single cells. Science. 2010;329(5991):533-8. doi: 10.1126/science.1188308. PubMed PMID: 20671182; PMCID: PMC2922915.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors attempted to dissect the function of a long non-coding RNA, lnc-FANCI-2, in cervical cancer. They profiled lnc-FANCI-2 in different cell lines and tissues, generated knockout cell lines, and characterized the gene using multiple assays.

      Strengths:

      A large body of experimental data has been presented and can serve as a useful resource for the scientific community, including transcriptomics and proteomics datasets. The reported results also span different parts of the regulatory network and open up multiple avenues for future research.

      Thanks for your positive comments on the strengths.

      Weaknesses:

      The write-up is somewhat unfocused and lacks deep mechanistic insights in some places.

      As the lnc-FANCI-2 as a novel lncRNA had never been explored for any functional study, our report found that it regulates RAS signaling. Thus, this report focuses on lnc-FANCI-2 and RAS signaling pathway but also includes some important screening data, which are important for our readers to understand how we could reach the RAS signaling.

      Reviewer #2 (Public review):

      The study by Liu et al provides a functional analysis of lnc-FANCI-2 in cervical carcinogenesis, building on their previous discovery of FANCI-2 being upregulated in cervical cancer by HPV E7.

      The authors conducted a comprehensive investigation by knocking out (KO) FANCI-2 in CaSki cells and assessing viral gene expression, cellular morphology, altered protein expression and secretion, altered RNA expression through RNA sequencing (verification of which by RT-PCR is well appreciated), protein binding, etc. Verification experiments by RT-PCR, Western blot, etc are notable strengths of the study.

      The KO and KD were related to increased Ras signaling and EMT and reduced IFN-y/a responses.

      Thanks for your positive comments. It did take us a few years to reach this scientific point for understanding of lnc-FANCI-2 function.

      Although the large amount of data is well acknowledged, it is a limitation that most data come from CaSki cells, in which FANCI-2 localization is different from SiHa cells and cancer tissues (Figure 1). The cytoplasmic versus nuclear localization is somewhat puzzling.

      Regarding lnc-FANCI-2 localization, it could be both cytoplasmic and nuclear in cervical cancer tissues, HPV16 or HPV18 infected keratinocytes, and HPV16+ cervical cancer cell line CaSki cells which contain multiple integrated HPV16 DNA copies. But surprisingly, it is most detectable in the nucleus in HPV16+ SiHa cells which contain only one copy of integrated HPV16 DNA (Yu, L., et al. mBio 15: e00729-24, 2024). No matter what, knockdown of lnc-FANCI-2 expression from SiHa cells induces RAS signaling leading to an increase in the expression of p-AKT and p-Erk1/2 (suppl. Fig. S6B).

      Reviewer #3 (Public review):

      Summary:

      A long noncoding RNA, lnc-FANCI-2, was reported to be regulated by HPV E7 oncoprotein and a cell transcription factor, YY1 by this group. The current study focuses on the function of lnc-FANCI-2 in HPV-16 positive cervical cancer is to intrinsically regulate RAS signaling, thereby facilitating our further understanding of additional cellular alterations during HPV oncogenesis. The authors used advanced technical approaches such as KO, transcriptome and (IRPCRP) and LC- MS/MS analyses in the current study and concluded that KO Inc-FANCI-2 significantly increases RAS signaling, especially phosphorylation of Akt and Erk1/2.

      Strengths:

      (1) HPV E6E7 are required for full immortalization and maintenance of the malignant phenotype of cervical cancer, but they are NOT sufficient for full transformation and tumorigenesis. This study helps further understanding of other cellular alterations in HPV oncogenesis.

      (2) lnc-FANCI-2 is upregulated in cervical lesion progression from CIN1, CIN2-3 to cervical cancer, cancer cell lines, and HPV transduced cell lines.

      (3) Viral E7 of high-risk HPVs and host transcription factor YY1 are two major factors promoting lnc-FANCI-2 expression.

      (4) Proteomic profiling of cytosolic and secreted proteins showed inhibition of MCAM, PODXL2, and ECM1 and increased levels of ADAM8 and TIMP2 in KO cells.

      (5) RNA-seq analyses revealed that KO cells exhibited significantly increased RAS signaling but decreased IFN pathways.

      (6) Increased phosphorylated Akt and Erk1/2, IGFBP3, MCAM, VIM, and CCND2 (cyclin D2) and decreased RAC3 were observed in KO cells.

      Thanks for your positive comments. It has taken us almost nine years to reach this point to gradually understand lnc-FANCI-2 functions, which are more complex than our initial thoughts.  

      Weaknesses:

      (1) The authors observed the increased Inc-FANCI-2 in HPV 16 and 18 transduced cells, and other cervical cancer tissues as well, HPV-18 positive HeLa cells exhibited different expressions of Inc-FANCI-2.

      Both HPV16 and HPV18 infections induce lnc-FANCI-2 expression in keratinocytes (Liu H., et al. PNAS, 2021). However, HPV18+ cervical cancer cell lines HeLa and C4II cells (Figure S1A and S1B) do not express lnc-FANCI-2 as we see in HPV-negative cell lines such as HCT116, HEK293, HaCaT, and BCBL1 cells. Although we don’t know why, our preliminary data show that the lnc-FANCI-2 promoter functions well and is sensitive to YY1 binding in lnc-FANCI-2 expressing CaSki and C33A cells in our dual luciferase assays but is much less sensitive to YY1 binding in HeLa and HCT116 cells, indicating some unknown cellular factors negatively regulating lnc-FANCI-2 promoter activity.

      Author response image 1.

      A firefly luciferase (FLuc) reporter containing either the wild-type (−600 wt) or YY1-binding-site-mutated lnc-FANCI-2 promoter was evaluated in CaSki, HeLa, C33A, and HCT116 cells for its promoter activity, with Renilla luciferase (RLuc) activity driven by a TK promoter serving as an internal control. The two YY1-binding motifs (A and B) with a X for mutation are illustrated in the right diagram.

      (2) Previous studies and data in the current showed a steadily increased Inc-FANCI-2 during cancer progression, however, the authors did not observe significant changes in cell behaviors (both morphology and proliferation) in KO Inc-FANCI-2.

      Thanks. We do see decreases in cell proliferation, colony formation, and cell migration, accompanied by increased cell senescence, from the lnc-FANCI-2 KO cells to the parent WT cells.  These data are now added to the revised Fig. 1 and the revised supplemental Fig. S3.

      (3) The authors observed the significant changes of RAS signaling (downstream) in KO cells, but they provided limited interpretations of how these results contributed to full transformation or tumorigenesis in HPV-positive cancer.

      As we stated in the title of this function of lnc-FANCI-2, the lnc-FANCI-2 intrinsically restricts RAS signaling and phosphorylation of Akt and Erk in HPV16-infected cervical cancer. Presumably, high RAS-AKT-ERK signaling inhibits tumor cell survival due to senescence induction as we show in our new Figure 1 and supplemental Fig. S3. A similar report was found in a lung cancer study (Patricia Nieto, et al. Nature 548: 239-243, 2017).

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Major comments:

      (1) A major issue is that parts of the manuscript read like a collection of experimental results. However, some of the results do not contribute directly to the central story. Besides confusing the reader, the large amount of apparently disparate results can raise more questions. For example:

      a) Why is lnc-FANCI-2 highly expressed in HPV16-infected cervical cancer cell lines (but not in HPV18-infected cells)?

      b) How do p53 and RB repress the expression of lnc-FANCI-2?

      c) What regulates the sub-cellular localization of lnc-FANCI-2?

      d) How does lnc-FANCI-2 negatively regulate RAS signalling?

      e) How does MAP4K4 bind to lnc-FANCI-2?

      f) Do lnc-FANCI-2 and MAP4K4 require each other to regulate RAS signalling?

      g) How does RAS signalling regulate the transcription of MCAM and IGFBP3?

      h) How does MCAM feedback on RAS? Do the different MCAM isoforms impact on RAS signalling differently?

      i) How does IGFBP3 feedback on ERK but not AKT?

      j) How do the other mentioned proteins like ADAM8 fit into the regulatory network?

      k) Each question will require a lot more work to address. I think it would be good if the authors could think through carefully what the key message(s) in the current manuscript should be and then present a more focused write-up.

      Thanks for the critical comments. Because this study is the first time to explore lnc-FANCI-2 functions, we would like to be collective. We believe these data are important to guide any future studies. We really appreciate our reviewer listing many questions related to HPV infection, cell biology, RAS signaling, cancer biology from questions a to k. To address each question in a satisfactory way will be a separate study, but fortunately, our report has pointed out such a direction with some preliminary data for future studies. Here below are our responses to each question from a to k:

      a) Both HPV16 and HPV18 infection induce lnc-FANCI-2 expression in keratinocytes (Liu H., et al. PNAS, 2021). However, HPV18+ cervical cancer cell lines HeLa and C4II cells (Figure S1A and S1B) do not express lnc-FANCI-2 as we see in HPV-negative cell lines such as HCT116, HEK293, HaCaT, and BCBL1 cells. Although we don’t know why, our preliminary data show that lnc-FANCI-2 promoter functions well and is sensitive to YY1 binding in lnc-FANCI-2 expressing CaSki and C33A cells but is much less sensitive to YY1 in HeLa and HCT116 cells, indicating some unknown cellular factors negatively regulating lnc-FANCI-2 promoter activity.

      b) We don’t know whether p53 and pRB could repress the expression of lnc-FANCI-2 although C33A cells bearing a mutant p53 and mutant pRB express high amount of lnc-FANCI-2. However, KD of E2F1 had no effect on lnc-FANCI-2 promoter activity in CaSki cells (Liu, H., et al. PNAS, 2021).

      c) RNA cellular localization can be affected by many factors, including splicing, export, and polyadenylation. As lnc-FANCI-2 is a long non-coding RNA, its regulation of cellular location could be more complicated than mRNAs and thus could be a future research direction.  

      d) The conclusion that lnc-FANCI-2 negatively regulates RAS signaling is based on both lnc-FANCI-2 KO and KD studies.  Please see the proposed hypothetic model in Figure 8E.

      e) The MAP4K4 binding to lnc-FANCI-2 was demonstrated by our IRPCRP-Mass spectrometry (Fig. 8A and 8C), although the exact binding site on lnc-FANCI-2 was not explored. As you probably know, many enzymes today turn out an RNA-binding enzyme (Castello A., et al. Trends Endocrinol. Metab. 26: 746-757, 2015; Hentze MW., et al. Nat. Rev. Mol. Cell Biol. 19: 327-341, 2018)    

      f) Yes, they are slightly relied on each other in regulating RAS signaling. We found that KD of MAP4K4 in parent CaSki cells (Figure 8D) led to more effect on RAS signaling (MCAM, IGFBP3, p-Akt) than that in lnc-FANCI-2 KO ΔPr-A9 cells. In contrast, the latter displayed more p-Erk1/2 than that induced by KD of lnc-FANCI-2 in the parental CaSki cells (Figure S7C).

      g) We believe RAS signaling regulates most likely the transcription of MCAM and IGFBP3 through phosphorylated transcription factors (Figure 8E diagram).

      h) As a signal molecule with at least 13 ligands/coreceptors (Joshkon A., et al. Biomedicines 8: 633, 2020), the increased MCAM appears to sustain RAS signaling (Fig. 7J and Fig. 8E). We are assuming the full-length cytoplasmic MCAM plays a predominant role in RAS signaling due to its abundance than the cleaved nuclear MCAM missing both transmembrane and cytoplasmic regions. Plus, RAS signaling mainly occurs in the cytosol.  

      i) Exact mechanism remains unknown. Lnc-FANCI-2 KO cells exhibit high expression levels of IGFBP3 RNA and protein and p-Erk1/2, but not so much for p-Akt, possibly due to IGFBP3 regulation of MAPK for Erk phosphorylation, but not much so on PI3K for Akt phosphorylation.

      j) The dysregulation of RAS signaling and ADAM protein activity is implicated in various cancers. ADAM proteins can modulate RAS signaling by cleaving and releasing ligands that activate or inactivate RAS-related pathways (Schafer B., et al. JBC 279: 47929-38, 2004; Ohtsu H., et al. Am J Physiol Cell Physiol 291: C1-C10, 2006; Dang M, et al. JBC 286: 17704-17713, 2011; Kleino I, et al. PLoS One 10: e0121301, 2015). Some ADAM proteins are Involved in the migration and invasion of cancer cells, and its loss can promote the degradation of KRAS (Huang Y-K., et al. Nat Cancer 5: 400-419, 2024). In this revision, we have a brief discussion on ADAMs and RAS signaling.

      k) We agree with our reviewer that each question will require a lot more work to address. As this study is to explore the lnc-FANCI-2 function for the first time, however, we prefer to include all of these data that have been selectively included in this write-up. We hope reviewer 1 will be satisfied with our response to each question from a to j. 

      (2) Figures S1A & S1C - Replicates are needed.

      Yes, we have repeated all of the experiments. The quantification shown in Figure S1A and S1C was performed in triplicate, and error bars have been added to the updated figure.

      3) Figure S1D - There seems to be some lnc-FANCI-2 RNA in the nucleus of CaSki cells as well. Please quantify the relative amount of lnc-FANCI-2 in the nucleus vs cytoplasm.

      Yes, a small fraction of lnc-FANCI-2 is in the nucleus of CaSki cells as we reported (Liu H., PNAS, 2021, Movies S1 and S2). We did quantify by fractionation and RT-qPCR the relative amount of lnc-FANCI-2 in the nucleus vs cytoplasm in Figure S1C. 

      (4) Figure S2B - (a) For ΔPr-A9 cells, it looks like there is an increase in E6 and a decrease in E7, instead of "little change" as the authors claimed. (b) I suggest checking the protein levels for all the control and KO clones.

      Thanks for the questions. We had some variation in E6 and E7 detection and the submitted one was one representative.  We grew again the lnc-FANCI-2 KO clones A9 and B3 and reexamined the expression of HPV16 E6/E7 proteins and their downstream targets, p53 and E2F1. As shown in new Figure S3A expt II, we saw again some variations in the detections (~20-30%) and these variations do not reflect a noticeable change for their downstream targets. Thus, we do not consider these changes significantly enough to draw a conclusion in our study, but rather most likely from sampling in the assays.

      (5) In the Proteome Profiler Human sReceptor Array analysis, multiple proteins were highlighted as having at least 30% change. But it is unclear how they relate to RAS signaling.

      Thanks for this comment.  Cellular soluble receptors are essential for RAS signaling, EMT pathway and IFN responses. For example, the dysregulation of RAS signaling and ADAM protein activity is implicated in various cancers. ADAM proteins can modulate RAS signaling by cleaving and releasing ligands that activate or inactivate RAS-related pathways (Schafer B., et al. JBC 279: 47929-38, 2004; Ohtsu H., et al. Am J Physiol Cell Physiol 291: C1-C10, 2006; Dang M, et al. JBC 286: 17704-17713, 2011; Kleino I, et al. PLoS One 10: e0121301, 2015). Some ADAM proteins are Involved in the migration and invasion of cancer cells, and its loss can promote the degradation of KRAS (Huang Y-K., et al. Nat Cancer 5: 400-419, 2024). In this revision, we have a brief discussion on ADAMs and RAS signaling.

      (6) Does knockdown of MAP4K4 lead to an increase in MCAM and IGFBP3?

      Yes, the MAP4K4 KD from parental WT CaSki cells does lead an increase in MCAM (~70%) and IGFBP3 (~30%) which is like the knockdown of lnc-FANCI-2 shown in the revised Figure 8D.

      Minor comments:

      (7) In the opinion of this reviewer the title is somewhat unwieldy.

      Thanks. We have shortened the title as “The lnc-FANCI-2 intrinsically restricts RAS signaling in HPV16-infected cervical cancer”

      (8) The abstract can be more focused and doesn't have to mention so many gene names. In fact, the significance paragraph works better as an abstract. For the significance, the authors can provide another write-up on the implications of their research instead.

      Thanks. We have revised the abstract and added the implications of this research.

      (9) The last sentence of the introduction feels a little abrupt. It would be good to elaborate a little more on the key findings.

      Thanks for this critical comment. We have revised as in the following: In this report, we demonstrate that lnc-FANCI-2 in HPV16-infected cells controls RAS signaling by interaction with MAP4K4 and other RNA-binding proteins. Ablation of lnc-FANCI-2 in the cells promotes RAS signaling and phosphorylation of Akt and Erk. High levels of lnc-FANCI-2 and low level of MCAM expression in cervical cancer patients correlate with improved survival, indicating that lnc-FANCI-2 plays a critical role in regulating RAS signaling to affect cervical cancer progression and patient outcomes.

      (10) Typo on line 191: Should be ADAM8 and not ADMA8.

      Corrected.

      Reviewer #2 (Recommendations for the authors):

      The paper contains a vast amount of data and would greatly benefit from an expanded version of the schematic of Figure 8E summarizing the main results. Including additional details on FANCI-2 regulation by HPV (primarily from previous studies) and its implications for HPV16-driven carcinogenesis would provide a more comprehensive overview.

      Thanks for the suggestion. We have modified our Figure 8E to include HR-HPV E7 and YY1 in regulation of lnc-FANCI-2 transcription.

      Further specific comments:

      (1) The introduction may be shortened to increase readability (e.g. lines 77-90; 94-105).

      We have shortened the introduction by deletion of the lines 94-105 from our initial submission.

      (2) Lines 55-57 the number of cervical cancer diagnoses and mortality need to be updated to the latest literature. The reference is from 2012.

      Thanks. We have revised and updated accordingly with a new citation (Bray F., et al: Global cancer statistics 2022: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin 74, 229-263 (2024))

      (3) Line 61: Progression rate of CIN3 is incorrect (31% in 30 years according to reference 5).

      Thanks. Corrected.

      (4) Lines 108-112 are difficult to understand and should be rewritten.

      Thanks. Revised accordingly.

      (5) Line 116 Is this correct or should 'but' be 'and'?

      Thanks. Corrected accordingly.

      (6) Figure 1A top: The difference between cervical cancer and normal areas is hard to see in the top figure. The region labeled as "normal" does not resemble typical differentiating epithelium or normal glandular epithelium, though this is difficult to assess accurately from the image provided. I suggest adding HE staining and also the histotypes.

      We have added an H&E staining panel in the corresponding region to Figure 1A, which clearly shows the normal and cancer regions. Both cervical cancer tissues were cervical squamous cell carcinoma.

      (7) HFK-HPV16 & 18 cells (Figure 1B) are not described in the Materials & Methods.

      Thanks. We revised our Materials and Methods by citing our two previous publications.

      (8) Figure 2E (RNA scope on FANCI-2 KO) only shows 2 to 3 cells, which makes it somewhat difficult to assess downregulated expression in the KO. I suggest replacing these with pictures showing more cells (i.e. >10) to strengthen the results.

      We have replaced the image in Figure 2E to include more cells.

      (9) The spindle-like morphology in deltaPr-A9 cells shown in FigS2A is not very distinct. Including images at higher magnification could help clarify this feature.

      Good comment. We have enlarged the images for better view and revised the context.

      (10) Both protein and RNA expression analysis have been performed on WT CaSki cells and FANCI-2 KO cells. If I am correct there is little overlap between the significantly changed gene products. What does this mean? Have you looked into the comparison?

      The DEGs identified from RNA-seq indicated a genome wide transcriptome change, while the protein array we used only covered 105 soluble protein receptors. However, we did find 9/15 (60%) membrane proteins in cell lysates (PODXL2, ECM1, NECTIN2, MCAM, ADAM9, CDH5, ADAM10, ITGA5, NOTCH1, SCARF2, ADAM8, TIMP2, LGALS3BP, CDH13, and ITGB6) exhibited consistent changes in expression (underlined) by both RNA-seq and protein array assays. We have revised the text with this information (page 11). Other six proteins (40%) had inconsistent expression correlation in two assays could be due to post-translational mechanisms, such as protein stability, modifications and secretion, etc.  

      (11) Figure S7, which represents TCGA data and survival is quite complex. It would be more effective to display a similar figure for FANCI-2, as was done for MCAM in Figure 7I, to simplify the comparison and enhance clarity.

      Thanks. However, the suggested figure for lnc-FANCI-2 was published in PNAS paper already (Liu H., et al. PNAS, 2021).  The Figure S8 in this revision is the result from our in-house GradientScanSurv pipeline, a new way to correlate the expression and survival more accurately.

      What do the Figures look like if you analyse only HPV16+ patients versus HPV18+ patients, considering that FANCI-2 upregulation in cell lines is related to HPV16 and not 18? Is there an effect of histotype? Or tumor stage?

      HPV18 infected keratinocytes express high level of lnc-FANCI-2. Two HPV18<sup>+</sup> HeLa and C4II cell lines and HPV-negative cell lines, such as HCT116 cells, which do not express lnc-FANCI-2 could be due to the presence of some unknow repressive factors. We found that lnc-FANCI-2 promoter functions well in responding to YY1 binding in CaSki and C33A cells expressing lnc-FANCI-2 but does not so in HeLa and HCT116 cells in our dual luciferase assays. 

      (12) It remains puzzling that FANCI-2 upregulation was previously shown to already occur in CIN lesions and increase further in cervical cancer, while the current data indicate that FANCI-2 suppresses AKT activation. If I am correct Akt activation has been linked to cervical carcinogenesis. Similarly, line 434 states that increased MCAM might promote cervical tumorigenesis, implying that low FANCI-2 would stimulate tumorigenesis. If I understand correctly, the increase in FANCI-2 observed in CIN lesions would reflect a "brake" on the carcinogenic pathway and its sustained increase in cancer might indicate that growth is still (partly) controlled. As mentioned earlier, a Figure illustrating the relation between FANCI-2, HPV, and the carcinogenic process would be beneficial for clarity.

      Yes. Increased MCAM, but low level of lnc-FANCI-2, correlates with poor cervical cancer survival. We have revised Figure 8E to illustrate this relation better.  

      (13) May part of the potentially conflicting findings be explained by CaSki cells being of metastatic origin? Related to this, does the expression of FANCI-2 or MALM depend on the tumor stage?

      Thanks for this important suggestion. Unfortunately, we found that the expression of lnc-FANCI-2 and MCAM is not associated with cervical cancer stage based on the TCGA data (http://gepia.cancer-pku.cn/index.html). See the data below:

      Author response image 2.

      Despite some lingering uncertainty, the extensive experiments conducted using KO and KD cells do provide compelling evidence that lnc-FANCI-2 function is linked to RAS signaling and EMT.

      Thanks for your positive review and instructive comments.

      Reviewer #3 (Recommendations for the authors):

      (1) The authors observed the increased Inc-FANCI-2 in HPV 16 and 18 transduced cells, and other cervical cancer tissues as well, HPV-18 positive HeLa cells exhibited different expressions of Inc-FANCI-2. I suggest authors provide more discussions on this difference, for example, HPV genotypes. HPV genome status in host cells? Cell types?

      Thanks. We found the keratinocyte infections with HPV16, HPV18, and other HR-HPVs could induce lnc-FANCI-2 expression (Liu H., et al. PNAS, 2021). In this report, we found HPV18<sup>+</sup> HeLa and C4II cells and other HPV-negative cell lines do not. Our preliminary data on lnc-FANCI-2 promoter activity assays showed the presence of a negative regulatory factor (s) in non-lnc-FANCI-2 expressing cells. See the data in Author response image 1.

      We have revised our discussion by inclusion these sets of the luciferase data as data not shown.

      (2) I suggest the authors discuss more details on how the changes of RAS signaling in KO cells help our further understanding of the molecular mechanisms for HPV-associated full-cell transformation and malignancy in addition to the well-known functions of HPV E6 and E7.

      Thanks. We have modified the Figure 8E as suggested by reviewer 2 and revised the discussion further.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      Summary:

      This paper performs fine-mapping of the silkworm mutants bd and its fertile allelic version, bdf, narrowing down the causal intervals to a small interval of a handful of genes. In this region, the gene orthologous to mamo is impaired by a large indel, and its function is later confirmed using expression profiling, RNAi, and CRISPR KO. All these experiments are convincingly showing that mamo is necessary for the suppression of melanic pigmentation in the silkworm larval integument. The authors also use in silico and in vitro assays to probe the potential effector genes that mamo may regulate. Strengths: The genotype-to-phenotype workflow, combining forward (mapping) and reverse genetics (RNAi and CRISPR loss-of-function assays) linking mamo to pigmentation are extremely convincing.

      Response: Thank you very much for your affirmation of our work. The reviewer discussed the parts of our manuscript that involve evolution sentence by sentence. We have further refined the description in this regard and improved the logical flow. Thank you again for your help.

      Weaknesses:

      1) The last section of the results, entitled "Downstream target gene analysis" is primarily based on in silico genome-wide binding motif predictions.

      While the authors identify a potential binding site using EMSA, it is unclear how much this general approach over-predicted potential targets. While I think this work is interesting, its potential caveats are not mentioned. In fact the Discussion section seems to trust the high number of target genes as a reliable result. Specifically, the authors correctly say: "even if there are some transcription factor-binding sites in a gene, the gene is not necessarily regulated by these factors in a specific tissue and period", but then propose a biological explanation that not all binding sites are relevant to expression control. This makes a radical short-cut that predicted binding sites are actual in vivo binding sites. This may not be true, as I'd expect that only a subset of binding motifs predicted by Positional Weight Matrices (PWM) are real in vivo binding sites with a ChIP-seq or Cut-and-Run signal. This is particularly problematic for PWM that feature only 5-nt signature motifs, as inferred here for mamo-S and mamo-L, simply because we can expect many predicted sites by chance.

      Response: Thank you very much for your careful work. The analysis and identification of transcription factor-binding sites is an important issue in gene regulation research. Techniques such as ChIP-seq can be used to experimentally identify the binding sites of transcription factors (TFs). However, reports using these techniques often only detect specific cell types and developmental stages, resulting in a limited number of downstream target genes for some TFs. Interestingly, TFs may regulate different downstream target genes in different cell types and developmental stages.

      Previous research has suggested that the ZF-DNA binding interface can be understood as a “canonical binding model”, in which each finger contacts DNA in an antiparallel manner. The binding sequence of the C2H2-ZF motif is determined by the amino acid residue sequence of its α-helical component. Considering the first amino acid residue in the α-helical region of the C2H2-ZF domain as position 1, positions -1, 2, 3, and 6 are key amino acids for recognizing and binding DNA. The residues at positions -1, 3, and 6 specifically interact with base 3, base 2, and base 1 of the DNA sense sequence, respectively, while the residue at position 2 interacts with the complementary DNA strand (Wolfe SA et al., 2000; Pabo CO et al., 2001). Based on this principle, the binding sites of C2H2-ZF have good reference value. For the 5-nt PWM sequence, we referred to the study of D. melanogaster, which was identified by EMSA (Shoichi Nakamura et al., 2019). In the new version, we have rewritten this section.

      Pabo CO, Peisach E, Grant RA. Design and selection of novel Cys2His2 zinc finger proteins. Annu Rev Biochem. 2001;70:313-340.

      Wolfe SA, Nekludova L, Pabo CO. DNA recognition by Cys2His2 zinc finger proteins. Annu Rev Biophys Biomol Struct. 2000;29:183-212.

      Nakamura S, Hira S, Fujiwara M, et al. A truncated form of a transcription factor Mamo activates vasa in Drosophila embryos. Commun Biol. 2019;2:422. Published 2019 Nov 20.

      2) The last part of the current discussion ("Notably, the industrial melanism event, in a short period of several decades ... a more advanced self-regulation program") is flawed with important logical shortcuts that assign "agency" to the evolutionary process. For instance, this section conveys the idea that phenotypically relevant mutations may not be random. I believe some of this is due to translation issues in English, as I understand that the authors want to express the idea that some parts of the genome are paths of least resistance for evolutionary change (e.g. the regulatory regions of developmental regulators are likely to articulate morphological change). But the language and tone is made worst by the mention that in another system, a mechanism involving photoreception drives adaptive plasticity, making it sound like the authors want to make a Lamarckian argument here (inheritance of acquired characteristics), or a point about orthogenesis (e.g. the idea that the environment may guide non-random mutations).

      Because this last part of the current discussion suffers from confused statements on modes and tempo of regulatory evolution and is rather out of topic, I would suggest removing it.

      In any case, it is important to highlight here that while this manuscript is an excellent genotype-to-phenotype study, it has very few comparative insights on the evolutionary process. The finding that mamo is a pattern or pigment regulatory factor is interesting and will deserve many more studies to decipher the full evolutionary study behind this Gene Regulatory Network.

      Response: Thank you very much for your careful work. In this part of the manuscript, we introduced some assumptions that make the statement slightly unconventional. The color pattern of insects is an adaptive trait. The bd and bdf mutants used in the study are formed spontaneously. As a frequent variation and readily observable phenotype, color patterns have been used as models for evolutionary research (Wittkopp PJ et al., 2011). Darwin's theory of natural selection has epoch-making significance. I deeply believe in the theory that species strive to evolve through natural selection. However, with the development of molecular genetics, Darwinism’s theory of undirected random mutations and slow accumulation of micromutations resulting in phenotype evolution has been increasingly challenged.

      The prerequisite for undirected random mutations and micromutations is excessive reproduction to generate a sufficiently large population. A sufficiently large population can contain sufficient genotypes to face various survival challenges. However, it is difficult to explain how some small groups and species with relatively low fertility rates have survived thus far. More importantly, the theory cannot explain the currently observed genomic mutation bias. In scientific research, every theory is constantly being modified to adapt to current discoveries. The most famous example is the debate over whether light is a particle or a wave, which has lasted for hundreds of years. However, in the 20th century, both sides seemed to compromise with each other, believing that light has a wave‒particle duality.

      In summary, we have rewritten this section to reduce unnecessary assumptions.

      Wittkopp PJ, Kalay G. Cis-regulatory elements: molecular mechanisms and evolutionary processes underlying divergence. Nat Rev Genet. 2011;13(1):59-69.

      Minor Comment:

      The gene models presented in Figure 1 are obsolete, as there are more recent annotations of the Bm-mamo gene that feature more complete intron-exon structures, including for the neighboring genes in the bd/bdf intervals. It remains true that the mamo locus encodes two protein isoforms.

      An example of the Bm-mamo locus annotation, can be found at: https://www.ncbi.nlm.nih.gov/gene/101738295 RNAseq expression tracks (including from larval epidermis) can be displayed in the embedded genome browser from the link above using the "Configure Tracks" tool.

      Based on these more recent annotations, I would say that most of the work on the two isoforms remains valid, but FigS2, and particularly Fig.S2C, need to be revised.

      Response: Thank you very much for your careful work. In this study, we referred to the predicted genes of SilkDB, NCBI and Silkbase. In different databases, there are varying degrees of differences in the number of predicted genes and the length of gene mRNA. Because the SilkDB database is based on the first silkworm genome, it has been used for the longest time and has a relatively large number of users. In the revised manuscript, we have added the predicted genes of NCBI and Silkbase in Figure S1.

      Author response image 1.

      The predicted genes and qPCR analysis of candidate genes in the responsible genomic region for bd mutant. (A) The predicted genes in SilkDB;(B) the predicted genes in Genbak;(C) the predicted genes in Silkbase;(D) analysis of nucleotide differences in the responsible region of bd;(E) investigation of the expression level of candidate genes.

      Reviewer #2 (Public Review):

      Summary:

      The authors tried to identify new genes involved in melanin metabolism and its spatial distribution in the silkworm Bombyx mori. They identified the gene Bm-mamo as playing a role in caterpillar pigmentation. By functional genetic and in silico approaches, they identified putative target genes of the Bm-mamo protein. They showed that numerous cuticular proteins are regulated by Bm-mamo during larval development.

      Strengths:

      • preliminary data about the role of cuticular proteins to pattern the localization of pigments

      • timely question

      • challenging question because it requires the development of future genetic and cell biology tools at the nanoscale

      Response: Thank you very much for your affirmation of our work. The reviewer's familiarity with the color patterns of Lepidoptera is helpful, and the recommendation raised has provided us with very important assistance. This has allowed us to make significant progress with our manuscript.

      Weaknesses:

      • statistical sampling limited

      • the discussion would gain in being shorter and refocused on a few points, especially the link between cuticular proteins and pigmentation. The article would be better if the last evolutionary-themed section of the discussion is removed.

      A recent paper has been published on the same gene in Bombyx mori (https://www.sciencedirect.com/science/article/abs/pii/S0965174823000760) in August 2023. The authors must discuss and refer to this published paper through the present manuscript.

      Response: Thank you very much for your careful work. First, we believe that competitive research is sometimes coincidental and sometimes intentional. Our research began in 2009, when we began to configure the recombinant population. In 2016, we published an article on comparative transcriptomics (Wu et al. 2016). The article mentioned above has a strong interest in our research and is based on our transcriptome analysis for further research, with the aim of making a preemptive publication. To discourage such behavior, we cannot cite it and do not want to discuss it in our paper.

      Songyuan Wu et al. Comparative analysis of the integument transcriptomes of the black dilute mutant and the wild-type silkworm Bombyx mori. Sci Rep. 2016 May 19:6:26114. doi: 10.1038/srep26114.

      Reviewer #1 (Recommendations For The Authors):

      1) please consider using a more recent annotation model of the B. mori genome to revise your Result Section 1, Fig.1, and Fig. S2. https://www.ncbi.nlm.nih.gov/gene/101738295

      Specifically, you used BGIM_ gene models, while the current annotation such as the one above featured in the NCBI database provides more accurate intron-exon structures without splitting mamo into tow genes. I believe this can be done with minor revisions of the figures, and you could keep the BGIM_ gene names for the text.

      Response: Thank you very much for your careful work. The GenBank of NCBI (National Center for Biotechnology Information) is a very good database that we often use and refer to in this research process. Our research started in 2009, so we mainly referred to the SilkDB database (Jun Duan et al., 2010), although other databases also have references, such as NCBI and Silkbase (https://silkbase.ab.a.u-tokyo.ac.jp/cgi-bin/index.cgi). Because the SilkDB database was constructed based on the first published silkworm genome data, it has been used for the longest time and has a relatively large number of users. Recently, researchers are still using these data (Kejie Li et al., 2023).

      The problem with predicting the mamo gene as two genes (BGIBMGA012517 and BGIBMGA012518) in SilkDB is mainly due to the presence of alternative splicing of the mamo gene. BGIBMGA012517 corresponds to the shorter transcript (mamo-s) of the mamo gene. Due to the differences in sequencing individuals, sequencing methods, and methods of gene prediction, there are differences in the number and sequence of predicted genes in different databases. We added the pattern diagram of predicted genes from NCBI and Silkbase, and the expression levels of new predicted genes are shown in Supplemental Figure S1.

      Jun Duan et al., SilkDB v2.0: a platform for silkworm (Bombyx mori) genome biology. Nucleic Acids Res. 2010 Jan;38(Database issue): D453-6. doi: 10.1093/nar/gkp801. Kejie Li et al., Transcriptome analysis reveals that knocking out BmNPV iap2 induces apoptosis by inhibiting the oxidative phosphorylation pathway. Int J Biol Macromol. 2023 Apr 1;233:123482. doi: 10.1016/j.ijbiomac.2023.123482. Epub 2023 Jan 31.

      Author response image 2.

      The predicted genes and qPCR analysis of candidate genes in the responsible genomic region for bd mutant. (A) The predicted genes in SilkDB;(B) the predicted genes in Genbak;(C) the predicted genes in Silkbase;(D) analysis of nucleotide differences in the responsible region of bd;(E) investigation of the expression level of candidate genes.

      2) As I mentioned in my public review, I strongly believe the interpretation of the PWM binding analyses require much more conservative statements taking into account the idea that short 5-nt motifs are expected by chance. The work in this section is interesting, but the manuscript would benefit from a quite significant rewrite of the corresponding Discussion section, making it that the in silico approach is prone to the identification of many sites in the genomes, and that very few of those sites are probably relevant for probabilistic reasons. I would recommend statements such as "Future experiments assessing the in vivo binding profile of Bm-mamo (eg. ChIP-seq or Cut&Run), will be required to further understand the GRNs controlled by mamo in various tissues".

      Response: Thank you very much for your careful work. Previous research has suggested that the ZF-DNA binding interface can be understood as a “canonical binding model”, in which each finger contacts DNA in an antiparallel manner. The binding sequence of the C2H2-ZF motif is determined by the amino acid residue sequence of its α-helical component. Considering the first amino acid residue in the α-helical region of the C2H2-ZF domain as position 1, positions -1, 2, 3, and 6 are key amino acids for recognizing and binding DNA. The residues at positions -1, 3, and 6 specifically interact with base 3, base 2, and base 1 of the DNA sense sequence, respectively, while the residue at position 2 interacts with the complementary DNA strand (Wolfe SA et al., 2000; Pabo CO et al., 2001). Based on this principle, the prediction of DNA recognition motifs of C2H2-type zinc finger proteins currently has good accuracy.

      The predicted DNA binding sequence (GTGCGTGGC) of the mamo protein in Drosophila melanogaster was highly consistent with that of silkworms. In addition, in D. melanogaster, the predicted DNA binding sequence of mamo, the bases at positions 1 to 7 (GTGCGTG), was highly similar to the DNA binding sequence obtained from EMSA experiments (Seiji Hira et al., 2013). Furthermore, in another study on the mamo protein of Drosophila melanogaster, five bases (TGCGT) were used as the DNA recognition core sequence of the mamo protein (Shoichi Nakamura et al., 2019). In the JASPAR database (https://jaspar.genereg.net), there are also some shorter (4-6 nt) DNA recognition sequences; for example, the DNA binding sequence of Ubx is TAAT (ID MA0094.1) in Drosophila melanogaster. However, we used longer DNA binding motifs (9 nt and 15 nt) of mamo to study the 2 kb genomic regions near the predicted gene. Over 70% of predicted genes were found to have these feature sequences near them. This analysis method is carried out with common software and processes. Due to sufficient target proteins, the accessibility of DNA, the absence of suppressors, the suitability of ion environments, etc., zinc finger protein transcription factors are more likely to bind to specific DNA sequences in vitro than in vivo. Using ChIP-seq or Cut&Run techniques to analyze various tissues and developmental stages in silkworms can yield one comprehensive DNA-binding map of mamo, and some false positives generated by predictions can be excluded. Thank you for your suggestion. We will conduct this work in the next research step. In addition, for brevity, we deleted the predicted data (Supplemental Tables S7 and S8) that used shorter motifs.

      Pabo CO, Peisach E, Grant RA. Design and selection of novel Cys2His2 zinc finger proteins. Annu Rev Biochem. 2001;70:313-340.

      Wolfe SA, Nekludova L, Pabo CO. DNA recognition by Cys2His2 zinc finger proteins. Annu Rev Biophys Biomol Struct. 2000;29:183-212.

      Anton V Persikov et al., De novo prediction of DNA-binding specificities for Cys2His2 zinc finger proteins. Nucleic Acids Res. 2014 Jan;42(1):97-108. doi: 10.1093/nar/gkt890. Epub 2013 Oct 3.

      Seiji Hira et al., Binding of Drosophila maternal Mamo protein to chromatin and specific DNA sequences. Biochem Biophys Res Commun. 2013 Aug 16;438(1):156-60. doi: 10.1016/j.bbrc.2013.07.045. Epub 2013 Jul 20.

      Shoichi Nakamura et al., A truncated form of a transcription factor Mamo activates vasa in Drosophila embryos. Commun Biol. 2019 Nov 20;2: 422. doi: 10.1038/s42003-019-0663-4. eCollection 2019.

      3) In my opinion, the last section of the Discussion needs to be completely removed ("Notably, the industrial melanism event, in a short period of several decades ... a more advanced self-regulation program"), as it is over-extending the data into evolutionary interpretations without any support. I would suggest instead writing a short paragraph asking whether the pigmentary role of mamo is a Lepidoptera novelty, or if it could have been lost in the fly lineage.

      Below, I tried to comment point-by-point on the main issues I had.

      Wu et al: Notably, the industrial melanism event, in a short period of several decades, resulted in significant changes in the body color of multiple Lepidoptera species(46). Industrial melanism events, such as changes in the body color of pepper moths, are heritable and caused by genomic mutations(47).

      Yes, but the selective episode was brief, and the relevant "carbonaria" mutations may have existed for a long time at low-frequency in the population.

      Response: Thank you very much for your careful work. Moth species often have melanic variants at low frequencies outside industrial regions. Recent molecular work on genetics has revealed that the melanic (carbonaria) allele of the peppered moth had a single origin in Britain. Further research indicated that the mutation event causing industrial melanism of peppered moth (Biston betularia) in the UK is the insertion of a transposon element into the first intron of the cortex gene. Interestingly, statistical inference based on the distribution of recombined carbonaria haplotypes indicates that this transposition event occurred in approximately 1819, a date highly consistent with a detectable frequency being achieved in the mid-1840s (Arjen E Van't Hof, et al., 2016). From molecular research, it is suggested that this single origin melanized mutant (carbonaria) was generated near the industrial development period, rather than the ancient genotype, in the UK. We have rewritten this part of the manuscript.

      Arjen E Van't Hof, et al., The industrial melanism mutation in British peppered moths is a transposable element. Nature. 2016 Jun 2;534(7605):102-5. doi: 10.1038/nature17951.

      Wu et al: If relying solely on random mutations in the genome, which have a time unit of millions of years, to explain the evolution of the phenotype is not enough.

      What you imply here is problematic for several reasons.

      First, as you point out later, some large-effect mutations (e.g. transpositions) can happen quickly.

      Second, it's unclear what "the time units of million of years" means here... mutations occur, segregate in populations, and are selected. The speed of this process depends on the context and genetic architectures.

      Third, I think I understand what you mean with "to explain the evolution of the phenotype is not enough", but this would probably need a reformulation and I don't think it's relevant to bring it here. After all, you used loss-of-function mutants to explain the evolution of artificially selected mutants. The evolutionary insights from these mutants are limited. Random mutations at the mamo locus are perfectly sufficient here to explain the bd and bdf phenotypes and larval traits.

      Response: Thank you very much for your careful work. Charles Darwin himself, who argued that “natural selection can act only by taking advantage of slight successive variations; she can never take a leap, but must advance by the shortest and slowest steps” (Darwin, C. R. 1859). This ‘micromutational’ view of adaptation proved extraordinarily influential. However, the accumulation of micromutations is a lengthy process, which requires a very long time to evolve a significant phenotype. This may be only a proportion of the cases. Interestingly, recent molecular biology studies have shown that the evolution of some morphological traits involves a modest number of genetic changes (H Allen Orr. 2005).

      One example is the genetic basis analysis of armor-plate reduction and pelvic reduction of the three-spined stickleback (Gasterosteus aculeatus) in postglacial lakes. Although the marine form of this species has thick armor, the lake population (which was recently derived from the marine form) does not. The repeated independent evolution of lake morphology has resulted in reduced armor plate and pelvic structures, and there is no doubt that these morphological changes are adaptive. Research has shown that pelvic loss in different natural populations of three-spined stickleback fish occurs by regulatory mutations deleting a tissue-specific enhancer (Pel) of the pituitary homeobox transcription factor 1 (Pitx1) gene. The researchers genotyped 13 pelvic-reduced populations of three-spined stickleback from disparate geographic locations. Nine of the 13 pelvic-reduced stickleback populations had sequence deletions of varying lengths, all of which were located at the Pel enhancer. Relying solely on random mutations in the genome cannot lead to such similar mutation forms among different populations. The author suggested that the Pitx1 locus of the stickleback genome may be prone to double-stranded DNA breaks that are subsequently repaired by NHEJ (Yingguang Frank Chan et al., 2010).

      The bd and bdf mutants used in the study are formed spontaneously. Natural mutation is one of the driving forces of evolution. Nevertheless, we have rewritten the content of this section.

      Darwin, C. R. The Origin of Species (J. Murray, London, 1859).

      H Allen Orr. The genetic theory of adaptation: a brief history. Nat Rev Genet. 2005 Feb;6(2):119-27. doi: 10.1038/nrg1523.

      Yingguang Frank Chan et al., Adaptive evolution of pelvic reduction in sticklebacks by recurrent deletion of a Pitx1 enhancer. Science. 2010 Jan 15;327(5963):302-5. doi: 10.1126/science.1182213. Epub 2009 Dec 10.

      Wu et al: Interestingly, the larva of peppered moths has multiple visual factors encoded by visual genes, which are conserved in multiple Lepidoptera, in the skin. Even when its compound eyes are covered, it can rely on the skin to feel the color of the environment to change its body color and adapt to the environment(48). Therefore, caterpillars/insects can distinguish the light wave frequency of the background. We suppose that perceptual signals can stimulate the GRN, the GRN guides the expression of some transcription factors and epigenetic factors, and the interaction of epigenetic factors and transcription factors can open or close the chromatin of corresponding downstream genes, which can guide downstream target gene expression.

      This is extremely confusing because you are bringing in a plastic trait here. It's possible there is a connection between the sensory stimulus and the regulation of mamo in peppered moths, but this is a mere hypothesis. Here, by mentioning a plastic trait, this paragraph sounds as if it was making a statement about directed evolution, especially after implying in the previous sentence that (paraphrasing) "random mutations are not enough". To be perfectly honest, the current writing could be misinterpreted and co-opted by defenders of the Intelligent Design doctrine. I believe and trust this is not your intention.

      Response: Thank you very much for your careful work. The plasticity of the body color of peppered moth larvae is very interesting, but we mainly wanted to emphasize that their skin shows the products of visual genes that can sense the color of the environment by perceiving light. Moreover, these genes are conserved in many insects. Human skin can also perceive light by opsins, suggesting that they might initiate light–induced signaling pathways (Haltaufderhyde K et al., 2015). This indicates that the perception of environmental light by the skin of animals and the induction of feedback through signaling pathways is a common phenomenon. For clarity, we have rewritten this section of the manuscript.

      Haltaufderhyde K, Ozdeslik RN, Wicks NL, Najera JA, Oancea E. Opsin expression in human epidermal skin. Photochem Photobiol. 2015;91(1):117-123.

      Wu et al: In addition, during the opening of chromatin, the probability of mutation of exposed genomic DNA sequences will increase (49).

      Here again, this is veering towards a strongly Lamarckian view with the environment guiding specific mutation. I simply cannot see how this would apply to mamo, nothing in the current article indicates this could be the case here. Among many issues with this, it's unclear how chromatin opening in the larval integument may result in heritable mutations in the germline.

      Response: Thank you very much for your careful work. Previous studies have shown that there is a mutation bias in the genome; compared with the intergenic region, the mutation frequency is reduced by half inside gene bodies and by two-thirds in essential genes. In addition, they compared the mutation rates of genes with different functions. The mutation rate in the coding region of essential genes (such as translation) is the lowest, and the mutation rates in the coding region of specialized functional genes (such as environmental response) are the highest. These patterns are mainly affected by the traits of the epigenome (J Grey Monroe et al., 2022).

      In eukaryotes, chromatin is organized as repeating units of nucleosomes, each consisting of a histone octamer and the surrounding DNA. This structure can protect DNA. When one gene is activated, the chromatin region of this gene is locally opened, becoming an accessible region. Research has found that DNA accessibility can lead to a higher mutation rate in the region (Radhakrishnan Sabarinathan et al., 2016; Schuster-Böckler B et al., 2012; Lawrence MS et al., 2013; Polak P et al., 2015). In addition, the BTB-ZF protein mamo belongs to this family and can recruit histone modification factors such as DNA methyltransferase 1 (DMNT1), cullin3 (CUL3), histone deacetylase 1 (HDAC1), and histone acetyltransferase 1 (HAT1) to perform chromatin remodeling at specific genomic sites. Although mutations can be predicted by the characteristics of apparent chromatin, the forms of mutations are diverse and random. Therefore, this does not violate randomness. For clarity, we have rewritten this section of the manuscript.

      J Grey Monroe, Mutation bias reflects natural selection in Arabidopsis thaliana. Nature. 2022 Feb;602(7895):101-105.

      Sabarinathan R, Mularoni L, Deu-Pons J, Gonzalez-Perez A, López-Bigas N. Nucleotide excision repair is impaired by binding of transcription factors to DNA. Nature. 2016;532(7598):264-267.

      Schuster-Böckler B, Lehner B. Chromatin organization is a major influence on regional mutation rates in human cancer cells. Nature. 2012;488(7412):504-507.

      Lawrence MS, Stojanov P, Polak P, et al. Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature. 2013;499(7457):214-218.

      Polak P, Karlić R, Koren A, et al. Cell-of-origin chromatin organization shapes the mutational landscape of cancer. Nature. 2015;518(7539):360-364.

      Mathew R, Seiler MP, Scanlon ST, et al. BTB-ZF factors recruit the E3 ligase cullin 3 to regulate lymphoid effector programs. Nature. 2012;491(7425):618-621.

      Wu et al: Transposon insertion occurs in a timely manner upstream of the cortex gene in melanic pepper moths (47), which may be caused by the similar binding of transcription factors and opening of chromatin.

      No, we do not think that the peppered moth mutation is Lamarckian at all, as seems to be inferred here (notice that by mentioning the peppered moth twice, you are juxtaposing a larval plastic trait and then a purely genetic wing trait, making it even more confusing). Also, the "in a timely manner" is superfluous, because all the data are consistent with a chance mutation being eventually picked up by strong directional mutation. The mutation and selection did NOT occur at the same time.

      Response: Thank you very much for your careful work. The insertion of one transposon into the first intron of the cortex gene of industrial melanism in peppered moth occurred in approximately 1819, which is similar to the time of industrial development in the UK (Arjen E Van't Hof, et al., 2016). In multiple species of Heliconius, the cortex gene is the shared genetic basis for the regulation of wing coloring patterns. Interestingly, the SNP of the cortex, associated with the wing color pattern, does not overlap among different Heliconius species, such as H. erato dephoon and H. erato favorinus, which suggests that the mutations of this cortex gene have different origins (Nadeau NJ et al., 2016). In addition, in Junonia coenia (van der Burg KRL et al., 2020) and Bombyx mori (Ito K et al., 2016), the cortex gene is a candidate for regulating changes in wing coloring patterns. Overall, the cortex gene is an evolutionary hotspot for the variation of multiple butterfly and moth wing coloring patterns. In addition, it was observed that the variations in the cortex are diverse in these species, including SNPs, indels, transposon insertions, inversions, etc. This indicates that although there are evolutionary hotspots in the insect genome, this variation is random. Therefore, this is not completely detached from randomness.

      Arjen E Van't Hof, et al., The industrial melanism mutation in British peppered moths is a transposable element. Nature. 2016 Jun 2;534(7605):102-5. doi: 10.1038/nature17951.

      Nadeau NJ, Pardo-Diaz C, Whibley A, et al. The gene cortex controls mimicry and crypsis in butterflies and moths. Nature. 2016;534(7605):106-110.

      van der Burg KRL, Lewis JJ, Brack BJ, Fandino RA, Mazo-Vargas A, Reed RD. Genomic architecture of a genetically assimilated seasonal color pattern. Science. 2020;370(6517):721-725.

      Ito K, Katsuma S, Kuwazaki S, et al. Mapping and recombination analysis of two moth colour mutations, Black moth and Wild wing spot, in the silkworm Bombyx mori. Heredity (Edinb). 2016;116(1):52-59.

      Wu et al: Therefore, we proposed that the genetic basis of color pattern evolution may mainly be system-guided programmed events that induce mutations in specific genomic regions of key genes rather than just random mutations of the genome.

      While the mutational target of pigment evolution may involve a handful of developmental regulator genes, you do not have the data to infer such a strong conclusion at the moment.

      The current formulation is also quite strong and teleological: "system-guided programmed events" imply intentionality or agency, an idea generally assigned to the anti-scientific Intelligent Design movement. There are a few examples of guided mutations, such as the adaptation phase of gRNA motifs in bacterial CRISPR assays, where I could see the term ""system-guided programmed events" to be applicable. But it is irrelevant here.

      Response: Thank you very much for your careful work. The CRISPR-CAS9 system is indeed very well known. In addition, recent studies have found the existence of a Cas9-like gene editing system in eukaryotes, such as Fanzor. Fanzor (Fz) was reported in 2013 as a eukaryotic TnpB-IS200/IS605 protein encoded by the transposon origin, and it was initially thought that the Fz protein (and prokaryotic TnpBs) might regulate transposon activity through methyltransferase activity (Saito M et al., 2023). Fz has recently been found to be a eukaryotic CRISPR‒Cas system. Although this system is found in fungi and mollusks, it raises hopes for scholars to find similar systems in other higher animals. However, before these gene-editing systems became popular, zinc finger nucleases (ZFNs) were already being studied as a gene-editing system in many species. The mechanism by which ZFN recognizes DNA depends on its zinc finger motif (Urnov FD et al., 2005). This is consistent with the mechanism by which transcription factors recognize DNA-binding sites.

      Furthermore, a very important evolutionary event in sexual reproduction is chromosome recombination during meiosis, which helps to produce more abundant alleles. Current research has found that this recombination event is not random. In mice and humans, the PRDM9 transcription factors are able to plan the sites of double-stranded breaks (DSBs) in meiosis recombination. PRDM9 is a histone methyltransferase consisting of three main regions: an amino-terminal region resembling the family of synovial sarcoma X (SSX) breakpoint proteins, which contains a Krüppel-associated box (KRAB) domain and an SSX repression domain (SSXRD); a PR/SET domain (a subclass of SET domains), surrounded by a pre-SET zinc knuckle and a post-SET zinc finger; and a long carboxy-terminal C2H2 zinc finger array. In most mammalian species, during early meiotic prophase, PRDM9 can determine recombination hotspots by H3K4 and H3K36 trimethylation (H3K4me3 and H3K36me3) of nucleosomes near its DNA-binding site. Subsequently, meiotic DNA DSBs are formed at hotspots through the combined action of SPO11 and TOPOVIBL. In addition, some proteins (such as RAD51) are involved in repairing the break point. In summary, programmed events of induced and repaired DSBs are widely present in organisms (Bhattacharyya T et al., 2019).

      These studies indicate that on the basis of randomness, the genome also exhibits programmability.

      Saito M, Xu P, Faure G, et al. Fanzor is a eukaryotic programmable RNA-guided endonuclease. Nature. 2023;620(7974):660-668.

      Urnov FD, Miller JC, Lee YL, et al. Highly efficient endogenous human gene correction using designed zinc-finger nucleases. Nature. 2005;435(7042):646-651.

      Bhattacharyya T, Walker M, Powers NR, et al. Prdm9 and Meiotic Cohesin Proteins Cooperatively Promote DNA Double-Strand Break Formation in Mammalian Spermatocytes [published correction appears in Curr Biol. 2021 Mar 22;31(6):1351]. Curr Biol. 2019;29(6):1002-1018.e7.

      Wu et al: Based on this assumption, animals can undergo phenotypic changes more quickly and more accurately to cope with environmental changes. Thus, seemingly complex phenotypes such as cryptic coloring and mimicry that are highly similar to the background may have formed in a short period. However, the binding sites of some transcription factors widely distributed in the genome may be reserved regulatory interfaces to cope with potential environmental changes. In summary, the regulation of genes is smarter than imagined, and they resemble a more advanced self-regulation program.

      Here again, I can agree with the idea that certain genetic architectures can evolve quickly, but I cannot support the concept that the genetic changes are guided or accelerated by the environment. And again, none of this is relevant to the current findings about Bm-mamo.

      Response: Thank you very much for your careful work. Darwin's theory of natural selection has epoch-making significance. I deeply believe in the theory that species strive to evolve through natural selection. However, with the development of molecular genetics, Darwinism’s theory of undirected random mutations and slow accumulation of micromutations resulting in phenotype evolution has been increasingly challenged.

      The prerequisite for undirected random mutations and micromutations is excessive reproduction to generate a sufficiently large population. A sufficiently large population can contain sufficient genotypes to face various survival challenges. However, it is difficult to explain how some small groups and species with relatively low fertility rates have survived thus far. More importantly, the theory cannot explain the currently observed genomic mutation bias. In scientific research, every theory is constantly being modified to adapt to current discoveries. The most famous example is the debate over whether light is a particle or a wave, which has lasted for hundreds of years. However, in the 20th century, both sides seemed to compromise with each other, believing that light has a wave‒particle duality.

      Epigenetics has developed rapidly since 1987. Epigenetics has been widely accepted, defined as stable inheritance caused by chromosomal conformational changes without altering the DNA sequence, which differs from genetic research on variations in gene sequences. However, an increasing number of studies have found that histone modifications can affect gene sequence variation. In addition, both histones and epigenetic factors are essentially encoded by genes in the genome. Therefore, genetics and epigenetics should be interactive rather than parallel. However, some transcription factors play an important role in epigenetic modifications. Meiotic recombination is a key process that ensures the correct separation of homologous chromosomes through DNA double-stranded break repair mechanisms. The transcription factor PRDM9 can determine recombination hotspots by H3K4 and H3K36 trimethylation (H3K4me3 and H3K36me3) of nucleosomes near its DNA-binding site (Bhattacharyya T et al., 2019). Interestingly, mamo has been identified as an important candidate factor for meiosis hotspot setting in Drosophila (Winbush A et al., 2021).

      Bhattacharyya T, Walker M, Powers NR, et al. Prdm9 and Meiotic Cohesin Proteins Cooperatively Promote DNA Double-Strand Break Formation in Mammalian Spermatocytes [published correction appears in Curr Biol. 2021 Mar 22;31(6):1351]. Curr Biol. 2019;29(6):1002-1018.e7.

      Winbush A, Singh ND. Genomics of Recombination Rate Variation in Temperature-Evolved Drosophila melanogaster Populations. Genome Biol Evol. 2021;13(1): evaa252.

      Reviewer #2 (Recommendations For The Authors):

      Major comments

      Response: Thank you very much for your careful work. First, we believe that competitive research is sometimes coincidental and sometimes intentional. Our research began in 2009, when we began to configure the recombinant population. In 2016, we published an article on comparative transcriptomics (Wu et al. 2016). The article mentioned above has a strong interest in our research and is based on our transcriptome analysis for further research, with the aim of making a preemptive publication.

      To discourage such behavior, we cannot cite it and do not want to discuss it in our paper.

      Songyuan Wu et al. Comparative analysis of the integument transcriptomes of the black dilute mutant and the wild-type silkworm Bombyx mori. Sci Rep. 2016 May 19:6:26114. doi: 10.1038/srep26114.

      • line 52-54. The numerous biological functions of insect coloration have been thoroughly investigated. It is reasonable to expect more references for each function.

      Response: Thank you very much for your careful work. We have made the appropriate modifications.

      Sword GA, Simpson SJ, El Hadi OT, Wilps H. Density-dependent aposematism in the desert locust. Proc Biol Sci. 2000;267(1438):63-68. … Behavior.

      Barnes AI, Siva-Jothy MT. Density-dependent prophylaxis in the mealworm beetle Tenebrio molitor L. (Coleoptera: Tenebrionidae): cuticular melanization is an indicator of investment in immunity. Proc Biol Sci. 2000;267(1439):177-182. … Immunity.

      N. F. Hadley, A. Savill, T. D. Schultz, Coloration and Its Thermal Consequences in the New-Zealand Tiger Beetle Neocicindela-Perhispida. J Therm Biol. 1992;17, 55-61…. Thermoregulation.

      Y. G. Hu, Y. H. Shen, Z. Zhang, G. Q. Shi, Melanin and urate act to prevent ultraviolet damage in the integument of the silkworm, Bombyx mori. Arch Insect Biochem. 2013; 83, 41-55…. UV protection.

      M. Stevens, G. D. Ruxton, Linking the evolution and form of warning coloration in nature. P Roy Soc B-Biol Sci. 2012; 279, 417-426…. Aposematism.

      K. K. Dasmahapatra et al., Butterfly genome reveals promiscuous exchange of mimicry adaptations among species. Nature.2012; 487, 94-98…. Mimicry.

      Gaitonde N, Joshi J, Kunte K. Evolution of ontogenic change in color defenses of swallowtail butterflies. Ecol Evol. 2018;8(19):9751-9763. Published 2018 Sep 3. …Crypsis.

      B. S. Tullberg, S. Merilaita, C. Wiklund, Aposematism and crypsis combined as a result of distance dependence: functional versatility of the colour pattern in the swallowtail butterfly larva. P Roy Soc B-Biol Sci.2005; 272, 1315-1321…. Aposematism and crypsis combined.

      • line 59-60. This general statement needs to be rephrased. I suggest remaining simple by indicating that insect coloration can be pigmentary, structural, or bioluminescent. About the structural coloration and associated nanostructures, the authors could cite recent reviews, such as: Seago et al., Interface 2009 + Lloyd and Nadeau, Current Opinion in Genetics & Development 2021 + "Light as matter: natural structural colour in art" by Finet C. 2023. I suggest doing the same for recent reviews that cover pigmentary and bioluminescent coloration in insects. The very recent paper by Nishida et al. in Cell Reports 2023 on butterfly wing color made of pigmented liquid is also unique and worth to consider.

      Response: Thank you very much for your careful work. We have made the appropriate modifications.

      Insect coloration can be pigmentary, structural, or bioluminescent. Pigments are mainly synthesized by the insects themselves and form solid particles that are deposited in the cuticle of the body surface and the scales of the wings (10, 11). Interestingly, recent studies have found that bile pigments and carotenoid pigments synthesized through biological synthesis are incorporated into body fluids and passed through the wing membranes of two butterflies (Siproeta stelenes and Philaethria diatonica) via hemolymph circulation, providing color in the form of liquid pigments (12). The pigments form colors by selective absorption and/or scattering of light depending on their physical properties (13). However, structural color refers to colors, such as metallic colors and iridescence, generated by optical interference and grating diffraction of the microstructure/nanostructure of the body surface or appendages (such as scales) (14, 15). Pigment color and structural color are widely distributed in insects and can only be observed by the naked eye in illuminated environments. However, some insects, such as fireflies, exhibit colors (green to orange) in the dark due to bioluminescence (16). Bioluminescence occurs when luciferase catalyzes the oxidation of small molecules of luciferin (17). In conclusion, the color patterns of insects have evolved to be highly sophisticated and are closely related to their living environments. For example, cryptic color can deceive animals via high similarity to the surrounding environment. However, the molecular mechanism by which insects form precise color patterns to match their living environment is still unknown.

      • RNAi approach. I have no doubt that obtaining phenocopies by electroporation might be difficult. However, I find the final sampling a bit limited to draw conclusions from the RT-PCR (n=5 and n=3 for phenocopies and controls). Three control individuals is a very low number. Moreover, it would nice to see the variability on the plot, using for example violin plots.

      Response: Thank you very much for your careful work. In the RNAi experiment, we injected more than 20 individuals in the experimental group and control group. We have added the RNAi data in Figure 4.

      Author response table 1.

      • Figure 6. Higher magnification images of Dazao and Bm-mamo knockout are needed, as shown in Figure 5 on RNAi.

      Response: Thank you very much for your careful work. We have added enlarged images.

      Author response image 3.

      • Phylogenetic analysis/Figure S6. I am not sure to what extent the sampling is biased or not, but if not, it is noteworthy that mamo does not show duplicated copies (negative selection?). It might be interesting to discuss this point in the manuscript.

      Response: Thank you very much for your careful work. mamo belongs to the BTB/POZ zinc finger family. The members of this family exhibit significant expansion in vertebrates. For example, there are 3 members in C. elegans, 13 in D. melanogaster, 16 in Bombyx mori, 58 in M. musculus and 63 in H. sapiens (Wu et al, 2019). These members contain conserved BTB/POZ domains but vary in number and amino acid residue compositions of the zinc finger motifs. Due to the zinc finger motifs that bind to different DNA recognition sequences, there may be differences in their downstream target genes. Therefore, when searching for orthologous genes from different species, we required high conservation of their zinc finger motif sequences. Due to these strict conditions, only one orthologous gene was found in these species.

      • Differentially-expressed genes and CP candidate genes (line 189-191). The manuscript would gain in clarity if the authors explain more in details their procedure. For instance, they moved from a list of 191 genes to CP genes only. Can they say a little bit more about the non-CP genes that are differentially expressed? Maybe quantify the number of CPs among the total number of differentially-expressed genes to show that CPs are the main class?

      Response: Thank you very much for your careful work. The nr (Nonredundant Protein Sequence Database) annotations for 191 differentially expressed genes in Supplemental Table S3 were added. Among them, there were 19 cuticular proteins, 17 antibacterial peptide genes, 6 transporter genes, 5 transcription factor genes, 5 cytochrome genes, 53 enzyme-encoding genes and others. Because CP genes were significantly enriched in differentially expressed genes (DEGs), previous studies have found that BmorCPH24 can affect pigmentation. Therefore, we first conducted an investigation into CP genes.

      • Interaction between Bm-mamo. It is not clear why the authors chose to investigate the physical interaction of Bm-mamo protein with the putative binding site of yellow, and not with the sites upstream of tan and DDC. Do the authors test one interaction and assume the conclusion stands for the y, tan and DDC?

      Response: Thank you very much for your careful work. In D. melanogaster, the yellow gene is the most studied pigment gene. The upstream and intron sequences of the yellow gene have been identified as containing multiple cis-regulatory elements. Due to the important pigmentation role of the yellow gene and its variable cis-regulatory sequence among different species, it has been considered a research model for cis-regulatory elements (Laurent Arnoult et al. 2013, Gizem Kalay et al. 2019, Yaqun Xin et al. 2020, Yann Le Poul et al. 2020). We use yellow as an example to illustrate the regulation of the mamo gene. We added this description to the discussion.

      Laurent Arnoult et al. Emergence and diversification of fly pigmentation through evolution of a gene regulatory module. Science. 2013 Mar 22;339(6126):1423-6. doi: 10.1126/science.1233749.

      Gizem Kalay et al. Redundant and Cryptic Enhancer Activities of the Drosophila yellow Gene. Genetics. 2019 May;212(1):343-360. doi: 10.1534/genetics.119.301985. Epub 2019 Mar 6.

      Yaqun Xin et al. Enhancer evolutionary co-option through shared chromatin accessibility input. Proc Natl Acad Sci U S A. 2020 Aug 25;117(34):20636-20644. doi: 10.1073/pnas.2004003117. Epub 2020 Aug 10.

      Yann Le Poul et al. Regulatory encoding of quantitative variation in spatial activity of a Drosophila enhancer. Sci Adv. 2020 Dec 2;6(49):eabe2955. doi: 10.1126/sciadv.abe2955. Print 2020 Dec.

      • Please note that some controls are missing for the EMSA experiments. For instance, the putative binding-sites should be mutated and it should be shown that the interaction is lost.

      Response: Thank you very much for your careful work. In this study, we found that the DNA recognition sequence of mamo is highly conserved across multiple species. In D. melanogaster, studies have found that mamo can directly bind to the intron of the vasa gene to activate its expression. The DNA recognition sequence they use is TGCGT (Shoichi Nakamura et al. 2019). We chose a longer sequence, GTGCGTGGC, to detect the binding of mamo. This binding mechanism is consistent across species.

      • Figure 7 and supplementary data. How did the name of CPs attributed? According to automatic genome annotation of Bm genes and proteins? Based on Drosophila genome and associated gene names? Did the authors perform phylogenetic analyses to name the different CP genes?

      Response: Thank you very much for your careful work. The naming of CPs is based on their conserved motif and their arrangement order on the chromosome. In previous reports, sequence identification and phylogenetic analysis of CPs have been carried out in silkworms (Zhengwen Yan et al. 2022, Ryo Futahashi et al. 2008). The members of the same family have sequence similarity between different species, and their functions may be similar. We have completed the names of these genes in the text, for example, changing CPR2 to BmorCPR2.

      Zhengwen Yan et al. A Blueprint of Microstructures and Stage-Specific Transcriptome Dynamics of Cuticle Formation in Bombyx mori. Int J Mol Sci. 2022 May 5;23(9):5155.

      Ningjia He et al. Proteomic analysis of cast cuticles from Anopheles gambiae by tandem mass spectrometry. Insect Biochem Mol Biol. 2007 Feb;37(2):135-46.

      Maria V Karouzou et al. Drosophila cuticular proteins with the R&R Consensus: annotation and classification with a new tool for discriminating RR-1 and RR-2 sequences. Insect Biochem Mol Biol. 2007 Aug;37(8):754-60.

      Ryo Futahashi et al. Genome-wide identification of cuticular protein genes in the silkworm, Bombyx mori. Insect Biochem Mol Biol. 2008 Dec;38(12):1138-46.

      • Discussion. I think the discussion would gain in being shorter and refocused on the understudied role of CPs. Another non-canonical aspect of the discussion is the reference to additional experiments (e.g., parthogenesis line 290-302, figure S14). This is not the place to introduce more results, and it breaks the flow of the discussion. I encourage the authors to reshuffle the discussion: 1) summary of their findings on mamo and CPs, 2) link between pigmentation mutant phenotypes, pigmentation pattern and CPs, 3) general discussion about the (evo-)devo importance of CPs and link between pigment deposition and coloration. Three important papers should be mentioned here:

      1) Matsuoka Y and A Monteiro (2018) Melanin pathway genes regulate color and morphology of butterfly wing scales. Cell Reports 24: 56-65... Yellow has a pleiotropic role in cuticle deposition and pigmentation.

      2) https://arxiv.org/abs/2305.16628... Link between nanoscale cuticle density and pigmentation

      3) https://www.cell.com/cell-reports/pdf/S2211-1247(23)00831-8.pdf... Variation in pigmentation and implication of endosomal maturation (gene red).

      Response: Thank you very much for your careful work. We have rewritten the discussion section.

      1) We have summarized our findings.

      Bm-mamo may affect the synthesis of melanin in epidermis cells by regulating yellow, DDC, and tan; regulate the maturation of melanin granules in epidermis cells through BmMFS; and affect the deposition of melanin granules in the cuticle by regulating CP genes, thereby comprehensively regulating the color pattern in caterpillars.

      2) We describe the relationship among the pigmentation mutation phenotype, pigmentation pattern, and CP.

      Previous studies have shown that the lack of expression of BmorCPH24, which encodes important components of the endocuticle, can lead to dramatic changes in body shape and a significant reduction in the pigmentation of caterpillars (53). We crossed Bo (BmorCPH24 null mutation) and bd to obtain F1(Bo/+Bo, bd/+), then self-crossed F1 and observed the phenotype of F2. The lunar spots and star spots decreased, and light-colored stripes appeared on the body segments, but the other areas still had significant melanin pigmentation in double mutation (Bo, bd) individuals (Fig. S13). However, in previous studies, introduction of Bo into L (ectopic expression of wnt1 results in lunar stripes generated on each body segment) (24) and U (overexpression of SoxD results in excessive melanin pigmentation of the epidermis) (58) strains by genetic crosses can remarkably reduce the pigmentation of L and U (53). Interestingly, there was a more significant decrease in pigmentation in the double mutants (Bo, L) and (Bo, U) than in (Bo, bd). This suggests that Bm-mamo has a stronger ability than wnt1 and SoxD to regulate pigmentation. On the one hand, mamo may be a stronger regulator of the melanin metabolic pathway, and on the other hand, mamo may regulate other CP genes to reduce the impact of BmorCPH24 deficiency.

      3) We discussed the importance of (evo-) devo in CPs and the relationship between pigment deposition and coloring.

      CP genes usually account for over 1% of the total genes in an insect genome and can be categorized into several families, including CPR, CPG, CPH, CPAP1, CPAP3, CPT, CPF and CPFL (68). The CPR family is the largest group of CPs, containing a chitin-binding domain called the Rebers and Riddiford motif (R&R) (69). The variation in the R&R consensus sequence allows subdivision into three subfamilies (RR-1, RR-2, and RR-3) (70). Among the 28 CPs, 11 RR-1 genes, 6 RR-2 genes, 4 hypothetical cuticular protein (CPH) genes, 3 glycine-rich cuticular protein (CPG) genes, 3 cuticular protein Tweedle motif (CPT) genes, and 1 CPFL (like the CPFs in a conserved C-terminal region) gene were identified. The RR-1 consensus among species is usually more variable than RR-2, which suggests that RR-1 may have a species-specific function. RR-2 often clustered into several branches, which may be due to gene duplication events in co-orthologous groups and may result in conserved functions between species (71). The classification of CPH is due to their lack of known motifs. In the epidermis of Lepidoptera, the CPH genes often have high expression levels. For example, BmorCPH24 had a highest expression level, in silkworm larvae epidermis (72). The CPG protein is rich in glycine. The CPH and CPG genes are less commonly found in insects outside the order Lepidoptera (73). This suggests that they may provide species specific functions for the Lepidoptera. CPT contains a Tweedle motif, and the TweedleD1 mutation has a dramatic effect on body shape in D. melanogaster (74). The CPFL members are relatively conserved in species and may be involved in the synthesis of larval cuticles (75). CPT and CPFL may have relatively conserved functions among insects. The CP genes are a group of rapidly evolving genes, and their copy numbers may undergo significant changes in different species. In addition, RNAi experiments on 135 CP genes in brown planthopper (Nilaparvata lugens) showed that deficiency of 32 CP genes leads to significant defective phenotypes, such as lethal, developmental retardation, etc. It is suggested that the 32 CP genes are indispensable, and other CP genes may have redundant and complementary functions (76). In previous studies, it was found that the construction of the larval cuticle of silkworms requires the precise expression of over two hundred CP genes (22). The production, interaction, and deposition of CPs and pigments are complex and precise processes, and our research shows that Bm-mamo plays an important regulatory role in this process in silkworm caterpillars. For further understanding of the role of CPs, future work should aim to identify the function of important cuticular protein genes and the deposition mechanism in the cuticle.

      Minor comments - Title. At this stage, there is no evidence that Bm-mamo regulates caterpillar pigmentation outside of Bombyx mori. I suggest to precise 'silkworm caterpillars' in the title.

      Response: Thank you very much for your careful work. We have modified the title.

      • Abstract, line 29. Because the knowledge on pigmentation pathway(s) is advanced, I would suggest writing 'color pattern is not fully understood' instead of 'color pattern is not clear'.

      Response: Thank you very much for your careful work. We have modified this sentence.

      • line 29. I suggest 'the transcription factor' rather than 'a transcription factor'.

      Response: Thank you very much for your careful work. We have modified this sentence.

      • line 30. If you want to mention the protein, the name 'Bm-mamo' should not be italicized.

      Response: Thank you very much for your careful work. We have modified this sentence.

      • line 30. 'in the silkworm'.

      Response: Thank you very much for your careful work. We have modified this sentence.

      • line 31. 'mamo' should not be italicized.

      Response: Thank you very much for your careful work. We have modified this sentence.

      • line 31. 'in Drosophila' rather 'of Drosophila'.

      Response: Thank you very much for your careful work. We have modified this sentence.

      • line 32. Bring detail if the gamete function is conserved in insects? In all animals?

      Response: Thank you very much for your careful work. The sentence was changed to “This gene has a conserved function in gamete production in Drosophila and silkworms and evolved a pleiotropic function in the regulation of color patterns in caterpillars.”

      • Introduction, line 51. I am not sure what the authors mean by 'under natural light'. Please rephrase.

      Response: Thank you very much for your careful work. We have deleted “under natural light”.

      • line 43. I find that the sentence 'In some studies, it has been proven that epidermal proteins can affect the body shape and appendage development of insects' is not necessary here. Furthermore, this sentence breaks the flow of the teaser.

      Response: Thank you very much for your careful work. We have deleted this sentence.

      • line 51-52. 'Greatly benefit them' should be rephrased in a more neutral way. For example, 'colours pattern have been shown to be involved in...'.

      Response: Thank you very much for your careful work. We have modified to “and the color patterns have been shown to be involved in…”

      • line 62. CPs are secreted by the epidermis, but I would say that CPs play their structural role in the cuticle, not directly in the epidermis. I suggest rephrasing this sentence and adding references.

      Response: Thank you very much for your careful work. We have modified “epidermis” to “cuticle”.

      • line 67. Please indicate that pathways have been identified/reported in Lepidoptera (11). Otherwise, the reader does not understand if you refer to previous biochemical in Drosophila for example.

      Response: Thank you very much for your careful work. We have modified this sentence. “Moreover, the biochemical metabolic pathways of pigments used for color patterning in Lepidoptera…have been reported.”

      • line 69. Missing examples of pleiotropic factors and associated references. For example, I suggest adding: engrailed (Dufour, Koshikawa and Finet, PNAS 2020) + antennapedia (Prakash et al., Cell Reports 2022) + optix (Reed et al., Science 2011), etc. Need to add references for clawless, abdominal-A.

      Response: Thank you very much for your careful work. We have made modifications.

      • line 76. The simpler term moth might be enough (instead of Lepidoptera).

      Response: Thank you very much for your careful work. We have modified this to “insect”.

      • line 96. I would simplify the text by writing "Then, quantitative RT-PCR was performed..."

      Response: Thank you very much for your careful work. We have modified this sentence.

      • line 112. 'Predict' instead of 'estimate'?

      Response: Thank you very much for your careful work. We have modified this sentence.

      • line 113. I would rather indicate the full name first, then indicate mamo between brackets.

      Response: Thank you very much for your careful work. We have modified this sentence.

      • line 144. The Perl script needs to be made accessible on public repository.

      Response: Thank you very much for your careful work.

      • line 147-150. Too many technical details here. The details are already indicated in the material and methods section. Furthermore, the details break the flow of the paragraph.

      Response: Thank you very much for your careful work. We have modified this section.

      • line 152. Needs to make the link with the observed phenotypes in Figure 1. Just needs to state that RNAi phenocopies mimic the mutant alleles.

      Response: Thank you very much for your careful work. We have modified this sentence.

      • line 153-157. Too many technical details here. The details are already indicated in the material and methods section. Furthermore, the details break the flow of the paragraph.

      Response: Thank you very much for your careful work. We have simplified this paragraph.

      • line 170. Please rephrase 'conserved in 30 species' because it might be understood as conserved in 30 species only, and not in other species.

      Response: Thank you very much for your careful work. We have modified this sentence.

      • line 182. Maybe explain the rationale behind restricting the analysis to +/- 2kb. Can you cite a paper that shows that most of binding sites are within 2kb from the start codon?

      Response: Thank you very much for your careful work. We have modified this sentence.

      • line 182. '14,623 predicted genes'.

      Response: Thank you very much for your careful work. We have modified this sentence.

      • line 183. '10,622 genes'

      Response: Thank you very much for your careful work. We have modified this sentence.

      • line 183. Redundancy. Please remove 'silkworm' or 'B. mori'.

      Response: Thank you very much for your careful work. We have modified this sentence.

      • line 187. '10,072 genes'

      Response: Thank you very much for your careful work. We have modified this sentence.

      • line 188. '9,853 genes'

      Response: Thank you very much for your careful work. We have modified this sentence.

      • line 200. "Therefore, the differential...in caterpillars" is a strong statement.

      Response: Thank you very much for your careful work. We have modified this sentence.

      • line 204. Remove "The" in front of eight key genes. Also, needs a reference... maybe a recent review on the biochemical pathway of melanin in insects.

      Response: Thank you very much for your careful work. We have modified this sentence.

      • line 220. This sentence is too general and vague. Please explicit what you mean by "in terms of evolution". Number of insect species? Diversity of niche occupancy? Morphological, physiological diversity?

      Response: Thank you very much for your careful work. We have modified this sentence.

      • line 285. The verb "believe" should be replaced by a more neutral one.

      Response: Thank you very much for your careful work. We have modified this sentence.

      • line 354-355. This sentence needs to be rephrased in a more objective way.

      Response: Thank you very much for your careful work. We have rewritten this sentence.

      • line 378. Missing reference for MUSCLE.

      Response: Thank you very much for your careful work. We have modified this sentence.

      • line 379. Pearson model?

      Response: Thank you very much for your careful work. We have modified this sentence.

      • line 408. "The CRISPRdirect online software was used...".

      Response: Thank you very much for your careful work. We have modified this sentence.

      • Figure 1. In the title, I suggest indicating Dazao, bd, bdf as it appears in the figure. Needs to precise 'silkworm larval development'.

      Response: Thank you very much for your careful work. We have modified this figure title.

      • Figure 3. In the title, is the word 'pattern' really necessary? In the legend, please indicate the meaning of the acronyms AMSG and PSG.

      Response: Thank you very much for your careful work. We have modified this figure legend.

      • Figure S7A. Typo 'Znic finger 1', 'Znic finger 2', 'Znic finger 3',

      Response: Thank you very much for your careful work. We have fixed these typos. .

    1. Author Response:

      Reviewer #1 (Public Review):

      Summary:

      The authors identified that genetically and pharmacological inhibition of CERS1, an enzyme implicated in ceramides biosynthesis worsen muscle fibrosis and inflammation during aging.<br /> Strengths:

      The study points out an interesting issue on excluding CERS1 inhibition as a therapeutic strategy for sarcopenia. Overall, the article it's well written and clear.<br /> Weaknesses:

      Many of the experiments confirmed previous published data, which also show a decline of CERS1 in ageing and the generation and characterization of a muscle specific knockout mouse line. The mechanistic insights of how the increased amount of long ceramides (cer c24) and the decreased of shorter ones (cer c18) might influence muscle mass, force production, fibrosis and inflammation in aged mice have not been addressed.

      We thank the reviewer for the assessment and would like to point out that Cers1 had not previously been studied in the context of aging. Moreover, our unbiased pathway analyses in human skeletal muscle implicate CERS1 for the first time with myogenic differentiation, which we validate in cell culture systems. To improve mechanistic insights, as suggested by Reviewer #1, we performed more experiments to gain insights how Cers1 derived c18, and Cers2 derived c24 ceramide species affect myogenesis. We recently showed that knocking out Cers2 reduces c24:0/c24:1 and promotes muscle cell maturation (PMID: 37118545, Fig. 6m-r and Supplementary Fig. 5e). This suggests that the very long chain ceramides c24 might indeed be driving the effect we see upon Cers1 inhibition because we observe an accumulation of c24 ceramides upon Cers1 (c18) inhibition (Fig 2B, Fig 3B, Fig 4A, Fig S3E), which is associated with impaired muscle maturation (Fig 4B-C, Fig S3G-I, Fig S4G-I). To study whether impaired muscle cell differentiation upon Cers1 inhibition is dependent on Cers2, we knocked-down Cers1 alone, or in combination with the knockdown of Cers2. Results show that reduced muscle cell maturation mediated by Cers1KD is rescued by the simultaneous knockdown of Cers2 as shown by gene expression analyses and immunohistochemical validation and quantification. Hence, we believe that reducing Cers1 function during aging might lead to an increase in sphingosine levels as has been shown previously (PMID: 31692231). Increased sphingosine triggers cell apoptosis due to its toxicity (PMID: 12531554). Therefore, channeling accumulating sphingosine towards C24 ceramides may avoid toxicity but, as we show in this manuscript, will reduce the myogenic potential in muscle. However, if also C24 production is blocked by Cers2 inhibition, sphingosine is forced towards the production of other, potentially less toxic or myogenesis-impairing ceramides. We added these new data to the revised manuscript as new Fig 5D-E and new Fig S5G-I.

      Reviewer #2 (Public Review):

      Summary:

      The manuscript by Wohlwend et al. investigates the implications of inhibiting ceramide synthase Cers1 on skeletal muscle function during aging. The authors propose a role for Cers1 in muscle myogenesis and aging sarcopenia. Both pharmacological and AAV-driven genetic inhibition of Cers1 in 18month-old mice lead to reduced C18 ceramides in skeletal muscle, exacerbating age-dependent features such as muscle atrophy, fibrosis, and center-nucleated fibers. Similarly, inhibition of the Cers1 orthologue in C. elegans reduces motility and causes alterations in muscle morphology.<br /> Strengths:

      The study is well-designed, carefully executed, and provides highly informative and novel findings that are relevant to the field.

      Weaknesses:

      The following points should be addressed to support the conclusions of the manuscript.

      (1) It would be essential to investigate whether P053 treatment of young mice induces age-dependent features besides muscle loss, such as muscle fibrosis or regeneration. This would help determine whether the exacerbation of age-dependent features solely depends on Cers1 inhibition or is associated with other factors related to age- dependent decline in cell function. Additionally, considering the reported role of Cers1 in whole-body adiposity, it is necessary to present data on mice body weight and fat mass in P053treated aged-mice.

      We thank the reviewer to suggest that we study Cers1 inhibition in young mice. In fact, a previous study shows that muscle-specific Cers1 knockout in young mice impairs muscle function (PMID: 31692231). Similar to our observation, these authors report reduced muscle fiber size and muscle force. Therefore, we do not believe that our observed effects of Cers1 inhibition in aged mice are specific to aging, although the phenotypic consequences are accentuated in aged mice. As requested by the reviewer, we attached the mice body weights and fat mass (Author response image 1A-B). The reduced fat mass upon P053 treatment is in line with previously reported reductions in fat mass in chow diet or high fat diet fed young mice upon Cers1 inhibition (PMID: 30605666, PMID: 30131496), again suggesting that the effect of Cers1 inhibition might not be specific to aging.

      Author response image 1.

      (A-B) Body mass (A) and Fat mass as % of body mass (B) were measured in 22mo C57BL/6J mice intraperitoneally injected with DMSO or P053 using EchoMRI (n=7-12 per group). (C-D) Grip strengh measurements in all limbs (C) or only the forelimbs (D) in 24mo C57BL/6J mice intramuscularly injected with AAV9 particles containing scramble, or shRNA targeting Cers1 (n=8 per group). (E-F) Pax7 gene expression in P053 or AAV9 treated mice (n=6-7 per group) (E), or in mouse C2C12 muscle progenitor cells treated with 25nM scramble or Cers1 targeting shRNA (n=8 per group) (F). (G) Proliferation as measured by luciferase intensity in mouse C2C12 muscle muscle cells treated with 25nM scramble or Cers1 targeting shRNA (n=24 per group). Each column represents one biological replicate. (H) Overlayed FACS traces of Annexin-V (BB515, left) and Propidium Iodide (Cy5, right) of mouse C2C12 muscle myotubes treated with 25nM scramble or Cers1 targeting shRNA (n=3 per group). Quantification right: early apoptosis (Annexin+-PI-), late apoptosis (Annexin+-PI+), necrosis (Annexin--PI+), viability (Annexin--PI-). (I) Normalized Cers2 gene expression in mouse C2C12 muscle muscle cells treated with 25nM scramble or Cers1 targeting shRNA (n=6-7 per group). (J-K) Representative mitochondrial respiration traces of digitonin-permeablized mouse C2C12 muscle muscle cells treated DMSO or P053 (J) with quantification of basal, ATP-linked, proton leak respiration as well as spare capacity and maximal capacity linked respiration (n=4 per group). (L) Reactive oxygen production in mitochondria of mouse C2C12 muscle muscle cells treated DMSO or P053. (M) Enriched gene sets related to autophagy and mitophagy in 24mo C57BL/6J mouse muscles intramuscularly injected with AAV9 particles containing scramble, or shRNA targeting Cers1 (left), or intraperitoneally injected with DMSO or P053 (right). Color gradient indicates normalized effect size. Dot size indicates statistical significance (n=6-8 per group). (N) Representative confocal Proteostat® stainings with quantifications of DMSO and P053 treated mouse muscle cells expressing APPSWE (top) and human primary myoblasts isolated from patients with inclusion body myositis (bottom). (O) Stillness duration during a 90 seconds interval in adult day 5 C. elegans treated with DMSO or 100uM P053. (P) Lifespan of C. elegans treated with DMSO or P053. (n=144-147 per group, for method details see main manuscript page 10).

      (2) As grip and exercise performance tests evaluate muscle function across several muscles, it is not evident how intramuscular AAV-mediated Cers1 inhibition solely in the gastrocnemius muscle can have a systemic effect or impact different muscles. This point requires clarification.

      The grip strength measurements presented in the manuscript come from hindlimb grip strength, as pointed out in the Methods section. We measured grip strength in all four limbs, as well as only fore- (Author response image 1C-D). While forelimb strength did not change, only hindlimb grip strength was significantly different in AAV-Cers1KD compared to the scramble control AAV (Fig 3I), which is in line with the fact that we only injected the AAV in the hindlimbs. This is similar to the effect we observed with our previous data where we saw altered muscle function upon IM AAV delivery in the gastrocnemius (PMID: PMID: 34878822, PMID: 37118545). The gastrocnemius likely has the largest contribution to hindlimb grip strength given its size, and possibly even overall grip strength as suggested by a trend of reduced grip strength in all four limbs (Author response image 1C). We also suspect that the hindlimb muscles have the largest contribution to uphill running as we could also see an effect on running performance. While we carefully injected a minimal amount of AAV into gastrocnemius to avoid leakage, we cannot completely rule out that some AAV might have spread to other muscles. We added this information to the discussion of the manuscript as a potential limitation of the study.

      (3) To further substantiate the role of Cers1 in myogenesis, it would be crucial to investigate the consequences of Cers1 inhibition under conditions of muscle damage, such as cardiotoxin treatment or eccentric exercise.<br /> While it would be interesting to study Cers1 in the context of muscle regeneration, and possibly mouse models of muscular dystrophy, we think such work would go beyond the scope of the current manuscript.

      (4) It would be informative to determine whether the muscle defects are primarily dependent on the reduction of C18-ceramides or the compensatory increase of C24-ceramides or C24-dihydroceramides.

      To improve mechanistic insights, as suggested by Reviewer #2, we performed more experiments to gain insights how Cers1 derived c18, and Cers2 derived c24 ceramide species affect myogenesis. We recently showed that knocking out Cers2 reduces c24:0/c24:1 and promotes muscle cell maturation (PMID: 37118545, Fig. 6m-r and Supplementary Fig. 5e). This suggests that the very long chain ceramides c24 might indeed be driving the effect we see upon Cers1 inhibition because we observe an accumulation of c24 ceramides upon Cers1 (c18) inhibition (Fig 2B, Fig 3B, Fig 4A, Fig S3E), which is associated with impaired muscle maturation (Fig 4B-C, Fig S3G-I, Fig S4G-I). To study whether impaired muscle cell differentiation upon Cers1 inhibition is dependent on Cers2, we knocked-down Cers1 alone, or in combination with the knockdown of Cers2. Results show that reduced muscle cell maturation mediated by Cers1KD is rescued by the simultaneous knockdown of Cers2 as shown by gene expression analyses and immunohistochemical validation and quantification. We added these data to the manuscript as new Fig 5D-E, new Fig S5G-I. These data, together with our previous results showing that Degs1 knockout reduces myogenesis (PMID: 37118545, Fig. 6s-x and Fig. 7) suggest that C24/dhC24 might contribute to the age-related impairments in myogenesis. We added the new results to the revised manuscript.

      (5) Previous studies from the research group (PMID 37118545) have shown that inhibiting the de novo sphingolipid pathway by blocking SPLC1-3 with myriocin counteracts muscle loss and that C18-ceramides increase during aging. In light of the current findings, certain issues need clarification and discussion. For instance, how would myriocin treatment, which reduces Cers1 activity because of the upstream inhibition of the pathway, have a positive effect on muscle? Additionally, it is essential to explain the association between the reduction of Cers1 gene expression with aging (Fig. 1B) and the age-dependent increase in C18-ceramides (PMID 37118545).

      Blocking the upstream enzyme of the ceramide pathway (SPT1) shuts down the entire pathway that is overactive in aging, and therefore seems beneficial for muscle aging. While most enzymes in the ceramide pathway that we studied so far (SPTLC1, CERS2) revealed muscle benefits in terms of myogenesis, inflammation (PMID: 35089797; PMID: 37118545) and muscle protein aggregation (PMID: 37196064), the CERS1 enzyme shows opposite effects. This is also visible in the direction of CERS1 expression compared to the other enzymes in one of our previous published studies (PMID: 37118545, Fig. 1e and Fig. 1f). In the current study, we show that Cers1 inhibition indeed exacerbates age-related myogenesis and inflammation as opposed to the inhibition of Sptlc1 or Cers2. As the reviewer points out, both C18- and C24-ceramides seem to accumulate upon muscle aging. We think this is due to an overall overactive ceramide biosynthesis pathway. Blocking C18-ceramides via Cers1 inhibition results in the accumulates C24-ceramides and worsens muscle phenotypes (see reply to question #4). On the other hand, blocking C24-ceramides via Cers2 inhibition improves muscle differentiation. These observations together with the finding that Cers1 mediated inhibition of muscle differentiation is dependent on proper Cers2 function (new Fig 5D-E, new Fig S5G-I) points towards C24-ceramides as the main culprit of reduced muscle differentiation. Hence, at least a significant part of the benefits of blocking SPTLC1 might have been related to reducing very long-chain ceramides. We believe that reduced Cers1 expression in skeletal muscle upon aging, observed by us and others (PMID: 31692231), might reflect a compensatory mechanism to make up for an overall overactive ceramide flux in aged muscles. Reducing Cers1 function during aging might lead to an increase in sphingosine levels as has been shown previously (PMID: 31692231). Increased sphingosine triggers cell apoptosis due to its toxicity (PMID: 12531554). Therefore, channeling accumulating sphingosine towards C24 ceramides may avoid toxicity but, as we show in this manuscript, will reduce the myogenic potential in muscle. However, if also C24 production is blocked by Cers2 inhibition (new Fig 5E-D, new Fig S5G-I), sphingosine is forced towards the production of other, potentially less toxic, or myogenesis-impairing ceramides. These data are now added to the revised manuscript (see page 7). Details were added to the discussion of the manuscript (see page 8).

      Addressing these points will strengthen the manuscript's conclusions and provide a more comprehensive understanding of the role of Cers1 in skeletal muscle function during aging.

      Reviewer #1 (Recommendations For The Authors):

      The authors identified that genetical and pharmacological inhibition of CERS1, an enzyme implicated in ceramides biosynthesis worsen muscle fibrosis and inflammation during aging.

      Even though many of the experiments only confirmed previous published data (ref 21, 11,37,38), which also show a decline of CERS1 in ageing and the generation and characterization of a muscle specific knockout mouse line, the study points out an interesting issue on excluding CERS1 inhibition as a therapeutic strategy for sarcopenia and opens new questions on understanding how inhibition of SPTLC1 (upstream CERS1) have beneficial effects in healthy aging (ref 15 published by the same authors).

      Overall, the article it's well written and clear. However, there is a major weakness. The mechanistic insights of how the increased amount of long ceramides (c24) and the decreased of shorter ones (cer c18) might influence muscle mass, force production, fibrosis and inflammation in aged mice have not been addressed. At the present stage the manuscript is descriptive and confirmatory of CERS1 mediated function in preserving muscle mass. The authors should consider the following points:

      Comments:

      (1) Muscle data

      (a) The effect of CERS1 inhibition on myotube formation must be better characterized. Which step of myogenesis is affected? Is stem cell renewal or MyoD replication/differentiation, or myoblast fusion or an increased cell death the major culprit of the small myotubes? Minor point: Figure S1C: show C14:00 level at 200 h; text of Fig S2A and 1F: MRF4 and Myogenin are not an early gene in myogenesis please correct, Fig S2B and 2C: changes in transcript does not mean changes in protein or myotube differentiation and therefore, authors must test myotube formation and myosin expression.

      Cers1 inhibition seems to affect differentiation and myoblast fusion. To test other suggested effects we performed more experiments as delineated. Inhibiting Cers1 systemically with the pharmacological inhibitor of Cers1 (P053) or with intramuscular delivery of AAV expressing a short hairpin RNA (shRNA) against Cers1 in mice did not affect Pax7 transcript levels (Author response image 1E). Moreover, we did also not observe an effect of shRNA targeting Cers1 on Pax7 levels in mouse C2C12 muscle progenitor cells (Author response image 1F). To characterize the effect of Cers1 inhibition on muscle progenitor proliferation/renewal, we used scramble shRNA, or shRNA targeting Cers1 in C2C12 muscle progenitors and measured proliferation using CellTiter-Glo (Promega). Results showed that Cers1KD had no significant effect on cell proliferation (Author response image 1G). Next, we assayed cell death in differentiating C2C12 myotubes deficient in Cers1 using FACS Analysis of Annexin V (left) and propidium iodide (right). We found no difference in early apoptosis, late apoptosis, necrosis, or muscle cell viability, suggesting that cell death can be ruled out to explain smaller myotubes (Author response image 1H). These findings support the notion that the inhibitory effect of Cers1 knockdown on muscle maturation are primarily based on effects on myogenesis rather than on apoptosis. Our data in the manuscript also suggests that Cers1 inhibition affects myoblast fusion, as shown by reduced myonucleation upon Cers1KD (Fig S3H right, Fig S5I).

      (b) The phenotype of CESR1 knockdown is milder than 0P53 treated mice (Fig S5D and Figure 3F, 3H are not significant) despite similar changes of Cer18:0, Cer24:0, Cer 24:1 concentration in muscles . Why?

      Increases in very long chain ceramides were in fact larger upon P053 administration compared to AAVmediated knockdown. For example, Cer24:0 levels increased by >50% upon P053 administration, compared to 20% by AAV injections. Moreover, dhC24:1 increased by 6.5-fold vs 2.5-fold upon P053 vs AAV treatment, respectively. These differences might not only explain the slightly attenuated phenotypes in the AA- treated mice but also underlines the notion that very long chain ceramides might cause muscle deterioration. We believe inhibiting the enzymatic activity of Cers1 (P053) as compared to degrading Cers1 transcripts is a more efficient strategy to reduce ceramide levels. However, we cannot completely rule out multi-organ, systemic effects of P053 treatment beyond its direct effect on muscle. We added these details in the discussion of the revised manuscript (see page 8 of the revised manuscript).

      (c) The authors talk about a possible compensation of CERS2 isoform but they never showed mRNA expression levels or CERS2 protein levels aner treatment. Is CERS2 higher expressed when CERS1 is downregulated in skeletal muscle?

      We appreciate the suggestion of the reviewer. We found no change in Cers2 mRNA levels upon Cers1 inhibition in mouse C2C12 myoblasts (Author response image 1I). We would like to point out that mRNA abundance might not be the optimal measurement for enzymes due to enzymatic activities. Therefore, we think metabolite levels are a better proxy of enzymatic activity. It should also be pointed out that “compensation” might not be an accurate description as sphingoid base substrate might simply be more available upon Cers1KD and hence, more substrate might be present for Cers2 to synthesize very long chain ceramides. This “re-routing” has been previously described in the literature and hypothesized to be related to avoid toxic (dh)sphingosine accumulation (PMID: 30131496). Therefore, we changed the wording in the revised manuscript to be more precise.

      (d) Force measurement of AAV CERS1 downregulated muscles could be a plus for the study (assay function of contractility)

      In the current study we measured grip strength in mice, which had previously been shown to be a good proxy of muscle strength and general health (PMID: 31631989). Indeed, our results of reduced muscle grip strength are in line with previous work that shows reduced contractility in muscles of Cers1 deficient mice (PMID: 31692231).

      (e) How are degradation pathways affected by the downregulation of CERS1. Is autophagy/mitophagy affected? How is mTOR and protein synthesis affected? There is a recent paper that showed that CerS1 silencing leads to a reduction in C18:0-Cer content, with a subsequent increase in the activity of the insulin pathway, and an improvement in skeletal muscle glucose uptake. Could be possible that CERS1 downregulation increases mTOR signalling and decreases autophagy pathway? Autophagic flux using colchicine in vivo would be useful to answer this hypothesis

      Cers1 in skeletal muscle has indeed been linked to metabolic homeostasis (see PMID: 30605666). In line with their finding in young mice we also find reduced fat mass upon P053 treatment in aged mice (Author response image 1A-B). We also looked into mitochondrial bioenergetics upon blocking Cers1 with P053 treatment using an O2k oxygraphy (Author response image 1J-L). Results show that Cers1 inhibition in mouse muscle cells increases mitochondrial respiration, similar to what has been shown before (PMID: 30131496). However, we also found that reactive oxygen species production in mouse muscle cells is increased upon P053 treatment, suggesting the presence of dysfunctional mitochondria upon inhibiting Cers1 with P053.We next looked into the mitophagy/autophagy degradation pathways suggested by the reviewer and do not find convincing evidence supporting that Cers1 has a major impact on autophagy or mitophagy derived gene sets in mice treated with shRNA against Cers1, or the Cers1 pharmacological inhibitor P053 (Author response image 1M).

      We then assessed the effect of Cers1 inhibition on transcripts levels related to the mTORC1/protein synthesis, as suggested by the reviewer. Cers1 knockdown in differentiating mouse muscle cells showed only a weak trend to reduce mTORC1 and its downstream targets (new Fig S4A). In line with this, there was no notable difference in protein synthesis in differentiating, Cers1 deficient mouse C2C12 myoblasts as assessed by L-homopropargylglycine (HPG) amino acid labeling using confocal microscopy (new Fig S4B) or FACS analyses (new Fig S4C). However, Cers1KD increased transcripts related to the myostatin-Foxo1 axis as well as the ubiquitin proteasome system (e.g. atrogin-1, MuRF1) (new Fig S4D), suggesting Cers1 inhibition increases protein degradation. We added these details to the revised manuscript on page 7. We recently implicated the ceramide pathway in regulating muscle protein homeostasis (PMID: 37196064). Therefore, we assessed the effect of Cers1 inhibition with the P053 pharmacological inhibitor on protein folding in muscle cells using the Proteostat dye that intercalates into the cross-beta spine of quaternary protein structures typically found in misfolded and aggregated proteins. Interestingly, inhibiting Cers1 further increased misfolded proteins in C2C12 mouse myoblasts expressing the Swedish mutation in APP and human myoblasts isolated from patients with inclusion body myositis (Author response imageure 1N). These findings suggest that deficient Cers1 might upregulate protein degradation to compensate for the accumulation of misfolded and aggregating proteins, which might contribute to impaired muscle function observed upon Cers1 knockdown. Further studies are needed to disentangle the underlying mechanstics.

      (f) The balances of ceramides have been found to play roles in mitophagy and fission with an impact on cell fate and metabolism. Did the authors check how are mitochondria morphology, mitophagy or how dynamics of mitochondria are altered in CERS1 knockdown muscles? (fission and fusion). There is growing evidence relating mitochondrial dysfunction to the contribution of the development of fibrosis and inflammation.

      Previously, CERS1 has been studied in the context of metabolism and mitochondria (for reference, please see PMID: 26739815, PMID: 29415895, PMID: 30605666, PMID: 30131496). In summary, these studies demonstrate that C18 ceramide levels are inversely related to insulin sensitivity in muscle and mitochondria, and that Cers1 inhibition improves insulin-stimulated suppression of hepatic glucose production and reduced high-fat diet induced adiposity. Moreover, improved mitochondrial respiration, citrate synthase activity and increased energy expenditure were reported upon Cers1 inhibition. Lack of Cers1 specifically in skeletal muscle was also reported to improve systemic glucose homeostasis. While these studies agree on the effect of Cers1 inhibition on fat loss, results on glucose homeostasis and insulin sensitivity differ depending on whether a pharmacologic or a genetic approach was used to inhibit Cers1. The current manuscript describes the effect of CERS1 on muscle function and myogenesis because these were the most strongly correlated pathways with CERS1 in human skeletal muscle (Fig 1C) and impact of Cers1 on these pathways is poorly studied, particularly in the context of aging. Therefore, we would like to refer to the mentioned studies investigating the effect of CERS1 on mitochondria and metabolism.

      (2) C.elegans data:

      (a) The authors checked maternal RNAi protocol to knockdown lagr-1 and showed alteration of muscle morphology at day 5. They also give pharmacological exposure of P053 drug at L4 stage. Furthermore, the authors also used a transgenic ortholog lagr-1 to perform the experiments. All of them were consistent showing a reduced movement. It would be important to show rescue of the muscle phenotype by overexpressing CERS1 ortholog in knockdown transgenic animals.

      We used RNAi to knockdown the Cers1 orthologue, lagr-1, in C.elegans. Therefore, we do not have transgenic animals. Overexpressing lagr-1 in the RNAi treated animals would also not be possible as the RNA from the overexpression would just get degraded.

      (b) The authors showed data about distance of C.elegans. It would be interesting to specify if body bends, reversals and stillness are affected in RNAi and transgenic Knockdown worms.

      As suggested, we measured trashing and stillness as suggested by the reviewer and found reduced trashing (new Fig S5B) and a trend towards an increase in stillness (Author response image 1O) in P053 treated worms on day 5 of adulthood, which is the day we observed significant differences in muscle morphology and movement (Fig 4D-E, Fig S5A). These data are now included in the revised manuscript.

      (c) Is there an effect on lifespan extension by knocking down CERS1?

      We performed two independent lifespan experiments in C.elegans treated with the Cers1 inhibitor P053 and found reduced lifespan in both replicate experiments (for second replicate, see Author response image 1P). We added these data to the revised manuscript as new Fig 4H.

      How do the authors explain the beneficial effect of sptlc1 inhibition on healthy aging muscle? Discuss more during the article if there is no possible explanation at the moment.

      We believe that blocking the upstream enzyme of the ceramide pathway (SPT1) shuts down the entire pathway that is overactive in aging, and therefore is more beneficial for muscle aging. Our current work suggests that at least a significant part of Sptlc1-KD benefits might stem from blocking very long chain ceramides. While SPTLC1 and CERS2 revealed muscle benefits in terms of myogenesis, inflammation (PMID: 35089797; PMID: 37118545) and muscle protein aggregation (PMID: 37196064), the CERS1 enzyme shows opposite effects, which is also visible in Fig 1e and Fig 1f of PMID: 37118545. In the current study, we show that Cers1 inhibition indeed exacerbates aging defects in myogenesis and inflammation as opposed to the inhibition of Sptlc1 or Cers2. The fact that the effect of Cers1 on inhibiting muscle differentiation is dependent on the clearance of Cers2-derived C24-ceramides suggests that reducing very long chain ceramides might be crucial for healthy muscle aging. We added details to the discussion.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Weaknesses:

      (1). Analysis of transcript expression is limited to the CT-peptide encoding gene, while no gene expression analysis was attempted for the three identified receptors. Differences in the activation of downstream signaling pathways between the three receptors are also questionable due to unclarities in the statistical analysis and variation in the control and experimental data in heterologous assays. Together, this makes it difficult to propose a mechanism underlying differences in the functions of the two CT-like peptides in muscle control and growth regulation.

      We appreciate the reviewer's rigorous critique. The manuscript has been comprehensively revised as follows:

      (1) For the expression analysis of the three identified receptors, the updated results are presented in Figure 5, with the detailed descriptions in Results section 2.4 (line 287-290) and Materials and Methods section 4.5 (line 767).

      (2) For the statistical tests and methodological clarity, statistical tests were indeed performed for all experiments. However, we acknowledge that the original labeling methods required enhanced methodological clarity, and we apologize for any confusion caused. All figures have been revised to improve the visibility of differences, and statistical test information has been added to both the figure legends and the Materials and methods section “4.10 Statistical Analysis” (line 900-910).

      (3) For the variation in the control and experimental data, the minor observed variations in control conditions across experiments primarily arise from two methodological factors: 1) Each experimental set used cells transfected with distinct receptor subtypes (e.g., AjPDFR1 vs. AjPDFR2), inherently introducing baseline variability due to differential receptor expression profiles. 2) Independent cell culture batches were employed for replicate experiments to ensure biological reproducibility.  Importantly, these minor variations ‌did not compromise‌ the statistical significance of downstream signaling differences (p < 0.01 for all comparative analyses). Therefore, differences in the activation of downstream signaling pathways between the three receptors are reliable.

      (2) The authors also suggest a putative orexigenic role for the CT-like peptidergic system in feeding behavior. This effect is not well supported by the experimental data provided, as no detailed analysis of feeding behavior was carried out (only indirect measurements were performed that could be influenced by other peptidergic effects, such as on muscle relaxation) and no statistically significant differences were reported in these assays.

      Thank you for the reviewer’s valuable comments. Our revised manuscript now includes the following multidimensional analyses to strengthen evidence of the orexigenic role of AjCT2: Firstly, in sea cucumbers, the mass of remaining bait is a common indicator of feeding condition. After long-term AjCT2 injection, this value was significantly decreased in comparison with control group during phase V (Figure 8A-figure supplement 1), which indicates that AjCT2 promotes feeding in A. japonicus. Correspondingly, in long-term loss-of-function experiments (newly added in the revised manuscript), the remaining bait in the siAjCTP1/2-1 group was significantly increased in comparison with siNC group form phase II to IV (Figure 10B). The detailed descriptions of these supplementary experiments have been added to‌ Results Section 2.6 (lines 390-396) and Materials and Methods Section 4.9 (line 879-888).

      Secondly, after 24 days of continuous injections of siAjCTP1/2-1, we monitored the feeding behavior of these sea cucumbers over three consecutive days. Each day, we removed residual bait and feces, then repositioned fresh food at the tank center.‌ We calculated the aggregation percentage (AP) of sea cucumbers around the food during the feeding peak (2:00-4:00) each day, which is the most reliable indicator of feeding behavior in this species‌. The results showed that the AP in siAjCTP1/2-1 group was significantly lower than that in control group. Post-dissection observations revealed reduced intestinal food content and significant intestinal degeneration in the siAjCTP1/2-1 group (The figure has been added below). These results indicate that long-term functional loss of AjCT2 reduces food intake and influences the feeding behavior of A. japonicus.

      In response to the comment regarding “No statistically significant differences were reported in these assays”, we have modified the figures to clearly visualize the differences and added statistical test details in both the figure legends and the Materials and methodssection “4.10 Statistical analysis” (lines 900–910).

      Author response image 1.

      The feeding behavior of A. japonicus after long-term loss-of-function of AjCT2. (A) A record of feeding behavior. The red arrow refers to the food and the red box represents the feeding area. The numbers in the figure represent individuals entering into the feeding area. (B) The aggregation percentage (AP) of sea cucumbers around the food during the feeding peak (2:00-4:00) (n=3 days). (C) The degenerated intestine of sea cucumber after 24 days of siAjCTP1/2-1 injection. Data in the graph represent the mean ± standard deviation. *Significant differences between groups (p < 0.05). Control: siNC injection group; CT-SiRNA: siAjCTP1/2 injection group.<br />

      (3) Overall, details regarding statistical analyses are not (clearly) specified in the manuscript, and there are several instances where statements are not supported by literature evidence.

      Thank you for the reviewer’s comments. Again, we sincerely apologize for the confusion caused. To clarify, statistical tests were performed for all experiments. However, the original labeling may have been somewhat messy. We have revised all figures to enhance the visibility of differences and provided detailed statistical test information in both the figure legends and the Materials and Methods section titled “4.10 Statistical Analysis” (lines 900–910). Additionally, we have supplemented the revised manuscript with further literature evidence to support our statements: (1) citation to Furuya et al. (2000), Johnson et al. (2005), Jékely (2013) and Mirabeau et al. (2013) have been added to clarify the foundation studies on DH31 and DH31 receptors in invertebrates (line 73-74); (2) Conzelmann et al. (2013) and Furuya et al. (2000) were cited to validate the present of two different types of CT-related peptides in protostomes: CT-type peptides (with an N-terminal disulphide bridge) and DH31-type peptides (lacking this feature) (line 78-79); (3) Johnson et al. (2005) was referenced to support the dual ligand-receptor interactions of DH31 in Drosophila, specifically its binding to both CG17415 (a CTR/CLR-related protein) and CG13758 (the PDF receptor)  (line 94); (4) Johnson et al. (2005) and Goda et al. (2019) were cited to reinforce the functional significance of dual DH31 receptor pathways in Drosophila, as extensively studied in prior research (line 95-97).

      Reviewer #2 (Public review):

      Weaknesses:

      (1) The authors claim that A. japonicus CTs activate "PDF" receptors and suggest that this cross-talk is evolutionarily ancient since a similar phenomenon also exists in the fly Drosophila melanogaster. These conclusions are not fully supported for several reasons. The authors perform phylogenetic analysis to show that the two "PDF" receptors form an independent clade. This clade is sister to the clade comprising CT receptors. This phylogenetic analysis suffers from several issues. Firstly, the phylogenies lack bootstrap support. Secondly, the resolution of the phylogeny is poor because representative members from diverse phyla have not been included. For instance, insect or other protostomian PDF receptors have not been included so how can the authors distinguish between "PDF" receptors or another group of CT receptors? Thirdly, no in vivo evidence has been presented to support that CT can activate "PDF" receptors in vivo.

      We thank the reviewers for their constructive comments. As suggested, ‌we expanded our taxon sampling to include more representative members across diverse phyla‌ and reanalyzed the phylogenetic relationships (including bootstrap tests) in Figure 1C. The revised analysis revealed two distinct clades‌: one containing CTR/CLR-type receptors and the other PDF-type receptors. Specifically, AjCTR clustered within the CTR/CLR-type receptor group, while AjPDFR1 and AjPDFR2 were placed in the PDF-type receptor clade. The full species names for all taxa were provided in the Supplementary Table 2.

      To provide in vivo evidence supporting CT-mediated activation of "PDF" receptors‌, we conducted the following experiments: Firstly, we confirmed that AjPDFR1 and AjPDFR2 were the functional receptors of AjCT1 and AjCT2 (Figure 2, 3 and 4). Secondly, injection of AjCT2 and siAjCTP1/2-1 in vivo induced corresponding changes in AjPDFR1 and AjPDFR2 expression levels in the intestine (Figure 8C, 9A, 9B and 9C).

      (2) The source of CT which mediates the effects on longitudinal muscles and intestine is unclear. Is it autocrine or paracrine signaling by CT from the same tissue or is it long-range hormonal signaling?

      Thank you for this feedback. We have now analysed CT-type neuropeptide expression in A. japonicus using immunohistochemistry with the antiserum to the A. rubens CT-type peptde ArCT, which has previously been shown to cross-react with CT-type neuropeptides in other echinoderms (Aleotti et al., 2022). We have added related descriptions in the following sections: Results (section 2.4, line 299-336), Discussion (section 3.3, line 545-554) and Materials and methods (section 4.6, line 785-817). Consistent with this previous finding, the ArCT antiserum labelled neuronal cells and fibers in the central and peripheral nervous system and in the digestive system of A. japonicus (Figure 6). The specificity of immunostaining was confirmed by performing pre-absorption tests with the ArCT antigen peptide (Figure 6-figure supplement 1). The detection of immunostaining in the innervation of the intestine is consistent with PCR results and the relaxing effect of AjCT2 on intestine preparations. Interestingly, no immunostaining was observed in longitudinal muscle, which is inconsistent with the detection of AjCT1/2 transcripts in this tissue. This may reflect differences in the sensitivity of the methods employed to detect transcripts (PCR) and mature peptide (immunohistochemistry). The absence of ArCT-like immunoreactivity in the longitudinal muscles suggests that AjCT1 and AjCT2 may exert relaxing effects on this tissue in vivo via hormonal signaling mechanisms. However, because AjCT1/2 expression in the longitudinal muscles may be below the detection threshold of the ArCT antibodies, we can’t rule out the possibility that AjCT1/2 are released within the longitudinal muscles physiologically.   

      (3) Pharmacology experiments showing the effects of CT1 and CT2 on ACh-induced contractions were performed. Sample traces have been provided but no traces with ACh alone have been included. How long do ACh-induced contractions persist? These controls are necessary to differentiate between the eventual decay of ACh effects and relaxation induced by CT1 and CT2. The traces also do not reflect the results portrayed in dose-response curves. For instance, in Figure 6B, maximum relaxation is reported for 10-6M. Yet, the trace hardly shows any difference before and after the addition of 10-6M peptide. The maximum effect in the trace appears to be after the addition of 10-8M peptide.

      Thank you for the reviewer’s comments. ‌As requested, we have included representative traces of ACh-induced contraction of longitudinal muscle and intestinal preparations (Figure 7—figure supplement 1B and 1C). Notably, the positive control (ACh) maintained contraction effects for at least 15 minutes‌, consistent with its known pharmacological properties. Regarding Figure 7B (previous Figure 6B), ‌the trace illustrates the cumulative effects of successive neuropeptide treatments at increasing concentrations‌. A gradual reduction in response amplitude was observed at the highest peptide concentration, ‌likely reflecting receptor desensitization‌, a phenomenon previously reported for neuropeptide Y and oxytocin (Tsurumaki et al., 2003; Arrowsmith and Wray, 2014). These results are now explicitly described in the Results Section 2.5 (lines 340-345 and 348-352) and discussed in Section 3.3 (lines 569-574). In response to the reviewer’s suggestion‌, we further tested the pharmacological effects of AjCT2 at 10⁻⁶ M. ‌As shown in Figure 7—figure supplement 1A, this concentration induced maximal relaxation‌, confirming its dose-dependent efficacy.

      (4) I am unsure how differences in wet mass indicate feeding and growth differences since no justification has been provided. Couldn't wet mass also be influenced by differences in osmotic balance, a key function of calcitonin-like peptides in protostomian invertebrates? The statistical comparisons have not been included in Figure 7B.

      We appreciate the reviewer's insightful comments. We fully concur that wet mass constitutes an inadequate indicator for evaluating feeding and growth variations. Consequently, we reassessed A. japonicus growth parameters using two established metrics: weight gain rate (WGR) and specific growth rate (SGR), to delineate differences between experimental and control groups. Notably, the high-concentration AjCT2 injection group exhibited statistically significant increases in both WGR and SGR relative to controls (Figure 8A). This demonstrates a putative physiological role of AjCT2 signaling in enhancing feeding efficiency and growth performance in A. japonicus. Detailed methodologies are provided in the Materials and methods Section 4.8 (lines 847-851), with corresponding results presented in the Results Section 2.6 (lines 370-375). Besides, Cong et al., (2024) reported holotocin-induced osmoregulatory function in A. japonicus, manifested by significant wet weight elevation and body bloating. However, our AjCT2 intervention showed no such phenotypic alterations, suggesting that AjCT2 likely does not participate in osmotic balance regulation, at least under these experimental conditions. Crucially, the observed WGR and SGR enhancements following AjCT2 administration was not caused by osmoregulatory effects.

      (5) While the authors succeeded in knocking down CT, the physiological effects of reduced CT signaling were not examined.

      Thank you for the reviewer’s comment. We have supplemented the experiments to investigate the physiological effects of long-term reduced CT signaling following the reviewer’s suggestions, including measuring the dry weight of remaining bait and excrement, calculating the weight gain rate and specific growth rate, and testing the expression levels of three growth factors (AjMegf6, AjGDF-8 and AjIgf) to further assess AjCT2’s role in feeding and growth. The results demonstrated that weight gain rate and specific growth rate in the siAjCTP1/2-1 group were significantly decreased (As shown in Figure 10A). Correspondingly, except in phase I, the siAjCTP1/2-1 group exhibited a significant increase in remaining bait and a decrease in excrement during phases II-VI (Figure 10B). Furthermore, the growth inhibitory factor AjGDF-8 was significantly up-regulated and the growth promoting factor AjMegf6 was significantly down-regulated in siAjCTP1/2-1 group (Figure 10C). These findings further support the potential physiological role of AjCT2 signaling in promoting feeding and growth in A. japonicus. The added results are presented in Figure 10, with related descriptions in Section 2.6 (Results, lines 390-396), Section 3.4 (Discussion, line 597-603) and Section 4.9 (Materials and Methods, lines 879-888).

      Reviewer #1 (Recommendations for the authors):

      (1) The abstract states that loss-of-function tests (RNAi knockdown) reveal a potential physiological role for AjCT2 signaling in promoting feeding and growth in A. japonicus. However, RNAi knockdown was only followed by analysis of transcript expression of CT-like receptors and not by the assessment of feeding or growth.

      Thank you for this helpful feedback. In the revised manuscript, we have supplemented the experiments to investigate the physiological effects of long-term reduced CT signaling, as suggested by the reviewer. These include measuring the dry weight of remaining bait and excrement, calculating the weight gain rate and specific growth rate, and testing the expression levels of the three growth factors (AjMegf6, AjGDF-8 and AjIgf) to further assess the function of AjCT2 on feeding and growth in A. japonicus. The results are as follows:

      (1) The weight gain rate and specific growth rate in the siAjCTP1/2-1 group were significantly decreased (As shown in Figure 10A).

      (2) Correspondingly, except for the phase I, the siAjCTP1/2-1 group had significantly increased remaining bait and decreased excrement during phases II-VI (Figure 10B).

      (3) The growth inhibitory factor AjGDF-8 was significantly up-regulated, while the growth promoting factor AjMegf6 was significantly down-regulated in the siAjCTP1/2-1 group (Figure 10C).

      These findings further support the potential physiological role of AjCT2 signaling in promoting feeding and growth in A. japonicus. We have incorporated these results into ‌Figure 10‌ and added related descriptions in the following sections: Results (section 2.6, line 390-396), Discussion (section 3.4, line 597-603) and Materials and methods (section 4.9, line 879-888).

      Regarding the original statement in the abstract “Furthermore, in vivo pharmacological experiments and loss-of-function tests revealed a potential physiological role for AjCT2 signaling in promoting feeding and growth in A. japonicus.” This sentence effectively summarizes our findings. Therefore, we have retained it in the revised manuscript while supplementing the missing experimental details as requested.

      (2) Information on the statistical tests that were performed is lacking for most experiments. It is recommended to include this information in the figure legends, in addition to the methods section. Details on the phylogenetic analysis (parameters and statistics used) and calculation of half maximal effective concentrations (calculation methods and confidence intervals) also need to be included in the manuscript.

      Thank you for this constructive feedback. As the reviewer suggested, statistical test information‌ has been incorporated into both the figure legends and the “4.10 Statistical Analysis” subsection of the Materials and methods (lines 900-910). Specifically:

      (1)Phylogenetic analysis details‌ (parameters and statistical approaches) are now provided in the Materials and methods section 4.2 (line 675-682);

      (2) Bootstrap test results‌ supporting the phylogenetic trees have been added to Figure 1B and 1C‌;

      (3)Half-maximal effective concentration (EC₅₀) calculations‌, including methodologies and confidence intervals, are documented in both the Figure 2B legend and the “4.10 Statistical Analysis” section (lines 900-910)‌‌.

      (3) In some figures (e.g. Figure 5A, 7A), the n number indicated does not match the number of data points shown in the figure panel. It is not clear what n represents here. In Figure 6B, an x-axis label is missing. In some figure legends (e.g. Figure 4 - Figure Supplement 1), the error bars and significance levels are not defined.

      We apologize for this error; we have corrected all quantity errors related to "n" in the manuscript’ figure legends. And also, the x-axis label was added in Figure 7B (previous Figure 6B), error bars and significance levels were defined in all figure legends clearly

      (4) It would be useful to explain what the difference is between the Cre and SRE luciferase assay and why these two assays were used to study receptor-activated signaling cascades. The source of the synthetic peptides is mentioned, but it is recommended to also state the purity of the synthetic peptides.

      Thank you for the valuable comments. As stated in the introduction (line 66-69)- “binding of CT to CTR in the absence of RAMPs can activate signaling via several downstream pathways, including cAMP accumulation, Ca<sup>2+</sup> mobilization, and ERK activation.” Based on this established mechanism, we selected ‌cAMP and Ca²⁺ signaling pathways‌ as biomarkers for studying receptor-activated cascades, with the following experimental rationale: CRE-Luc Reporter System functions as a cAMP response element detector and SRE-Luc Reporter System serves as an intracellular Ca²⁺ level indicator. In CRE-Luc detection, when the receptor is activated by a ligand, it couples with Gαs protein to activate the cAMP/PKA signaling pathway. The accumulation of cAMP can lead to the phosphorylation of PKA, and then enhance the transcription of CRE-containing genes. Therefore, significant increase in CRE-Luc activity directly correlates with cAMP accumulation. Similarly, SRE-Luc activity reflects dynamic changes in intracellular Ca<sup>2+</sup> levels. We have added the explanation of this part in the materials and methods section 4.4 (line 715-721). The purity of the synthetic peptides was >95%, and we have also added this information in section 4.4 (line 715) according to the reviewer’s suggestion.

      (5) In Figure 3B, it is difficult to see receptor internalization in response to the application of synthetic CT-like peptides, and a control condition (without peptide application) is lacking.

      Thank you for the reviewer’s comment. The control condition (without peptide application) was added in Figure 3-figure supplement 1, which shows the localization of pEGFP-N1/receptors in the cell membrane. Upon stimulation with synthetic CT-like peptides (‌Materials and methods section 2.3‌), the receptors exhibit clear internalization into the cytoplasm, as visualized in ‌Figure 3B‌ through comparative analysis.

      (6) Differences in the activation of downstream signaling cascades between the three receptors are questionable because there is substantial variation in the experimental data and control conditions in different experiments (for example, in Figures 3A and 4A). To better represent this variation, it is recommended to plot individual data points onto the bar graphs in all figures and to nuance the interpretation of putative differences in downstream signaling of different receptors. Differences in the physiological roles of CT-like peptides may be explained by various mechanisms, including differences in peptide/receptor expression or in the potency of peptides to activate different receptors in vivo. It would be useful to elaborate on these different explanations in the discussion.

      We appreciate the reviewer's critical assessment. The observed variations in control conditions across experiments (e.g., Figures 3A & 4A) primarily arise from two methodological factors: ① Each experimental set used cells transfected with distinct receptor subtypes (e.g., AjPDFR1 vs. AjPDFR2), inherently introducing baseline variability due to differential receptor expression profiles. ② Independent cell culture batches were employed for replicate experiments to ensure biological reproducibility.  Importantly, these minor variations ‌did not compromise‌ the statistical significance of downstream signaling differences (p < 0.01 for all comparative analyses). And according to the reviewer’s suggestion, we have plotted individual data points onto the bar graphs in all figures.

      And also, according to the reviewer’s suggestion, we have expanded the discussion on receptor-specific signaling cascades in Section 3.4 (lines 589-609). Key findings include: In vivo pharmacological assays demonstrated that ‌only high concentrations of AjCT2 significantly enhanced feeding and growth rates in A. japonicus‌. In contrast, neither a low concentration of AjCT2 nor any concentration of AjCT1 (low or high) induced detectable effects. Furthermore, ‌long-term knockdown of AjCTP1/2 further validated the essential role of AjCT2 in regulating feeding and growth‌ in this species. To elucidate the receptor mediating AjCT2’s feeding- and growth-promoting effects, we selected AjPDFR2 based on its distinct activation profile:‌ AjCT2 selectively activated AjPDFR2, inducing downstream ERK1/2 phosphorylation, whereas AjCT1 exhibited no activity‌ toward this receptor. Given this receptor specificity, we performed AjPDFR2 knockdown experiments, which revealed phenotypic changes ‌consistent with those in AjCTP1/2 knockdown animals‌, including ‌significantly reduced WGR and SGR‌, alongside ‌increased remaining bait accumulation and diminished excrement output‌ compared to control. Collectively, these results support a model wherein AjCT2 promotes feeding and growth in A. japonicus via AjPDFR2-dependent activation of the cAMP/PKA/ERK1/2 and Gαq/Ca²⁺/PKC/ERK1/2 cascades‌. Considering the inherent complexity of neuropeptide signaling systems, which involve multiple GPCR subtypes coupled to diverse signaling cascades, ligands bound to the same receptor may activate distinct G protein subforms within a single cell (Møller et al., 2003; Mendel et al., 2020). Receptor activation modes may be modulated by structural polymorphisms or binding site diversity (Wong et al., 2000; Changeux, 2010), as well as by the differential efficacy of peptides in activating receptors in vivo‌.  

      (7) For the peptide injection experiments, it is recommended to explain the different animal groups in the results section. In addition, injection in the control condition seems to have a small effect on the wet weight. Therefore, it would be useful to compare control-injected and peptide-injected groups after injection.

      Thank you for the reviewer’s comments. We have provided an expanded explanation of the animal group classifications in Section 2.6 (lines 367–375). We fully agree that a comparative analysis between the experimental and control groups post-injection is essential. However, since wet weight measurement is suboptimal for demonstrating feeding and growth variations, we re-evaluated the data using two validated metrics: weight gain rate (WGR) and specific growth rate (SGR) of A. japonicus. The results revealed that the high-concentration AjCT2 injection group exhibited significantly elevated weight gain rate and specific growth rate compared to the control group, suggesting a potential role of AjCT2 signaling in promoting feeding and growth in A. japonicus. These results are presented in Figure 8A, with detailed descriptions in Results Section 2.6 (lines 370–375) and methodology in Materials and Methods Section 4.8 (lines 847-851).

      (8) Regarding the RNAi knockdown experiments, it is not clear from the methods section what the siNC control exactly is, and how the interference rate is calculated.

      Thank you for this comment. The siNC control was siRNA which does not target any genes in A. japonicus, with interference rates quantified through the 2<sup>-ΔΔCT</sup> method to assess siRNA inhibition efficiency.‌ These methodological details have been incorporated into Materials and Methods Section 4.9 (lines 866–867 and 874-876) for enhanced clarity.‌

      Reviewer #2 (Recommendations for the authors):

      (1) Both the phylogenies are missing bootstrap tests. Please include this analysis. The phylogenetic analyses should also include other Family B ligands and receptors from both vertebrates and invertebrates because it is widely assumed that PDF is related to VIP given their shared roles in circadian clock and gut regulation. Therefore, this analysis needs to be more comprehensive than currently presented. Drosophila melanogaster receptors have also been excluded in spite of the Drosophila PDFR exhibiting ligand promiscuity. The legend should also include the full species names of the various taxa (or modify the figure to include full names) instead of referring to another table. The supplementary table was not available to this reviewer.

      Thank you for the reviewer’s constructive comments. According to the reviewer’s suggestion, we have incorporated the VIPRs and Drosophila melanogaster receptors into the comparative analysis and reanalyzed the phylogenies in Figure 1C, and both phylogenies included bootstrap tests (Figure 1B, 1C) in the revised manuscript. The full species names of the various taxa are listed in supplementary tables 1 and 2 in the revised manuscript.

      (2) Expression data indicate that AjCTP1/2 is expressed in both the longitudinal muscles and intestine. What are the cell types that express AjCTP1/2? Given that the authors show an effect of CT1 and CT2 on both of these tissues, it would be important to know whether this is local regulation (paracrine or autocrine) vs long-distance hormonal control by the nervous system. This can be addressed by performing in situ hybridization or immunohistochemistry of CT (using Asterias rubens CT antibody: https://doi.org/10.3389/fnins.2018.00382) on these tissues.

      Thank you for this feedback. We have now analysed CT-type neuropeptide expression in A. japonicus using immunohistochemistry with the antiserum to the A. rubens CT-type peptde ArCT, which has previously been shown to cross-react with CT-type neuropeptides in other echinoderms (Aleotti et al., 2022). We have added related descriptions in the following sections: Results (section 2.4, line 299-336), Discussion (section 3.3, line 545-554) and Materials and methods (section 4.6, line 785-817). ‌Consistent with this previous finding, the ArCT antiserum labelled neuronal cells and fibers in the central and peripheral nervous system and in the digestive system of A. japonicus (Figure 6). The specificity of immunostaining was confirmed by performing pre-absorption tests with the ArCT antigen peptide (Figure 6-figure supplement 1). The detection of immunostaining in the innervation of the intestine is consistent with PCR results and the relaxing effect of AjCT2 on intestine preparations. Interestingly, no immunostaining was observed in longitudinal muscle, which is inconsistent with the detection of AjCT1/2 transcripts in this tissue. This may reflect differences in the sensitivity of the methods employed to detect transcripts (PCR) and mature peptide (immunohistochemistry). The absence of ArCT-like immunoreactivity in the longitudinal muscles suggests that AjCT1 and AjCT2 may exert relaxing effects on this tissue in vivo via hormonal signaling mechanisms. However, because AjCT1/2 expression in the longitudinal muscles may be below the detection threshold of the ArCT antibodies, we can’t rule out the possibility that AjCT1/2 are released within the longitudinal muscles physiologically.       

      (3) While Drosophila DH31 can activate both PDF and DH31 receptors, the EC50 values differ drastically. Importantly, there is an independent gene encoding PDF which is a more sensitive ligand for the PDF receptor. This is in stark contrast to the situation presented here where the authors have yet to identify the PDF gene in their system. Outside Drosophila this cross signaling between the two systems has not been observed in any species. Based on this, I would argue that the ability of CTs to activate PDFR is not an evolutionary ancient property but rather an example of convergent evolution if supported by more evidence.

      We sincerely appreciate the reviewers' insightful comments.‌ We agree that we cannot rule out the possibilty that ability of CT-type peptides to activate PDF-type receptors in Drosophila and A. japonicus has arisen independently. Therefore, we have modified the text in the discussion accordingly so that this alternative explanation for the effects of CT-type peptides on PDF-type receptors is also presented: “Alternatively, the ability of CT-type neuropeptides to act as ligands for PDF-type receptors in D. melanogaster and A. japonicus may have evolved independently. Further studies on a wider variety of both protostome (e.g. molluscs, annelids) and deuterostome taxa (e.g. other echinoderms, hemichordates) are needed to address this issue.”

      (4) AjCT1 and CT2 can activate the two PDF receptors ex vivo. However, their EC50 values are larger and the responses are lower compared to those seen for the CT receptor. Similar cross-talk between closely related peptide families is often observed in ex vivo systems (see: https://doi.org/10.1016/j.bbrc.2010.11.089 , https://doi.org/10.1073/pnas.162276199 , https://doi.org/10.1093/molbev/mst269 and others). However, very few signaling systems exhibit this type of cross-talk in vivo. Without any in vivo evidence, I suspect that the more likely possibility is that the bona fide endogenous ligand for PDF receptors remains to be discovered. The authors could, however, perform peptide and receptor knockdown experiments and show overlap in phenotypes following CT knockdown and PDFR knockdown to support their claim.

      We sincerely appreciate the reviewers' insightful critique. According to the reviewer’s suggestion, we have supplemented CTP and AjPDFR2 knockdown experiments, and measured the dry weight of remaining bait and excrement, as well as calculating the weight gain rate and specific growth rate in response to phenotypic changes. The results showed that weight gain rate and specific growth rate in experimental groups were significantly decreased respectively (As shown in Figure 10A and 11B), Correspondingly, except for the I phase, the siAjCTP1/2-1 group had significantly increased remaining bait and decreased excrement in II-VI phases (Figure 10B), the remaining bait weight was significantly increased in siAjPDFR2-1 group (except during phase I), while the weight of excrement was significantly decreased in phase V and VI (Figure 11C). Therefore, AjCT and AjPDFR2 knockdown experiments showed overlap in phenotypes, providing evidence that AjCT does act as an endogenous ligand for PDFR. These results were added in Figure 10 and Figure 11. The related description was added in the results section 2.6 (line 390-396), section 2.7 (line 427-439) and the materials and methods section 4.9 (line 879-898). We acknowledge, however, that other peptides, in addition AjCT1 and AjCT2, may also act as ligands for AjPDFR1 and AjPDFR2 in vivo and on-going studies in the Chen (OUC) and Elphick (QMUL) labs are attempting to address this issue

      (5) Why are receptor transcripts upregulated following peptide injection? Usually, increased ligand levels/signaling result in a compensatory decrease in receptor levels. These negative feedback loops maintain optimum signaling levels. Since the authors have successfully implemented RNAi for this CT precursor, what are the phenotypes on growth and feeding?

      We thank the reviewers for raising these critical points. Our responses are structured as follows: Firstly, our findings align with established mechanisms of neuropeptide-induced receptor modulation (Please check the reference Tiptanavattana et al. 2022). Secondly, based on the reviewer’s suggestion, we have supplemented the experiments to detect the phenotype variations on growth and feeding based on long-term reduced CT signaling, including measuring the dry weight of remaining bait and excrement, calculating the weight gain rate and specific growth rate, as well as testing the expression levels of the three growth factors (AjMegf6, AjGDF-8 and AjIgf). The results showed that weight gain rate and specific growth rate in siAjCTP1/2-1 group were significantly decreased (As shown in Figure 10A), Correspondingly, except for the I phase, the siAjCTP1/2-1 group had more remaining bait and less excrement in II-VI phases (Figure 10B). Furthermore, the growth inhibitory factor AjGDF-8 was significantly up-regulated and the growth promoting factors AjMegf6 were significantly down-regulated in siAjCTP1/2-1 group (Figure 10C). We have added these results in Figure 10, with detailed description in the results section 2.6 (line 390-396) and in the materials and methods section 4.9 (line 879-888). And after long-term continuous injections of siAjCTP1/2-1, we further recorded the feeding behavior of these sea cucumbers for three consecutive days. The remaining bait and feces were cleaned and the food was re-placed in the middle of the tank each day. We calculated the aggregation percentage (AP) of sea cucumbers around the food during the peak feeding period (2:00-4:00) each day, which is the best indicator for sea cucumber feeding behavior detecting. The results showed that the AP in siAjCTP1/2-1 group was significantly lower than that in control group. After dissection, we also found the intestines of siAjCTP1/2-1 group had less food and significantly degenerated (see author response image 1). All these results supported that long-term functional loss of AjCT2 negatively influence the feeding and growth of A. japonicus.

      Other comments:

      (6) What criteria do the authors use to classify some proteins as "type", some as "like" and others as "related"? In my opinion, DH31 could be referred to as CT-like or CT-type. Please use one term for clarity unless there is a scientific explanation behind this terminology.

      Thank you for the reviewer’s comment. If you look at the paper by Cai et al. (2018) you will see in Figure 14 that CT-type peptides and DH31-type peptides are paralogous, probably due to a gene duplication in the common ancestor of the protostomes. The CT-related peptides in protostomes that have a disulphide bridge we would describe as CT-type because they have conserved a feature that is found in CT-type peptides in deuterostomes. Whereas the DH31 peptides we would describe as CT-like. But there is not a formal rule on this. It is possible the duplication event that gave rise to DH31 and CT-type peptides occurred in the common ancestor of the Bilateria but DH31-type signaling was lost in deuterostomes. On the other hand, if the gene duplication that gave rise to DH31-type peptides and CT-type peptides in protostomes did occur in a common ancestor of the protostomes, then DH31 and CT-type peptides in protostomes could be described as co-orthologs of CT-type peptides in deuterostomes. In this case, both CT peptides and DH31 peptides in protostomes could be described as CT-type. Here is a useful link for explanation of terms: https://omabrowser.org/oma/type/

      (7) Was genomic DNA removal step performed before cDNA synthesis for qRT-PCR?

      Thank you for the reviewer’s comment. The genomic DNA removal step was performed before cDNA synthesis for qRT-PCR and we have added the information in the section 4.5 (line 774-776).

      (8) Line 70: The presence of calcitonin-like peptides (DH31) and DH31 receptors in invertebrates was discovered long before the discoveries by Jekely 2013 and Mirabeau and Joly 2013. Please credit these original studies: https://pubmed.ncbi.nlm.nih.gov/10841553/ and https://pubmed.ncbi.nlm.nih.gov/15781884/.

      Thank you for the reviewer’s comment. We have credited these original studies in the revised manuscript.

      (9) Lines 72-74: Please cite https://pubmed.ncbi.nlm.nih.gov/24359412/.

      Thank you for the reviewer’s comment. We have cited it in the revised manuscript.

      (10) Line 87: Please cite https://pubmed.ncbi.nlm.nih.gov/15781884/.

      Thank you for the reviewer’s comment. We have cited it in the revised manuscript.

      (11) Lines 89-91: The functional significance of DH31 signalling to PDFR in Drosophila is known. See: https://pubmed.ncbi.nlm.nih.gov/15781884/ and https://pubmed.ncbi.nlm.nih.gov/30696873/. There are several studies that have shown the functions of DH31 signalling via DH31R.

      Thank you for the reviewer’s comment. We have corrected it and added all this studies in the revised manuscript.

      (12) Figure 1 Supplement 1: The tertiary models for CT1 and CT2 look completely different. This prediction is not in line with both ligands activating the same receptor.

      Thank you for the reviewer’s comment. We have deleted this supplementary figure.

      (13) Figure 1 Supplement 3 legend: Please add panel labels next to the corresponding receptor.

      Thank you for the reviewer’s comment. We have added panel labels next to the corresponding receptors as you suggested.

      (14) Figure 2: What does CO refer to?

      Thank you for the reviewer’s comment. CO (Control) refers to the stimulation of HEK293T transfected cells with serum-free DMEM, and we have added the detailed information in Figure 2 legend (line 251-252).

      (15) Figure 3: Due to the low magnification of the cells, it is difficult to see the localization of the receptor. It would also be more appropriate to use a membrane marker rather than DAPI which does not label the cytoplasm or membrane where the receptor can be found.

      we appreciate the reviewer's insightful comment regarding the experimental controls.‌ The baseline receptor localization data under non-stimulated conditions are presented in ‌Figure 3—figure supplement 1‌, demonstrating constitutive membrane distribution of pEGFP-N1-tagged receptors. Upon stimulation with synthetic CT-like peptides, qualitative imaging analysis revealed significant ligand-induced receptor internalization into the cytoplasm (Figure 3B).

      (16) Figure 9: Please include PDF precursor and receptor as separate columns. Also, Drosophila CT/DH31 receptors have been characterized.

      Thank you for the reviewer’s comment. We have added PDF precursor, predicted peptides and receptors as separate columns in the revised manuscript Figure 12. And also, we corrected the error summary of Drosophila CT/DH31 receptors according to your suggestions.

      (17) Table 1: It is not very clear why there are multiple columns for ERK1/2 with different outcomes.

      Thank you for the reviewer’s comment. Although the cAMP/PKA or Gαq/Ca<sup>2+</sup>/PKC signaling is activated after ligand binding to receptors, the downstream ERK1/2 cascade is not necessarily activated. Therefore, we counted the activation status of cAMP/PKA and its downstream ERK1/2 cascade, and Gαq/Ca<sup>2+</sup>/PKC and its downstream cascade in Table 1 respectively. We have optimized Table1 to make it clearer in the revised manuscript.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We thank the reviewers for their time and insightful and constructive comments. We are pleased that reviewers found this study “opens the way for novel future work” and the findings “interesting”. We have experimentally addressed the points raised by the reviewers and have substantially revised the manuscript by modifying 30 figures panels. The reviewers’ points are specifically addressed below.

      1) The authors concluded that an accumulation of Ly6Clo monocytes occurred in the Rbpjfl/fl Lyz2cre/cre mouse by examining the percentage of cells among CD45+ cells in Figure 1. It would be helpful if the authors could give an account of the total cell count numbers of monocyte subsets per ml of blood and in the bone marrow to give the readers a better idea of the extent of increase as cell percentages among CD45+ cells may be influenced by the number of other immune subsets.

      We thank the reviewer for raising these points. In this research, we crossed Rbpjfl/fl mice with Lyz2-Cre mice carrying the Cre recombinase inserted in the Lysozyme-M (Lyz2) gene locus results in the selective deletion of RBP-J in myeloid cells, such as monocytes, macrophages and granulocytes. We then proceeded to examine the neutrophil levels in the bone marrow and blood. The percentage of neutrophils observed was found to be similar to that of control mice, which was in line with the findings reported in the literature (Metzemaekers et al. 2020). Furthermore, the proportion of Ly6Chi monocytes in RBP-J deficient mice was found to be similar to that of control mice, which is consistent with the literature (Ginhoux et al. 2014). Based on these results, we thought that the changes observed in the proportion of Ly6Clo monocytes could reliably indicate the alterations occurring in Ly6Clo monocytes within the Rbpjfl/flLyz2cre/cre mice.

      2) The authors demonstrated no significant differences in bone marrow progenitor and monocyte numbers, therefore concluding that monocyte egress from the bone marrow did not contribute to the increase in Ly6Clo monocyte numbers in the blood (Figure 1B-D). As it is unclear what is the exact cell number increase in the blood, the changes in bone marrow monocyte numbers might be too small to be reflected in their percentage calculations. In light that CCR2 was also found to play a role in Ly6Clo monocyte homeostasis in Rbpjfl/fl Lyz2cre/cre mice, could the authors demonstrate if Rbpj-deficient Ly6Clo monocytes might be more responsive to CCL2 through transwell experiments? This would also provide readers a more in-depth mechanism of how an increase in CCR2 on Rbpj-deficient Ly6Clo monocytes leads to their accumulation in the periphery.

      The experimental results regarding the proportion of monocytes and precursor cells in the bone marrow were derived from multiple experiments. The data obtained from individual experiments as well as the final integrated data did not reveal significant differences between the control mice and Rbpjfl/flLyz2cre/cre mice. Therefore, we believed that even if there were small changes in cell numbers, these differences could still be reflected through alterations in their proportions. We attempted transwell experiments, but unfortunately, they were not technically successful. Nearly all sorted Ly6Clo monocytes attached to the transwell membrane, making it challenging to draw a conclusion regarding the responsiveness of RBP-J deficient Ly6Clo monocytes to CCL2.

      3) In the parabiosis experiment conducted in Figure 3C-E, the authors provide conclusive evidence that the accumulation of Rbpj-deficient Ly6Clo monocytes was cell intrinsic as Rbpj-deficient Ly6Clo monocytes continued to accumulate in the blood of control counterparts. Monocytes have also been shown to accumulate in the spleen and re-enter or home back to the bone marrow. Assessing if there is a change in monocyte homing abilities in Rbpj-deficient Ly6Clo monocytes by examining their numbers in the spleen and bone marrow of control parabiotic mice would substantiate their claims that the defect was cell intrinsic and provide further understanding for the readers of why Rbpj-deficient Ly6Clo monocytes accumulate in the blood.

      We thank the reviewer for bringing out this interesting point. We also analyzed the proportions of GFP- Ly6Chi monocytes and Ly6Clo monocytes in the bone marrow of parabiotic mice. The experimental results revealed that there were no significant differences in the proportion of GFP- monocytes between the control mice and the KO animals (see the figure A below). We also detected the expression of CXCR4 in bone marrow Ly6Clo monocytes. Rbpjfl/flLyz2cre/cre mice exhibited normal expression of CXCR4 (see Author response image 1 below), which participates in the homing of classical and nonclassical monocytes to bone marrow and spleen monocyte reservoirs (Chong et al. 2016). The homing abilities of RBP-J deficient Ly6Clo monocytes may not have changed.

      Author response image 1.

      4) Authors should provide cell counts for Figure 5B to demonstrate the extent CCR2 depletion affects the number of Ly6Clo monocytes in Rbpjfl/fl Lyz2cre/cre mice as explained in point 1.

      As mentioned before, we believed that the proportion of circulating monocytes could, to some extent, provide evidence of the impact of CCR2 deficiency on Ly6Clo monocytes.

      Reviewer #2

      1) The confirmation of knockout in supplemental figure 1A shows only a two third knockdown when this should be almost totally gone. Perhaps poor primer design, cell sorting error or low Cre penetrance is to blame, but this is below the standard one would expect from a knockout.

      Kang et al (PMID: 31944217) evaluated the knockout efficiency of Rbpj in sorted colonic macrophages of Rbp-jfl/flLyz2cre/cre mice using qPCR and immunoblotting. The qPCR result indicated a two-third knockdown, while the immunoblotting results demonstrated efficient deletion of RBP-J protein in Rbp-jfl/flLyz2cre/cre mice. As pointed out by the reviewer, the observed two-third knockdown, which is lower than the expected complete knockout, may be attributed to primer design.

      2) Many figures (e.g. 1A) only show proportional data (%) when the addition of cell numbers would also be informative

      We appreciate the reviewer for bringing up these points. Indeed, multiple articles studying monocytes only show changes in cell proportions. As mentioned above, we believed that analyzing the proportion of circulating monocytes could offer valuable evidence of the influence of RBP-J deficiency on Ly6Clo monocytes.

      3) Many figures only have an n of 1 or 2 (e.g. 2B, 2C)

      Here, we employed annexin V (AnnV) and propidium iodide (PI) staining to evaluate apoptosis and cell death in Ly6Chi and Ly6Clo blood monocytes from control and RBPJ deficient mice. The results showed no significant difference in the levels of apoptosis and cell death between the two groups (see Author response image 2 below). The statistical data for Ki-67 expression obtained from multiple experiments, and the expression of Ki-67 showed no significant difference between the control and RBP-J deficient mice (see the figure B below). In Figure 2C, each dot represents 2-3 mice, and there were no differences observed between control and RBP-J deficient mice at multiple time points during the repeated measurements.

      Author response image 2.

      4) Sometimes strong statements were based on the lack of statistical significance, when more n number could have changed the interpretation (e.g. 2G, 3E)

      We have derived the corresponding conclusions based on the observed experimental results.

      5) There is incomplete analysis (e.g. Network analysis) and interpretation of RNAsequencing results (figure 4), the difference between the genotypes in both monocyte subsets would provide a more complete picture and potentially reveal mechanisms

      We thank the reviewer for bringing out this point. We agreed that a more comprehensive analysis, including a comparison between the genotypes in both monocyte subsets, would provide a deeper understanding and potentially uncover underlying mechanisms. Having observed alterations in blood Ly6Clo monocytes in RBP-J deficient mice, our primary focus had been on analyzing the differentially expressed genes within this subset of monocytes to gain further insights into its specific characteristics and behavior. We also uploaded sequencing data sets in the Genome Expression Omnibus with assigned accession numbers GSE208772 to facilitate interested researchers in accessing and downloading the data.

      6) The experiments in Figures 5 and 7 are missing a control (Lyz2cre/cre Ccr2RFP/RFP or the Rbpj+/+ versions) and may have been misinterpreted. For example if the control (RBP-J WT, CCR2 KO) was used then it would almost certainly show falling Ly6C low numbers compared to RBP-J WT CCR2 WT, but RBP-J KO CCR2 KO would still have more Ly6c low monocytes than RBP-J WT, CCR2 KO - meaning that the RBP-J function is independent of CCR2. I.e. Ly6c low numbers are mostly dependent on CCR2 but this is irrespective of RBP-J.

      The diminished Ly6Clo monocytes in Rbpjfl/flLyz2cre/creCcr2RFP/RFP (DKO) mice can be divided into two distinct subpopulations: one portion originates from Ly6Chi monocytes, while the other comprises Ly6Clo monocytes characterized by heightened CCR2 expression. The Ly6Clo monocytes that remain in DKO mice exhibit CCR2 expression levels within the normal range when compared to Lyz2cre/cre mice, but lower levels compared to RBP-J deficient mice (Figure 5A). These findings suggest that RBP-J exerts regulatory influence over Ly6Clo monocytes, at least in part, through CCR2.

      7) Figure 6 was difficult to interpret because of the lack of shown gating strategy. This reviewer assumes that alveolar macrophages were gated out of analysis

      The gating strategy of lung interstitial macrophage in the manuscript Figure 6 was consistent with the published work (Schyns et al, cited in the manuscript). We also measured alveolar macrophages (AM) from control and RBP-J deficient mice bronchoalveolar lavage fluid. At the resting state, RBP-J deficient mice exhibited normal AM frequency and number (see Author response image 3 below).

      Author response image 3.

      8) The statements around Figure 7 are not completely supported by the evidence, i) a significant proportion of CD16.2+ cells were CCR2 independent and therefore potentially not all recently derived from monocytes, and ii) there is nothing to suggest that the source was not Ly6C high monocytes that differentiated - the manuscript in general seems to miss the point that the source of the Ly6C low cells is almost certainly the Ly6C high monocytes - which further emphasises the importance of both cells in the sequencing analysis

      Schyns et al and Sabatel at al showed that the numbers of IM and CD16.2+ were similar in Ccr2 sufficient and Ccr2-/- mice, demonstrating that CD16.2+ cells were Ccr2 independent. The number of CD16.2+ cells was significantly reduced in Rbpjfl/flLyz2cre/creCcr2RFP/RFP mice as compared to Rbpjfl/flLyz2cre/cre mice, in line with decreased number of lung Ly6Clo monocytes and blood Ly6Clo monocytes, showing that CD16.2+ cells depended on Ccr2 for their presence in Rbpjfl/flLyz2cre/cre mice.

      9) The authors did not refer to or cite a similar 2020 study that also investigated myeloid deletion of Rbpj (Qin et al. 2020 - https://doi.org/10.1096/fj.201903086RR). Qin et al identified that Ly6Clo alveolar macrophages were decreased in this model - it is intriguing to synthesise these two studies and hypothesise that the ly6c low monocytes steal the lung niche, but this was not discussed

      We thank the reviewer for bringing this study to our attention. According to their findings, myeloid-specific RBP-J deficiency resulted in a decrease in Ly6CloCD11bhi alveolar macrophages but an increase in Ly6CloCD11blo alveolar macrophages after bleomycin treatment, while the total number of alveolar macrophages showed no significant difference. These results suggest that RBP-J may play a role in regulating the balance between these specific alveolar macrophage subsets in response to bleomycin-induced injury, without affecting the overall population of alveolar macrophages. This may be different from what we observe in interstitial macrophages under resting conditions.

      Reviewer #3

      1) It is curious that the authors do not see the increase in circulating monocytes reflected in the spleen however, the n-number is 2. Increasing the n-number would enable the author to understand the data which is not interpretable at the moment. There are multiple other places in which a low n-number makes it hard to fully understand the biology (eg Figure 2C&E)

      Although we only counted the number of splenic monocyte subsets in two mice, the proportion of splenic monocyte subsets was calculated based on additional quantity of mice in our study.

      2) Given that Ly6Clow monocytes are thought to be longer lived than Ly6C+ and there is still considerable labelling of Ly6Clow monocytes at the end of the 96 hours analysed in the EdU experiment, it is not possible to determine from the data here whether RBPJ deficiency increases life span. Could it be that differences in %EdU+ cells would only be seen at later time points? If the timeline was extended, could it be that differences in %EdU+ become apparent

      Based on the latex bead experiment, we observed that the presence of latex+ Ly6Clo monocytes at 7 days in control and RBP-J deficient mice did not differ, indicating that the lifespan of Ly6Clo monocytes did not increase.

      3) Similarly for the latex bead experiment. Given that there is only n=2 at the first time point and only ~30% of Ly6Clow monocytes are Latex+, it is very hard to conclusively claim that RBP-J does not influence monocyte survival or proliferation. An interesting experiment to assess whether RBP-J is increasing monocyte survival could be an adoptive transfer model in which Ly6Clow monocytes are injected into a congenic mouse and tracked over time.

      In RBP-J deficient mice, there was an increase in the proportion of Ly6Clo monocytes. We hypothesized that this lower proportion of latex+ cells might make it easier to observe differences, but clearly, in our experiment, no differences were observed between control and RBP-J deficient mice.

      4) RNA-seq: Ccr2 and Itgax are not the top hits. The authors do not investigate the top hits which may provide very interesting insight into how RBP-J influences monocyte biology.

      We thank the reviewer for raising these points. We also analyzed some top changed genes. The top two gene in the downregulated gene list are Hes1 and Nrarp, which are regulated by the Notch pathway (Krebs et al 2001 and Radtke et al 2010). We tested blood monocytes, but the population of monocyte subsets displayed no differences between Hes1fl/flRbp-jfl/flLyz2cre/cre and Rbp-jfl/flLyz2cre/cre mice (data not shown). As shown in Figure 2- figure supplement 1A, expression of Nr4a1 showed no significant differences between control and RBP-J deficient mice. The top gene in the upregulated gene list is Erdr1, which has been reported to play a role in cellular survival (Soto et al 2017), while blood monocyte subsets in RBP-J deficient mice displayed normal survival.

      5) The PCA plot in figure 4C- it would be interesting to see where all the biological replicates fall.

      We agree with the reviewer’s assessment that observing the positions of all biological replicates on the PCA plot may indeed yield valuable insights. However, it is worth noting that the upregulated and downregulated genes also offer suggestive hints.

      6) Based on CCR2 expression and CD11c expression, monocytes from RBP-J deficient mice look more like Ly6C+ monocytes - could it be that RBP-J is increasing conversion from Ly6C+ monocytes to Ly6Clow? Or could it be that Ly6Clow monocytes are heterogeneous and RBP-J is increasing survival or conversion of one subtype of Ly6Clow monocytes but looking at all Ly6Clow monocytes together is masking this?

      Ly6Clo monocyte can be subdivided into different subpopulations depending on surface makers, such as CD43, MHC-II, CD11c and CCR2 (Jakubzick et al 2013 and Ginhoux et al. 2014). Carlin et al founded that a subset of blood Ly6Clow cells was independent of both Ccr2 and Nr4a1. As said by the reviewer, Ly6Clo monocytes are heterogeneous. Therefore, there is a possibility of altered survival in a certain group of Ly6Clo monocytes.

      7) The data presented here suggest that lung CD16.2+ interstitial macrophages are derived from Ly6Clow monocytes which are increased via CCR2. Although the data are suggestive, they are not conclusive, lineage tracing and CCR2 blockade or better, conditional CCR2 deficiency would help to strengthen the claim.

      Schyns et al showed that the number of CD16.2+ was similar in Ccr2 sufficient and Ccr2-/- mice, demonstrating that CD16.2+ cells were Ccr2 independent. While number of CD16.2+ cells was significantly reduced in Rbpjfl/flLyz2cre/creCcr2RFP/RFP mice as compared to Rbpjfl/flLyz2cre/cre mice, in line with decreased number of lung Ly6Clo monocytes and blood Ly6Clo monocytes. Moreover, the turnover of lung Ly6Chi and Ly6Clo monocytes was normal. These results implicated that CD16.2+ cells depended on Ccr2 for their presence in Rbpjfl/flLyz2cre/cre mice.

      8) The figures could do with more headings/ more detailed legends to help the reader, for example including what is BM, what is blood, what is spleen. Figure 2E needs the days labelled on or above the histograms.

      We thank the reviewer for raising this important point. We have now added additional detailed legends to the figure.

      9) Gating strategies should be included to help the reader understand which cells you are looking at, especially for Figure 6&7.

      The gating strategy for Figures 6 and 7 followed the method reported in the literature, which included the identification of alveolar macrophages. Additionally, we labeled the markers for cell populations in the figure.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This study aims to understand the malaria antigen-specific cTfh profile of children and adults living in a malaria holoendemic area. PBMC samples from children and adults were unstimulated or stimulated with PfSEA-1A or PfGARP in vitro for 6h and analysed by a cTfh-focused panel. Unsupervised clustering and analysis on cTfh were performed.

      The main conclusions are:

      (1) the cohort of children has more diverse (cTfh1/2/17) recall responses compared to the cohort of adults (mainly cTfh17) and

      (2) Pf-GARP stimulates better cTfh17 responses in adults, thus a promising vaccine candidate.

      Strengths:

      This study is in general well-designed and with excellent data analysis. The use of unsupervised clustering is a nice attempt to understand the heterogeneity of cTfh cells. Figure 9 is a beautiful summary of the findings.

      Weaknesses:

      (1) Most of my concerns are related to using PfSEA-1A and PfGARP to analyse cTfh in vitro stimulation response. In vitro, stimulation on cTfh cells has been frequently used (e.g. Dan et al, PMID: 27342848), usually by antigen stimulation for 9h and analysed CD69/CD40L expression, or 18h and CD25/OX40. However, the authors use a different strategy that has not been validated to analyse in vitro stimulated cTfh. Also, they excluded CD25+ cells which might be activated cTfh. I am concerned about whether the conclusions based on these results are reliable.

      It has been shown that cTfh cells can hardly produce cytokines by Dan et al. However, in this paper, the authors report the significant secretion of IL-4 and IFNg on some cTfh clusters after 6h stimulation. If the stimulation is antigen-specific through TCR, why cTfh1 cells upregulate IL-4 but not IFNg in Figure 6? I believe including the representative FACS plots of IL-4, IFNg, IL21 staining, and using %positive rather than MFI can make the conclusion more convincing. Similarly, the author should validate whether TCR stimulation under their system for 6h can induce robust BCL6/cMAF expression in cTfh cells. Moreover, there is no CD40L expression. Does this mean TCR stimulation mediated BCl6/cMAF upregulation and cytokine secretion precede CD40L expression?

      In summary, I am particularly concerned about the method used to analyse PfSEA-1A and PfGARP-specific cTfh responses because it lacks proper validation. I am unsure if the conclusions related to PfSEA-1A/PfGARP-specific responses are reliable.

      An unfortunate reality of these types of complex immunologic studies is that it takes time to optimize a multiparameter flow cytometry panel, run this number of samples, and then conduct the analysis (not to mention the time it takes for a manuscript to be accepted for peer-review). An unexpected delay, frankly, was the COVID-19 pandemic when non-essential research lab activities were put on hold. We designed our panel in 2019 and referred to the “T Follicular Helper Cells” Methods and Protocols book from Springer 2015. Obviously the field of human immunology took a huge leap forward during the pandemic as we sought to characterize components of protective immunity, and as a result there are several new markers we will choose for future studies of Tfh subsets. We agree with the reviewer that cytokine expression kinetics differ depending on the in vitro stimulation conditions. Due to small blood volumes obtained from healthy children, we were limited in the number of timepoints we could test. However, since we were most interested in IL21 expression, we found 6 hrs to be the best in combination with the other markers of interest during our optimization experiments. We did find IFNg expression from non-Tfh cells, therefore we believe our stimulation conditions worked.

      Dan et al used stimulated tonsils cells to assess the CXCR5<sup>pos</sup>PD1<sup>pos</sup>CD45RA<sup>neg</sup> Tfh and CXCR5<sup>neg</sup> CD45RA<sup>neg</sup> non-Tfh whereas in our study, we evaluated CXCR5<sup>pos</sup>PD1<sup>pos</sup>CD45RA<sup>neg</sup> Tfh from PBMCs. Dan et al PBMCs’ work used EBV/CMV or other pathogen product stimuli and only gated on CD25<sup>pos</sup>OX40<sup>pos</sup> cells which are not the cells we are assessing in our study. This might explain in part the differences in cytokine kinetics, as we evaluated CD25<sup>neg</sup> PBMCs only. However, we agree that more recent studies focused on CXCR5<sup>pos</sup>PD1<sup>pos</sup> cells included more Activation-induced marker (AIM) markers, which are missing in our study, inducing a lack of depth in our analysis.

      Percentage of positive cells and MFI are complementary data. Indeed, the percentage of positive cells only indicates which cells express the marker of interest without giving a quantitative value of this expression. MFI indicates how much the marker of interest is expressed by cells which is important as it can indicate degree of activation or exhaustion per cell. Meta-cluster analysis is not ideal to assess the percentage of positivity whereas it does provide essential information regarding the intensity of expression. We added supplemental figures 14 (Bcl6 and cMAF), 15 (INFg and IL21) and 16 (IL4 and IL21) where percentage of positive cells were manually gated directly from the total CXCR5<sup>pos</sup>CD4<sup>pos</sup>CD45RA<sup>neg</sup>CD25<sup>neg</sup> TfH based on the FMO or negative control, and we overlaid the positive cells on the UMAP of all the CXCR5<sup>pos</sup>CD4<sup>pos</sup>CD45RA<sup>neg</sup>CD25<sup>neg</sup> meta-clusters. Results from the manual gating are consistent with the results we show using clustering. However, it helps to better visualize that antigen-specific IL21 expression was statistically significant in children whereas the high background observed for adults did not reveal higher expression after stimulation, perhaps suggesting an upper threshold of cytokine expression (supplemental figure 15). The following sentence has been added in the methods at the end of the “OMIQ analysis” section: “ However, the percentage of positive IFN𝛾, IL-4, IL-21, Bcl6, or cMAF using manual gating can be found in Supplemental Figures 14, 15, and 16 along with the overlay of the gated positive cells on the CD4<sup>pos</sup>CXCR5<sup>pos</sup>CD25<sup>neg</sup> UMAP and the cytoplots of the gated positive cells for each meta-cluster (Supplemental Figures 14, 15, and 16).”

      Indeed cMAF can be induced by TCR signaling, ICOS and IL6 (Imbratta et. al, 2020). However, in our study populations, ICOS was expressed (see Author response image 1, panel A) in absence of any stimulation suggesting that CXCR5<sup>pos</sup>CD4<sup>pos</sup>CD25<sup>neg</sup>CD45RA<sup>neg</sup> cells were already capable of expressing cMAF. Indeed, after gating Bcl6 and cMAF positive cells based on their FMOs (Author response image 1, panel B and C, respectively), we overlaid positive cells on the CXCR5<sup>pos</sup>CD4<sup>pos</sup>CD25<sup>neg</sup>CD45RA<sup>neg</sup> cells UMAP and we can see that most of our cells already express cMAF alone (Author response image 1, panel D), co-express cMAF and Bcl6 (Author response image 1, panel E), confirming that they are TfH cells, whereas very few cells only expressed Bcl6 alone (Author response image 1, panel F). Because we knew that cT<sub>FH</sub> already expresses Bcl6 and cMAF, we focused our analysis on the intensity of their expression to assess if our vaccine candidates were inducing more expression of these transcription factors.

      Author response image 1.

      (2) The section between lines 246-269 is confusing. Line 249, comparing the abundance after antigen stimulation is improper because 6h stimulation (under Golgi stop) should not induce cell division. I think the major conclusions are contained in Figure 5e, that (A) antigen stimulation will not alter cell number in each cluster and (B) children have more MC03, 06 and fewer MC02, etc.). The authors should consider removing statements between lines 255-259 because the trends are the same regardless of stimulations.

      We agree, there is no cell division after 6h and that different meta clusters did not proliferate after this short of in vitro stimulation. The use of the word ‘abundance’ in the context of cluster analysis is in reference to comparing the contribution of events by each group to the concatenated data. After the meta clusters are defined and then deconvoluted by study group, certain meta clusters could be more abundant in one group compared to another - meaning they contributed more events to a particular metacluster.

      Dimensionality reduction is more nuanced than manual gating and reveals a continuum of marker expression between the cell subsets, as there is no hard “straight line” threshold, as observed when using in 2D gating. Because of this, differences are revealed in marker expression levels after stimulation making them shift from one cluster to another - thereby changing their abundance.

      To clarify how this type of analysis is interpreted, we have modified lines 255-259 as follows:

      “In contrast, the quiescent PfSEA-1A- and PfGARP-specific cT<sub>FH</sub>2-like cluster (MC02) was significantly more abundant in adults compared to children (Figure 5c and 5d, pf<0.05). Interestingly, following PfGARP stimulation, the activated cT<sub>FH</sub>1/17-like subset (MC09) became more abundant in children compared to adults (Figure 5d, pf<0.05 with a False Discovery Rate=0.08), but no additional subsets shifted phenotype after PfSEA-1A stimulation (Figure 5c).”

      Reviewer #2 (Public Review):

      Summary:

      Forconi et al explore the heterogeneity of circulating Tfh cell responses in children and adults from malaria-endemic Kenya, and further compare such differences following stimulation with two malaria antigens. In particular, the authors also raised an important consideration for the study of Tfh cells in general, which is the hidden diversity that may exist within the current 'standard' gating strategies for these cells. The utility of multiparametric flow cytometry as well as unbiased clustering analysis provides a potentially potent methodology for exploring this hidden depth. However, the current state of analysis presented does not aid the understanding of this heterogeneity. This main goal of the study could hopefully be achieved by putting all the parameters used in one context, before dissecting such differences into their specific clinical contexts.

      Strengths:

      Understanding the full heterogeneity of Tfh cells in the context of infection is an important topic of interest to the community. The study included clinical groupings such as age group differences and differences in response to different malaria antigens to further highlight context-dependent heterogeneity, which offers new knowledge to the field. However, improvements in data analyses and presentation strategies should be made in order to fully utilize the potential of this study.

      Weaknesses:

      In general, most studies using multiparameter analysis coupled with an unbiased grouping/clustering approach aim to describe differences between all the parameters used for defining groupings, prior to exploring differences between these groupings in specific contexts. However, the authors have opted to separate these into sections using "subset chemokine markers", "surface activation markers" and then "cytokine responses", yet nuances within all three of these major groups were taken into account when defining the various Tfh identities. Thus, it would make sense to show how all of these parameters are associated with one another within one specific context to first logically establish to the readers how can we better define Tfh heterogeneity. When presented this way, some of the identities such as those that are less clear such as "MC03/MC04/ MC05/ MC08" may even be better revealed. once established, all of these clusters can then be subsequently explored in further detail to understand cluster-specific differences in children vs adults, and in the various stimulation conditions. Since the authors also showed that many of the activation markers were not significantly altered post-stimulation thus there is no real obstacle for merging the entire dataset for the first part of this study which is to define Tfh heterogeneity in an unbiased manner regardless of age groups or stimulation conditions. Other studies using similar approaches such as Mathew et al 2020 (doi: 10.1126/science.abc8) or Orecchioni et al 2017 (doi: 10.1038/s41467-017-01015-3) can be referred to for more effective data presentation strategies.

      Accordingly, the expression of cytokines and transcription factors can only be reliably detected following stimulation. However, the underlying background responses need to be taken into account for understanding "true" positive signals. The only raw data for this was shown in the form of a heatmap where no proper ordering was given to ensure that readers can easily interpret the expression of these markers following stimulation relative to no stimulation. Thus, it is difficult to reliably interpret any real differences reported without this. Finally, the authors report differences in either cluster abundance or cluster-specific cytokine/ transcription factor expression in Tfh cell subsets when comparing children vs adults, and between the two malaria antigens. The comparisons of cytokine/transcription factor between groups will be more clearly highlighted by appropriately combining groupings rather than keeping them separate as in Figures 6 and 7.

      Thank you for sharing these references. Similar to SPADE clustering and ViSNE dimensionality algorithms used in Orecchioni et al, we used all the extracellular markers from our panel in our FlowSOM algorithm with consensus meta-clustering which includes both the chemokine receptors and activation markers even though they are presented separately in our manuscript across the figure 3 and 4. This was explained in the methods section (lines 573 - 587). We then chose the UMAP algorithm as visual dimensionality reduction of the meta-clusters generated by FlowSOM-consensus meta-clustering as explained under the “OMIQ analysis” subpart of our methods (lines 588- 604). Therefore, we believe we have conducted the analysis as this reviewer suggests even if we chose to show the figures that were informative to our story. The heatmap of the results brings the possibility to see which combination of markers respond or not to the different conditions and between groups, all the raw data are present from the supplemental figures 10 to 13 showing, using bar plots, the differences expressed in the heatmaps. We believe it strengthens our interpretation of the results.

      Regarding the transcription factor and cytokine background, we added supplemental figures 14, 15 and 16 where we used manual gating to select Bcl6, cMAF, IFNg, IL21 or IL4 positive cells directly from total CXCR5<sup>pos</sup>CD4<sup>pos</sup>CD45RA<sup>neg</sup>CD25<sup>neg</sup> TfH cells based on the FMO or negative control, and we overlaid the positive cells on the UMAP of all the CXCR5<sup>pos</sup>CD4<sup>pos</sup>CD45RA<sup>neg</sup>CD25<sup>neg</sup> meta-clusters. Moreover, all the dot plots (with their statistics) used for the heatmap figure 6 and 7 can be found in the supplemental figures 10, 11, 12 and 13. These supplemental figures address the concerns above by showing the difference of signals between unstimulated and stimulated conditions.

      Reviewer #3 (Public Review):

      Summary:

      The goal of this study was to carry out an in-depth granular and unbiased phenotyping of peripheral blood circulating Tfh specific to two malaria vaccine candidates, PfSEA-1A and PfGARP, and correlate these with age (children vs adults) and protection from malaria (antibody titers against Plasmodium antigens.). The authors further attempted to identify any specific differences in the Tfh responses to these two distinct malaria antigens.

      Strengths:

      The authors had access to peripheral blood samples from children and adults living in a malaria-endemic region of Kenya. The authors studied these samples using in vitro restimulation in the presence of specific malaria antigens. The authors generated a very rich data set from these valuable samples using cutting-edge spectral flow cytometry and a 21-plex panel that included a variety of surface markers, cytokines, and transcription factors.

      Weaknesses:

      - Quantifying antigen-specific T cells by flow cytometry requires the use of either 1- tetramers or 2- in vitro restimulation with specific antigens followed by identification of TCR-activated cells based on de-novo expression of activation markers (e.g. intracellular cytokine staining and/or surface marker staining). Although authors use an in vitro restimulation strategy, they do not focus their study on cells de-novo expressing activation markers as a result of restimulation; therefore, their study is not really on antigen-specific cTfh. Moreover, the authors report no changes in the expression of activation markers commonly used to identify antigen-specific T cells upon in vitro restimulation (including IFNg and CD40L); therefore, it is not clear if their in vitro restimulation with malaria antigens actually worked.

      We understand the reviewer’s point of view and apologies for any confusion. IFNg was expressed but not statistically different between groups. Indeed, looking at the CD8 T cells and using manual gating, we were able to show that IFNg was increased but not statistically significant upon stimulation from CD4<sup>pos</sup>CXCR5<sup>pos</sup> cells (supplemental figure 15, panel C), confirming our primary observation using clustering analysis. These results showed that our malaria antigen induced IFNg response in some participants, but not all of them, revealing heterogeneity in this response among individuals within the same group.

      Regarding CD40L, in the supplemental figure 7, we can see that some of our meta-clusters expressed more CD40L upon stimulation, but again without leading to statistical differences between groups. Combined with the increased expression of other cytokines and transcription factors, we showed that our stimulation did indeed work. However, because of the high variation within groups, there were no statistical differences across our groups. Because CD40L is not the only marker showing specific T cell activation, and not all T cells respond using this marker alone, a more comprehensive multimarker AIM panel might have highlighted differences between groups. We recognized the limitations of our study and believe that future study will benefit from more activation markers commonly used to identify antigone-specific T cells such as CD69, OX40, 4-1BB (AIM panel), among other markers.

      - CXCR5+CD4+ memory T cells have been shown to present multi-potency and plasticity, capable of differentiating to non-Tfh subsets upon re-challenge. Although authors included in their flow panel a good number of markers commonly used in combination to identify Tfh (CXCR5, PD-1, ICOS, Bcl-6, IL-21), they only used one single marker (CXCR5) as their basis to define Tfh, thus providing a weak definition for Tfh cells and follow up downstream analysis.

      Sorry for the confusion, even though the subsampled on the CD4<sup>pos</sup>CXCR5<sup>pos</sup> CD25<sup>neg</sup> cells to run our FlowSOM, we showed the different levels of expression across meta-clusters (figure 4 panels A and B) of PD1 (Tfh being PD1 positive cells) and ICOS (indicating the activation stage of the Tfh, “T Follicular Helper Cells” Methods and Protocols book from Springer 2015). We also included an overlay of the manually gated double positive Bcl6-cMAF cells on the CXCR5<sup>pos</sup>CD45RA<sup>neg</sup>CD25<sup>neg</sup> CD4 T cell UMAP plot to show that most of them express Bcl6 (supplemental figure 14). Interestingly, the manually gated IL21 positive cells were less abundant, particularly for children (supplemental figure 15). Because we were not able to include all the markers that are now used to define Tfh cells, we referred to our cell subsets as “TFH-like”. This is an acknowledged limitation of our study. Due to the limited blood volume obtained from children and cost of running multiplex flow cytometry assays, our results showing antigen-specific heterogeneity of Tfh subset will have to be validated in future studies that include these additional defining markers.

      - Previous works have used FACS-sorting and in vitro assays for cytokine production and B cell help to study the functional capacity of different cTfh subsets in blood from Plasmodium-infected individuals. In this study, authors do not carry out any such assays to isolate and evaluate the functional capacity of the different Tfh subsets identified. Thus, all the suggestions for the role that these different cTfh subsets may have in vivo in the context of malaria remain highly hypothetical.

      Unfortunately, low blood volumes obtained from children prevented us from running in vitro functional assays and the study design did not allow us to correlate them with protection. However, since the function of identified Tfh subsets from malaria-exposed individuals has been evaluated using Pf lysates in other studies, we referenced them when interpreting the differences we reported in Tfh subset recognition between malaria antigens. If either of these antigens move forward into vaccine trials, then evaluating their function would be important.

      - The authors have not included malaria unexposed control groups in their study, and experimental groups are relatively small (n=13).

      This study design did not include the recruitment of malaria naive negative controls as its goal was to assess malaria antigen-specific responses comparing the quality and abundance between malaria-exposed children to adults to these potential new vaccine targets PfSEA-1A and PfGARP. We did however test 3 malaria-naive adults and found no non-specific activation after stimulation with these two malaria antigens. Since this was done as part of our assay optimization, we did not feel the need to show these negative findings.

      And even with our small sample size, we demonstrated significant age-associated differences in malaria antigen-specific responses from cT<sub>FH</sub>-like subsets.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Minor points are:

      (1) Line 88, cTfh cells are not only from GC-Tfh, they have GC-independent origin (He et al, PMID: 24138884).

      The following sentence was added line 88 “Interestingly, cT<sub>FH</sub> cells can also come from peripheral cT<sub>FH</sub> precursor CCR7<sup>low</sup>PD1<sup>high</sup>CXCR5<sup>pos</sup> cells; thus, they also have a GC-independent origin (He, Cell, 2013 PMID: 24138884).

      (2) I believe all participants were free of blood-stage infection upon enrolment. But can authors clearly state this information between lines 151-159?

      We mentioned in the methods, line 495-496 “Participants were eligible if they were healthy and not experiencing any symptoms of malaria at the time venous blood was collected”. However, using qPCR we found 5 children with malaria blood stage. As shown in Author response image 2, comparing malaria free to blood-stage children, no differences were observed without any stimulation. However, MC03 is more abundant upon malaria antigen stimulation in the blood-stage group whereas MC04 is more abundant in the malaria free group upon PfGARP stimulation only confirming that our stimulation worked.

      Author response image 2.

      Reviewer #3 (Recommendations For The Authors):

      (1) The strategy for gating on antigen-specific cTfh cells needs to be revised. The correct approach would be to gate on those cells that respond by de-novo expression of activation markers upon antigen restimulation (also termed activation-induced markers. e.g. CD69, CD40L, CXCL13 and IL-21, Niessl 2020; CD69, CD40L, CD137 and OX40, Lemieux 2023; CD137 and OX40, Grifoni 2020). As it stands, the study is not really on antigen-specific T cells, but rather on the overall CD4 T cell compartment plus or minus antigenic stimulation.

      We recognized the limitation in our flow panel design which prevents us from performing this gating. We originally based our panel design on the “T follicular helper cells methods and protocols” book (Springer 2015) which used CD45RA, CD25, CXCR5, CCR6, CXCR3, CCR7, ICOS and PD1 to define cT<sub>FH</sub>. We had already optimized our 21-color panel, purchased reagents and started to run our experiments by the time these publications modified how to define TFH cells Niessl, Lemieux and Grifoni’s publication. Indeed we optimized and performed our assay from November 2019 to March 2020, finishing to run the samples during the first quarantine. Because of the urgent needs of research on SARS-CoV-2 that we were involved with from this time and moving forward, the analysis of our TFH work got highly postponed. Moreover, 2020 is also the year where many TFH papers came out with better ways to define cT<sub>FH</sub> and responses to antigen stimulations. In our future studies, our panel will include AIM.

      (2) It is not clear if the antigenic stimulation actually worked. Does the proportion of IFNg+ or IL-4+ or IL-21+ or CD40L+ or CD25+ CD4 or CD8 T cells increase following in vitro antigen restimulation?

      Yes, using manual gating, we are able to show an increase of IL4 (supplemental figure 16 panel B and C), and IL21 (supplemental figure 15 panel J and K) production in both children and adults. However, we did not observe significant production of IFNg (supplemental figure 15, panel C) and changes in CD40L expression (supplemental figure 7) after malaria antigen stimulation, however, our positive control SEB worked. So, yes our stimulation assay worked but these 2 malaria antigens did not significantly induce these cytokines. This could be that they are too low to detect in every participant since they are single antigens and not whole parasite lysates, as other studies have used. It could also be that these antigens don’t stimulate CD40L or IFNg in all our participants. We brought up this limitation as follow in the discussion, line 473: “Although the heterogeneity in the response of CD40L and IFNγ suggests that our tested malaria antigens did not induce significant differences in the expression of these markers in all our participants, our panel did not include other activated induced markers, such as OX40, 4-1BB, and CD69”.

      (3) It is not clear what is the proportion of cTfh over the total CD4 T cell compartment among the different groups. Does this vary among different groups? It would be valuable to display this as an old-fashioned combination of contour plots with outliers for illustrating flow cytometry and bar graphs for the cumulative data.

      The proportion of CD3<sup>pos</sup>CD4<sup>pos</sup>CD25<sup>neg</sup>CXCR5<sup>pos</sup> cTfh cells did not differ within the total number of CD4 T cells between groups (figure 2).

      (4) The gating strategy could be refined and become more robust if adding additional markers in combination with CXCR5 for identifying cTfh (e.g. CXCR5+Bcl6+).

      Thank you for this suggestion. An overlay of Bcl6 expression can be found in supplemental figure 14 where we confirm that our CXCR5+ cT<sub>FH</sub>-like subsets express cMAF and Bcl6.

      (5) The protocols for intracellular and intranuclear staining seem to be incomplete in Materials and Methods. In particular, cell permeabilization strategies seem to be missing.

      Our apologies for this oversight, we added the following sentences in the methods line 545: “Cells were fixed and permeabilized for 45 mins using the transcription factor buffer set (BD Pharmingen) followed by a wash with the perm-wash buffer. Intracellular staining was performed at 4 °C for 45 more mins followed by two washes using the kit’s perm-wash buffer”.

      (6) In Materials and Methods, the authors mention they have used fluorescence minus one control to set their gating strategy. It would be valuable to show these, either on the main body or as part of supplementary figures.

      We added the cytoplots of the FMOs and/or negative controls as appropriate in the supplemental figures 14 (cMAF and Bcl6), 15 (IFNg and IL21) and 16 (IL4 and IL21).

      (7) Line 194 and Figure 3, it is not clear the criteria that the authors used for down-sampling events before FlowSOM analysis. Was this random? Was this done with unstimulated or stimulated samples?

      We chose to down-sample on CD3posCD4<sup>pos</sup>CD25<sup>neg</sup>CD45RA<sup>neg</sup> and CXCR5<sup>pos</sup> cells prior to our FlowSOM to allow more cluster analysis to focus only on the differences among those cells. The down-sampling used 1,000 CD3posCD4<sup>pos</sup>CD25<sup>neg</sup> CD45RA<sup>neg</sup>CXCR5<sup>pos</sup> cells from each fcs file (unstimulated and stimulated samples). If the fcs file had more than 1,000 CXCR5<sup>pos</sup> cells, the down-sampling was done randomly by the OMIQ platform algorithm to select only 1,000 CXCR5<sup>pos</sup> cells within this specific fcs file. The latest sentence was added to the methods line 593.

      (8) Lanes 201, 202, As it stands, the take of the authors on the role of different cTfh subsets during infection remains highly speculative. Are these differences in cTfh phenotypes actually reflected in their in vitro capacity to provide B cell help (e.g. as in the Obeng-Adjei 2015 paper) or to produce IL-21, express co-stimulatory molecules, or any other characteristic that would allow them to better infer their functional roles during infection? Any additional in vitro analysis of the functional capacity of isolated cTfh subsets identified in this research would greatly increase its value.

      We agree with the reviewer that this sentence is speculative, and we rephrase it as follow: “First, we found different CXCR5 expression levels between meta-clusters (Figure 3b); CXCR5 is essential for cT<sub>FH</sub> cells to migrate to the lymph nodes and interact with B-cells”. We would have liked to perform in vitro functional assays. However, as explained above, we did not have sufficient cells collected from children to do so.

      (9) It is not clear why authors omitted IL-17 and did not use IFNg and IL-4 to refine their definition of Th1, Th2 and Th17 cTfh.

      We would have liked to include IL-17, however we were constrained by only having access to a 4 lasers cytometer at the time we ran our assay. In light of needing to prioritize markers, when we were designing our flow panel, cTfh1 were shown to be preferentially activated during episodes of acute febrile malaria children (Obeng-Adjei). Therefore, we chose to focus on IFNg and IL4 to differentiate Tfh1 from Tfh2, in addition to other markers as surrogate of functional potential. We did not use IFNg and IL4 to refine our definition of Tfh1, Tfh2 and Tfh17 as recent publications have shown that IL4 is not only expressed in Tfh2 but also in the other Tfh subsets, at lower intensity (Gowthaman among others). Therefore IFNg and IL4 by themselves were not sufficient to properly define the different Tfh subsets. In future studies, we plan to include transcription factor profiles (T-bet, BATF, GATA3) to further refine definitions of Tfh subsets.

      (10) Lines, 226, 228, based on the combination of markers that the MC03 subset expresses, it is tempting to think that this is the only "truly" committed Tfh subset from the entire analysis. Please, discuss.

      If the reviewer is referring to changes in marker expression levels that indicate they have not reached a level of differentiation that would make them reliable (ie “true) Tfh cells, we agree that this is an important question now that we have technology that can measure and analyse so many phenotypic markers at once. This brings forward the need for the scientific method - to replicate study findings to determine whether they are consistent given the same study design and experimental conditions.

      (11) Lines 243 244, Again, is this reflected in functional capacity?

      The study described in this manuscript did not include functional assays. However, this did not change the key finding that different malaria antigens behaved differently, demonstrating heterogeneity in Tfh recognition of malaria antigens. Regarding CD40L expression, we did not observe differences between groups, however some individuals had an increase of their CD40L (supplemental figure 7). It is possible that some individuals had responded through other activated induced markers (CD69, ICOS, OX40, 4-1BB among others) and that our stimulation condition was not long enough to assess CD40L expression upon malaria antigen stimulation. This limitation has been addressed by editing the line 243-244 as follows: “we were unable to find statistical differences in the CD40L expression between groups as only few individuals responded through it (supplemental figure 7).”

      (12) Lines 243, 244, Are these cTfh subsets exclusively detected in malaria-exposed individuals? This is confounded by the lack of a malaria unexposed control group in this study, which would have been highly valuable.

      We agree with the reviewer that having non-naive children would have been valuable as a negative control group. However, this study was conducted in Kenya where all children are suspected to have had at least one malaria infection. We also did not have ethical approval or the means to enroll children in the USA who would not have been exposed to malaria as a negative control group. Since we were also evaluating differences by age group, comparing US adults would not have helped to address this point. Therefore, this remains an open question that might be addressed by another study recruiting children in non-malaria endemic areas.

      (13) Line 267, as the authors have not gated on T cells de-novo expressing activation markers in response to antigen restimulation, how do they know these are indeed antigen-specific cTfh?

      Omiq analysis accounts for marker expression levels in the resting cells (unstimulated well) for each individual compared to each experimental/stimulated well. The algorithm computationally determines whether that expression level changed without an arbitrary positive threshold, keeping the expression levels as a continuous variable, not dichotomous - which is the power of unbiased cluster analyses. Therefore, we know that these cells are antigen-specific based on the statistical difference in intensity expression between the resting cells and the stimulated ones. Nevertheless, manual gating to show “de-novo” responding cells, produced the same results as assessing the MFI of each meta-cluster (supplemental figures 14, 15 and 16).

      (14) Lines, 292-295, it is very surprising that Tfh cells would not produce IL-21 upon restimulation. Have the authors observed upregulation of IL-21 following SEB restimulation?

      Yes, we observed IL21 positive cells upon SEB stimulation (supplemental figure 15, panel J and K). However we found unexpectedly high background levels of IL21, specifically within the adult group (supplemental figure 15, panel K and M) making it challenging to find antigen-specific increases above background. Interestingly, an increase in IL21 using manual gating was observed upon PfSEA-1A or PfGARP stimulation in children (supplemental figure 15, panel J and L).

      (15) In Figures 3 and 4, it is not clear if there are any significant differences in expression of different markers between different cTfh subsets and/or different conditions. Moreover, the lack of differences in response to antigen stimulation seems to suggest that it did not work adequately.

      We intentionally chose 6-hours stimulation to better assess changes in cytokines which we did. However, because it is a short stimulation, we did not expect dramatic changes in the extracellular markers presented in the figure 3 and 4. A longer stimulation, such as 24h, will highlight properly these changes.

      (16) Figure 5b would benefit from bar graphs.

      Please find below the bar-graphs for the highlighted meta-clusters in figure 5b. We did not include these bar-graphs to our figure 5 as they do not bring new information. They repeat the information already presented through the EdgeR plot.

      Author response image 3.

      (17) Figures 6 and 7 would greatly benefit from showing individual examples of old-fashioned contour with outliers flow plots to illustrate the different cTfh subsets identified in the study.

      The different cT<sub>FH</sub> subsets can be found with a contour plot with outliers in the supplemental figure 4.

      (18) Figures 3,4, 6, and 7, the authors exclusively focused on the study of MFI to measure the expression of cytokine and transcription factors among different groups/stimulations. Have the authors observed any differences in the percentage or absolute counts of cytokine+ and/or TF+ between different subsets of cTfh and/or different conditions?

      Yes. We added the supplemental figures 14 (transcription factors) and 15/16 (cytokines) where cytokines and transcription factors were assessed using manual gating. We found that total CD4<sup>pos</sup>CXCR5<sup>pos</sup> IL4 was significantly increased upon stimulation in both adults and children while IFNg was not. However, we found significantly higher IFNg on total CD8<sup>pos</sup> cells showing that the stimulation worked, but the total CD4<sup>pos</sup>CXCR5<sup>pos</sup> did not express IFNg. Finally, we observed a trend of higher IL21<sup>pos</sup>CD4<sup>pos</sup>CXCR5<sup>pos</sup> in adults, not significant due to high background whereas IL21 was significantly increased upon stimulation in children. Regarding cMAF and Bcl6, both transcription factors were significantly increased upon stimulation within children only.

      (19) Figure 8, the definition for high and low PfGARP antibody titers seems rather arbitrary. Are these associations still significant when attempting a regular correlation analysis between Ab values (i.e. Net MFI) and different cTfh subsets?

      Yes, the definition for high and low PfGARP antibody levels is arbitrary but when looking at the antibody data (figure 1b), it was naturally bimodal. Therefore as a sub-analysis, we assess the association between PfGARP antibodies levels and cT<sub>FH</sub> subsets, see Author response image 4. We checked the correlation between the abundance of the meta-clusters and the level of IgG anti-PfGARP and anti-PfSEA after PfGARP and PfSEA stimulation. We also checked the correlation between the MFI expression of Bcl6 and cMAF after stimulation (PfGARP or PfSEA-1A minus the unstimulated) by the meta-clusters and the level of IgG anti-PfGARP and anti-PfSEA. However, we believe that because of our small sample size, our results are not robust enough and that we risk over-interpreting the data. Therefore, we choose not to include this analysis in the manuscript.

      Author response image 4.

      (20) The comprehensive 21-plex panel that authors used in this study could generate insights on additional immune cells beyond cTfh (e.g. additional CD4 T cell subsets, CD8 T cells, CD19 B cells). It is not clear why the authors limited their analysis to cTfh only.

      The primary goal of the study was to assess the cT<sub>FH</sub> response to malaria vaccine candidates. However, we were able to assess the IFNg expression for CD8 T cells upon stimulation using the manual gating as indicated in the supplemental figure 15. Without additional markers to more clearly define other CD4 T cell or B cell subsets, we do not believe this dataset would go deep enough into characterizing antigen-specific responses to malaria antigens that would yield new insight.

      (21) Minor point, the punctuation should be revised throughout the manuscript.

      Punctuation was revised throughout the manuscript by our departmental scientific writer Dr. Trombly, as per reviewer request.

    1. Reviewer #2 (Public Review):

      Assessment

      This study develops a potentially useful metric for quantifying codon usage adaptation – the Codon Adaptation Index of Species (CAIS) – that is intended to allow for more direct comparisons of the strength of selection at the molecular level across species by controlling for interspecies variation in amino acid usage and GC content. As evidence to support there claim CAIS better controls for GC content and amino acid usage across species, they note that CAIS has only a weak positive correlation with GC% (that does not stand up to multiple hypothesis testing correction) while CAI has a clear negative correlation with GC%. Using CAIS, they find better adapted species have more disordered protein domains; however, excitement about these findings is dampened due to (1) this result is also observed using the effective number of codons (ENC) and

      (2) concerns over the interpretation of CAIS as a proxy for the effectiveness of selection.

      Public Review

      Summary

      The goal of the authors in this study is to develop a more reliable approach for quantifying codon usage such that it is more comparable across species. Specifically, the authors wish to estimate the degree of adaptive codon usage, which is potentially a general proxy for the strength of selection at the molecular level. To this end, the authors created the Codon Adaptation Index for Species (CAIS) that attempts to control for differences in amino acid usage and GC% across species. Using their new metric, the authors observe a positive relationship between CAIS and the overall “disorderedness” of a species protein domains. I think CAIS has the potential to be a valuable tool for those interested in comparing codon adaptation across species in certain situations. However, I have certain theoretical concerns about CAIS as a direct proxy for the efficiency of selection sNe when mutation bias changes across species.

      Strengths

      (1) I appreciate that the authors recognize the potential issues of comparing CAI when amino acid usage varies and correct for this in CAIS. I think this is sometimes an under-appreciated point in the codon usage literature, as CAI is a relative measure of codon usage bias (i.e. only considers synonyms). However, the strength of natural selection on codon usage can potentially vary across amino acids, such that comparing mean CAI between protein regions with different amino acid biases may result in spurious signals of statistical significance.

      (2) The CAIS metric presented here is generally applicable to any species that has an annotated genome with protein-coding sequences. A significant improvement over the previous version is the implementation of software tool for applying this method.

      (3) The authors do a better job of putting their results in the context of the underlying theory of CAIS compared to the previous version.

      (4) The paper is generally well-written.

      Weaknesses

      (1) The previously observed correlation between CAIS and body size was due to a bug when calculating phylogenetic independent contrasts. I commend the authors for acknowledging this mistake and updating the manuscript accordingly. I feel that the unobserved correlation between CAIS and body size should remain in the final version of the manuscript. Although it is disappointing that it is not statistically significant, the corrected results are consistent with previous findings (Kessler and Dean 2014).

      (2) I appreciate the authors for providing a more detailed explanation of the theoretical basis model. However, I remain skeptical that shifts in CAIS across species indicates shifts in the strength of selection. I am leaving the math from my previous review here for completeness.

      As in my previous review, let’s take a closer look at the ratio of observed codon frequencies vs. expected codon frequencies under mutation alone, which was previously notated as RSCUS in the original formulation. In this review, I will keep using the RSCUS notation, even though it has been dropped from the updated version. The key point is this is the ratio of observed and expected codon frequencies. If this ratio is 1 for all codons, then CAIS would be 0 based on equation 7 in the manuscript – consistent with the complete absence of selection on codon usage. From here on out, subscripts will only be used to denote the codon and it will be assumed that we are only considering the case of r = genome for some species s.

      I think what the authors are attempting to do is “divide out” the effects of mutation bias (as given by Ei), such that only the effects of natural selection remain, i.e. deviations from the expected frequency based on mutation bias alone represents adaptive codon usage. Consider Gilchrist et al. GBE 2015, which says that the expected frequency of codon i at selection-mutation-drift equilibrium in gene g for an amino acid with Na synonymous codons is

      where ∆M is the mutation bias, ∆η is the strength of selection scaled by the strength of drift, and φg is the gene expression level of gene g. In this case, ∆M and ∆η reflect the strength and direction of mutation bias and natural selection relative to a reference codon, for which ∆M,∆η = 0. Assuming the selection-mutation-drift equilibrium model is generally adequate to model of the true codon usage patterns in a genome (as I do and I think the authors do, too), the Ei,g could be considered the expected observed frequency codon i in gene g

      E[Oi,g].

      Let’s re-write the  in the form of Gilchrist et al., such that it is a function of mutation bias ∆M. For simplicity we will consider just the two codon case and assume the amino acid sequence is fixed. Assuming GC% is at equilibrium, the term gr and 1 − gr can be written as

      where µx→y is the mutation rate from nucleotides x to y. As described in Gilchrist et al. MBE 2015 and Shah and Gilchrist PNAS 2011, the mutation bias . This can be expressed in terms of the equilibrium GC content by recognizing that

      As we are assuming the amino acid sequence is fixed, the probability of observing a synonymous codon i at an amino acid becomes just a Bernoulli process.

      If we do this, then

      Recall that in the Gilchrist et al. framework, the reference codon has ∆MNNG,NNG \= 0 =⇒ e−∆MNNG,NNG \=

      (1) Thus, we have recovered the Gilchrist et al. model from the formulation of Ei under the assumption that natural selection has no impact on codon usage and codon NNG is the pre-defined reference codon. To see this, plug in 0 for ∆η in equation (1).

      We can then calculate the expected RSCUS using equation (1) (using notation E[Oi]) and equation (6) for the two codon case. For simplicity assume, we are only considering a gene of average expression (defined as ). Assume in this case that NNG is the reference codon (∆MNNG,∆ηNNG \= 0).

      This shows that the expected value of RSCUS for a two codon amino acid is expected to increase as the strength of selection ∆η increases, which is desired. Note that ∆η in Gilchrist et al. is formulated in terms of selection against a codon relative to the reference, such that a negative value represents that a codon is favored relative to the reference. If ∆η = 0 (i.e. selection does not favor either codon), then E[RSCUS] = 1. Also note that the expected RSCUS does not remain independent of the mutation bias. This means that even if sNe (i.e. the strength of natural selection) does not change between species, changes to the strength and direction of mutation bias across species could impact RSCUS. Assuming my math is right, I think one needs to be cautious when interpreting CAIS as representative of the differences in the efficiency of selection across species except under very particular circumstances.

      Consider our 2-codon amino acid scenario. You can see how changing GC content without changing selection can alter the CAIS values calculated from these two codons. Particularly problematic appears to be cases of extreme mutation biases, where CAIS tends toward 0 even for higher absolute values of the selection parameter. Codon usage for the majority of the genome will be primarily determined by mutation biases,

      with selection being generally strongest in a relatively few highly-expressed genes. Strong enough mutation biases ultimately can overwhelm selection, even in highly-expressed genes, reducing the fraction of sites subject to codon adaptation.

      Peer review image 1.

      Peer review image 2.

      CAIS (Low Expression)

      Peer review image 3.

      CAIS (Average Expression)

      Peer review image 4.

      CAIS (High Expression)

      If we treat the expected codon frequencies as genome-wide frequencies, then we are basically assuming this genome made up entirely of a single 2-codon amino acid with selection on codon usage being uniform across all genes. This is obviously not true, but I think it shows some of the potential limitations of the CAIS approach. Based on these simulations, CAIS seems best employed under specific scenarios. One such case could be when it is known that mutation bias varies little across the species of interest. Looking at the species used in this manuscript, most of them have a GC content around 0.41, so I suspect their results are okay (assuming things like GC-biased gene conversion are not an issue). Outliers in GC content probably are best excluded from the analysis.

      Although I have not done so, I am sure this could be extended to the 4 and 6 codon amino acids. One potential challenge to CAIS is the non-monotonic changes in codon frequencies observed in some species (again, see Shah and Gilchrist 2011 and Gilchrist et al. 2015).

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This important study reveals the RelA/Stat3-dependent gene program in the liver influences intestinal homeostasis. The evidence supporting the conclusions is compelling, although some additional experiments will strengthen the study. The work will be of interest to scientists in gastrointestinal research fields.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this study, the authors showed that activation of RelA and Stat3 in hepatocytes of DSS-treated mice induced CYPs and thereby produced primary bile acids, particularly CDCA, which exacerbated intestinal inflammation.

      Strengths:

      This study reveals the RelA/Stat3-dependent gene program in the liver influences intestinal homeostasis.

      Our reply: We thank the reviewer for the positive feedback and for appreciating the strength of our study.

      Weaknesses:

      Additional evidence will strengthen the conclusion.

      (1) In Fig. 1C, photos show that phosphorylation of RelA and Stat3 was induced in only a few hepatocytes. The authors conclude that activation of both RelA and Stat3 induces inflammatory pathways. Therefore, the authors should show that phosphorylation of RelA and Stat3 is induced in the same hepatocytes during DSS treatment.

      Our reply: The reviewers have raised a pertinent issue in Figure 1, as later on in our study we suggest that the combined activation of Rela and Stat3 is critical for aggravating the colitogenic phenotype in the murine model.

      To address this issue, we have co-stained the fixed liver tissue of untreated and DSS-treated wild type mice with p-RelA (Ser536) and p-Stat3(Ser727) antibodies. Author response image 1 below shows the single staining for p-Rela (Ser536), pStat3 (Ser727), DAPI (to demarcate the nuclei) and merged image (p-Rela + pStat3).

      Author response image 1.

      Further, the signal intensity of p-RelA (Ser536) and p-Stat3(Ser727) per nuclei was calculated and plotted as a box plot. It is evident that the median of p-Rela and p-Stat3 signal intensity in DSS-treated samples is more than that of the control samples, suggesting that the majority of the treated hepatocytes have the presence of both p-Rela and p-Stat3 in the nuclei.

      Author response image 2.

      Further, we calculate the number of nuclei in the DSS-treated samples which are above the 90th percentile of the control samples (data has been provided in Author response table 1 below). We also calculate the percentage overlap of p-Rela to p-Stat3 and vice versa in Author response table 1 below.

      Author response table 1.

      Together our analysis concludes that indeed there is an activation of Rela and Stat3 in the same hepatocytes to generate the downstream effect that we observe in our study post-DSS treatment.

      (2) In Fig. 5, the authors treated mice with CDCA intraperitoneally. In this experiment, the concentration of CDCA in the colon of CDCA-treated mice should be shown.

      Our reply: We have experimentally examined if the CDCA supplemented intraperitoneally at the experimental dose used in our study, is reaching the colon or not. To quantify colonic CDCA we have performed targeted mass spectrometric studies and the data has been provided as a bar plot below.

      Author response image 3.

      It is evident from the plot that the CDCA levels are significantly higher in mice supplemented with CDCA as compared to their corresponding control (where only the vehicle was supplemented). The data has been added to the supplementary section S5b and the main text has been modified accordingly.

      Reviewer #2 (Public Review):

      Singh and colleagues employ a methodical approach to reveal the function of the transcription factors Rela and Stat3 in the regulation of the inflammatory response in the intestine.

      Strengths of the manuscript include the focus on the function of these transcription factors in hepatocytes and the discovery of their role in the systemic response to experimental colitis. While the systemic response to induce colitis is appreciated, the cellular and molecular mechanisms that drive such systemic response, especially those involving other organs beyond the intestine are an active area of research. As such, this study contributes to this conceptual advance. Additional strengths are the complementary biochemical and metabolomics approaches to describe the activation of these transcription factors in the liver and their requirement - specifically in hepatocytes - for the production of bile acids in response to colitis.

      Our reply: We express our gratitude to the reviewer for recognizing and appreciating the mechanistic insight provided by our work, and for considering it valuable in advancing conceptual understanding in the relevant field.

      Some weaknesses are noted in the presentation of the data, including a comprehensive representation of findings in all conditions and genotypes tested.

      Our reply: We thank the reviewer for the query and we have suitably modified the figures for a comprehensive representation of the findings, as described below:

      ● In Figure 2C, we have added the control alcian blue stained samples to clarify that there were no qualitative differences in the mucin levels observed in the relaΔhepstat3Δhep as compared to the wild type mice.

      ● We have also modified the figure 2D for a better presentation of the data.

      ● We have included histopathological analysis for the relaΔhepstat3Δhep mice in Figures S3a and S3b, following a format similar to the wild-type data previously provided as Figure S1a and S1b.

      ● For Figure 5C, the corresponding untreated samples with and without CDCA supplementation have been provided in the supplementary section Figure S5e.

      ● For Figure 2E, 3E, and 4C - the RT-qPCR data of the DSS-treated samples is plotted relative to their corresponding control samples, hence we only display two conditions in the bar plot. We have accordingly modified the figure legend for better clarity.

      Reviewer #3 (Public Review):

      Summary:

      The authors try to elucidate the molecular mechanisms underlying the intra-organ crosstalks that perpetuate intestinal permeability and inflammation.

      Strengths:

      This study identifies a hepatocyte-specific rela/stat3 network as a potential therapeutic target for intestinal diseases via the gut-liver axis using both murine models and human samples.

      Our reply: We thank the reviewer for appreciating the therapeutic potential of our work.

      Weaknesses:

      (1) The mechanism by which DSS administration induces the activation of the Rela and Stat3 pathways and subsequent modification of the bile acid pathway remains clear. As the authors state, intestinal bacteria are one candidate, and this needs to be clarified. I recommend the authors investigate whether gut sterilization by administration of antibiotics or germ-free condition affects 1. the activation of the Rela and Stat3 pathway in the liver by DSS-treated WT mice and 2. the reduction of colitis in DSS-treated relaΔhepstat3Δhep mice.

      Our reply: We thank the reviewer for bringing up the aspect of gut microbiota in imparting colitis in our mice model. In accordance with reviewer's recommendation, we have sterilized the gut by administration of antibiotics, to evaluate if the intestinal bacteria are an important component leading to the activation of Rela and Stat3 pathway in the liver of DSS-treated WT mice or not.

      (a) A brief schematic representation of the experimental design has been provided below and the detailed description of the methods has been described in supplementary methods.

      Author response image 4.

      Extract of liver tissues from mice treated with DSS for 6 days with/without prior antibiotic treatment were probed with p-Stat3 (Ser727) to examine the activation status of the hepatic Stat3 pathway. We observe that the signals for p-Stat3 (Ser727) are comparatively reduced post antibiotic treatment as evident from the blot below. p-Stat3 (Ser727) was a prominent activation signal at Day 6 DSS treatment that we have observed in Figure 1D,E.

      Author response image 5.

      These studies suggest that the activation status of Stat3 activation is hampered by antibiotic treatment and considering that Rela and Stat3 have to coordinate activity, presumably the downstream activation will be modulated upon gut sterilization. However, it should be appreciated that a sterilized gut is not likely to be physiologically relevant and intestinal bacteria along with bile acid levels would modulate Rela/Stat3 pathways.

      b) It is likely that the hepatic deficiency of Rela and Stat3 may have modified the gut microbiome in relaΔhepstat3Δhep mice because of the altered bile composition. Moreover, the gut microbiota is a key component that guides the outcome of colitis. Hence, future studies are important to examine the role of the gut microbiome in imparting resistance in relaΔhepstat3Δhep mice, to colitogenic insults.

      (2) It has not been shown whether DSS administration causes an increase in primary bile acids, represented by CDCA, in the colon of WT mice following activation of the Rela and Stat3 pathways, as demonstrated in Figure 6.

      Our reply: In order to address the query, we would kindly like to request the reviewers to look at figure 4B where we show an increase in the CDCA levels of the colonic tissue, which is corresponding to our CDCA levels in the liver tissue (figure 4A) thus indicating that it may be driven by the hepatic Rela and Stat3 pathways.

      (3) The implications of these results for IBD treatment, especially in what ways they may lead to therapeutic intervention, need to be discussed.

      Our reply: We are grateful to the reviewer for bringing this topic for discussion.

      Until now, only immunosuppressive agents and immunomodulators have been conventionally considered as therapeutic measures to manage IBD. However, with increasing research on the role of hepatic bile acid metabolism during experimental colitis, its potential cannot be undermined in the clinical setting. The potential of bile acids as a therapeutic target has been harnessed in the past; bile acid sequestrants have been utilized as a treatment for hyperlipidemia 46. Remedies like fecal microbial transplantation, which serve to normalize the bile acid ratios in the gut, are emerging as potential therapeutics in the last decade for IBD 47, 40. However, the potential of altering hepatic bile metabolism has remained unexplored for IBD, possibly due to a lack of mechanistic insight. Towards this, our work demonstrates the pro-inflammatory potential of CDCA during colitis following the activation of the Rela/Stat3 pathway. The suppression of Rela/Stat3-induced CDCA could provide beneficial effects in IBD patients while protecting the basal bile acid levels (through FXR signaling). Thus our studies identify a hepatocyte-specific rela/stat3 network as a potential therapeutic target for intestinal diseases. Another approach could be the use of bile acid sequestrants, which will temporarily decrease the levels of primary bile acids in the colon until the proinflammatory pathways are dampened as a combinatorial therapy alongside existing treatments.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Minor:

      Fig. 4C should be Fig. 4D and vice versa.

      Our reply: We have swapped Fig. 4C and Fig. 4D and corresponding changes have been incorporated in the main text.

      Reviewer #2 (Recommendations For The Authors):

      Please make note of the following specific comments

      The immunostainings for phosphorylated p-Rela and STAT3 are unclear. Is there nuclear translocation of these phosphorylated transcription factors? Can the authors enumerate the percentage of cells in which nuclear translocation (presumably in hepatocytes) is detected?

      Our reply: We apologize that immunostainings for phosphorylated p-Rela and STAT3 are unclear to the reviewers. Here we have tried our best to make the data clear by analyzing the stained section and plotting them.

      To start with, we have co-stained the fixed liver tissue of untreated and DSS-treated wild type mice with p-RelA (Ser536) and p-Stat3(Ser727) antibodies, below we have provided a representative image used for analysis. To demarcate the nuclear boundary of the hepatocytes DAPI was used and the signal intensity for p-RelA (Ser536) and p-Stat3(Ser727) was quantified using ZenBlue software.

      Author response image 6.

      Below we have provided the box plot for the calculated nuclear intensities in the control (untreated) and DSS-treated samples for p-Rela and p-Stat3. We can clearly see that the median of p-Rela and p-Stat3 signal intensity in DSS-treated samples is more than that of the control samples, suggesting that the majority of the treated hepatocytes have the translocation of p-Rela and p-Stat3 in their nuclei.

      Author response image 7.

      The figure legends for Figures 2C and D are flipped. Please correct.

      Our reply: Thank you for pointing it out, our apologies for the error and we have corrected the figure 2 accordingly.

      For all H&E stainings, the authors should include histological scoring disease severity.

      Our reply: Thank you for the query put forward, histological scoring to quantify the qualitative data obtained through microscopy is given below. Dot plot for the histological scoring of the H&E data for untreated and DSS-treated colon samples, we have referred to the scale described by Ren Y et al. 2019 (doi: 10.1038/s41598-019-53305-z) to score the sections.

      Author response image 8.

      We have added the dot plot to supplementary figure 2d, also the method applied for the above analysis has been described in the supplementary method section.

      Please include Alcian Blue Staining in non-DSS treated WT and rel/stat3 double cKO mice.

      Our reply: Thank you for pointing this out, we have added the Alcian Blue Staining of non-DSS treated WT and rel/stat3 double KO mice to figure 2C

      For Figure 3C, can the authors indicate in the figure itself which bile acid is being represented (not only in the Figure legend)?

      Our reply: Thank you for the suggestion we have indicated the respective bile acid in Figure 3C for better understanding.

      As these data are from untargeted metabolomics, were other bile acids detected?

      Our reply: This is a part of a separate study conducted by our collaborator, and will form a part of a new manuscript which will be focussed on human studies.

      Can the authors validate the downregulation of key enzymes shown in Figure 3D, E at the protein level?

      Our reply: We agree with the reviewer’s comment, that mRNA levels are not critical determinants of activation of any pathway, rather an indicator of probable activation. In that scenario, the estimation of protein levels is more determinative. But taking into consideration that we have the metabolomic data in subsequent figures (as in Figure 4 A, B) supporting our findings in Figure 3D, E, this makes RT-qPCR data a more robust indicator of an activated hepatic bile acid biosynthesis machinery.

      The figure legends for Figures 4C and D are flipped. Please correct.

      Our reply: Taking into consideration the suggestions by reviewer 1 we have swapped Fig. 4C and Fig. 4D and corrected the legend placement accordingly, thank you for pointing this out.

      Also, please include representative images for the data represented in 4C.

      Our reply: Thank you for the query, we have already added the representative images of confocal microscopy as figure S4.

      Figure 5B should indicate that the data presented is from double cKO mice.

      Our reply: We have indicated that the colon length data is from double KO animals in figure to make the visual representation clear for the readers, thank you for the concern.

      Please correct typos: "entrocytic" and "Untread" in Figure Legend 5.

      Our reply: Thank you for pointing out the error in the Legend, we apologize for the error in these errors we have corrected Figure 5.

      Figure S4 includes a dataset (qPCR for Mmp3) that is not described. Neither Figure S4 nor S5 are described in the text.

      Our reply: Thank you for the query, firstly we have already added Figure S4 and S5 to the text, our apologies that it has not been properly highlighted.

      Secondly, the data for RT-qPCR for Mmp3 has been removed from supplementary figures as it may not be very relevant to the study.

      Overall, the manuscript should be edited to ensure the correct use of English. Please also note that the last name of the first author seems to be missing in the main text.

      Our reply: Thank you for the suggestion we have re-checked the manuscript for the probable errors and rectified them. The first author has a single name (with no surname) and we would like to correct that during the final print of the manuscript.

      Reviewer #3 (Recommendations For The Authors):

      (1) The authors need to show if DSS treatment affects the serological or histological changes in the liver of relaΔhepstat3Δhep mice.

      Our reply: To address that, we have analyzed key serological markers of liver damage as well as looked into tissue histology.

      The pathophysiological parameters of the liver of DSS treated relaΔhepstat3Δhep mice has been added to the revised manuscript as figure S3a and S3b. Here we show that the serological parameters are within the physiological range upon DSS treatment (Author response image 9a). Besides, the histological parameters remain unaltered as compared to the control tissue (Author response image 9b).

      Cumulatively, both at the tissue level and functional level, there is not much effect of DSS

      treatment on liver of relaΔhepstat3Δhep mice.

      Author response image 9.

      (2) It is recommended to use a second model to verify if this phenomenon is applicable to colitic status in general.

      Our reply: We appreciate the query put forward, this is an ongoing study and we hope to examine further the role of hepatic RelA and Stat3 in TNBS-induced colitis model and in T cell transfer model of colitis.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, Liu et al. present CROWN-seq, a technique that simultaneously identifies transcription-start nucleotides and quantifies N6,2'-O-dimethyladenosine (m6Am) stoichiometry. This method is derived from ReCappable-seq and GLORI, a chemical deamination approach that differentiates A and N6-methylated A. Using ReCappable-seq and CROWN-seq, the authors found that genes frequently utilize multiple transcription start sites, and isoforms beginning with an Am are almost always N6-methylated. These findings are consistently observed across nine cell lines. Unlike prior reports that associated m6Am with mRNA stability and expression, the authors suggest here that m6Am may increase transcription when combined with specific promoter sequences and initiation mechanisms. Additionally, they report intriguing insights on m6Am in snRNA and snoRNA and its regulation by FTO. Overall, the manuscript presents a strong body of work that will significantly advance m6Am research.

      Strengths:

      The technology development part of the work is exceptionally strong, with thoughtful controls and well-supported conclusions.

      We appreciate the reviewer for the very positive assessment of the study. We have addressed the concerns below.

      Weaknesses:

      Given the high stoichiometry of m6Am, further association with upstream and downstream sequences (or promoter sequences) does not appear to yield strong signals. As such, transcription initiation regulation by m6Am, suggested by the current work, warrants further investigation.

      We thank the reviewer for the insightful comments. We have softened the language related to m<sup>6</sup>Am and transcription regulation. We totally agree with the reviewer that future investigation is required to determine the molecular mechanism behind m<sup>6</sup>Am and transcription regulation.

      Reviewer #2 (Public review):

      Summary:

      In the manuscript "Decoding m6Am by simultaneous transcription-start mapping and methylation quantification" Liu and co-workers describe the development and application of CROWN-Seq, a new specialized library preparation and sequencing technique designed to detect the presence of cap-adjacent N6,2'-O-dimethyladenosine (m6Am) with single nucleotide resolution. Such a technique was a key need in the field since prior attempts to get accurate positional or quantitative measurements of m6Am positioning yielded starkly different results and failed to generate a consistent set of targets. As noted in the strengths section below the authors have developed a robust assay that moves the field forward.

      Furthermore, their results show that most mRNAs whose transcription start nucleotide (TSN) is an 'A' are in fact m6Am (85%+ for most cell lines). They also show that snRNAs and snoRNAs have a substantially lower prevalence of m6Am TSNs.

      Strengths:

      Critically, the authors spent substantial time and effort to validate and benchmark the new technique with spike-in standards during development, cross-comparison with prior techniques, and validation of the technique's performance using a genetic PCIF1 knockout. Finally, they assayed nine different cell lines to cross-validate their results. The outcome of their work (a reliable and accurate method to catalog cap-adjacent m6Am) is a particularly notable achievement and is a needed advance for the field.

      Weaknesses:

      No major concerns were identified by this reviewer.

      We thank the reviewer for the positive assessment of the method and dataset. We have addressed the concerns below.

      Mid-level Concerns:

      (1) In Lines 625 and 626, the authors state that “our data suggest that mRNAs initate (mis-spelled by authors) with either Gm, Cm, Um, or m6Am.” This reviewer took those words to mean that for A-initiated mRNAs, m6Am was the ‘default’ TSN. This contradicts their later premise that promoter sequences play a role in whether m6Am is deposited.

      We thank the reviewer for the comment. We have changed this sentence into “Instead, our data suggest that mRNAs initiate with either Gm, Cm, Um, or Am, where Am are mostly m<sup>6</sup>Am modified.” The revised sentence separates the processes of transcription initiation and m<sup>6</sup>Am deposition, which will not confuse the reader.

      (2) Further, the following paragraph (lines 633-641) uses fairly definitive language that is unsupported by their data. For example in lines 637 and 638 they state “We found that these differences are often due to the specific TSS motif.” Simply, using ‘due to’ implies a causative relationship between the promoter sequences and m6Am has been demonstrated. The authors do not show causation, rather they demonstrate a correlation between the promoter sequences and an m6Am TSN. Finally, despite claiming a causal relationship, the authors do not put forth any conceptual framework or possible mechanism to explain the link between the promoter sequences and transcripts initiating with an m6Am.

      (3) The authors need to soften the language concerning these data and their interpretation to reflect the correlative nature of the data presented to link m6Am and transcription initiation.

      For (2) and (3). We have softened the language in the revised manuscript. Specifically, for lines 633-641 in the original manuscript, we have changed “are often due to” into “are often related to” in the revised manuscript, which claims a correlation rather than a causation.

      Reviewer #3 (Public review):

      Summary:

      m6Am is an abundant mRNA modification present on the TSN. Unlike the structurally similar and abundant internal mRNA modification m6A, m6Am’s function has been controversial. One way to resolve controversies surrounding mRNA modification functions has been to develop new ways to better profile said mRNA modification. Here, Liu et al. developed a new method (based on GLORI-seq for m6A-sequencing), for antibody-independent sequencing of m6Am (CROWN-seq). Using appropriate spike-in controls and knockout cell lines, Liu et al. clearly demonstrated CROWN-seq’s precision and quantitative accuracy for profiling transcriptome-wide m6Am. Subsequently, the authors used CROWN-seq to greatly expand the number of known m6Am sites in various cell lines and also determine m6Am stoichiometry to generally be high for most genes. CROWN-seq identified gene promoter motifs that correlate best with high stoichiometry m6Am sites, thereby identifying new determinants of m6Am stoichiometry. CROWN-seq also helped reveal that m6Am does not regulate mRNA stability or translation (as opposed to past reported functions). Rather, m6Am stoichiometry correlates well with transcription levels. Finally, Liu et al. reaffirmed that FTO mainly demethylates m6Am, not of mRNA but of snRNAs and snoRNAs.

      Strengths:

      This is a well-written manuscript that describes and validates a new m6Am-sequencing method: CROWN-seq as the first m6Am-sequencing method that can both quantify m6Am stoichiometry and profile m6Am at single-base resolution. These advantages facilitated Liu et al. to uncover new potential findings related to m6Am regulation and function. I am confident that CROWN-seq will likely be the gold standard for m6Am-sequencing henceforth.

      Weaknesses:

      Though the authors have uncovered a potentially new function for m6Am, they need to be clear that without identifying a mechanism, their data might only be demonstrating a correlation between the presence of m6Am and transcriptional regulation rather than causality.

      We thank the reviewer for the very positive assessment of the CROWN-seq method. We have softened the language which is related to the correlation between m<sup>6</sup>Am and transcription regulation.

      Reviewer recommendations:

      We thank the reviewers for their constructive suggestions. In the revised manuscript, we have corrected the errors and updated the requested discussions and figures.

      Reviewer #1 (Recommendations for the authors):

      (1) The prior work from the research group, "Reversible methylation of m6Am in the 5′ cap controls mRNA stability" (PMID: 28002401), should be cited, even if the current findings differ from earlier conclusions-particularly in line 58 and the section titled "m6Am does not substantially influence mRNA stability or translation".

      We thank the reviewer for this comment. We have added the citation.

      (2) I wonder why the authors chose to convert A to I before capping and recapping, as RNA fragmentation caused by chemical treatment may introduce noise into these processes.

      We thank the reviewer for this comment. This is a very good point. We have indeed considered this alternative protocol. There are two concerns in performing decapping-and-recapping before A-to-I conversion: (1) it is unclear whether the 3’-desthiobiotin, which is essential for the 5’ end enrichment, is stable or not during the harsh A-to-I conversion; (2) performing decapping-and-recapping first requires more enzyme and 3’-desthiobiotin-GTP, which are the major cost of the library preparation. This is because the input of CROWN-seq (~1 μg mRNA) is much higher than that in ReCappable-seq (~5 μg total RNA or ~250 ng mRNA). In the current protocol, many 5’ ends are highly fragmented and therefore are lost during the A-to-I conversion. As a result, less enzyme and 3’-desthiobiotin-GTP are needed.

      (3) During CROWN-seq benchmarking, the authors found that 93% of reads mapped to transcription start sites, implying a 7% noise level with a spike-in probe. This noise could lead to false positives in TSN assignments in real samples. It appears that additional filters (e.g., a known TSS within 100 nt) were applied to mitigate false positives. If so, I recommend that the authors clarify these filters in the main text.

      We thank the reviewer for this comment. We think that the spike-in probes might lead to an underestimation of the accuracy of TSN mapping. The spike-in probes are made by in vitro transcription with m<sup>7</sup>Gpppm<sup>6</sup>AmG or m<sup>7</sup>GpppAmG analogs. We found that the in vitro transcription exhibits a small amount of non-specific initiation, which leads to spike-in probes with 5’ ends that are not precisely aligned with the desired TSS. To better illustrate the mapping accuracy of CROWN-seq, we provided Figure 2H, which compares the non-conversion rates of newly found A-TSNs between wild-type and PCIF1 knock cells. If the newly found A-TSNs are real, they should show high non-conversion rates in wild-type cells (i.e., high m<sup>6</sup>Am) and almost zero non-conversion rates (i.e., Am) in PCIF1 knockout cells. As expected, most of the newly found A-TSNs are true A-TSNs since they are m6Am in wild-type and Am in PCIF1 knockout. Thus, we think that CROWN-seq is very precise in TSS mapping. We have clarified this in the Discussion.

      (4) I wonder if PCIF1 knockout affects TSN choice and abundance. If not, this data should be presented. If so, how are these changes accounted for in Figure 2H and Figure S5?

      We thank the reviewer for this comment.  PCIF1 KO does not really affect TSN choice. Here we calculate the correlation of relative TSN expression within genes between wild-type and PCIF1 KO cells (shown using Pearson’s r). It shows that most of the genes have similar TSN choices (with higher Pearson’s r) in both wild-type and PCIF1 KO cells. Thus, PCIF1 KO does not alter global TSN expressions.

      Author response image 1.

      (5) The manuscript refers to Am as a rare modification in mRNA (e.g., introduction lines 101-102; discussion lines 574, 608; and possibly other locations) without specifying this only applies to transcription start sites. As this study does not cover entire mRNA sequences, these statements may not be misleading.

      We thank the reviewer for this comment.  We have clarified it.

      Reviewer #2 (Recommendations for the authors):

      (1) On line 122, the authors state that: "On average, a gene uses 9.5{plus minus}9 (mean and s.d., hereafter) TSNs (Figure 1A)." However, they do not discuss the dispersion apparent in the TSNs they observed. Figure panels 1A, B, and S1A, B show a range of 120 bases or less. What is the predominant range of distances between annotated TSNs and the newly identified ones?

      1a) For example, what percentage of new TSNs fall within 20? 50? 75? bases of the annotated sites? Additional text describing the distribution of these TSNs would help readers better understand the diversity inherent in these novel 5' RNA ends. Notably, this additional text likely is best placed in the CROWN-Seq section related to Figure 2 or S2.

      We thank the reviewer for this comment. We have updated Figure S2 to describe the newly found TSSs. Depending on the coverage in CROWN-seq, the TSSs with higher coverage tend to overlap with or locate proximally to known TSSs. In contrast, the TSSs with low coverage tend to be located further away from annotated TSSs.

      1b) The alternate TSNs can have effects on splicing patterns and isoform identity. Providing a few sentences to explain how regularly this occurs would be helpful.

      We thank the reviewer for this comment. It is a very interesting point. Different TSNs can indeed have different splicing patterns. Although the discovery of splicing patterns regulated by TSNs is out of the scope of this study, we have discussed this possibility in the revised Discussion section.

      (2) On Lines 241 and 242, the authors mentioned that 1284 sites were excluded from the analysis based on low (under 20-explained in the figure legend) read count, distance from TSS, or false negatives (which are not explained). Although I agree that the authors are justified in setting these reads aside, the information could be useful to readers willing to perform follow-up work if their mRNAs of interest were included in these 1284 sites.

      2a) An annotation of all of these sites (broken down by category, i.e. the 811, the 343, and the 130) as a supplementary table should be provided.

      We thank the reviewer for this comment. We have added the categories to the revised Table S1.

      (3) Although I have marked several typos/grammar mistakes in several parts of this review, others exist elsewhere in the text and should be corrected.

      We thank the reviewer for this comment. We have corrected them.

      (4) In lines 122 and 123 the authors say "Only ~9% of genes contain a single TSN (Figure 1A)." However, their figure shows 81% with a single TSN. Why is there a 10% discrepancy?

      We thank the reviewer for this comment. We have corrected the plot in Figure 1A, to match the description.

      (5) The first Tab of Table S2 is labeled 'Legend', but is blank. Is this intentional?

      We thank the reviewer for this comment. We have updated the table legends.

      (6) On lines 70 and 76 of the supplementary figure file pertaining to Figure S2, the legend labels for Figure S2E and S2F are not accurate, they need to be changed to G and H.

      (7) In Figure 4A 'percentile' is misspelled.

      (8) The color-coding legend for the 4 bases is missing from (and should be added to) Figure S4A.

      (9) On Lines 984, 1163, and 1194 the '2s' should be properly sub-scripted where appropriate.

      For (6) to (9). We thank the reviewer for finding these issues. We have now corrected them.

      Reviewer #3 (Recommendations for the authors):

      (1) The authors should discuss if their results can definitively distinguish between the SSCA+1GC motif promoting m6Am that, in turn, promotes transcription, versus the SCA+1GC motif promoting m6Am but also separately promoting transcription in a m6Am-independent manner. The authors should also discuss this in light of recent findings by An et al. (2024 Mol. Cell), which support the former conclusion.

      We thank the reviewer for the suggestion. We now have updated the Discussion to address that our paper and An et al. can support each other.

      (2) Given that the authors showed m6Am promotes gene expression (Figure 5) but does not affect mRNA stability (Fig. S5), logic dictates that m6Am must regulate mRNA transcription. However, the authors should explain why this regulation focuses on the initiation aspect of transcription rather than other aspects of transcriptional e.g. premature termination, pause release, and elongation.

      We thank the reviewer for this comment. In this study, we did not profile the 3’ ends of nascent RNAs and thus we can only make conclusions about the overall transcription process but not a specific aspect. We have updated the revised Discussion section to mention that An et al. discovered that m<sup>6</sup>Am can sequester PCF11 and thus promote transcription, and therefore some of the effects we see could be related to differential premature termination.

      (3) Authors should add alternative versions of Figure 1D but with 3 colours corresponding to Am vs. m6Am vs. Cm/Gm/Um for all the cells, they performed CROWN-seq on.

      We thank the reviewer for this comment. We have updated Figure S5 as the corresponding figure showing the fraction of Am vs. m6Am vs. Cm/Gm/Um.

      (4) Figure 2H (left): Please comment on the few outliers that still show high non-conversion even in PCIF1-KO cells.

      We thank the reviewer for this comment. We have discussed the outliers in the main text. These outliers can be found in the revised Table S3.

      (5) Line 254: "Second, if these sites were RNA fragments they would not contain m6Am." is missing a comma.

      (6) S2G and S2H labelling in Figure S2 legends is wrong.

      For (5) and (6). We thank the reviewer for these comments. We have corrected them.

      (7) Figure 3D: Many gene names are printed multiple times (e.g. ACTB is printed 5 times). Is this correct; is each dot representing 1 cell line?

      We thank the reviewer for this comment. These gene names represent different transcription-start nucleotides. We now clarify that each instance refers to a different start site.

      (8) S5A-C: Even if there's no substantial difference, authors should still display the Student's T-test P-values as they did for S5D-G.

      We thank the reviewer for this comment. We have updated the P-values.

      (9) Figure 5C and S5E: Why are the authors not showing the respective analysis for C-TSN and U-TSN genes?

      We thank the reviewer for this comment. Most mRNAs start with A or G. We therefore selected G-TSN as the control. Unlike G-TSNs which occur in diverse sequence and promoter contexts, C-TSNs and U-TSNs are unusual. Genes that mainly use C-TSNs and U-TSNs are the so-called “5’ TOP (Terminal OligoPyrimidine)” genes. The 5’ TOP genes are mostly genes related to translation and metabolism, and thus their expressions reflect the homeostasis of cell metabolism. Thus, we were concerned that any differential expression of the C-TSN and U-TSN genes between wild-type and PCIF1 knockout cells might reflect specific effects on TOP transcriptional regulation rather than the general effects of PCIF1 on transcription.

      (10) Line 82, 470, 506, 676: The authors should also cite Koh et al (2019 Nat. Comm.) in these lines that describe how snRNAs can also be m6Am-methylated and how FTO targets these same snRNAs for demethylation.

      We thank the reviewer for this comment. We have updated the citation.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public Review): 

      Summary: 

      This manuscript presents a method to infer causality between two genes (and potentially proteins or other molecules) based on the non-genetic fluctuations among cells using a version of the dual-reporter assay as a causal control, where one half of the dual-reporter pair is causally decoupled, as it is inactive. The authors propose a statistical invariant identity to formalize this idea. 

      We thank the referee for this summary of our work. 

      Strengths: 

      The paper outlines a theoretical formalism, which, if experimentally used, can be useful in causal network inference, which is a great need in the study of biological systems. 

      We thank the referee for highlighting the potential value of our proposed method.

      Weaknesses: 

      The practical utility of this method may not be straightforward and potentially be quite difficult to execute. Additionally, further investigations are needed to provide evidence of the broad applicability of the method to naturally occurring systems and its scalability beyond the simple circuit in which it is experimentally demonstrated. 

      We agree with these two points and have rewritten the manuscript, in particular highlighting the considerable future work that remains to be done to establish the broad applicability and scalability of our method.

      In the rewritten manuscript we explicitly spell out potential practical issues and we explicitly state that our presented proof–of–principle feasibility study does not guarantee that our method will successfully work in systems beyond the narrowly sampled test circuits. This helps readers to clearly distinguish between what we claim to have done from what remains to be done. The re-written parts and additional clarifications are:

      Abstract (p. 1), Introduction (p. 1-2), Sec. “Proposed additional tests” (p. 8), and “Limitations of this study” (p. 10).

      Reviewer #2 (Public Review): 

      Summary: 

      This paper describes a new approach to detecting directed causal interactions between two genes without directly perturbing either gene. To check whether gene X influences gene Z, a reporter gene (Y) is engineered into the cell in such a way that (1) Y is under the same transcriptional control as X, and (2) Y does not influence Z. Then, under the null hypothesis that X does not affect Z, the authors derive an equation that describes the relationship between the covariance of X and Z and the covariance of Y and Z. Violation of this relationship can then be used to detect causality. 

      The authors benchmark their approach experimentally in several synthetic circuits. In four positive control circuits, X is a TetR-YFP fusion protein that represses Z, which is an RFP reporter. The proposed approach detected the repression interaction in two or three of the positive control circuits. The authors constructed sixteen negative control circuit designs in which X was again TetR-YFP, but where Z was either a constitutively expressed reporter or simply the cellular growth rate. The proposed method detected a causal effect in one of the eight negative controls, which the authors argue is not a false positive, but due to an unexpected causal effect. Overall, the data support the practical usefulness of the proposed approach. 

      We thank the referee for their summary of our work.

      Strengths: 

      The idea of a "no-causality control" in the context of detected directed gene interactions is a valuable conceptual advance that could potentially see play in a variety of settings where perturbation-based causality detection experiments are made difficult by practical considerations. 

      By proving their mathematical result in the context of a continuous-time Markov chain, the authors use a more realistic model of the cell than, for instance, a set of deterministic ordinary differential equations. 

      We thank the referee for summarizing the value of our work. 

      Caveats: 

      The term "causally" is used in the main-text statement of the central theorem (Eq 2) without a definition of this term. This makes it difficult to fully understand the statement of the paper's central theorem without diving into the supplement.  

      We thank the referee for this suggestion. In the revised manuscript we now define causal effects right before the statement of the main theorem of the main text (p. 2). We have also added a definition of the causal network arrows in the caption of Fig. 1 to help readers better understand our central claim.

      The basic argument of theorem 1 appears to rely on establishing that x(t) and y(t) are independent of their initial conditions. Yet, there appear to be some scenarios where this property breaks down: 

      (1) Theorem 1 does not seem to hold in the edge case where R=beta=W=0, meaning that the components of interest do not vary with time, or perhaps vary in time only due to measurement noise. In this case x(t), y(t), and z(t) depend on x(0), y(0), and z(0). Since the distributions of x(0), y(0), and z(0) are unspecified, a counterexample to the theorem may be readily constructed by manipulating the covariance matrix of x(0), y(0), and z(0). 

      (2) A similar problem may occur when transition probabilities decay with time. For example, suppose that again R=0 and X are degraded by a protease (B), but this protease is subject to its own first-order degradation. The deterministic version of this situation can be written, for example, dx/dt=-bx and db/dt=-b. In this system, x(t) approaches x(0)exp(-b(0)) for large t. Thus, as above, x(t) depends on x(0). If similar dynamics apply to the Y and Z genes, we can make all genes depend on their initial conditions, thus producing a pathology analogous to the above example. 

      The reviewer does not know when such examples may occur in (bio)physical systems. Nevertheless, since one of the advantages of mathematics is the ability to correctly identify the domain of validity for a claim, the present work would be strengthened by "building a fence" around these edge cases, either by identifying the comprehensive set of such edge cases and explicitly prohibiting them in a stated assumption set, or by pointing out how the existing assumptions already exclude them.  

      We thank the referee for bringing to our attention these edge cases that indeed violate our theorem as stated. In the revised manuscript we have “built a fence” around these edge cases by adding two requirements to the premise of our theorem: First, we have added the requirement that the degradation rate does not decay to zero for any possible realization. That is, if beta(t) is the degradation rate of X and Y for a particular cell over time, then taking the time average of beta(t) over all time must be non-zero. Second, we have added the requirement that the system has evolved for enough time such that the dual reporter averages <x> and <y>, along with the covariances Cov(x, z_{k}) and Cov(y, z_{k}) have reached a time-independent stationary state.  

      With these requirements, no assumptions need to be made about the initial conditions of the system, because any differences in the initial conditions will decay away as the system reaches stationarity. For instance, the referee’s example (1) is not possible with these requirements because beta(t) can no longer remain zero. Additionally, example (2) is no longer possible because the time average of the degradation rate would be zero, which is no longer allowed (i.e., we would have that integral from 0 to T of b(0)exp(-t)/T dt =  0 when T goes to infinity). 

      Note that adding the condition that degradation cannot decay to exactly zero does not reduce the biological applicability of the theorem. But as the referee correctly points out any mathematical theorem needs to be accurately stated and stand on its own regardless of whether biological systems could realize particular edge cases. Also note, that the requirement that the cellular ensemble has reached a time-independent distribution of cell-to-cell variability can be (approximately) experimentally verified by taking snapshots of ensemble variability at two sufficiently separate different moments in time. 

      In response to the referee’s comment, we have added the above requirements when stating the theorem in the main text. We have also added the requirement of non-decay of the degradation rate to the definition of the system in SI Sec. 4, along with the stationarity requirement in theorem 1 in SI Sec 5. We have also added mathematical details to the proof of the invariant in SI Sec 5.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      This manuscript presents a method to infer causality between two genes (and potentially proteins or other molecules) based on the non-genetic fluctuations among cells using a version of the dual-reporter assay as a causal control, where one half of the dual-reporter pair is causally decoupled, as it is inactive. The authors propose a statistical invariant identity to formalize this idea. They propose and experimentally demonstrate the utility of this idea with a synthetic reporter system in bacteria. 

      The paper is well written and clearly outlines the principle, the mathematical invariant relationship both to give the reader an intuitive understanding of why the relationship must be true and in their mathematical derivation of the proof of Theorem 1. 

      The paper outlines a theoretical formalism, which, if experimentally used, can be useful in causal network inference, which is a great need in the study of biological systems. However, the practical utility of this method may not be straightforward and potentially be quite difficult to execute. We think this work could offer a platform to advance the field of network inference, but would encourage the authors to address the following comments. 

      We thank the reviewer for the positive comments on readability, summarizing the value of our work, as well as the critical comments below that helped us improve the manuscript.

      Major comments: 

      (1) Although the invariant identity seems theoretically sound, the data from synthetic engineered circuits in this manuscript do not support that the invariant holds for natural causal relations between genes in wild-type cells. In all the positive control synthetic circuits (numbers 1 to 4) the target gene Z i.e. RFP was always on the plasmid, and in circuit #4 there was an additional endogenous copy. The authors recapitulate the X-to-Z causality in circuits 1, 2, and 3 but not 4. Ultimately, the utility of this method lies in the ability to capture causality from endogenous correlations, this observation suggests that the method might not be useful for that task. 

      We thank the referee for their careful reading of our synthetic circuits and sincerely apologize for an error in our description of circuit #4 in the schematic of Table S2 of the supplement. We incorrectly stated that this circuit contained a chromosomally expressed RFP. In fact, in circuit #4 RFP was only on the plasmid just like in the circuits #1-3. We have corrected the schematic in the revised manuscript and have verified that the other circuits are correctly depicted.

      In the revised manuscript, we now explicitly spell out that all our “positive control” test cases had the genes of interest expressed on plasmids, and that we have not shown that our method successfully detected causal interactions in a chromosomally encoded gene regulatory circuit, see additional statements in Sec. “Causally connected genes that break the invariant” on p. 6. 

      In the absence of any explicit experimental evidence, it is then important to consider whether chromosomally encoded circuits are expected to cause problems for our method which is based on a fluctuation test. Due to plasmid copy number fluctuations, X and Z will fluctuate significantly more when expressed on plasmids than when expressed chromosomally. However, because this additional variability is shared between X and Z it does not help our analysis which relies on stochastic differences in X and Z expression due to “intrinsic noise” effects downstream of copy number fluctuations. The additional “extrinsic noise” fluctuations due to plasmid copy number variability would wash out violations of Eq. (2) rather than amplify them. If anything, we thus expect our test cases to have been harder to analyze than endogenous fluctuations. This theoretical expectation is indeed borne out by numerical test cases presented in the revised supplement where plasmid copy fluctuations severely reduced the violations of Eq. 2, see new additional SI Sec. 15. 

      Additionally, the case of the outlier circuit (number 12) suggests that exogenous expression of certain genes may lead to an imbalance of natural stoichiometry and lead to indirect effects on target genes which can be misinterpreted as causal relations. Knocking out the endogenous copy may potentially ameliorate this issue but that remains to be tested. 

      We agree with the referee that the expression of exogenous genetic reporters can potentially affect cellular physiology and lead to undesired effects. In the revised manuscript we now explicitly spell out that the metabolic burden or the phototoxicity of introducing fluorescent proteins could in principle cause artificial interactions that do not correspond to the natural gene regulatory network, see Sec. “Proposed additional tests” on p. 8.

      However, it is also important to consider that the test circuit #12 represents a synthetic circuit with genes that were expressed at extremely high levels (discussed in 3rd paragraph of Sec. “Evidence that RpoS mediated stress response affected cellular growth in the outlier circuit”, p. 8), which led to the presumed cellular burden. Arguably, natural systems would not typically exhibit such high expression levels, but importantly even if they did, our method does not necessarily rely on fluorescently tagged proteins but can, in principle, also be applied to other methods such as transcript counting through sequencing or in-situ hybridization of fluorescent probes.  

      Ultimately, the value of this manuscript will be greatly elevated if the authors successfully demonstrate the recapitulation of some known naturally existing causal and non-causal relations. For this, the authors can choose any endogenous gene Z that is causally controlled by gene X. The gene X can be on the exogenous plasmid along with the reporter and the shared promoter. Same for another gene Z' which is not causally controlled by gene X. Potentially a knockout of endogenous X may be required but it might depend  on what genes are chosen. 

      If the authors think the above experiments are outside the scope of this manuscript, they should at least address these issues and comment on how this method could be effectively used by other labs to deduce causal relations between their favorite genes. 

      Because a full analysis of naturally occurring gene interactions was beyond the scope of our work, we agree with the referee’s suggestion to add a section to discuss the limitations of our experimental results. In the revised manuscript we reiterate that additional investigations are needed to show that the method works to detect causal interactions between endogenous genes, see Abstract (p. 1), Introduction (p. 1-2), Sec. “Proposed additional tests” (p. 8), and “Limitations of this study”  (p. 9). In the original manuscript we explicitly spelled out how other researchers can potentially carry out this further work in the subsections titled “Transcriptional dual reporters” (p. 3) and ”Translational dual reporters” (p. 3).  In the revised manuscript, we have added a section “Proposed additional tests” (p. 8) in which we propose an experiment analogous to the one proposed by the referee above, involving an endogenous gene circuit found in E. coli, as an example to test our invariant. 

      (2) For a theoretical exposition that is convincing, we suggest the authors simulate a larger network (for instance, a network with >10 nodes), like the one shown schematically in Figure 1, and demonstrate that the invariant relationship holds for the causally disconnected entities, but is violated for the causally related entities. It would also be interesting to see if any quantification for the casual distance between "X" and the different causally related entities could be inferred.  

      We thank the referee for this suggestion. We have added SI Sec. 14 where we present simulation results of a larger network with 10 nodes. We find that all of the components not affected by X satisfy Eq. (2) as they must. However, it is important to consider that we have analytically proven the invariant of Eq. (2) for all possible systems. It provably applies equally to networks with 5, 100, or 10,000 components. The main purpose of the simulations presented in Fig. (2) is to illustrate our results and to show that correlation coefficients do not satisfy such an invariant. However, they are not used as a proof of our mathematical statements.

      We thank the referee for the interesting suggestion of quantifying a “causal distance”. Unfortunately, the degree to which Eq. (2) is violated cannot directly equate to an absolute measure for the “causal distance” of an interaction. This is because both the strength of the interaction and the size of the stochastic fluctuations in X affect the degree to which Eq. (2) is violated. The distance from the line should thus be interpreted as a lower bound on the causal effect from X to Z because we do not know the magnitude of stochastic effects inherent to the expression of the dual reporters X and Y. While the dual reporters X and Y are identically regulated, they will differ due to stochastic fluctuations. Propagation of these fluctuations from X to Z are what creates an asymmetry between the normalized covariances. In the most extreme example, if X and Y do not exhibit any stochastic fluctuations we have x(t)=y(t) for all times and Eq. (2) will not be violated even in the presence of a strong causal link from X to Z.

      However, it might be possible to infer a relative causal distance to compare causal interactions within cells.

      That is, in a given network, the normalized covariances between X, Y and two other components of interest Z1, Z2 that are affected by X can be compared. If the asymmetry between (η𝑥𝑧1 , η𝑦𝑧1) is larger than the asymmetry between (η𝑥𝑧2 , η𝑦𝑧2) , then we might be able to conclude that X affects Z1 with a stronger interaction than the interaction from X to Z2, because here the intrinsic fluctuations in X are the same in both cases. 

      In response to the referee’s comment and to test the idea of a relative causal distance, we have simulated a larger network made of 10 components. In this network, X affects a cascade of components called Z8, Z9, and Z10, see the additional SI Sec. 14. Here the idea of a causal distance can be defined as the distance down the cascade: Z8 is closest to X and so has the largest causal strength, whereas Z10 has the weakest. Indeed, simulating this system we find that the asymmetry between η𝑥𝑧8 and η𝑦𝑧8 is the largest whereas that between  η𝑥𝑧10 and η𝑦𝑧10 the smallest. We also find that all of the components not affected by X have normalized covariances that satisfy Eq. (2). This result suggests that the relative causal distance or strength in a network could potentially be estimated from the degree of the violations of Eq. (2). 

      However, we note that these are preliminary results. In the case of the specific regulatory cascade now considered in SI Sec. 14, the idea of a causal distance can be well defined. Once feedback is introduced into the system, this definition may no longer make sense. For instance, consider the same network that we simulate in SI Sec. 14, but where the most downstream component in the cascade, Z10, feeds back and affects X and Y. In such a circuit it is unclear whether Z8 or Z10 is “causally closer” to X. A more thorough theoretical analysis, equipped with a more universal quantitative definition for causal distance or strength, would be needed to deduce what information can be inferred from the relative distances in the violations of Eq. (2). While this defines an interesting research question, answering it goes beyond the scope of the current manuscript. 

      Minor comments: 

      - The method relies on the gene X and the reporter Y having the same control which would result in similar dynamics. The authors do not quantitatively compare the YFP and CFP expression if this indeed holds for the synthetic circuits. It would be useful to know how much deviation between the two can be tolerated while not affecting the outcome. 

      We thank the referee for their comment. The invariant of Eq. (2) is indeed only guaranteed to hold only when the transcription rate of Y is proportional to that of X. How much levels of X and Y covary depends on the stochastic effects intrinsic to the expression of the dual reporters as well as how similar the transcriptional control of X and Y is. The stochastic difference between X and Y is exactly what we exploit. 

      However, in the limit of high YFP and CFP levels, intrinsic fluctuations that cause stochastic expression differences between X and Y become negligible and we can directly infer whether they are indeed tightly co-regulated from time-traces: Below, we show two single cell traces taken with our experimental setup in which the YFP and CFP fluorescence trajectories are almost exactly proportional. Both of these traces are from circuit #10 as defined in Table. S4. 

      Author response image 1.

      We chose the above traces because they showed the highest correlation between YFP and CFP levels. Other traces for lower expression levels have lower correlations due to effects of intrinsic noise (see Tables S2-S4). However, the existence of one trace in which YFP is almost perfectly proportional to CFP throughout can only occur if the YFP and CFP genes are under the same control. And, since the control of YFP and CFP genes in all of our synthetic circuits are identical (with the same promoters and plasmid positions), these data strongly suggest that our dual reporters are tightly co-regulated in all the synthetic circuits. Moreover, the negative control experiments presented in Fig. 3E provide a natural consistency check that the YFP and CFP are under the same control and satisfy Eq. (1).

      We agree that it would be useful to know how much the X and Y production rates can differ for Eq. (2) to hold. Importantly, our proven theorem already allows for the rates to differ by an unspecified proportionality constant. In response to the referee’s comment we have derived a more general condition under which our approach holds. In the newly added SI Sec. 7 we prove that Eq. (2) holds also when rates differ as long as the difference is stochastic in nature with an average of zero. We also prove that Eq. (2) holds in the face of multiplicative noise that is independent of the X and Y production rates.

      However, the production rates of X and Y cannot differ in all ways. Some types of differences between the X and Y production rates can lead to deviations of Eq. (2) even when there is no causal interaction. To highlight this, we added the results of simulations of a toy model in which the X and Y production rates differ by an additive noise term that does not average to zero, see Fig. S19B of the newly added SI Sec. 7.

      - The invariant should potentially hold true for any biological species that are causally related e.g. protein-protein interactions. Also, this method could potentially find many applications in eukaryotic cells. Although it's outside the scope of current work to experimentally demonstrate such applications, the authors should comment on experimental strategies to apply this method to overcome potential pitfalls (e.g. presence of enhancers in eukaryotic cells). 

      We thank the referee for this suggestion. We agree that there are potential pitfalls that could come into effect when our proposed approach is applied on more complex systems such as eukaryotic gene expression. In response to the referee’s comment, we have added an explicit discussion of these potential pitfalls in the discussion section “Limitations of this study” (see p. 10). 

      In particular, in eukaryotes there are many genes in which promoter sequences may not be the sole factor determining transcription rates. Other factors that can be involved in gene regulation include the presence of enhancers, epigenetic modifications, and bursts in gene expression, to name a few. We thus propose a few strategies, which include positioning the passive reporter at a similar gene loci as the gene of interest, measuring the gene regulation activities of the gene of interest and its passive reporter using a separate method, and exploiting the invariant with a third gene, where it is known there is no causal interaction, as a consistency check. In addition, we include in the SI a new section SI Sec. 8 which shows that the invariant holds in the face of many types of bursty gene expression dynamics.

      However, the above is not a comprehensive list. Some of the issues the referee mentions are serious and may not be straightforward to overcome. We now spell this out explicitly in the revised manuscript (p. 10). 

      - In the legend of Fig. 1, the sentence "Data points here are for..." is missing a few words, or needs to be rephrased. 

      We thank the referee for this comment. We have rewritten the figure caption, which now reads “Data points are numerical simulations of specific example networks (see SI for details) to illustrate the analytically proven theorem of Eq. 2.”

      - Fig. 2 talks about the uncertainties associated with each point on the scatter plots. However, it is difficult to understand the quantification in such a plot. It would be great to have a plot quantifying the uncertainties in the invariant relation for the different topologies studied, specifically in order to understand if one topology is consistently deviating more from the x=y line than the other topologies studied here.  

      We thank the referee for this suggestion. In the supplement of the revised manuscript we have added supplemental Figs. S3, S4, and  S5 to separately quantify the uncertainty of the difference processes plotted in Fig. 2 and have added a new section (SI Sec. 11) to discuss the processes simulated in Fig. 2 in more detail. In short, each simulated process generated less than ~5% of outliers when considering 95% confidence intervals (with the max percentage deviation being 5.01% for process 5, see Fig. S5). These outliers were then simulated over a larger number of simulations to reduce the sampling error, which resulted in 0% of outliers (see Sec. “Confidence intervals for finite sampling error” on Materials and Methods on p. 11). Some simulated processes generated larger percentage errors in the normalized covariances than others, but this is expected as different processes have different dynamics which will result in different degrees of sampling of the underlying distributions.

      Note, that the invariant of Eq. 2 is analytically proven for all tested topologies as none of the topologies include a causal effect from X to Z. Any deviation of the numerical data from the straight line prediction of Eq. 2 (right column in Fig. 2C) is due to the finite sampling of a stochastic process to estimate the true covariance from the sampling covariance. Any given parameter set was simulated several times which allowed us to estimate the sampling error from differences in between repeated samples. In the additional SI figures we now quantify this error for the different topologies. 

      In addition to the above changes we want to highlight that the purpose of the simulations presented in Fig. (2) is not to prove our statements or explore the behavior of different topologies. The purpose of the data presented in the right column of Fig. 2C is to illustrate the theoretical invariant and act as a numerical sanity check of our analytically proven result. In contrast, the data in the left column of Fig 2C illustrates that the correlations do not satisfy an invariant like Eq. 2 which applies to covariances but not correlations.  

      - The legend for Fig. 3 seems to end abruptly. There likely needs to be more.  

      We thank the referee for catching this mistake. We have corrected the accidentally truncated figure caption of Fig. 3.

      - There is a typo in equation (5.3) on page 23 of supplementary material, there should be x instead of y in the degradation equation of x. 

      We thank the referee for catching this mistake which has been corrected in the revised manuscript.

      - In the supplemental material, to understand the unexpected novel discovery of causality, Figure S5 is presented. However, this doesn't give the context for other negative controls designed, and the effect of rfp dynamics (which can be seen in the plots both in the main paper and the supplement) in the growth rate of cells in those constructs. As a baseline, it would be nice to have those figures.  

      We thank the referee for this suggestion. We have now included representative RFP traces with the growth rates for other negative control circuits, see Fig. S10. In addition, we have now included the cross correlation functions between RFP and growth rate in these negative control circuits, see Fig. S10A. While in all cases, RFP and growth rate are negatively correlated, the outlier circuit exhibits the largest negative correlation.

      The suggested comparison of the referee thus highlights that – in isolation – a negative correlation between RFP and growth rate is only weak evidence for our hypothesized causal interaction because negative correlations can result from the effect of growth rate affecting volume dilution and thus RFP concentration. Crucially, we thus additionally considered the overall variability of growth rate and found the outlier circuit has the largest growth rate variability which is indicative of something that is affecting the growth rate of those cells, see Fig. S10B. To compare the magnitude of RFP variability against other strains requires constraining the comparison group to other synthetic circuits that have RFP located on the chromosome rather than a plasmid. This is why we compare the CV of the outlier with the CV of circuit #5, which corresponds to the “regular” repressilator (i.e., the outlier circuit without the endogenous lacI gene). As an additional comparison, we computed the CV for a strain of E. coli that does not contain a synthetic plasmid at all, but still contains the RFP gene on the chromosome. We find that the CVs in the outlier circuit to be larger than in these two additional circuits, suggesting that the outlier circuit causes additional fluctuations in the RFP and growth rate. We now spell this out explicitly in the revised manuscript (see Sec. “Evidence that RpoS mediated stress response affected cellular growth in the outlier circuit“, p. 8).

      The referee is correct that the above arguments are only circumstantial evidence, but they do show that the data is consistent with a plausible explanation of the hypothesized causal interaction. Our main evidence for an RpoS mediated stress response that explains the deviations from Eq. 2 in the outlier circuit is the perturbation experiment in which the deviation disappears for the RpoS knockout strain. We now spell out this argument explicitly in the revised manuscript (see Sec. “Evidence that RpoS mediated stress response affected cellular growth in the outlier circuit“, p. 8).

      Reviewer #2 (Recommendations For The Authors): 

      The proof of theorem 1 relies on an earlier result, lemma 1. Lemma 1 only guarantees the existence of a "dummy" system that satisfies the separation requirement and preserves the dynamics of X and Y. However, in principle, it may be possible to maintain the dynamics of X and Y while still changing the relationship between Cov(X,Zk) and Cov(Y,Zk). This could occur if the dynamics of Zk differ in a particular way between the original system and the dummy system. So lemma 1 needs to be a little stronger- it needs  to mention that the dynamics of Zk are preserved, or something along these lines. The proof of lemma 1 appears to contain the necessary ingredients for what is actually needed, but this should be clarified. 

      We agree with the referee that this is an important distinction. Lemma 1 does in fact guarantee that any component Zk that is not affected by X and Y will have the same dynamics in the “dummy” system. However, as the referee points out, this is not stated in the lemma statement nor in the proof of the lemma. In response to the referee’s comment, we have made it clear in the lemma statement that the Zk dynamics are preserved in the “dummy” system, and we have also added details to the proof to show that this is the case, see Lemma 1 on p. 27 of the SI. 

      Readers who are familiar with chemical reaction diagrams, but not birth-death process diagrams may waste some time trying to interpret Equation 1 as a chemical reaction diagram with some sort of rate constant as a label on each arrow (I did this). It may be helpful to either provide a self-contained definition of the notation used, or mention a source where the necessary definitions can be found. 

      We agree with the referee. In the revised manuscript we have added a description of the notation used below Equation 1 of the main text, see p. 2. The notational overloading of the “arrow notation” is a perennial problem in the field and we thank the referee for reminding us of the need to clarify what the arrows mean in our diagrams.

      It would be helpful if the authors could propose a rule for deciding whether dependence is detected or not. As it stands presently, the output of the approach seems to be a chart like that in Figure 3D where you show eta_xz and eta_yz with confidence interval bars and the reader must visually assess whether the points more-or-less fall on the line of unity. It would be better to have some systematic procedure for making a "yes or no" call as to whether a causal link was detected or not. Having a systematic detection rule would allow you to make a call as to whether dependence in circuit 3 was detected or not. It would also allow you or a future effort to evaluate the true positive rate of the approach in simulated settings. 

      We thank the referee for this suggestion. In the revised manuscript we have added an explicit rule for detecting causality using the invariant of Eq. (2). Specifically, Eq. (2) can be re-written as r = 1 where r is the covariability ratio r = etaXZ/etaYZ. In that case, given 95% confidence intervals for the experimentally determined covariability ratio r, we say that there is a causal interaction if the confidence intervals overlap with the value of r = 1. 

      This corresponds to a null hypothesis test at the 2.5% significance level. The reason that it is at 2.5% significance and not 5% significance is as follows. Let’s say we measure a covariability ratio of r_m, and the 95% confidence interval is [r_m - e_m, r_m + e_m] for some error e_m. Without loss of generality, let’s say that r_m > 1 (the same applies if r_m < 1). This means that Prob(r < r_m - e_m) = 2.5% and Prob(r > r_m + e_m) = 2.5% , where r is the actual value of the covariability ratio. Under the null hypothesis that there is no causal interaction, we set r = 1. However, we now have Prob(1 < r_m + e_m) = 0, because we know that r_m > 1 and so we must have r_m + e_m > 1. The probability that the value of 1 falls outside the error bars is therefore 2.5% under the null hypothesis. 

      This proposed rule is the same rule that we used to detect statistical outliers in our simulations, where we found a “false positive” rate of 2.3% over 6522 simulated systems due to statistical sampling error (as discussed in the Materials and Methods section). In response to the referee’s suggestion, we have added the section “A rule for detecting causality in the face of measurement uncertainty” (p. 4). We also apply the rule to the experimental data and find that the rule detects 2/4 causal interactions in Fig. 3D. We have clarified this in the Fig. 3D caption, in the main text, and we have added a figure in the SI (Fig. S2) where we apply the null hypothesis test on the measured covariability ratios. 

      Note, whether the third interaction is “detected” or not depends on the cut-off value used. We picked the most common 95% rule to be consistent with the traditional statistical approaches. With this rule one of the data points lies right at the cusp of detection, but ultimately falls into the “undetected” category if a strictly binary answer is sought under the above rule. 

      It would be helpful to mention what happens when the abundance of a species hits zero. Specifically, there are two ways to interpret the arrow from X to X+d with a W on top: 

      Interpretation (1): 

      P(X+d | X) = W if X+d {greater than or equal to} 0  P(X+d | X) = 0 if X_i+d_i < 0 for at least one i 

      Interpretation (2): 

      P(X+d | X) = W regardless of whether X+d < 0  W = 0 whenever X_i < d_i for at least one i 

      Interpretation (1) corresponds to a graph where the states are indexed on the non-negative integers. Interpretation (2) corresponds to a graph where the states are indexed on the integers (positive or negative), and W is responsible for enforcing the non-negativity of mass. I believe you need the second interpretation because the first interpretation leads to problems with your definition of causality. For example, consider the reaction: 

      (Na, K) -- 0.1 --> (Na-1, K+1) 

      This could occur if Na and K are the intracellular concentrations of sodium and potassium ions in a cell that has an ATP-driven sodium-potassium exchanger whose rate is limited by the frequency with which extracellular potassium ions happen to flow by. Per the definition of causality found in the appendix, Na has no causal effect on K since Na does not show up in the reaction rate term. However, under interpretation (1), Na clearly has a causal effect on K according to a reasonable definition of causality because if Na=0, then the reaction cannot proceed, whereas if Na>0 then it can. However, under interpretation (2), the reaction above cannot exist and so this scenario is excluded. 

      We thank the referee for this comment that helped us clarify the meaning of arrows with propensities. In short, interpretation (2) corresponds to the definition of our stochastic systems. This is consistent with the standard notation used for the chemical master equation. As the referee points out, because molecular abundances cannot be negative, any biochemical system must then have the property that the propensity of a reaction must be equal to zero when the system is in a state in which an occurrence of that reaction would take one of the abundances to negative numbers. Stochastic networks that do not have this property cannot correspond to biochemical reaction networks.

      In the revised manuscript, we now spell this out explicitly to avoid any confusion, see SI page 25.

      Furthermore, we additionally discuss the referee’s example in which the rate of exchanging Na for K through an ion exchanger is approximately independent of the intracellular Na concentration. Because biochemical systems cannot become negative, it cannot be that the rate is truly constant, but at some point for low concentrations must go down until it becomes exactly zero for zero molecules. 

      Importantly, agreement with Eq. (2) does not imply that there is no causal effect from X to Zk. It is the deviation from Eq. (2) that implies the existence of a causal effect from X to Zk. Therefore, although the above referee’s example would constitute a causal interaction in our framework, it would not lead to a deviation of Eq. (2) because the fluctuations in Na (which we exploit) do not propagate to K. From a practical point of view, our method thus detects whether changing X over the observed range affects the production and degradation rates of Zk. 

      In the course of setting up the negative control benchmark circuits, a perturbation-based causal validation would be nice. For instance, first, verify that X does not affect Z by intervening on X (e.g. changing its copy number or putting it under the control of an inducible promoter), and ensuring that Z's activity is not affected by such interventions upon X. This approach would help to adjudicate questions of whether the negative control circuits actually have an unknown causal link. The existing benchmark is already reasonably solid in my view, and I do not know how feasible this would be with the authors' setup, but I think that a perturbation-based validation could in principle be the gold standard benchmark.  

      We agree that additional perturbation-based validation tests on all of the negative control circuits would indeed improve the evidence that our method worked as advertised. While such experiments are indeed beyond the scope of our current work we now explicitly point out the benefits of such additional controls in the revised Discussion.

      Below is a series of comments about typography, mostly about section 4 of the supplement. 

      We thank the referee for their careful reading and highlighting those mistakes.

      At the bottom of page 21, Z_aff is defined as the set of components that are affected by X. However, later Z_aff seems to refer to components affected by X or Y. For instance, in the proof of lemma 1, it is written "However, because a is part of z_aff, the {ak} variables must be affected by X and/or Y." 

      We thank the referee for catching this mistake. We have changed the definition of Z_aff throughout the supplement to refer to components affected by X or Y. If it can be experimentally ensured that Y is a passive reporter (i.e., it does not affect other components in the cell), then the theorem can only be violated if X affects Z. 

      In the equation following Eq 5.2, W_k and d_k should be W_i and d_i ?  

      Yes, the referee is correct. In the revised manuscript we have corrected W_k and d_k to W_i and d_i. 

      In Eq 5.3 in the lower-left transition diagram, I think a "y" should be an "x". 

      Yes, the referee is correct. In the revised manuscript  we have fixed this typo.

      In the master equation above Eq 5.5, the "R" terms for the y reactions are missing the alpha term, and I think two of the beta terms need to be multiplied by x and y respectively.  

      The referee is correct. In the revised manuscript  we have fixed this typo.

      The notation of Eq 5.8, where z_k(t) is the conditional expectation of z_kt, is strange and difficult to follow. Why does z_k(t) not get a bar over it like its counterparts for x, y, R, and beta? The bars, although not a perfect solution, do help.  

      We agree with the referee’s comment and have added further explanations to define the averages in question, see SI p. 28. In short, when we condition on the history of the components not affected by X or Y, we in effect condition on the time trajectories of z_{k} (when it is part of the components not affected by X and/or Y) and beta (since it only depends on the components not affected by X or Y). We thus previously did not include the bars when taking the averages of these components in the conditional space because the conditioning in effect sets their time-trajectories (so they become deterministic functions of time). In the revised manuscript we now also denote these conditional expectations with bars and we have added comments to the proof to clarify their definition.

      I think it would be helpful to show how the relationship <x>=<y>/alpha is obtained from Eq 5.5.  

      We agree with this suggestion and have added the derivations, see Eqs. (5.9) - (5.13) in the revised SI. 

      In the main text, the legend of Fig 3 cuts off mid-sentence.  

      We thank the referee for catching this mistake which has been fixed in the revised manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Oor et al. report the potentially independent effects of the spatial and feature-based selection history on visuomotor choices. They outline compelling evidence, tracking the dynamic history effects based on their clever experimental design (urgent version of the search task). Their finding broadens the framework to identify variables contributing to choice behavior and their neural correlates in future studies.

      Strengths:

      In their urgent search task, the variable processing time of the visual cue leads to a dichotomy in choice performance - uninformed guesses vs. informed choices. Oor et al. did rigorous analyses to find a stronger influence of the location-based selection history on the uninformed guesses and a stronger influence of the feature-based selection history on the informed choices. It is a fundamental finding that contributes to understanding the drivers of behavioral variance. The results are clear.

      Weaknesses:

      (1) In this urgent search task, as the authors stated in line 724, the variability in performance was mainly driven by the amount of time available for processing the visual cue. The authors used processing time (PT) as the proxy for this "time available for processing the visual cue." But PT itself is already a measure of behavioral variance since it is also determined by the subject's reaction time (i.e., PT = Reaction time (RT) - Gap). In that sense, it seems circular to explain the variability in performance using the variability in PT. I understand the Gap time and PT are correlated (hinted by the RT vs. Gap in Figure 1C), but Gap time seems to be more adequate to use as a proxy for the (imposed) time available for processing the visual cue, which drives the behavioral variance. Can the Gap time better explain some of the results? It would be important to describe how the results are different (or the same) if Gap time was used instead of PT and also discuss why the authors would prefer PT over Gap time (if that's the case).

      Thanks to Rev 1 for requesting clarification of this important point. As Rev 1 notes, PT is a derived variable, computed for each trial by subtracting the Gap interval from RT (PT=RT‒Gap). While it is true that Gap and PT are correlated (inversely), it is precisely because of the variance in RT that Gap alone is not an adequate (or certainly not the best) predictor of choice outcome. First, note that, if the Gap were fixed, there would still be variance in RT and in outcome, and any dependence of outcome on time would be explained necessarily by the PT. This is true at any Gap. So, clearly, the PT predicts outcome in a way that the Gap cannot. It is easy to see why: the Gap is the part of the RT interval during which no cue information is present, whereas the PT is the part of the same interval during which it is. Therefore, if one accepts the logical premise that the likelihood of a correct choice depends on the amount of time available to view the Cue before making that choice (i.e., the definition of PT), it follows that the relationship between PT and performance should be tighter than that between performance and Gap. And, indeed, this is the case. Mean accuracy declines systematically as a function of Gap, as expected, but its correlation with performance is much weaker than for PT.

      Rev 1’s request for a comparison of how accuracy varies as function of PT versus how it varies with Gap has appeared in earlier publications (Stanford et al., 2010; Shankar et al., 2011; Salinas et al., 2014) and we now include it here for the current dataset by adding plots of accuracy versus Gap as a new panel in Fig. 1 (Fig. 1c). That PT (not Gap) better predicts the likelihood of success on a given trial is evident in comparing the tachometric (Fig. 1b) and psychometric curves (Fig. 1c). The tachometric curves vary from chance to asymptotic performance and do so over a short range of PT (~75 ms) with well-defined inflection points identifying key transitions in performance (e.g., from guesses to increasingly informed choices). In contrast, the psychometric function plotting average accuracy versus Gap (Fig. 1c) varies much more gradually, a reduction in temporal definition attributable to the failure to account for the RT’s contribution to determining PT for each trial at a given Gap.

      (2) The authors provide a compelling account of how the urgent search task affords

      (i) more pronounced selection history effects on choice and

      (ii) dissociating the spatial and feature-based history effects by comparing their different effects on the tachometric curves. However, the authors didn't discuss the limits of their task design enough. It is a contrived task (one of the "laboratory tasks"), but the behavioral variability in this simple task is certainly remarkable. Yet, is there any conclusion we should avoid from this study? For instance, can we generalize the finding in more natural settings and say, the spatial selection history influences the choice under time pressure? I wonder whether the task is simple yet general enough to make such a conclusion.

      As Rev. 1 notes, the CO task is a laboratory task that produces large history effects. But importantly, we don't think urgency is causal or essential to the existence of such effects (this is now more explicitly stated in the first section of the Results); it is simply a powerful tool for revealing and characterizing them. As noted in the Discussion, our results are consistent with studies that, based on simpler, non-urgent tasks, demonstrated either reward-driven spatial biases or color priming effects. The CO task uses urgency to generate a psychometric function that time resolves perceptually informed from perceptually uninformed choices, and thereby provides the logical key to disambiguating the simultaneous contributions of perceptual and non-perceptual biases to performance. Such was essential to our demonstration that distinct biases act independently on the same saccade choices.

      In a natural setting, we would certainly expect the respective magnitudes of such non-volitional history-based biases to be highly context dependent, but it would be difficult, if not impossible, to discern their relative impact on natural behavior. That said, we think that the biases revealed by the CO task are exemplary of those that would manifest in natural behaviors depending on the real-world context to which such behaviors correspond. Here, it is important to emphasize that the spatial- and feature-based biases we observed were not strategic, on average neither helping nor hindering overall performance. Thus, in the real-world we might expect the expression of similar biases to be an important source of behavioral variance. These observations are now summarized in the penultimate paragraph of the Discussion.

      (3) Although the authors aimed to look at both inter- and intra-trial temporal dynamics, I'm not sure if the results reflect the true within-trial dynamics. I expected to learn more about how the spatial selection history bias develops as the Gap period progresses (as the authors mentioned in line 386, the spatial history bias must develop during the Gap interval). Does Figure 3 provide some hints in this within-trial temporal dynamics?

      Because it is based on the location of the saccadic choice(s) on previous trial(s), we might expect a signal of spatial bias to be present before and during the Gap period and perhaps even before a trial begins (i.e., intertrial interval). However, because behavioral bias is a probabilistic measure of saccade tendency, we have no way of knowing if such a signal is present during periods devoid of saccadic choices. Note that, for both monkey subjects, average RT exceeded the duration of the longest Gap employed (Fig. 1), and this means that relatively few saccades occurred prior to Cue onset. That said, it's clear in both Figs. 2, 3, and 6 that location bias is evident for saccades initiated at the transition between Gap and Cue intervals (PT=0). Anecdotally, we can report that that spatial bias is evident when we extend our analysis back further into the range of negative PTs (i.e., Gap interval), but the statistics are weak given the paucity of trials at that point. Nevertheless, this is consistent with a bias that exists from the beginning of the trial, as would be expected based on neurophysiological studies from Hikosaka's lab in a simpler but comparable spatial bias task.

      Although our data do not unequivocally identify the temporal origin of the spatial bias, they clearly show that the bias is present early (at short PTs) and diminishes rapidly as the perceptual information accrues (at long PTs). Thus, the PT-dependent temporal dynamics that are revealed clearly suggest that spatial and perceptual biases operate over different intra-trial time frames, one decreasing and the other increasing. As mentioned by Rev. 1, Fig. 3 emphasizes this dichotomy.

      (4) The monkeys show significant lapse rates (enough error trials for further analyses). Do the choices in the error trials reflect the history bias? For example, if errors are divided in terms of PTs, do the errors with short PT reflect more pronounced spatial history bias (choosing the previously selected location) compared to the errors with long PT?

      The short answer is “yes”. Errors generally show a PT-dependent influence of history bias. However, correct and error trials are the result of the same biased dynamics, and analyzing them separately post-hoc does not provide much additional insight about the history effects beyond that provided by the tachometric curves themselves.

      To see this, first consider the figure below (Author response image 1). Two tachometric curves conditioned on color history are shown (left). These are the two extreme curves plotted in Fig. 2a, which correspond to the 4S (i.e., 4 repeats of the current target color) and 4D (4 color repeats and then a switch) conditions. Each of these curves already shows the probability of making an error at each PT but, indeed, we can compare the proportions of correct and error trials at short PTs (guesses) and long PTs (informed choices). These are indicated by the bar graphs on the right. Now, the effect of a bias would be to create a difference in success rate between repetitions (4S, blue) and switches (4D, red) relative to the overall, unbiased expectation (indicated by dotted lines). For color-based history, there is no bias at short PT: the proportions of correct choices are almost exactly at the expected chance level (filled bars coincide with dotted line). In contrast, at long PTs, there is a differential effect, but it is due both to a proportion of correct trials that is higher than expected in the 4S case (filled blue bar above dotted line) and to a proportion of correct trials that is lower than expected in the 4D case (filled orange bar below dotted line). This is exactly as one would expect if the current choice was biased by target color history.

      Author response image 1.

      A similar analysis can be done for location history (Author response image 2, which shows the two extreme curves from Fig. 2e). In this case the bias is much stronger at short PTs, and the difference between repeats (4S, blue) and switches (4D, red) is largely explained by a proportion of correct choices that is much higher than expected by chance in the 4S condition (filled blue bar well above dotted line). This makes sense, because a rewarded location is likely to become the next guess, so if the target happens to appear again at that same location, the subsequent guess is more likely than chance to be correct. At longer PTs, the differential effect is smaller, as would be expected for more informed choices, but it is again driven by the 4S condition. Importantly, in the case of location the total number of S trials is much smaller than the total number of D trials (because a target-location repetition has a probability of 0.25 only), so it only makes sense to compare the proportions of correct (or error) trials, not the absolute numbers, between those conditions.

      Author response image 2.

      In summary, although it is possible to examine the separate dependencies of correct and error trials on history and PT, the distinction is not very useful. Only the frequency of errors relative to that of correct choices makes complete sense, not so much, say, the frequency of short PT errors relative to that of long PT errors.  

      Reviewer #2 (Public review):

      Summary:

      This is a clear and systematic study of trial history influences on the performance of monkeys in a target selection paradigm. The primary contribution of the paper is to add a twist in which the target information is revealed after, rather than before, the cue to make a foveating eye movement. This twist results in a kind of countermanding of an earlier "uninformed" saccade plan by a new one occurring right after the visual information is provided. As with countermanding tasks in general, time now plays a key factor in the success of this task, and it is time that allows the authors to quantitatively assess the parametric influences of things like previous target location, previous target identity, and previous correctness rate on choice performance. The results are logical and consistent with the prior literature, but the authors also highlight novelties in the interpretation of prior-trial effects that they argue are enabled by the use of their paradigm.

      Strengths:

      Careful analysis of a multitude of variables influencing behavior

      Weaknesses:

      Results appear largely confirmatory.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) The authors provide comprehensive accounts of the urgent search task in multiple places in the manuscript. But the description can be simpler and more consistent throughout. I found it confusing when the authors compared their task with previous search tasks used by Bichot and Schall, McPeek et al. I believe the authors wanted to explain that it is not just the urgency but the fact that the target color being randomly interleaved also contributes to the pronounced history bias in their task. I appreciate their thorough comparison with previous studies but it can be distracting or lose focus. It might read better if this statement can be expanded in the Discussion, not in the Results (lines 366-376).

      We thank the reviewer for pointing this out. We agree that the paragraph in question was ambiguous and appeared to elaborate a Discussion point, which was not our intent. Indeed, as the reviewer noted, the main point was that the randomization of the target colors (and not urgency) is the critical aspect of the task that makes it surprisingly difficult for the monkeys. We have revised the paragraph to emphasize this conclusion and the two empirical results from our own data that support it. The agreement with prior studies, which is somewhat tangential, is now briefly mentioned at the end of the paragraph. It should now be clear that the text mainly describes current data that are relevant to the interpretation of the main results.

      (2) It's important to state that feature-based selection history bias is not merely due to the monkey's intrinsic bias to one color over the other (red vs green). The authors did a nice job controlling that, as mentioned in Methods (lines 194-196) and supplementary figure (Figure 1 - Figure Supplement 2). It would be helpful for readers to read in Results as well.

      Thank you for the suggestion. We now mention this in the second section of the Results.

      (3) D trial examples for the location history in Results can be confusing to readers (lines 407-409; left-left-right, up-up-left). The examples in Methods (lines 224-229; left-up-right, up-down-left) are better to convey the preceding (different) trials can be of any kind.

      Indeed. Both types of example are now mentioned in the Results.

      Reviewer #2 (Recommendations for the authors):

      I have only minor comments:

      (1) In the abstract, I'm not sure what "when combined" means in the last sentence. What is combined? Selection history and stimulus salience? If so, this is not very clear. Also, it might be nice to end the abstract on how the study addresses the three components of attention that the abstract started with in the first place (salience, task, and history). Otherwise, I spent multiple abstract reads (before even reading the rest of the paper) trying to see whether indeed the paper addresses the three components of attention that were so prominently described at the beginning of the abstract or not. And, I still could not convince myself of whether all three were addressed by the study or not (I then resorted to proceeding with a reading of the rest of the paper).

      Thanks for pointing this out. We have reworded the abstract to clarify that we are focusing on selection history, not salience or top-down attention.

      (2) Line 72: isn't stimulus location still a feature????

      Our nomenclature here is intended to be consistent with the commonly applied distinction between “spatial” and “feature” -based attention that underscores the distinct mechanistic underpinnings of “where” and “what”.

      (3) Lines 76-79: I'm very confused here. The part about "guesses can be strongly biased toward an arbitrary location early on". However, I expected the later part of the sentence to still stick to location and mention what the temporal dynamic is. Instead, it discusses perceptual bias, which I presume is the color thing. So, the net result is that I'm a bit confused about how *both* location and color behave in *both* early and late times.

      We have rewritten the end of this paragraph to clarify when and how location and feature biases manifest in behavior. It may be useful to note the following. The tachometric curve describes different types of choices distinguished by their timing, guesses at short PTs vs informed decisions at long PTs. However, this also corresponds to the degree to which perceptual information becomes available over time within a single trial. Namely, perceptual information is initially absent but arrives later on. The revised text now reflects this distinction, making the logic for the expected results clearer.

      (4) Last paragraph of the introduction (lines 80-82): it would be helpful to justify here why the psychophysics were done in monkeys in this study, instead of humans.

      We now allude to the reason these studies were done in monkeys but feel that more elaboration of this point is better left to Discussion. The Discussion now more explicitly states that the current data are closely related to neurophysiological studies of spatial attention and color priming in monkeys (beginning of 4th paragraph).

      - Line 389: this kind of formulation is much clearer to me than lines 76-79 mentioned above.

      As noted, the above-mentioned section has been revised.

      - I'm a bit confused by Figure 4 in the sense that some of the effect sizes are not too different from Figure 2, even when there are some intermediate inconsistent trials. I guess the problem is aggravated by the different axis ranges in Figures 2, and 4.

      All the 1S and 1D data points are the same in both figures, as they should, but the problem is that, otherwise, the two figures are just not comparable. Apples and oranges. To see this, note that the trends for the difference between S and D conditions should go in opposite directions as trials go further into the past, and indeed they do. In Figures 2c, f, the differences between 1S and 1D results are small, and those between 4S and 4D results are the largest because both S and D effects grow away from the average with more repetitions. In contrast, in Figure 4b-d, the differences between S and D shrink as the effect of a single trial becomes more distant (differences are largest between 1S and 1D results, smallest between 1S9x and 1D9x results). The only slightly ambiguous trend is that of Figure 2g, because the S data are more noisy. We have expanded the text surrounding Figure 4 to highlight the different expected trends for this analysis in contrast to that presented in Figure 2. This should clarify the qualitative difference between the two.

      - On a related note, it is odd that the summary figures (e.g. Figures. 2, 4, etc) are vertically aligned such that the dependent measure is on the x-axis rather than the y-axis. For example, looking at Figure 2, it would make much more sense if panels b-d and f-h were rotated by 90 deg, such that the vertical axis is indeed the low asymptote or high asymptote or RT. This would directly correlate with the same data in panels a and e in the same figure and would be much easier to follow. Then, later in the paper, Fig. 8 suddenly does the dependent measure on the y-axis, as I said. I think it can help to use similarly consistent plotting approaches across all (or most) analyses.

      We tried other formats but settled on the current one because we felt it made it (slightly) easier to compare the patterns across history conditions between any two of the 6 bar graphs in each figure (in Figs 2, 5, 6), in part because it prevents any confusion with the PT axes. As this does not make a substantial difference either way, we prefer to maintain the present arrangement. Additional labels are now included, which should make the figures a bit more friendly.

      - At the beginning of the paper, I was under the impression that this will really be a free viewing search task (e.g. Wolfe search arrays or old Nakayama search arrays), but then it became clear later that it was still an instructed task, with the only difference being that the target onset is now 4 targets. I think this distinction should be clarified very early on, in order to avoid confusion by the readers. The reason I say this is that with enforced fixation, there are other factors in this task that come into play, like the monkey's individual microsaccade rates etc, which can modulate performance since they also have a form of countermanding that is like the one imposed by the compelled saccade task. So, better alert the readers to the context of the task early on.

      Thanks. We have provided additional detail when introducing the task for the first time in the Introduction, along with a citation to an earlier publication in which the specific task is described. There should be no ambiguity now.

      Reviewing Editor Comments:

      Short Assessment:

      This important study makes compelling use of the monkey animal model to capture the long-time course over which trial history affects decision-making under time pressure, showing decisions are affected by the stimulus sequence extending back as many as four trials previously.

      Summary:

      Decision-making is variable, but how much of this variability can be accounted for by the immediate previous history is not well known. Using an "urgent" saccade, Oor et al manipulated how much time monkeys had to process evidence, and evaluated what they did when there was too little time to make an evidence-based decision. They report that the history affected performance as far back as 4 previous trials and that different aspects of the stimulus history (color and location) affected performance differently.

      Strengths:

      The key strengths of this paper are that the monkey paradigm permitted a study under highly controlled conditions with stable performance across sessions and enough trials to conduct the history analysis farther back in time than is possible with smaller data sets. While the fact that prior history affects decisions was previously known, this study provides a careful quantification of the effect -- which proves to be quite large - as well as an assessment of both location and feature histories in combination with each other. The manuscript is well-written and easy to follow.

      Weaknesses and recommendations for the authors:

      (1) The figures are lovely but could use some more text/design elements to clarify, and there is space to do so. e.g., in Figure 2, there could be titles to indicate that the top row involves the color history and the bottom row involves location history. The information is there, in the y labels of panels B and F, but it takes a while to see that.

      Done. Titles have been added to Figure 2 and several others.

      (2) Furthermore, the abbreviations 1D, 4S, etc are explained in the legend but it seems there is room to spell them out or include a graphic to indicate what they mean.

      The labels 1D, 4S, etc are difficult to spell out because each one represents multiple conditions; for instance, 2S may correspond to green-green or red-red target colors, and so on. Figure legends have been edited to more clearly indicate that S and D labels correspond to repeat and switch trials, respectively, and that the associated number indicates how far back the history goes.

      (3) The terms "low asymptote" and "high asymptote" could be indicated in a graphic of a tachymetric function, smoothing the transition to the rightmost panels. (Consider also alternative terms - perhaps "floor" and "ceiling" might be more readily understandable than asymptote to the student reader??).

      Thanks for the suggested terms, “floor” and “ceiling”, which we’ve adopted. They are indeed more natural. Figure 2a now indicates that floor and ceiling accuracies correspond to opposite ends of the PT axis.

      (4) The units for the asymptotes are not indicated - I assume these are "% correct" but that would be helpful to clarify.

      Yes. Units for floor and ceiling (and RT) are now indicated in all figures.

      (5) Figure 3 - "PT", and "1S-1D" could be spelled out, and the meaning of the two colored traces could be in the figure itself rather than only in the legend. Similar suggestions apply about labeling, abbreviations apply in subsequent figures.

      PT is now spelled out in all figures other than Figure 1, and labels for the two traces were added to Figure 3. Thanks for all the detailed suggestions.

    1. Author response:

      The following is the authors’ response to the previous reviews.’

      Public Reviews:

      Reviewer #1 (Public Review):

      For the colony analysis, it is unclear from the methods and main text whether the initial individual sorted colonies were split and subject to different conditions to support the claim of bi-potency. The finding that 40% of colonies displayed tenogenic differentiation, may instead suggest heterogeneity of the sorted progenitor population. The methods as currently described, suggest that two different plates were subject to different induction conditions. It is therefore difficult to assess the strength of the claim of bi-potency.

      Thanks for your valuable comment. We are sorry for the confusing illustration of colony assay. In fact, we first obtained CD29+/CD56+ myogenic progenitors by FACs. Then these freshly isolated cells were randomly seeded to 96-well plate with density of 1 cell/well. Subsequently, the single cell in each plate was cultured with growth medium to form colonies for ten days. Then myogenic induction was performed in three 96-well plates and tenogenic induction was performed in another three 96-well plates for subsequent analyses. We agree with your point that the sorted cell population could be heterogeneous myogenic progenitors. The result showed over 95% colonies successfully differentiated into myotubes, while 40% of colonies displayed tenogenic differentiation (Fig. 2g). Since the freshly obtained CD29+/CD56+ myogenic progenitors were randomly seeded for tenogenic induction or myogenic induction, the undifferentiated cells in each group were considered as the same sample. Furthermore, the optimal tenogenic differentiation condition for these cells was still waiting for investigation. Thus, we believe the colony analysis combined with the data in Figure 1 and Figure 2 could indicate the bi-potency for human CD29+/CD56+ myogenic progenitors.

      This group uses the well-established CD56+/CD29+ sorting strategy to isolate muscle progenitor cells, however recent work has identified transcriptional heterogeneity within these human satellite cells (ie Barruet et al, eLife 2020). Given that they identify a tenocyte population in their human muscle biopsy in Figure 1a, it is critical to understand the heterogeneity contained within the population of human progenitors captured by the authors' FACS strategy and whether tenocytes contained within the muscle biopsy are also CD56+/CD29+.

      Thanks for your constructive suggestion. We have included more samples to perform scRNA-seq and reanalyzed the data. The scRNA-seq data revealed that all the CD29+/CD56+ cells were myogenic progenitors, which occupied 19.3% of all the myogenic progenitors (Fig. 1e). However, there existed no tenocytes with CD29+/CD56+ (Fig. 1d), and tenocytes made up only a small percentage (0.06%) of all the mononuclear cells. Thus, human CD29+/CD56+ cells are myogenic progenitors, and tenocytes contained within the muscle biopsy are not CD56+/CD29+. In addition, both published research and our results indicated the heterogeneity of CD29+/CD56+ myogenic progenitors. Since the main purpose of current study was to investigate the tenogenic differentiation potential of CD29+/CD56+ myogenic progenitors, the heterogeneity in CD29+/CD56+ myogenic progenitors should be investigated in the further study.

      The bulk RNA sequencing data presented in Figure 3 to contrast the expression of progenitor cells under different differentiation conditions are not sufficiently convincing. In particular, it is unclear whether more than one sample was used for the RNAseq analyses shown in Figure 3. The volcano plots have many genes aligned on distinct curves suggesting that there are few replicates or low expression. There is also a concern that the sorted cells may contain tenocytes as tendon genes SCX, MKX, and THBS4 were among the genes upregulated in the myogenic differentiation conditions (shown in Figure 3b).

      Thanks for your comment. Each group consisted of three samples for RNAseq analyses. We are sorry there existed a minor analysis mistake in Fig. 3b and Fig. 3c, which have been reanalyzed in the revised version. There was no significantly difference of tendon related marker genes after myogenic differentiation (Fig. 3b), while these tenogenic genes were significantly up-regulated after tenogenic induction (Fig. 3c). As for contamination of tenocytes, scRNA-seq data showed there were no tenocytes with both CD29 and CD56 positive (please see response to Comment 2). And almost all the obtained cells highly expressed myogenic progenitors markers PAX7/MYOD1/MYF5 (Fig. 1f-g). Low expression levels of tendon markers were identified in these cells (Fig. 2a-c). Furthermore, although tendon genes slightly upregulated in myogenic differentiation conditions, these markers dramatically upregulated in tenogenic differentiation conditions (Fig. 2c). Thus, we believe the bulk RNA sequencing data could add the evidence of tenogenic differentiation ability of human CD29+/CD56+ myogenic progenitors.

      Reviewer #2 (Public Review):

      scRNAseq assay using total mononuclear cell population did not provide meaningful insight that enriched knowledge on CD56+/CD29+ cell population. CD56+/CD29+ cells information may have been lost due to the minority identity of these cells in the total skeletal muscle mononuclear population, especially given the total cell number used for scRNAseq was very low and no information on participant number and repeat sample number used for this assay. Using this data to claim a stem cell lineage relationship for MuSCs and tenocytes may not convincing, as seeing both cell types in the total muscle mononuclear population does not establish a lineage connection between them.

      Thanks for your constructive suggestion. We have included more samples to perform scRNA-seq and reanalyzed the data. Three samples with a total of 57,193 cells were included for analysis. As you can see in Fig. 1d and 1e, the joint expression analysis revealed that all the CD29+/CD56+ cells were myogenic progenitors, which occupied 19.3% of all the myogenic progenitors.  In addition, we agree with your comment that the pseudotime analysis could be a bit misleading as the nature of computational biology with pseudotime plots, so we deleted this assay.

      The TGF-b pathway assay uses a small molecular inhibitor of TGF-b to probe Smad2/3. The assay conclusion regarding Smad2/3 pathway responsible for tenocyte differentiation may be overinterpretation without Smad2/3 specific inhibitors being applied in the experiments.

      Thanks for your comment. We agree with your comment and we have revised it in the revision version (Figure 7, Line 306-326).

      Reviewer #3 (Public Review):

      This dual differentiation capability was not observed in mouse muscle stem cells.

      Thanks for your comment. We have explored the tenogenic differentiation potential of mouse MuSCs both in vivo and in vitro. However, low tenogenic differentiation ability was revealed (Figure 4), which might be due to species diversity. Maybe it is more demanding for humans to maintain the homeostasis of the locomotion system and the whole organism locomotion ability in much longer life span and bigger body size. Thus, the current study also indicated that anima studies may not clinically relevant when investigating human diseases.

      Recommendations For The Authors:

      Reviewer #1 (Recommendations For The Authors):

      The methods section contained insufficient details for sample tissue for many methods, including the single cell analysis, RNA FISH, and for in vivo cardiotoxin treatment. ie. how were the samples subclustered for the monocle pseudotime analysis; how many cells were counted in the FISH shown in Fig 1e/f, does the n=5 refer to tissue sections or biological replicates?; for the double injury, what was the cardiotoxin dose?

      Thanks for your comment. Three samples and a total 57,193 cells were analyzed in single cell analysis (Line 464). We deleted RNA FISH assay data because it provided limited information to prove bipotential ability of human CD29+/CD56+ myogenic progenitors. In addition, since the pseudotime analysis could be a bit misleading as the nature of computational biology with pseudotime plots, we also deleted this assay. For the double injury, 15μl of 10μM cardiotoxin was used for lineage tracing (Line 533).

      Additionally, the RNA sequencing datasets are not currently publicly available under the accession numbers provided.

      The raw data of RNA sequencing has been uploaded in NCBI (accession number: PRJNA1178160, PRJNA1012476 and PRJNA1012828), and these data will be released immediately after publication.

      The poor resolution of 1d makes it impossible to read any of the gene names or interpret the expression profiles of their proposed trajectories.

      Since the pseudotime analysis could be a bit misleading as the nature of computational biology with pseudotime plots, we deleted this assay.

      What does the color key for 3a refer to? It is not indicated in the figure or legend.

      Thanks for your comment. The color key for 3a refer to “Scaled expression values”, which has been added in the revised version.

      scRNAseq of the sorted CD29/56+ population could help uncover possible cell heterogeneity within these muscle progenitors and which sub-populations of myogenic progenitor cells have tenogenic potential.

      Thanks for your valuable suggestion. We included more cells from three biological repetitions to perform scRNA-seq and found that CD29/CD56+ cells were absolutely from myogenic progenitors (Fig. 1d and 1e). We agree with you that additional scRNAseq will be helpful to clarify the possible cell heterogeneity within these muscle progenitors. Since the main scope of current study is to investigate the biopotential of CD29/CD56+ myogenic progenitors, analysis of scRNAseq of the sorted CD29/56+ population would be performed in the further study for further exploration.

      Typos: Line 459 sored cells... preparasion with Chromium Single Cell 3' Reagent Kits (10X genomics, cat# 1000121-1000157). Figure 4E - typo in the word tamoxifen.

      Thanks for your valuable suggestion. We are sorry for the typos and have revised these typos (Line 459 and Fig. 4e).

      Reviewer #2 (Recommendations For The Authors):

      (1) scRNAseq is performed in total mononuclear cells isolated from human skeletal muscle. The cell number (around 15000 cells) seems very low for this assay, given the CD56+/CD29+ cells are a minority population in this sequencing, the data does not seem to provide meaningful insight into the MuSC cell identities. No information on sample numbers and number of patient participants can be found in the paper.

      Thanks for your comment. We added more cells to reanalyze the data in the revised manuscript. Three samples with a total of 57,193 cells were analyzed (Line 464). The joint expression analysis revealed that all the CD29+/CD56+ cells were myogenic progenitors, which occupied 19.3% of all the myogenic progenitors (Fig. 1d and 1e). These scRNA-seq data combined with functional experiment confirmed the MuSC cell identity of CD29+/CD56+ cells from mononuclear cells.

      In this regard, the paragraph starts with "To confirm the single cell analysis results, we first isolated myogenic progenitor cells from human muscle biopsy using FACS as described previously" which is misleading as the seRNAseq is not the result of the sorted cells. Please reword this paragraph to clarify.

      The related paragraph has been reworded (Line 84-95).

      Similarly, the existence of myocytes and tenocytes in scRNAseq does not necessarily prove a stem cell and mature cell lineage relationship. Please edit the wording to avoid overinterpretation.

      Thanks for your reminding. Since the pseudotime analysis could be a bit misleading as the nature of computational biology with pseudotime plots, we deleted this assay.

      (2) The in vitro differentiation assays are well performed, which included bulk culture and clonal culture. The efficiencies of those two assays seem to have discrepancies which may need clarification. Again, no sample numbers and repeats have been informed.

      Since the tendon differentiation period for bulk culture was 12 days, those myotubes fused by CD29+/CD56+ myogenic progenitors with only myogenic differentiation potential will be no longer alive. Thus, the efficiency of bulk culture seemed higher than that in clonal culture. As stated in statistical analysis, at least three biological replicates and technical repeats were performed in each experimental group (Line 577).

      In these paragraphs, terminologies including MuSCs, myogenic progenitors, CD56+/CD29+, and Pax7+ are interchangeably used, which generates confusion while reading. It is probably best to consistently use the cell sorting markers markers to address this cell population, throughout the paper.

      Thanks for your constructive suggestion. The cell population was consistently named as CD29+/CD56+ myogenic progenitors throughout the paper.

      Information on the proliferation rate and expansion of the MuSCs would be useful but not provided.

      Thanks for your comment. The analysis of cell proliferation was added in Figure 1 (Fig. 1h).

      The murine cell differentiation assays are not as convincing as the human study. The assay regarding "mouse muscle CD29+/CD56+ cells were isolated for tenogenic induction. However, very few mouse muscle CD29+/CD56+ cells expressed myogenic progenitor cell marker Pax7, MyoD1 and Vcam1" does not add any value to the work as those markers are not mouse MuSC markers to start with.

      Thanks for your comment. The experiments concerning mouse muscle CD29+/CD56+ cells have been deleted to avoid misleading.

      The Pax7-cre-TdTomato assay was also not convincing, as a negative finding may not be the best proof of absence.

      Thanks for your comment. Pax7 positive cells could consistently express TdTomato for lineage tracing. In current study, large amount of tdTomato+ myofibers were observed after muscle injury (SFig. 2c-d), suggesting that the tracing system works well. However, less than 0.2% tendon cells originated from TdTomato+ MuSCs were observed even four months after tendon removal (Fig. 4f-g). When comparing in vivo data between murine MuSCs and human CD29+/CD56+ myogenic progenitors, we believe these data could indicate the poor tendon differentiation abilities of murine MuSCs.

      (5) TGFb as a pathway of smad2/3 mediated tenocyte differentiation assays were well done albeit not novel. Using TGFb universal inhibitor may not accurately state the pathways were due to SMAD2/3 inhibition either.

      We agree with your comment and the conclusion concerning SMAD2/3 has been deleted throughout the manuscript.

      The paper also needs thorough proofreading. Currently, typographic, grammatical, and logical sequences of writing do not lend the paper to easy reading.

      (1) Figure 1K and 1I have similar legends but presumably K is referring to MuSC and I is referring to differentiated cells.

      (2) Tenogenic and myogenic induction should be changed to tenogenic/myogenic differentiation as they are the cells at the end of differentiation.

      (3) Figure 6, it is not clear how the "human cells" are calculated in this assay.

      Thanks for your constructive comment. (1) The figure legends in Figure1 have been revised (Line 797-804).  (2) Tenogenic and myogenic induction have been changed to tenogenic/myogenic differentiation manuscript when they are referring to cells at the end of differentiation (Fig.1, Fig.2, Fig.3, Fig.4, Fig.7 and SFig.1). (3) In Figure 6, “human cells” is referring to those injured tendons with transplantation of human CD29+/CD56+ myogenic progenitors. To evaluate the function of human CD29+/CD56+ myogenic progenitors, PBS group was set as negative control and uninjured group was set as normal control.

      Reviewer #3 (Recommendations For The Authors):

      (1) The full extent of the differentiation potential of CD29+/CD56+ stem/progenitor cells has not been thoroughly evaluated. There can also exist heterotopic ossification in injured tendon sites. Thus, it remains unclear whether these cells are truly bipotent as the authors claim, or can they differentiate into chondrocytes and osteoblasts.

      Thanks for your comment. The current study focused on the tenogenic differentiation potential of CD29+/CD56+ myogenic progenitors, so the research priority was the bipotential ability of CD29+/CD56+ myogenic progenitors. We agree with you that chondrogenic and osteogenic ability of CD29+/CD56+ myogenic progenitors is also important and would investigate it in the further study.

      (2) In Figure 3, the GO analysis also shows increased enrichment of muscle-related terms including muscle contraction and filament. Please clarify it.

      The tenogenic differentiation efficiency of CD29+/CD56+ myogenic progenitors was about 40% in clonal assay. Some cells would myogenically differentiated under this tenogenic induction system. Thus, the GO analysis could also enrich muscle related terms including muscle contraction and filament.

      (3) The authors use TNC staining to evaluate cell transplantation. My concern is whether the TNC expression is specific to the tendon site, or do engrafted human cells also express TNC in other sites such as muscle?

      TNC is one of a well-known tendon-related markers. As you can see in Figure 6b and Figure 6c, although some human cells (labeled by Lamin A/C) were engrafted in muscle tissue area (labeled by MyHC), these engrafted human cells didn’t express TNC in muscle. In addition, we also used tendon related markers SCX and TNMD to confirm the tenogenic differentiation ability of engrafted human cells in vivo (SFig. 3a and 3b).

      (4) The authors demonstrate that CD29+/CD56+ human stem/progenitor cells could efficiently transplant and contribute to myofiber regeneration in vivo. However, why were only a few transplanted human cells differentiating into myofiber (labeled by MyHC) in the tenon injury model even with CTX injection?

      Thanks for your comment. Since skeletal muscle is able to regenerate with in situ muscle progenitor cells, regeneration of injured muscle by CTX injection was dependent on not only CD29+/CD56+ myogenic progenitors, but also native murine MuSCs. Thus, it is reasonable that there were only a few transplanted human cells differentiating into myofiber (labeled by MyHC) in the tenon injury model even with CTX injection.

      (5) Figure 7 shows the crucial role of TGFB/SMAD signaling for the tenogenesis of human CD29+/CD56+ stem/progenitor cells. However, can TGFB/SMAD signaling activation facilitate the tenogenic differentiation of mouse MuSCs? This point is crucial to clarify the difference of MuSCs between different species.

      Thanks for your valuable suggestion. We did a series of pilot assays to investigate the effect of TGFβ signaling activation to facilitate tenogenic differentiation of mouse MuSCs (Author response image 1). As you can see, activating TGFβ by SRI-011381 could slightly increase the expression of tenogenic markers of murine MuSCs. It’s an interesting topic and we would investigate it in the further study.

      Author response image 1.

      TGFβ signaling pathway slightly elevated tenogenic differentiation ability of murine MuSCs (a) Immunofluorescence staining of tendon marker Scx and Tnc in murine MuSCs induced for tenogenic differentiation with or without TGFβ signaling pathway agonist SRI-011381, respectively. Scale bars, 50 µm. (b) Quantification of Scx and Tnc fluorescent intensity in murine MuSCs undergone tenogenic induction with or without TGFβ signaling pathway agonist SRI-011381, respectively. Error bars indicated standard deviation (n=5). (c) Protein levels of Tnc and Scx. Murine MuSCs were induced towards tenogenic differentiation with or without TGFβ signaling pathway agonist SRI-011381. Total protein was extracted from cells before and after differentiation and subjected for Tnc and Scx immunoblotting. GAPDH was served as loading control.

      (6) Please quantify the WB blot data throughout the manuscript.

      Thanks for your comment. The WB blot data has been quantified throughout the manuscript.

      (7) The data of RT-qPCR should indicate what the fold changes in relative to throughout the manuscript.

      Thanks for your comment. The sentence “GAPDH was served as reference gene” was added in the figure legends to illustrate RT-qPCR results.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      This study provides a thorough analysis of Nup107's role in Drosophila metamorphosis, demonstrating that its depletion leads to developmental arrest at the third larval instar stage due to disruptions in ecdysone biosynthesis and EcR signaling. Importantly, the authors establish a novel connection between Nup107 and Torso receptor expression, linking it to the hormonal cascade regulating pupariation.

      However, some contradictory results weaken the conclusions of the study. The authors claim that Nup107 is involved in the translocation of EcR from the cytoplasm to the nucleus. However, the evidence provided in the paper suggests it more likely regulates EcR expression positively, as EcR is undetectable in Nup107-depleted animals, even below background levels.

      We appreciate the concern raised in this public review. However, we must clarify that we do not claim that Nup107 directly regulates the translocation of EcR from the cytoplasm to nucleus, rather Nup107 regulates Ecdysone hormone (20E) synthesis which in turn affects EcR translocation. In the manuscript, we posited this hypothesis if Nup107 will regulate EcR nuclear translocation (9th line of 2nd paragraph on page 6). We have spelled this out more clearly as the 3rd subsection title of the Results section, and in the discussion (8th line of 2nd paragraph on page 11).

      20E acts through the EcR to induce the transcription of EcR responsive genes including the EcR. This creates a positive autoregulatory loop that enhances the EcR level through ecdysone signaling (1). Since Nup107 depletion leads to a reduction in ecdysone levels, it disrupts the transcription autoregulatory EcR expression loop. This can contribute to the reduced EcR levels seen in Nup107-depleted animals. 

      Additionally, the link between Nup107 and Torso is not fully substantiated. While overexpression of Torso appears to rescue the lack of 20E production in the prothoracic gland, the distinct phenotypes of Torso and Nup107 depletion-developmental delay in the former versus complete larval arrest in the latter complicate understanding of Nup107's precise role.

      We understand that there are differences in the developmental delay when Tosro and Nup107 depletion is analyzed. However, the two molecules being compared here are very different, and variability in their depletion could contribute observed phenotypic differences (2). Even if there is no variability of depletion of Torso and Nup107­­­, we believe that Nup107, being more widely expressed, and involved in the regulation of various cellular processes, induces stronger defects.

      Further, we think that RNAi-mediated depletion of Nup107 in prothoracic glands (PG) causes significant reduction in the PG size, which may exert a pronounced defect in 20E biosynthesis through the Halloween genes, inducing a stronger developmental arrest.

      To clarify these discrepancies, further investigation into whether Nup107 interacts with other critical signaling pathways related to the regulation of ecdysone biosynthesis, such as EGFR or TGF-β, would be beneficial and could strengthen the findings.

      In summary, although the study presents some intriguing observations, several conclusions are not well-supported by the experimental data.

      We agree with the reviewer’s suggestion. As noted in the literature, five RTKs-torso, InR, EGFR, Alk, and Pvr-stimulate the PI3K/Akt pathway, which plays a crucial role in the PG functioning and controlling pupariation and body size (3). We have checked the torso and EGFR signaling. We rescued Nup107 defects with the torso overexpression, however, constitutively active EGFR (BL-59843) did not rescue the phenotype (data was not shown). Nonetheless, we plan to examine the EGFR pathway activation by measuring the pERK levels in Nup107-depleted PGs.

      Reviewer #2 (Public review):

      Summary:

      The manuscript by Kawadkar et al investigates the role of Nup107 in developmental progression via the regulation of ecdysone signaling. The authors identify an interesting phenotype of Nup107 whole-body RNAi depletion in Drosophila development - developmental arrest at the late larval stage. Nup107-depleted larvae exhibit mis-localization of the Ecdysone receptor (EcR) from the nucleus to the cytoplasm and reduced expression of EcR target genes in salivary glands, indicative of compromised ecdysone signaling. This mis-localization of EcR in salivary glands was phenocopied when Nup107 was depleted only in the prothoracic gland (PG), suggesting that it is not nuclear transport of EcR but the presence of ecdysone (normally secreted from PG) that is affected. Consistently, whole-body levels of ecdysone were shown to be reduced in Nup107 KD, particularly at the late third instar stage when a spike in ecdysone normally occurs. Importantly, the authors could rescue the developmental arrest and EcR mislocalization phenotypes of Nup107 KD by adding exogenous ecdysone, supporting the notion that Nup107 depletion disrupts biosynthesis of ecdysone, which arrests normal development. Additionally, they found that rescue of the Nup107 KD phenotype can also be achieved by over-expression of the receptor tyrosine kinase torso, which is thought to be the upstream regulator of ecdysone synthesis in the PG. Transcript levels of the torso are also shown to be downregulated in the Nup107KD, as are transcript levels of multiple ecdysone biosynthesis genes. Together, these experiments reveal a new role of Nup107 or nuclear pore levels in hormone-driven developmental progression, likely via regulation of levels of torso and torso-stimulated ecdysone biosynthesis.

      Strengths:

      The developmental phenotypes of an NPC component presented in the manuscript are striking and novel, and the data appears to be of high quality. The rescue experiments are particularly significant, providing strong evidence that Nup107 functions upstream of torso and ecdysone levels in the regulation of developmental timing and progression.

      Weaknesses:

      The underlying mechanism is however not clear, and any insight into how Nup107 may regulate these pathways would greatly strengthen the manuscript. Some suggestions to address this are detailed below.

      Major questions:

      (1) Determining how specific this phenotype is to Nup107 vs. to reduced NPC levels overall would give some mechanistic insight. Does knocking down other components of the Nup107 subcomplex (the Y-complex) lead to similar phenotypes? Given the published gene regulatory function of Nup107, do other gene regulatory Nups such as Nup98 or Nup153 produce these phenotypes?

      We thank this public review for raising this concern. Working with a Nup-complex like the Nup107 complex, this concern is anticipated but difficult to address as many Nups function beyond their complex identity. Our observations with all other members of the Nup107-complex, including dELYS, suggest that except Nup107, none of the other tested Nup107-complex members could induce larval developmental arrest.

      In this study, we primarily focused on the Nup107 complex (outer ring complex) of the NPC. However, previous studies have reported that Nup98 and Nup153 interact with chromatin, with these investigations conducted in Drosophila S2 cells (4, 5, 6). We have now examined other nucleoporins outside of this complex, such as Nup153.

      We ubiquitously depleted Nup153 using the Actin5C-Gal4 driver and assessed the pupariation profile of the knockdown larvae in comparison to control larvae. In contrast to the Nup107 knockdown, when Nup153 is depleted to less than 50% levels, no impact on pupariation was observed (Auhtor response image 1)

      Author response image 1.

      Nup153 depletion does not affect the Drosophila metamorphosis. Actin5C-Gal4 is used as a ubiquitous driver. (A) Comparison of pupariation profiles of control and Nup153 knockdown organisms. (B) Quantification of Nup153 knockdown efficiency. Data are represented from at least three independent experiments. Statistical significance was derived from the Student’s t-test. Error bars represents SEM. ***p = <0.001.

      (2) In a related issue, does this level of Nup107 KD produce lower NPC levels? It is expected to, but actual quantification of nuclear pores in Nup107-depleted tissues should be added. These and the above experiments would help address a key mechanistic question - is this phenotype the result of lower numbers of nuclear pores or specifically of Nup107?

      We agree with the concern raised here, and to address the concern raised here, we stained the control and Nup107 depleted salivary glands with mAb414 antibody (exclusively FG-repeat Nup recognizing antibody). While Nup107 intensities are significantly reduced at the nuclear envelope in Nup107 depleted salivary glands, the mAb414 staining seems unperturbed (Author response image 2).

      Author response image 2.

      Nup107 depletion does not perturb overall NPC composition. Comparison of salivary gland nucleus upon control and Nup107 knockdown. The Nup107 is shown in green and mAb414, staining for other FG-repeat containing nucleoporins is shown in red. Scale bars, 5µm.

      (3) Additional experiments on how Nup107 regulates the torso would provide further insight. Does Nup107 regulate transcription of the torso or perhaps its mRNA export? Looking at nascent levels of the torso transcript and the localization of its mRNA can help answer this question. Or alternatively, does Nup107 physically bind the torso?

      While the concern regarding torso transcript level is genuine, we have already reported in the manuscript that Nup107 directly regulates torso expression. When Nup107 is depleted, torso levels go down, which in turn controls ecdysone production and subsequent EcR signaling (Figure 6B of the manuscript).

      However, the exact nature of Nup107 regulation on torso expression is still unclear. Since the Nup107 is known to interact with chromatin (7), it may affect torso transcription. The possibility of a stable and physiologically relevant interaction between Nup107 and the torso in a cellular context is unlikely largely due to their distinct subcellular localizations. If we investigate this further, it will require a significant amount of time for having reagents and experimentation, and currently stands beyond the scope of this manuscript.

      (4) The depletion level of Nup107 RNAi specifically in the salivary gland vs. the prothoracic gland should be compared by RT-qPCR or western blotting.

      Although we know that the Nup107 protein signal is reduced in SG upon knockdown (Figure 3B), we have not compared the Nup107 transcript level in these two tissues (SG and PG) upon RNAi. As suggested here, we evaluated the knockdown efficiency of Nup107 using the salivary gland-specific driver AB1-Gal4 and the prothoracic gland-specific driver Phm-Gal4. Our results indicate a significant reduction in Nup107 transcript levels upon Nup107 RNAi in both SG and PG compared to their respective controls (Author response image 3).

      Author response image 3.

      Nup107 levels are significantly reduced upon Nup107<sup>KK</sup> RNAi. Quantification of Nup107 transcript levels from control and Nup107 depleted larvae [tissue specific depletion using AB1-Gal4 (A) and Phm-Gal4 (B)]. Data are represented from at least three independent experiments. Statistical significance was derived from the Student’s t-test. Error bars represent SEM. **p = <0.004

      (5) The UAS-torso rescue experiment should also include the control of an additional UAS construct - so Nup107; UAS-control vs Nup107; UAS-torso should be compared in the context of rescue to make sure the Gal4 driver is functioning at similar levels in the rescue experiment.

      This is a very valid point, and we took this into account while planning the experiment. In such cases, often the GAL4 dilution can be critical. We have demonstrated in Figure S7, that GAL4 dilution is not blurring our observations. We used the Nup107<sup>KK</sup>; UAS-GFP as control alongside the Nup107<sup>KK</sup>; UAS-torso. We conclude that the presence of GFP signals in prothoracic glands and their reduced size indicates genes downstream to both UAS sequences are transcribed, and GAL4 dilution does not play a role here.

      Minor:

      (6) Figures and figure legends can stand to be more explicit and detailed, respectively.

      We have revisited all figures and their corresponding legends to ensure appropriate and explicit details are provided.

      Reviewer #3 (Public review):

      Summary:

      In this study by Kawadkar et al, the authors investigate the developmental role of Nup107, a nucleoporin, in regulating the larval-to-pupal transition in Drosophila through RNAi knockdown and CRISPR-Cas9-mediated gene editing. They demonstrate that Nup107, an essential component of the nuclear pore complex (NPC), is crucial for regulating ecdysone signaling during developmental transitions. The authors show that the depletion of Nup107 disrupts these processes, offering valuable insights into its role in development.

      Specifically, they find that:

      (1) Nup107 depletion impairs pupariation during the larval-to-pupal transition.

      (2) RNAi knockdown of Nup107 results in defects in EcR nuclear translocation, a key regulator of ecdysone signaling.

      (3) Exogenous 20-hydroxyecdysone (20E) rescues pupariation blocks, but rescued pupae fail to close.

      (4) Nup107 RNAi-induced defects can be rescued by activation of the MAP kinase pathway.

      Strengths:

      The manuscript provides strong evidence that Nup107, a component of the nuclear pore complex (NPC), plays a crucial role in regulating the larval-to-pupal transition in Drosophila, particularly in ecdysone signaling.

      The authors employ a combination of RNAi knockdown, CRISPR-Cas9 gene editing, and rescue experiments, offering a comprehensive approach to studying Nup107's developmental function.

      The study effectively connects Nup107 to ecdysone signaling, a key regulator of developmental transitions, offering novel insights into the molecular mechanisms controlling metamorphosis.

      The use of exogenous 20-hydroxyecdysone (20E) and activation of the MAP kinase pathway provides a strong mechanistic perspective, suggesting that Nup107 may influence EcR signaling and ecdysone biosynthesis.

      Weaknesses:

      The authors do not sufficiently address the potential off-target effects of RNAi, which could impact the validity of their findings. Alternative approaches, such as heterozygous or clonal studies, could help confirm the specificity of the observed phenotypes.

      This is a very valid point raised, and we are aware of the consequences of the off-target effects of RNAi. To assert the effects of authentic RNAi and reduce the off-target effects, we have used two RNAi lines (Nup107<sup>GD</sup> and Nup107<sup>KK</sup>) against Nup107. Both RNAi induced comparable levels of Nup107 reduction, and using these lines, ubiquitous and PG specific knockdown produced similar phenotypes. Although the Nup107<sup>GD</sup> line exhibited a relatively stronger knockdown compared to the Nup107<sup>KK</sup> line, we preferentially used the Nup107<sup>KK</sup> line because the Nup107<sup>GD</sup> line is based on the P-element insertion, and the exact landing site is unknown. Furthermore, there is an off-target predicted for the Nup107<sup>GD</sup> line, where a 19bp sequence aligns with the bifocal (bif) sequence. The bif-encoded protein is involved in axon guidance and regulation of axon extension. However, the Nup107<sup>KK</sup> line does not have a predicted off-target molecule, and we know its precise landing site on the second chromosome. Thus, the Nup107<sup>KK</sup> line was ultimately used in experimentation for its clearer and more reliable genetic background.

      We are also investigating Nup107 knockdown in the prothoracic gland, which exhibits polyteny. Additionally, the number of cells in the prothoracic gland is quite limited, approximately 50-60 cells (8). Given this, there is a possibility that a clonal study may not yield the phenotype.

      NPC Complex Specificity: While the authors focus on Nup107, it remains unclear whether the observed defects are specific to this nucleoporin or if other NPC components also contribute to similar defects. Demonstrating similar results with other NPC components would strengthen their claims.

      We thank this public review for raising this concern. Working with a Nup-complex like the Nup107 complex, this concern is anticipated but difficult to address as many Nups function beyond their complex identity. Our observations with all other members of the Nup107-complex, including dELYS, suggest that except Nup107, none of the other Nup107-complex members could induce larval developmental arrest. Since the study is primarily focused on the Nup107 complex (outer ring complex) of the NPC, we have not examined many more nucleoporins outside of this complex. But our observations with Nup153 knockdown, a nuclear basket nucleoporin, is comparable to control, with no delay in development (Author response image 1)

      Although the authors show that Nup107 depletion disrupts EcR signaling, the precise molecular mechanism by which Nup107 influences this process is not fully explored. Further investigation into how Nup107 regulates EcR nuclear translocation or ecdysone biosynthesis would improve the clarity of the findings.

      We appreciate the concern raised. Through our observation, we have proposed the upstream effect of Nup107 on the PTTH-torso-20E-EcR axis regulating developmental transitions. We know that Nup107 regulates torso levels, but we do not know if Nup107 directly interacts with torso. We would like to address whether Nup107 exerts control on PTTH levels also.

      However, we must emphasize that Nup107 does not directly regulate the translocation of EcR. On the contrary, we have demonstrated that when Nup107 is depleted only in the salivary gland, EcR translocates into the nucleus. Thus we conclude that the EcR translocation is 20E dependent and Nup107 independent. Further, we have argued that Nup107 regulates the expression of Halloween genes required for ecdysone biosynthesis. We are interested in identifying if Nup107 associates directly or through some protein to chromatin to bring about the changes in gene expression required for normal development.

      There are some typographical errors and overly strong phrases, such as "unequivocally demonstrate," which could be softened. Additionally, the presentation of redundant data in different tissues could be streamlined to enhance clarity and flow.

      Response: We thank the reviewer for this observation. We have put our best efforts to remove all typographical errors and have now made more reasonable statements based on our conclusions.

      Recommendations for the Authors:

      Reviewer #1 (Recommendations for the authors):

      The manuscript presents compelling evidence that Nup107 plays a role in regulating ecdysone production. However, significant concerns remain regarding the effects on EcR localization and expression, as well as the claimed link between PTTH/Torso signaling and Nup107's function, as the evidence provided is not conclusive.

      The hypothesis that Nup107 mediates EcR translocation from the cytoplasm to the nucleus appears misinterpreted by the authors. Based on the presented images, particularly for the prothoracic gland (PG) Figure 3C, Nup107 depletion seems to impact EcR protein levels rather than its localization. This conclusion is supported by data showing that EcR transcripts are autonomously downregulated in the absence of Nup107. Furthermore, the restoration of nuclear EcR levels upon exogenous 20E supplementation suggests that (1) Nup107 is dispensable for EcR activation and function, and (2) its primary role lies in regulating ecdysone production.

      We appreciate the concern raised by reviewer. However, we must clarify that we do not claim that Nup107 directly regulates the translocation of EcR from the cytoplasm, rather Nup107 regulates Ecdysone hormone (20E) synthesis which in turn affects EcR translocation. In the manuscript, we posited this hypothesis if Nup107 will regulate EcR nuclear translocation (9th line of 2nd paragraph on page 6). We have spelled this out more clearly as the 3rd subsection title of the Results section, and in the discussion (8th line of 2nd paragraph on page 11).

      20E acts through the EcR to induce the transcription of EcR responsive genes including the EcR. This creates a positive autoregulatory loop that enhances the EcR level through ecdysone signaling (1). Since Nup107 depletion leads to a reduction in ecdysone levels, it disrupts the transcription autoregulatory EcR expression loop. This can contribute to the reduced EcR levels seen in Nup107-depleted animals.

      Given that nucleoporins are known to influence mRNA transport-for instance, Nup107 has been shown to control Scn5a mRNA transport (Guan et al., 2019)-the observed effects on Halloween gene and EcR expression may stem from disruptions in mRNA transport to the cytoplasm. The downregulation of Shade further supports this hypothesis, as restricted ecdysone biosynthesis typically induces Shade upregulation in peripheral tissues. Quantifying potential mRNA accumulation in the nuclei of PG cells in Nup107-depleted animals would clarify this.

      The reviewer raised a valid point, and we fully agree with the concern that Nup107 has been shown to control Scn5a mRNA transport (Guan et al., 2019). The observed effects on Halloween gene and EcR expression could indeed stem from disruptions in efficient mRNA export to the cytoplasm. However, if Nup107 were regulating the mRNA export of Halloween genes and EcR, we should not expect a rescue of the Nup107 developmental delay phenotype with torso overexpression. But, by overexpressing the torso in the Nup107 depletion background, we are activating the torso pathway dependent Halloween gene expression, and rescuing the developmental delay phenotype of Nup107 depletion.

      With the current data, it is difficult to conclusively claim a role for Nup107 in EcR translocation or expression. Additional experiments, such as EcR overexpression in Nup107-depleted animals or Nup107 overexpression, would help determine its precise role.

      We appreciate the concern raised by reviewer. We did attempt to rescue the Nup107 depletion phenotype by overexpressing EcR (BL-6868) in the Nup107-RNAi background. However, we were unable to rescue the Nup107 depletion dependent developmental delay phenotype with this approach. This further suggests that the phenotype is not merely due to low level of EcR, but it is due to low availability of ecdysone hormone and EcR signaling.

      The second major issue is the proposed link between Nup107 and PTTH/Torso signaling. The authors suggest that Nup107 regulates ecdysone production through Torso expression based on rescue experiments. However, this is inconsistent with the distinct phenotypes observed when Nup107 or Torso signaling is disrupted. While PTTH/Torso signaling causes only a modest developmental delay (12 hours to 2 days, depending on the mutant), Nup107 depletion results in a complete developmental arrest at the larval stage. This discrepancy raises doubts about the assertion that Torso overexpression alone rescues such a severe phenotype. One possibility is that PTTH levels are upregulated in Nup107-depleted animals, leading to overactivation of the pathway when Torso is overexpressed. Quantifying PTTH levels in Nup107-depleted animals could address this.

      The reviewer raised a valid point, and we fully acknowledge this concern. While we do not completely agree with the idea of PTTH upregulation in Nup107 depleted larvae, as suggested here, we believe that quantifying PTTH levels upon Nup107 depletion can provide a useful insight. To address it, we quantified PTTH levels in Nup107-depleted larvae and found no significant change in PTTH expression compared to controls (Author response image 4).

      Author response image 4.

      Nup107 knockdown does not affect the PTTH level. Quantitation of PTTH transcript levels from control and Nup107 depleted larvae (Prothoracic specific depletion Phm-Gal4). Data are represented from at least three independent experiments. Statistical significance was derived from the Student's t-test. ns is non-significant.

      Another possibility is that the stock used for Torso overexpression, which includes a trk mutant, may introduce genetic interactions that overactivate the pathway. Using a clean UAS-Torso stock would resolve this issue.

      We appreciate the reviewer’s observation regarding the use of the Torso overexpression line (BL-92604), which carries the trk null allele on the second chromosome. The cleaved form of the trk serves as ligand for the troso receptor. Since it may serve as ligand for the torso, I am not sure how trk null allele bearing line when used along for torso overexpression studies will overactivate the pathway. 

      We realized this concern and the fly line used in this study and reported in the manuscript was generated through the following genetic strategy using the BL-92604 line.  First, a double balancer stock (Sco/CyO; MKRS/TM6.Tb) was used to generate the Sco/CyO; UAS-torso/ UAS-torso genotype. This recombinant line was subsequently combined with the Nup107<sup>KK</sup> line. Through the use of the double balancer strategy, we effectively replaced Nup107 RNAi genotype on the second chromosome, thereby ensuring that our final experimental setup is free from trk mutant contamination, if at all.

      Moreover, the rescue of Nup107 depletion phenotypes by RasV12 overexpression suggests that multiple RTKs, not just Torso, are affected. EGFR signaling, the primary regulator of ecdysone biosynthesis in the PG during the last larval stage, is notably absent from the authors' analysis. EGFR inactivation is known to arrest development, and previous studies indicate that Nup107 can reduce EGFR pathway activity (Kim et al, 2010). The authors should analyze EGFR pathway activity in the absence of Nup107. Overexpressing EGF ligands like Vein or Spitz in the PG (rather than the receptor) in a Nup107-depleted background would provide more relevant insights.

      The RasGTPase is one of the common effector molecules downstream of an activated receptor kinase. Rescue with a constitutively activated form of RasGTPase (RasV12) suggests one of the routes which is activated downstream of the torso receptor. It does not directly suggest all different RTKs are affected and are involved. Our idea of performing a rescue experiment was to see if the pathway activated downstream of the torso involves RasGTPase. 

      As noted in the literature, five RTKs—torso, InR, EGFR, Alk, and Pvr—stimulate the PI3K/Akt pathway, which plays a crucial role in the PG for controlling pupariation and body size (3). Although EGFR signaling is important, PTTH/Torso signaling is considered the primary mediator of metamorphic timing. In response to the suggestion to analyze EGFR pathway activity in the absence of Nup107, we attempted to rescue the phenotype by overexpressing constitutively active EGFR (BL-59843) in the Nup107-depleted background (data was not shown). We used constitutively active EGFR to bypass the availability of its ligands (vein and spitz). Unfortunately, we were unable to rescue the phenotype with this approach, which further suggests that EGFR is not the targeted RTK pathway in this context. By rescuing with torso, we found that Nup107 regulates torso-mediated Ras/Erk signaling to control metamorphosis.

      Additional issues require clarification:

      (1) RNAi Efficiency: In Figure 1C, the Nup107GD line shows a stronger knockdown effect than Nup107KK, yet most experiments were conducted with the weaker line. This might explain the residual Nup107 protein observed in Figure 2. Could the authors justify this choice?

      This is a very valid point raised, and we are aware of the consequences of the off-target effects of RNAi. To assert the effects of authentic RNAi and reduce the off-target effects, we have used two RNAi lines (Nup107<sup>GD</sup> and Nup107<sup>KK</sup>) against Nup107. Both RNAi induced comparable levels of Nup107 reduction, and using these lines, ubiquitous and PG specific knockdown produced similar phenotypes. Although the Nup107<sup>GD</sup> line exhibited a relatively stronger knockdown compared to the Nup107<sup>KK</sup> line, we preferentially used the Nup107<sup>KK</sup> line because the Nup107<sup>GD</sup> line is based on the P-element insertion, and the exact landing site is unknown. Furthermore, there is an off-target predicted for the Nup107<sup>GD</sup> line, where a 19bp sequence aligns with the bifocal (bif) sequence. The bif-encoded protein is involved in axon guidance and regulation of axon extension. However, the Nup107<sup>KK</sup> line does not have a predicted off-target molecule, and we know its precise landing site on the second chromosome. Thus, the Nup107<sup>KK</sup> line was ultimately used in experimentation for its clearer and more reliable genetic background.

      (2) Control Comparisons: In Figure 3, the effects of Nup107 depletion on EcR expression in salivary glands (SG) and PG are shown, but only SG controls are provided. Including PG controls would enable proper comparisons. These controls should also be added to Figures 5, 6, and S5.

      As suggested by the reviewer, we have checked the EcR localization in prothoracic gland (Author response image 5), also. As shown in figure R5, when PGs isolated from control, Nup107-RNAi and torso overexpression in Nup107 background were stained for EcR, the observations made were indistinguishable from those made in SGs of the indicated genetic combinations. This indicated that Nup107 regulates EcR signaling by regulating the 20E biosynthesis.

      Author response image 5.

      Prothoracic gland’s specific torso expression rescues EcR nuclear translocation defects. Immunofluorescence-based detection of nucleocytoplasmic distribution of EcR (EcR antibody, red) in control, prothoracic gland specific Nup107 knockdown (Phm-Gal4>Nup107<sup>KK</sup>) and torso overexpressing PG-specific Nup107 knockdown (Phm-Gal4>Nup107<sup>KK</sup>; UAS-torso) third instar larval Prothoracic gland nuclei. DNA is stained with DAPI. Scale bars, 20 μm.

      (3) Clarify the function of Torso in the text: The authors must revise their description of Torso signaling as the primary regulator of ecdysone production in both the results and discussion sections. Specifically, in the results section, the claim that Torso depletion induces developmental arrest is inaccurate. Instead, available evidence, including Rewitz et al. 2009, demonstrates that Torso depletion causes a delay of approximately five days rather than a complete developmental arrest. This discrepancy should be corrected to avoid overstating the role of Torso signaling in ecdysone regulation and to align the manuscript with established findings.

      We agree with the reviewer. We have incorporated the suggestion at the relevant place in the main manuscript.

      Reviewer #3 (Recommendations for the authors):

      These findings suggest that Nup107 is involved in regulating ecdysone signaling during developmental transitions, with depletion of Nup107 disrupting hormone-regulated processes. Moreover, the rescue experiments hint that Nup107 might directly influence EcR signaling and ecdysone biosynthesis, though the precise molecular mechanism remains unclear.

      Overall, the manuscript presents compelling data supporting Nup107's role in regulating developmental transitions. However, I have a few comments for consideration:

      Major Comments:

      RNAi Specificity: While RNAi is a powerful tool, the authors do not sufficiently address potential off-target effects, which could undermine the conclusions. Although a mutant Nup107 is described, it is lethal-are heterozygous or clonal studies possible to validate the findings more robustly?

      This is a very valid point raised, and we are aware of the consequences of the off-target effects of RNAi. To assert the effects of authentic RNAi and reduce the off-target effects, we have used two RNAi lines (Nup107<sup>GD</sup> and Nup107<sup>KK</sup>) against Nup107. Both RNAi induced comparable levels of Nup107 reduction, and using these lines, ubiquitous and PG specific knockdown produced similar phenotypes. Although the Nup107<sup>GD</sup> line exhibited a relatively stronger knockdown compared to the Nup107<sup>KK</sup> line, we preferentially used the Nup107<sup>KK</sup> line because the Nup107<sup>GD</sup> line is based on the P-element insertion, and the exact landing site is unknown. Furthermore, there is an off-target predicted for the Nup107<sup>GD</sup> line, where a 19bp sequence aligns with the bifocal (bif) sequence. The bif-encoded protein is involved in axon guidance and regulation of axon extension. However, the Nup107<sup>KK</sup> line does not have a predicted off-target molecule, and we know its precise landing site on the second chromosome. Thus, the Nup107<sup>KK</sup> line was ultimately used in experimentation for its clearer and more reliable genetic background.

      Following the suggestion from the reviewer, we considered conducting heterozygous and clonal analyses using the Nup107 mutant. We have carried out Nup107 knockdown studies in the prothoracic gland, which has a limited number of cells (50-60 cells) and is known to exhibit polyteny (8). Keeping these aspects of the Prothoracic gland in mind, the possibility that a clonal study will yield the phenotype is scarce. However, we will consider moving forward with this approach also.

      (2) NPC Complex Specificity: It remains unclear whether the observed defects are specific to Nup107 or if other NPC components also cause similar defects. If the authors are unable to use Nup107 mutants, they could demonstrate similar defects with other critical NPC members to bolster their claim.

      We thank this public review for raising this concern. Working with a Nup-complex like the Nup107 complex, this concern is anticipated but difficult to address as many Nups function beyond their complex identity. Our analysis of Nup153 depleted organisms indicates no developmental delay/defect. We have also assessed effects of knockdown of all other members of the Nup107-complex, including dELYS, but except Nup107 no other member of the Nup107-complex could induce developmental arrest in the third instar stage causing lack of pupariation. However, the null mutant of Nup133, the direct interactor of Nup107 in the Nup107-complex, induces a delay in pupariation (unpublished data).

      (3) Molecular Mechanism of EcR Signaling: The manuscript shows that Nup107 depletion affects EcR signaling and ecdysone biosynthesis, but the molecular basis of this regulation is not fully explored. Does phosphorylated ERK (p-ERK) fail to enter the nucleus? Clarifying this mechanism would strengthen the study's impact.

      We appreciate the reviewer’s insightful comment and fully agree with the concern. To address this, we examined the subcellular localization of phosphorylated ERK (p-ERK) in the prothoracic gland of control larvae, Nup107-depleted larvae, and Nup107-depleted larvae with torso overexpression. In control larvae, p-ERK was predominantly localized in the nucleus. However, in Nup107-depleted larvae, p-ERK was largely retained in the cytoplasm, indicating impaired pathway activation and nuclear translocation. Notably, overexpression of the torso in the Nup107-depleted background restored nuclear localization of p-ERK in the prothoracic gland (Author response image 6). These findings suggest that Nup107 regulates Drosophila metamorphosis, in part, through modulation of torso-mediated MAPK signaling.

      Author response image 6.

      Nup107 regulates torso activation dependent p-ERK localization. Detection of nucleocytoplasmic distribution of p-ERK (anti- p-ERK antibody, green) in the third instar larval prothoracic glands of control, PG-specific Nup107 knockdown (Phm-Gal4>Nup107<sup>KK</sup>) and PG-specific torso overexpression in Nup107 knockdown background (Phm-Gal4>Nup107<sup>KK</sup>; UAS-torso). DNA is stained with DAPI. Scale bars, 20 µm.

      Minor Comments:

      (1) The manuscript contains typographical errors that may hinder readability. Additionally, some phrases (e.g., "unequivocally demonstrate") may be overly strong. Consider adjusting language to reflect the nature of the data more accurately.

      We agree with the reviewer. We have edited the manuscript accordingly to crease out such typographical errors at relevant places in the main manuscript.

      (2) The data presentation could be improved by eliminating redundancy. Some sections repeat similar findings in different tissues, which could be consolidated to improve clarity and flow.

      While we agree with the comment, we could not help ourselves in tissue redundancy for presenting our data for EcR translocation studies. I wish we could use another tissue. However, we have put EcR localization and p-ERK translocation data in the responses to present another non-redundant tissue perspective (Figures R5 and R6).

      References:

      (1) Varghese, Jishy, and Stephen M Cohen. “microRNA miR-14 acts to modulate a positive autoregulatory loop controlling steroid hormone signaling in Drosophila.” Genes & development vol. 21,18 (2007): 2277-82. doi:10.1101/gad.439807

      (2) Rewitz, Kim F et al. “The insect neuropeptide PTTH activates receptor tyrosine kinase torso to initiate metamorphosis.” Science (New York, N.Y.) vol. 326,5958 (2009): 1403-5. doi:10.1126/science.1176450

      (3) Pan, Xueyang, and Michael B O'Connor. “Coordination among multiple receptor tyrosine kinase signals controls Drosophila developmental timing and body size.” Cell reports vol. 36,9 (2021): 109644. doi:10.1016/j.celrep.2021.109644

      (4) Pascual-Garcia, Pau et al. “Metazoan Nuclear Pores Provide a Scaffold for Poised Genes and Mediate Induced Enhancer-Promoter Contacts.” Molecular cell vol. 66,1 (2017): 63-76.e6. doi:10.1016/j.molcel.2017.02.020

      (5) Pascual-Garcia, Pau et al. “Nup98-dependent transcriptional memory is established independently of transcription.” eLife vol. 11 e63404. 15 Mar. 2022, doi:10.7554/eLife.63404

      (6) Kadota, Shinichi et al. “Nucleoporin 153 links nuclear pore complex to chromatin architecture by mediating CTCF and cohesin binding.” Nature communications vol. 11,1 2606. 25 May. 2020, doi:10.1038/s41467-020-16394-3

      (7) Gozalo, Alejandro et al. “Core Components of the Nuclear Pore Bind Distinct States of Chromatin and Contribute to Polycomb Repression.” Molecular cell vol. 77,1 (2020): 67-81.e7. doi:10.1016/j.molcel.2019.10.017

      (8) Shimell, MaryJane, and Michael B O'Connor. “Endoreplication in the Drosophila melanogaster prothoracic gland is dispensable for the critical weight checkpoint.” microPublication biology vol. 2023 10.17912/micropub.biology.000741. 21 Feb. 2023, doi:10.17912/micropub.biology.000741

    1. Author response:

      The following is the authors’ response to the original reviews.

      We have responded to these criticisms below and have revised the main text and figures. Here, we outline the major points of our responses:

      (1) The reviewers asked for more clarification regarding cell type annotation in the lung mesenchyme as shown in Figure 3C. We have included a new supplementary figure (Supplementary Figure 2) which shows differentially expressed genes amongst these mesenchymal cell subsets using a variety of visualization tools including a heatmap, UMAP plots, and the dotplot which was originally shown in Supplementary Figure 1D. The other supplemental figures have been re-numbered.

      (2) We acknowledge the lack of consensus in the field regarding the nomenclature of fibroblast subsets in the developing mouse lung. We are not attempting to define new subsets, but rather we adopted annotations based on previously published work. Specifically, we used Seurat to define mesenchymal cell clusters and then compared the gene expression patterns of these clusters to published work by Hurskainen et al. (Bernard Thebaud’s group) and Narvaez Del Pilar et al. (Jichou Chen’s group). We acknowledge these annotations might conflict with other published data, but any approach to choosing a cell label would be subject to scrutiny. For example, Col13a1 fibroblasts share markers with cells which have been defined by others as lipofibroblasts or alveolar fibroblasts. Similarly, Col14a1 fibroblasts appear to share markers with matrix fibroblasts. Further work is clearly needed to address these discrepancies, and we hope that making our data publicly available will help that effort. 

      (3) The reviewers asked us to interrogate changes in canonical markers of fibroblast subsets (i.e. lipofibroblasts, matrix fibroblasts) to address whether the apparent loss of myofibroblasts could be explained by a change in myofibroblast specification/differentiation. We have included these data in the responses, but because we are unable to draw any clear conclusions from these results, we do not feel these data warrant inclusion in the manuscript/figures.

      (4) As highlighted in the eLife assessment, our study does not include tissue validation (i.e. immunohistochemistry) of myofibroblast markers to distinguish whether the loss of myofibroblasts is attributable to lack of proliferation and/or changes in differentiation/specification. We spent considerable time over the past few months attempting to address these questions, however we were unable to produce convincing PDGFRa staining on tissues that we had collected during our original studies. Without PDGFRa staining, we regretfully could not co-stain for other useful markers to assess proliferation (EdU), apoptosis (TUNEL or caspase), or fibroblast function/specification (ACTA2, SM22a/TAGLN, ADRP, etc). We suspect that these experiments would require optimization of tissue fixation/processing at the time of harvest or the inclusion of a Pdgfra lineage tool for better identification of these cells by immunohistochemistry. Given that the majority of Pdgfra lineage tools require a knock-in/knock-out approach, data generated using these tools should be interpreted with caution given our results here show that Pdgfra-haploinsufficiency alone worsens disease outcomes after hyperoxia exposure.

      In summary, we have addressed several concerns raised by the reviewers and have attempted to perform some of the additional experiments suggested.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this study, the authors used both the commonly used neonatal hyperoxia model as well as cell-type-specific genetic inactivation of Tgfbr2 models to study the basis of BPD. The bulk of the analyses focus on the mesenchymal cells. Results indicate impaired myofibroblast proliferation, resulting in decreased cell number. Inactivation of Etc2 in Pdgfra-lineaged cells, preventing cytokinesis of myofibroblasts, led to alveolar simplification. Together, the findings demonstrate that disrupted myofibroblast proliferation is a key contributor to BPD pathogenesis.

      Strengths:

      Overall, this comprehensive study of BPD models advances our understanding of the disease. The data are of high quality.

      Weaknesses:

      The critiques are mostly minor and can be addressed without extensive experimentation.

      Reviewer #2 (Public Review):

      Summary:

      In this study, the authors systematically explore the mechanism(s) of impaired postnatal lung development with relevance to BPD (bronchopulmonary dysplasia) in two murine models of 'alveolar simplification', namely hyperoxia and epithelial loss of TGFb signaling. The work presented here is of great importance, given the limited treatment options for a clinical entity frequently encountered in newborns with high morbidity and mortality that is still poorly understood, and the unclear role of TGFb signaling, its signaling levels, and its cellular effects during secondary alveolar septum formation, a lung structure generating event heavily impacted by BPD. The authors show that hyperoxia and epithelial TGFb signaling loss have similar detrimental effects on lung structure and mechanical properties (emphysema-like phenotype) and are associated with significantly decreased numbers of PDGFRa-expressing cells, the major cell pool responsible for generation of postnatal myofibroblasts. They then use a single-cell transcriptomic approach combined with pathway enrichment analysis for both models to elucidate common factors that affect alveologenesis. Using cell communication analysis (NicheNet) between epithelial and myofibroblasts they confirm increased projected TGFb-TGFbR interactions and decreased projected interactions for PDGFA-PDGFRA, and other key pathways, such as SHH and WNT. Based on these results they go on to uncover in a sequela of experiments that surprisingly, increased TGFb appears reactive to postnatal lung injury and rather protective/homeostatic in nature, and the authors establish the requirement for alpha V integrins, but not the subtype alphaVbeta6, a known activator of TGFb signaling and implied in adult lung fibrosis. The authors then go beyond the TGFb axis evaluation to show that mere inhibition of proliferation by conditional KO of Ect2 in Pdgfra lineage results in alveolar simplification, pointing out the pivotal role of PDGFRa-expressing myofibroblasts for normal postnatal lung development.

      Strengths:

      (1) The approach including both pharmacologic and mechanistically-relevant transgenic interventions both of which produced consistent results provides robustness of the results presented here.

      (2) Further adding to this robustness is the use of moderate levels of hyperoxia at 75% FiO2, which is less extreme than 100% FiO2 frequently used by others in the field, and therefore favors the null hypothesis.

      (3) The prudent use of advanced single-cell analysis tools, such as NicheNet to establish cell interactions through the pathways they tested and the validation of their scRNA-seq results by analysis of two external datasets. Delineation of the complexity of signals between different cell types during normal and perturbed lung development, such as attempted successfully in this study, will yield further insights into the underlying mechanism(s).

      (4) The combined readout of lung morphometric (MLI) and lung physiologic parameters generates a clinically meaningful readout of lung structure and function.

      (5) The systematic evaluation of TGFb signaling better determines the role in normal and postnatally-injured lungs.

      Weaknesses:

      (1) While the study convincingly establishes the effect of lung injury on the proliferation of PDGFRa-expressing cells, differentiation is equally important. Characterization of PDGFRa expressing cells and tracking the changes in the injury models in the scRNA analysis, a key feature of this study, would benefit from expansion in this regard. PDGFRa lineage gives rise to several key fibroblast populations, including myofibroblasts, lipofibroblasts, and matrix-type fibroblasts (Collagen13a1, Collagen14a1). Lipofibroblasts constitute a significant fraction of PDGFRa+ cells, and expand in response to hyperoxic injury, as shown by others. Collagen13a1-expressing fibroblasts expand significantly under both conditions (Figure 3), and appear to contain a significant number of PDGFRa-expressing cells (Suppl Fig.1). Effects of the applied injuries on known differentiation markers for these populations should be documented. Another important aspect would be to evaluate whether the protective/homeostatic effect of TGFb signaling is supporting the differentiation of myofibroblasts. Postnatal Gli1 lineage gains expression of PDGFRa and differentiation markers, such as Acta2 (SMA) and Eln (Tropoelastin). Loss of PDGFRa expression was shown to alter Elastin and TGFb pathway-related genes. TGFb signaling is tightly linked to the ECM via LTBPs, Fibrillins, and Fibulins. An additional analysis in the aforementioned regard has great potential to more specifically identify the cell type(s) affected by the loss of TGFb signaling and allow analysis of their specific transcriptomic changes in response and underlying mechanism(s) to postnatal injury.

      We attempted to conduct additional analyses on our sequencing data to evaluate the impact of lung injury on the differentiation of Pdgfra-expressing cells towards other fibroblast lineages. To specifically address the impact of hyperoxia on fibroblast differentiation, we subsetted wildtype cells collected at the P7 timepoint (while pups were still undergoing hyperoxia treatment) from the larger data set. Shown below are several Violin Plots comparing gene expression between RA and O2 conditions across the mesenchymal populations.

      Although there are some interesting observations in this analysis, we could not identify a consistent theme from these data which could clearly answer the reviewers’ questions. We see a clear reduction of Pdgfra and Eln in both myofibroblast subsets with hyperoxia, which support our findings of reductions in the myofibroblast subsets. Acta2 and Tagln appear slightly lower in alveolar myofibroblasts, but both are higher in ductal myofibroblasts. Interestingly, both Acta2 and Tagln are higher in Col14a1 fibroblasts with hyperoxia. The functional relevance of these data are unclear because there appears to be higher per-cell expression of Acta2 in ductal myofibroblasts while the relative contribution of these cells is reduced (Figure 3D-E). Col14a1 fibroblasts show increased Acta2 and Tagln expression and are slightly increased in proportion at P7 with hyperoxia treatment (Figure 3D), albeit to a much lesser degree compared to Col13a1 fibroblasts.

      Author response image 1.

      Markers of ductal myofibroblasts including Hhip, Cdh4, and Aspn all appear lower with hyperoxia. Interestingly Plin2 expression is only slightly increased in Col13a1 fibroblasts with hyperoxia treatment, and there is also increased expression in alveolar myofibroblasts. Tcf21 is another marker commonly used to identify lipofibroblasts and its expression is similarly increased in myofibroblasts during hyperoxia, although its expression is conversely lower in Col13a1 and Col14a1 fibroblasts in our data. Overall, these data would appear consistent with recently published data by Ricetti et al. in which the authors observed an increase in lipofibroblast gene signatures and reduced myofibroblast gene signatures with hyperoxia treatment.

      Author response image 2.

      Author response image 3.

      The ability of our data to clearly identify changes in cell fate differentiation is limited by our use of Seurat to define cell clusters because these methods are likely to mask subtle gene expression changes in a small number of cells nested within a parent cluster. In the example above with Plin2, the change in Plin2 expression within myofibroblasts is not significant enough for Seurat to pull these cells out from their parent clusters to define a different lineage, nor are these cells similar enough in their current moment in time to be considered Col13a1 fibroblasts or lipofibroblasts. Increasing the dimensions used to define Seurat clusters might be sufficient to identify this subset of cells as a distinct cluster, however this approach would come at the expense of creating several more cell subsets with increasingly small populations which would be difficult to further analyze.

      One alternative approach to address these questions regarding differentiation might include using pseudo-time analysis of our sequencing data to predict cell lineage. Unfortunately, these analyses are beyond the scope of our current study, but we hope that our public data set can be used by investigators hoping to utilize this approach. Another method to address these questions could utilize a pulse-chase lineage experiment where one could label Pdgfra-expressing cells at the onset of injury and compare the differentiation of these labeled cells following injury. Li et al. conducted a similar experiment with hyperoxia in which Pdgfra-expressing cells were labeled during embryonic development and then postnatally following hyperoxia exposure. The authors noted a decrease in both lineaged myofibroblasts and lineaged lipofibroblasts and concluded that Pdgfra-lineaged cells were lost with hyperoxia treatment rather than undergoing aberrant differentiation. While these experiments likely have their own caveats related to the timing and efficiency of labeling, they represent a more conclusive approach to addressing differences in cell specification as compared to our sequencing- and flow cytometry-based approaches.

      Author response image 4.

      Author response image 5.

      (2) Of the three major lung abnormalities encountered in BPD, the authors focus on alveolarization impairment in great detail, to a very limited extent on inflammation, and not on vascularization impairment. However, this would be important not only to better capture the established pathohistologic abnormalities of BPD, but also it is needed since the authors alter TGFb signaling, and inflammatory and vascular phenotypes with developmental loss of TGFb signaling and its activators have been described. Since the authors make the point about the absence of inflammation in their BPD model, it will be important to show the evidence.

      We acknowledge that vascular changes significantly contribute to BPD pathogenesis, however our study was not designed to adequately characterize changes in vascular/endothelial cells. We were motivated to focus on the lung mesenchyme after observing a dramatic loss of PDGFRa+ cells with our initial characterization of the hyperoxia injury model (Figure 2). At the onset of our study, the existing publicly available data did not contain enough mesenchymal cells for in-depth analysis. To generate new observations and hypotheses within the lung mesenchyme we enriched our single cell prep for mesenchymal cells at the time of FACS-sorting to ensure we would have sufficient cell numbers for downstream analysis.

      (3) Conceptually it would be important that in the discussion the authors reconcile their findings in the experimental BPD models in light of human BPD and the potential implications it might have on new ways to target key pathways and cell types for treatment. This allows the scientific community to formulate the next set of questions in a disease-relevant manner.

      We have edited text in the discussion to address this point.

      Reviewer #3 (Public Review):

      Summary:

      This paper seeks to understand the role of alveolar myofibroblasts in abnormal lung development after saccular stage injury.

      Strengths:

      Multiple models of neonatal injury are used, including hyperoxia and transgenic models that target alveolar myofibroblasts.

      Weaknesses:

      There are several weaknesses that leave the conclusions significantly undersupported by the data as presented:

      (1) There is no validation of the decreased number of myofibroblasts suggested by flow cytometry/scRNAseq at the level of the tissue. Given that multiple groups have reported increased myofibroblasts (aSMA+ fibroblasts) in humans with BPD and in mouse models, demonstrating a departure from prior findings with tissue validation in the mouse models is essential. There are many reasons for decreased numbers of a subpopulation by flow cytometry, most notably that injured cells may be less likely to survive the cell sorting process.

      Unfortunately, we were unable to produce convincing PDGFRa staining on tissues that we had collected during our original studies. Without PDGFRa staining, we regretfully could not co-stain for other useful markers to assess proliferation (EdU), apoptosis (TUNEL or caspase), or fibroblast function/specification (aSMA/ACTA2, SM22a/TAGLN, ADRP, etc). We suspect that these experiments would require optimization of tissue fixation/processing at the time of harvest or the inclusion of a Pdgfra lineage tool for better identification of these cells by immunohistochemistry. Given that the majority of Pdgfra lineage tools require a knock-in/knock-out approach, data generated using these tools should be interpreted with caution given our results here show that Pdgfra-haploinsufficiency alone worsens disease outcomes after hyperoxia exposure.

      Our single cell data show that there is increased expression of Acta2 and Tagln shown in the plots which might be consistent with the increased aSMA staining which others have observed in these settings. Interestingly, the transcripts of both genes are reduced in alveolar fibroblasts while increased in ductal myofibroblasts, Col13a1 fibroblasts, Col14a1 fibroblasts, and vascular smooth muscle. We did not include aSMA antibody staining in our flow cytometry experiments, but this would certainly add value to future attempts to characterize the phenotypic changes occurring during these injury models. 

      (2) The hallmark genes used to define the subpopulations are not given in single-cell data. As the definition of fibroblast subtypes remains an area of unsettled discussion in the field, it is possible that the decreased number by classification and not a true difference. Tissue validation and more transparency in the methods used for single-cell sequencing would be critical here.

      See response above and new Supplemental Figure 2.

      (3) There is an oversimplification of neonatal hyperoxia as a "BPD model" used here without a reference to detailed prior work demonstrating that the degree and duration of hyperoxia dramatically change the phenotype. For example, Morty et al have shown that hyperoxia of 85% or more x 14 days is required to demonstrate the septal thickening observed in severe human BPD. Other than one metric of lung morphometry (MLI), which is missing units on the y-axis and flexivent data, the authors have not fully characterized this model. Prior work comparing 75% O2 exposure for 5, 8, or 14 days shows that in the 8-day exposed group (similar to the model used here), much of the injury was reversible. What evidence do the authors have that hyperoxia alone is an accurate model of the permanent structural injury seen in human BPD?

      At the onset of our studies, we noted that several groups were using widely variable protocols ranging from 60-100% O2 exposure. Morty et al. have indeed conducted thorough experiments to characterize various different hyperoxia exposure protocols. In their 2017 study (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5312005/) they showed that 85% O2 from P1-P7 was sufficient to produce increased septal thickness compared to control mice, and this change was comparable to P1-P14 exposure with 85% O2. Interestingly, they also noted that some therapeutic interventions could rescue disease caused by 60% O2 but not 85% O2 exposure. Our criteria in choosing a treatment protocol were: (1) nursing dams and pups survived hyperoxia exposure, (2) injury was reproducible across cohorts, and (3) injury was not reversible simply by recovering in room air. We found that recent work utilizing 75% O2 exposure was sufficient to cause the alveolar simplification phenotype which we sought to investigate. In our hands, we did not observe mortality of nursing dams or pups except for litters lost to cannibalism/failure of cross-fostering.

      We are confident that the injury caused by our hyperoxia protocol is not reversible simply by recovering mice in room air. Several groups have phenotyped mice at P4, P10, or P14 immediately following the conclusion of hyperoxia treatment. To ensure that we were studying a lasting, irreversible phenotype, we conducted our endpoint studies (morphometry and lung physiology) at P40. Because mice continue to undergo alveolarization until ~P36-P39, we reasoned that this additional recovery time following cessation of hyperoxia would allow for spontaneous recovery if this injury was transient. Additionally, shown below are unpublished flexiVent data in which mice were treated for 10 days with 75% O2 and recovered until analysis at 10 weeks of age. These results are entirely consistent with the flexiVent data we have included in the manuscript, and the persistence of lung physiologic changes in adult mice suggest the presence of permanent underlying structural changes. We did not conduct morphometry/MLI studies at later timepoints, but we have no reason to suspect a different outcome given the clear results from lung physiology.

      Author response image 6.

      (4) Thibeault et al published a single-cell analysis of neonatal hyperoxia in 2021, with seemingly contrasting findings. How does this dataset compare in context?

      Our data is complimentary to the single-cell analysis published by Thebaud et al. We included a re-analysis of their mesenchymal data in Supplementary Figure 2 which shows they also observed a relative decrease in myofibroblast clusters at the P7 and P14 timepoints following hyperoxia treatment. Figure 4 of their paper highlights the top differentially expressed genes between RA and O2 in Col13a1 FB and myofibroblasts, and we observe nearly identical findings in our data set within each of these clusters. Below we have created dotplots of P7 wildtype samples for the same selected genes shown in Figure 4G of the Thebaud et al. paper. It is important to note that their clustering pooled all myofibroblasts into one cluster, while our data is divided into alveolar myofibroblasts and ductal myofibroblasts. The other difference is their data set includes all timepoints P3, P7 and P14 pooled for display, while the plot we selected for simplicity here is only P7 cells. From these data we can see that the general trends are identical to those observed by Thebaud et al., and the differences in genes such as Acta2 can be accounted for by different changes observed in the different myofibroblast clusters – which is identical to what is shown in the violin plots above – namely that Acta2 is reduced in hyperoxia in alveolar myofibroblasts while increased in the ductal myofibroblasts.

      Author response image 7.

      Alveolar myoFB

      Author response image 8.

      Ductal myoFB

      One difference between our two datasets is the relative contribution of myofibroblast and Col13a1 fibroblasts to the entire mesenchymal population of cells. Over 50% of all mesenchymal cells in our preps consist of myofibroblasts, while most of their mesenchymal cells are Col13a1 fibroblasts. These differences are likely accounted for by differences in tissue digestion and cell preparation protocols. However, despite these differences, their data show the same trends of decreased myofibroblasts and a relative expansion in Col13a1 fibroblasts.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Figure 1, for the hyperoxia model, it is informative to have the analysis done at P40, while most of the previous studies using this model focus on outcomes shortly after the end of the hyperoxia regimen. The authors state "we did not see evidence of fibrosis, scarring, or inflammation." It will be helpful to include data supporting this conclusion, especially ACTA2, CTHRC1, and CD45 staining.

      We did not conduct trichrome staining or hydroxyproline assays to quantify the absence of fibrotic changes because there were no gross histologic changes consistent with scarring or fibrosis by H&E staining. We have amended the text to say “we did not see evidence of fibrosis or scarring” since we did not publish any changes to characterize the immune cell compartment.

      (2) Figure 3, single cell analysis, naming of the clusters is confusing. Is "alveolar myofibroblasts" the same as "secondary crest myofibroblasts"? Is "Col13a1 FB" the same as "alveolar fibroblasts" and "Col14a1 FB" the same as "adventitial fibroblasts"? The loss of myofibroblasts is intriguing because, by staining, there is an increase of ACTA2+ cells. Are ACTA2+ cells not myofibroblasts in scRNAseq data?

      As mentioned in responses above, we used Jichou Chen’s nomenclature of “alveolar myofibroblasts” and “ductal myofibroblasts”, but we agree that the former cluster is most consistent with “secondary crest myofibroblasts”. To distinguish the two remaining clusters of fibroblasts we used the same nomenclature as found in Thebaud et al’s single cell data set- “Col13a1 FB and “Col14a1 FB”. The Col13a1 FB cluster is most consistent with “alveolar fibroblasts” and contains high expression of several genes used to define “lipofibroblasts”, though it is unclear whether the latter may represent a subcluster within the Col13a1 FB cluster.

      As shown above, Acta2 is expressed broadly within the lung mesenchyme with highest levels found in myofibroblasts and smooth muscle cells.

      (3) Phosphorylated SMAD2/3 staining (e.g. Cell Signaling antibody) in the two models will be informative to show where TGF signaling activity is altered.

      We have not been successful in using SMAD2/3 staining to infer changes in TGFb signaling at the resolution needed to address this question. Other groups have shown qPCR and western blot data for SMAD2/3 signaling from whole lung extracts, but these approaches lack cell type and specificity and do not address spatial changes. We attempted to incorporate pSMAD2/3 staining into our flow cytometry experiments, but the staining protocol did not work in our hands.

      (4) Is cell death increased in the multiple models that showed simplification?

      While our EdU experiments address proliferation, we were unable to perform PDGFRa and TUNEL/caspase co-staining by histology to address apoptosis/cell death in our different models. Shown here is data from P7 wildtype mice in which Cdkn1a (promoting arrest of cell cycle), and pro-apoptotic genes Bax, Bak1, and Fas are all upregulated in hyperoxia in several mesenchymal cell populations including myofibroblasts.

      Author response image 9.

      (5) Wording: "These data suggest that avb6 does not play a role in TGFb activation during normal development or neonatal hyperoxia, while av-integrins in the lung mesenchyme are required for normal development and play a protective role in response to hyperoxia." The first half of the sentence is missing a reference to the epithelium.

      Text now reads "These data suggest that epithelial avb6 does not play a role…”

      Reviewer #2 (Recommendations For The Authors):

      The reviewer greatly appreciates the work presented here, especially the hard task of addressing combined signaling pathway input into key mesenchymal cell types during an essential expansion of alveolar surface area in postnatal lung and its effect upon disturbance.

      The issues of concern are mentioned in the public review and are expanded upon below:

      (1) Expanded characterization of PDGFRa+ expressing cells in the scRNA dataset is needed (see public review). Also included should be some of the key myofibroblast genes (elastin, Acta2, etc.) and their changes in the relevant cell populations. It would be important to show (at least at the transcriptional level) that myofibroblast differentiation is impaired if the author claims that the alveolarization defect is due to functional myofibroblast impairment. Furthermore, Ect2 expression and changes with treatments should be shown for the different cell populations (relevant to Figure 9).

      See responses above

      (2) The authors stated that they did not find evidence of fibrosis, scarring, and inflammation, but did not provide data to support this statement. Given the importance of at least the inflammation component in BPD, the absence of inflammation needs to be shown, especially in the model using the TGFBR2-cKO mouse, where at least their data show a trend to increased CD45 cell numbers (Figure 2), and upregulated inflammatory upstream regulators (IL10, IFNa, IKBKB, CEBPB upregulated) in the IPA (Figure 3). BAL and/or tissue by flow or IHC have been used to assess different immune cell populations. In terms of evaluation of vascular impairment, the single-cell data set contains endothelial cells, vascular smooth muscle, and pericytes, which allows interrogation following the two different types of injury (hyperoxia cKO TGFbR2) used for the scRNA-seq experiments).

      A full characterization of the immune cell or vascular/endothelial cell compartment within our models is beyond the scope of this current study as we were focusing on the shared changes observed within the lung mesenchyme. None of these compartments exist in isolation, so of course there are likely to be correlative and/or causative changes observed in each of the different models which we studied. We did consider further phenotypic analysis of the immune cells by flow cytometry within our different models, but deferred these experiments for future studies. As mentioned earlier we have omitted the reference to “no inflammation”.

      (3) The authors should report several litters per experiment and experimental group, mortality in the groups, and if present, visualize using e.g. Caplan-Meyer curves. The switch of the mothers during treatment, the early postnatal injections and treatments, and variability in outcome measures between different litters have to be anticipated. Therefore at least 2 litters, but preferably 3 litters per experiment should be examined, to show reproducibility.

      All experiments were conducted with at least 2-3 contemporaneous litters in each treatment group as this was necessary to have enough animals per treatment condition/group to achieve statistical significance. This was essential as all experiments were conducted on the C57BL/6 background where litter sizes are typically 6-8 pups in our colony. We did not encounter any maternal mortality related to hyperoxia exposure while rotating between hyperoxia and normoxia every 48 hrs. Loss of pups in our experiments was mostly due to cannibalism either immediately after birth or from neglect due to failure of cross-fostering.

      (4) The reviewer is concerned about using PBS as a control for experiments involving antibody treatment, in this case, 1D 11. The use of an isotype IgG would be the most appropriate and convincing control. In this case, an isotype-matched murine IgG1 control (13C4) has already been generated and is commercially available. While the reviewer does not suggest repeating all experiments, at least one small experiment showing that control IgG does not alter the lung phenotype with hyperoxia when compared with 1D11 would be important.

      We appreciate the reviewer’s suggestion and will consider an isotype antibody comparison in future studies. While not directly comparing 1D11 to isotype, we can share data in which we compared PBS to a different antibody. In this experiment, we attempted to use antibody blockade during the first 10 days of life while mice were undergoing hyperoxia treatment to target a specific component of the TGFb pathway. We observed no difference in outcomes either in RA or O2 when comparing PBS to xxx antibody. We cannot share the antibody identity due to intellectual property reasons, however additional studies confirmed that this antibody likely had no impact due to poor in vivo blocking activity.

      Author response image 10.

      (5) While inhibited proliferation is one possible explanation for the decrease of PDGFRa expression in the injured mice, there should be consideration of increased and/or premature apoptosis (before the physiologically observed wave P14-P20) as another reason. Also, do the authors propose that only proliferation results in alveolarization impairment, but differentiation plays no significant role here? If that is the case that would mean that there are some fully-differentiated myofibroblasts in the alveolar septa, but not enough to create the multitude of alveolar septal walls. Have the authors evaluated the decrease in secondary alveolar septa formed per alveolar airspace? This measure would give some sense of whether septum initiation was prevented or whether septa were formed, but are structurally abnormal, e.g. due to altered ECM (suspected decrease in Elastin and SMA expression, if myofibroblast differentiation was impaired or cell content (suspected decrease in myofibroblasts and increase of other cell types, such as lipofibroblasts).

      Apoptosis/cell death are likely to play a role in addition to inhibited proliferation. See violin plots shown above with cell cycle arrest and pro-apoptotic genes upregulated within the mesenchyme. Because we were unable to optimize tissue sections/staining with the samples collected during the early time points of our experiments (ie P4, P7, P10, P14), we are unable to co-stain for markers of apoptosis and answer this question in a direct manner. Future experiments will focus on additional characterization of these early changes with particular attention to altered fibroblast phenotypes within the alveolar septae.

      (6) An illustration depicting key cells and the pathways involved in cartoon format would be a useful addition and visualize the important conclusions of this paper for the reader.

      We appreciate this suggestion but think the results are sufficiently straightforward that a summary cartoon would not add much.

      Figure 4A: the legend appears to be switched. The gray square seems to align with the epithelial ligands, while the blue square aligns with receptors.

      Thank you for identifying this mistake – fixed.

      Names of transgenic lines used through manuscript:

      Please use the correct name, as per JAX would be either Gli1tm3(cre/ERT2)Alj/J or Gli1-CreERT2.

      Please use the correct name, as per JAX would be either Pdgfratm1.1(cre/ERT2)Blh/J or Pdgfrα-CreERT2.

      PDGFRa-CRE would be JAX# 013148.

      The transgenic lines have been noted in the methods, and we have edited the text of the manuscript to reflect the correct names of these lines. For the supplementary figure 4 which compares Gli1-CreERT2 to Pdgfrα-CreERT2, we left our prior nomenclature intact because it better reflects that each of these lines are haploinsufficient at their targeted loci, and that the controls are cre-negative littermates.

      We did not use the PDGFRa-CRE line (JAX# 013148).

      Reviewer #3 (Recommendations For The Authors):

      - More transparency about the single-cell analysis is required: 1) how are cell types and clusters defined? 2) what strategy was used for ambient RNA? 3) how do the controls compare with recently published mouse developmental datasets? 4) how does this model compare with the single-cell dataset published by Thibeault et al in 2021 (neonatal hyperoxia x 14 days with multiple time points used)?

      See responses above.

      - Tissue level validation of these findings is essential by RNA ISH or IF. While validation that the same process is at play in human tissue would be ideal, if this is not available, the conclusions must be tempered in the discussion.

      See responses above.

      - Is this more mild neonatal injury reversible in mice? As noted above, more characterization of this model (and placing it in the context of other more widely published models would be helpful).

      See responses above.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Weaknesses:

      (1) Important details about the nature of DEG comparisons between the wild type and the Lrrk2 G2019S model are missing.

      Please see the recommendations section below for specific responses to individual comments from Reviewer #1.

      (2) Some aspects of the integration between snRNA-seq and MERFISH data are not clear, and many MERFISH-identified cells do not appear to have a high-confidence cluster transfer into the snRNA-seq data space. Imputation is used to overcome some issues with the MERFISH dataset, but it is not clear that this is appropriate.

      Please see the recommendations section below for specific responses to individual comments from Reviewer #1.

      Reviewer #2 (Public review):

      (1) In the GO pathway analyses (both GSEA and DEG GO), I did not see a correction applied to the gene background considered. The study focusses on dopaminergic neurons and thus the gene background should be restricted to genes expressed in dopaminergic neurons, rather than all genes in the mouse genome. The problem arises that if we randomly sample genes from dopaminergic neurons instead of the whole genome, we are predisposed to sampling genes enriched in relevant cell-type-specific roles (and their relevant GO terms) and correspondingly depleted in genes enriched in functions not associated with this cell type. Thus, I am unsure whether the results presented in Figures 8 and 9 may be more likely to be obtained just by randomly sampling genes from a dopaminergic neuron. The background should be limited and these functional analyses rerun.

      Thank you for pointing out this important concern. We agree that overrepresentation analyses (ORAs) are vulnerable to selecting cell-type specific markers as significantly differentially expressed and thus inflating detection of cell-type associated gene sets rather than those truly altered as a function of experimental condition. We have thus re-run the GO analyses in our study with the genetic background being adjusted for each individual comparison. For dataset-level GO in Fig 8, genetic background was defined as genes with expression detected in at least 5% of all cells (to approximate the inclusion of cluster-specific genes). For comparisons of subsets within the dataset (i.e. a family or cluster) across conditions, a minimum detection level of 10% of cells was used to define the genetic background. These same thresholds were applied to filter the DEG lists used as input for GO. Interestingly, this correction appears to have filtered out or lowered the significance of some of the more generic brain-associated pathways that we initially presented, such as axonogenesis or learning and memory, and we feel even more confident in our original interpretation.

      Functional class scoring methods like GSEA, however, are unlike ORAs in that they do utilize a hypergeometric test to calculate overrepresentation as no distinction is made between significant and non-significant differential gene expression (nor is a genetic background provided as input to this tool). GSEA takes as input the full DE results, ranking genes according to their association with either group. Thus, genes simply enriched in DA neurons should be present towards both extremes of the rank list, rather than uniformly skewed toward one extreme. Per the GSEA authors’ user manual and original source paper, the entirety of DE testing should be provided as input for GSEA (barring genes with detection levels so low that their differential expression and/or ranking is likely to be artifactual):

      “The GSEA algorithm does not filter the expression dataset and generally does not benefit from your filtering of the expression dataset. During the analysis, genes that are poorly expressed or that have low variance across the dataset populate the middle of the ranked gene list and the use of a weighted statistic ensures that they do not contribute to a positive enrichment score. By removing such genes from your dataset, you may actually reduce the power of the statistic and processing time is rarely a factor as GSEA can easily analyze 22,000 genes with even modest processing power. However, an exception exists for RNA-seq datasets where GSEA may benefit from the removal of extremely low count genes (i.e., genes with artifactual levels of expression such that they are likely not actually expressed in any of the samples in the dataset).” [https://www.gsea-msigdb.org/gsea/doc/GSEAUserGuideFrame.html]

      In our study, this filtering of very low expression genes (to account for artifactually inflated fold changes or a large number of ties in the rank list that are subsequently ordered at random) occurred at the level of DE testing using the Seurat FindMarkers command, in which differential expression calculations were only performed for genes that were detected in a minimum of 10% of cells in the dataset.

      (2) In the scRDS results, I am unsure what is significant and what isn't. The authors refer to relative measures in the text ("highest") but I do not know whether these differences are significant nor whether any associations are significantly unexpected. Can the x-axis of scRDS results presented in Figure 9 H and I be replaced with a corrected p-value instead of the scRDS score?

      An important distinction should be made here between scDRS and similar approaches that utilize overrepresentation analyses to assess for associations of DEGs with putative risk genes, similar to the GO analyses performed in our paper. The scDRS score represents the relative association for each individual cell’s expression profile (among all other cells in the dataset) with PD risk loci by utilizing the underlying SNPs and associations described in GWAS summary statistics (see Methods or Zhang et al., Nat Genetics 2022 for more details). While scDRS can be used to generate a p value for each individual cell in the dataset, scDRS does not have a native method for defining group-level p values, nor have we attempted to calculate group-level p values here. In order to compare cluster-level mean scDRS scores and determine their significance, we created bootstrapped 95% confidence intervals for the mean scDRS score of each cluster or family (shown by the error bars in forest plots 9G, 9H). A score of 0 represents the null hypothesis of no association between gene expression and PD risk loci, and thus if the 95% confidence interval does not overlap 0, the mean scDRS score for a given group can be regarded as significant as there is a less than 5% chance of the true group mean containing the null. Similarly, groups can be compared to each other in the same way to determine if the group-level mean scDRS score is significantly different across a given pair. However, this overlap of confidence intervals should be interpreted cautiously, as there are a large number of potential comparisons that can be made, creating the potential for Type I error. We have added language to clarify what the scDRS score represents, and to ensure it is not conflated with approaches such as GO or GSEA.

      (3) The results discussed at the bottom of page 13 [page 14 of new version] state that 48.82% of the proteins encoded by the Calb1 DEGs have pre-synaptic localisations as opposed to 45.83% of the SOX6 DEGs, which does not support the statement that "greater proportions of DEGs are associated with presynaptic locations in cells from vulnerable DA neurons (Sox6 family, [and in particular,Sox6^tafa1]), compared to less vulnerable ones (Calb1 family)".

      Thank you for pointing this out; the error here lies in the wording of the results. The percentages mentioned above describe the percentages within the synaptic localized genes rather than the total DEG lists. We have rephrased this section for clarity to include both the percentages within this category as well as the total (the results of which are in line with our original statement).

      (4) While an interest in the Sox6^tafa1 subtype is explained through their expression of Anxa1 denoting a previously identified subtype associated with locomotory behaviours, it was unclear to me how to interpret the functional associations made to DEGs in this subtype taken out of context of other subtypes. Given all the other subtypes, it is not possible to ascertain how specific and thus how interesting these results are unless other subtypes are analysed in the same way and this Sox6^tafa1 subtype is demonstrated as unusual given results from other subtypes.

      In our study, we chose to specifically focus on this population given its unique acceleration-locked functional activity pattern observed in Azcorra & Gaertner et al, Nat Neuro 2023, as there are technical limitations that warrant cautious application of the above approach. We agree that the associations of this population to the described DEGs cannot be interpreted as unique to this population given the data presented and have added language to this effect within the text. There are two major challenges to analyzing all other subtypes to provide a comparison. Firstly, given the number of subtypes involved and number of downstream analyses, it is computationally intensive to carry out this analysis. More importantly however, the results cannot be easily compared across different populations due to the variability in both cluster size and internal heterogeneity of each cluster, as the statistical power in calculating DEGs will be inherently different across these populations (i.e. smaller or more heterogenous clusters would be expected to show a lower number of DEGs reaching significance). While pseudo bulk testing is effective for mitigating these factors, our limited sample number (n=2 independently generated datasets per group) dramatically underpowers differential expression testing using pseudo bulk analysis. One solution is to uniformly limit each cluster size to the minimally observed cluster size through random down-sampling. While this allows the ‘n’ in DE calculations to be uniform, this potentially worsens the problem of internal heterogeneity, which would remain roughly constant but in the setting of a lower ‘n’, increasing the variability in results for larger clusters. To provide a comparator for the population of interest we focused on, we have performed this down sampling approach in order to compare Sox6^Tafa1 to another cluster within the VTA, Calb1^Stac, that also expresses high levels of Anxa1 and Aldh1a1 given the broad interest in these markers as proxies for vulnerability. The results of this comparison are now shown in Figure S10.

      (5) On p12, the authors highlight Mir124a-1hg that encodes miR-124. This is upregulated in Figure 8D but the authors note this has been to be downregulated in PD patients and some PD mouse models. Can the authors comment on the directional difference?

      We have adjusted the text to reflect this discrepancy and speculate on why this may be observed. In short, one hypothesis is that miR-124, given its proposed neuroprotective effects, is increased in DA neurons facing toxic metabolic insults as a compensatory response. In our prodromal model without observable degeneration, this could represent an early sign of cell stress. While speculative, in PD patients or overtly degenerative models, lack of compensatory miR-124 or fulminant cell death among vulnerable cells could result in an observed decrease in miR-124 expression.

      (6) Lastly, can the authors comment on the selection of a LogFC cut-off of 0.15 for their DEG selection? I couldn't see this explained (apologies if I missed it).

      The 0.15 cutoff was selected arbitrarily based on the observed range of fold changes seen among our differentially expressed genes. However, importantly, this cutoff was not used for defining DEGs for downstream analyses such as GSEA or GO, nor for defining significance of differential expression, which was done purely based on FDR-adjusted p values <0.05. The selection of 0.15 affects only the coloring seen in the volcano plot, which we have decided to move to supplemental figures given the uniformly small effect size seen in individual genes and a separate reviewer comment regarding concern in the field over differential expression testing methods in single-cell datasets. Instead, this figure now focuses on highlighting pathway- and gene-set level comparisons that can provide easier interpretation of small, but concordant changes across swaths of genes.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) In the MERFISH dataset, only around half of the DAergic cells (2,297 of 4,532) were successfully projected into the snRNA-seq UMAP space, based on a similarity score > 0.5. Additionally, key transcripts that were used to define the snRNA-seq clusters (such as Sox6) were not identified at all in the MERFISH dataset. This raises some questions about the ability to integrate and compare these datasets directly, which are not fully considered in the manuscript. These discrepancies are smoothed over using imputation, which allows specific class-defining genes such as Sox6 to be plotted on spatial coordinates in Figure 4D. However, imputation is not without caveats, and the appropriateness of the imputation is not well considered in the text.

      We fully agree with the reviewer that the use of an imputation approach needs to be clarified and justified thoroughly. We added a sentence to better clarify the process of imputation on Page 9 “The imputed gene expression is extrapolated from anchors established from pairwise correspondences of cell expression levels between MERFISH and snRNA-Seq datasets.” This pair-wise cell correspondence as defined by anchors can be assessed using Seurat confidence score. We acknowledge the fact that only about 50% of cells could confidently be transferred onto the snRNA-Seq data. This is the result of using a stringent confidence level of 0.5 (similar to previous publications, PMID: 38092916 & 38092912). We preferred mapping fewer high-confidence cells than potentially misrepresenting the spatial location of some of these clusters.

      It is also important to demonstrate the reliability of gene imputation. Indeed as pointed out by the reviewer, some probes such as Sox6 were not detected in the MERFISH dataset. To strengthen our data integration and as already mentioned in the manuscript, we excluded 219 genes based on the deviation of average counts per cell between the datasets. The fact that the imputed expression of Sox6 perfectly reflects its well-characterized distribution (PMIDs: 25127144, 30104732, 25437550, 34758317) strengthened our confidence in our imputation pipeline. We also looked at the correlation of imputed gene expression with the detected transcripts in our MERFISH experiments. We added a new supplemental figure (S7) highlighting the correlations between MERFISH and imputed gene expression of 8 genes (4 for each Sox6 and Calb1 family). Together Fig S6 and S7 show the range of correlations between imputed and actual MERFISH transcript. Altogether, we can observe relatively high correlation between the number of detected transcripts per gene in snRNA-Seq and MERFISH datasets

      In addition, we added a paragraph discussing limitations of gene expression imputation on page 17: “A strength of our study is that it utilizes advantages of each transcriptomic approach, the deep molecular profiling of individual cells using snRNA-Seq and the spatial resolution of MERFISH. For instance, we relied on gene expression imputation to ascribe expression level to genes not covered/detected in our MERFISH probe panel. Gene imputation as described by Stuart et al.(92) has been used in several recent studies integrating spatial and transcriptomic data(46, 47). It relies on identifying anchors that enable projection of MERFISH data onto the UMAP space of a snRNA-Seq dataset and then uses neighboring cells to extrapolate the expression of genes not included in our probe panel. This approach was used to impute Sox6 expression, which accurately reflects what has been reported in prior immunofluorescence and in situ hybridization studies(11, 27, 38, 43, 55). Moreover, imputed gene expression levels correlated strongly with MERFISH detected transcript for most genes further supporting our approach (Fig S6 and S7). Nevertheless, dataset integration has limitations that should be considered. First, imputed gene expression relies on the ability to identify reliable anchors linking the snRNA-Seq and MERFISH datasets. These anchors are determined in part by the choice of genes included on probe panels and thus could indirectly influence the reliability of imputed gene expression. Secondly, gene counts per cell in MERFISH are determined via segmentation of images, which is susceptible to artifacts and bias from centrally versus peripherally localized gene transcripts. In summary, although limitations are present in multi-modal transcriptomic analyses, merging these two approaches provided a molecular and spatial map of the DA system that could not have been resolved by either method alone.”

      (2) In the discussion, the authors argue that the cellular classifications identified here for DA neurons are more likely to reflect discrete cell types than cell states. The rationale for this conclusion is largely based on the absence of subtype differences between wild-type and LRRK2 G2019S transgenic mice. I do not find this argument to be convincing, because it is still possible that certain subdivisions simply reflect dynamic cell states that are also not grossly altered in the mutant mouse. A stronger argument for this claim would be to include trajectory-based analyses that do not show predicted transition points between nearby or related clusters.

      We thank the reviewer for pointing out this particular limitation as differentiating “cell type” and “cell states” been debated in the field for years with no consensus emerging how to address the issue. As suggested, we performed a trajectory analysis using Monocle3 on both control and Lrrk2 samples. We’ve built the trajectory map, taking cluster 20 as the starting node. To avoid potential biased trajectories induced by different cell coverage, we’ve down sampled the Lrrk2 condition to match the number of cells of wildtype. As expected, since most of the DA clusters are not segregated in the UMAP space, the trajectory analysis showed predicted transitions between clusters (see Author response image 1A and 1B). Even though some clusters’ pseudotime score were statistically different between the wildtype and Lrrk2 samples, they overall remained similar (Author response image 1C). This analysis suggests that the LRRK2G2019S mutation induces a mild transcriptional perturbation but does not result in a major cell state drift. Indeed, we believe changes in the observed trajectory path would disappear as the number of cells analyzed increases. Because of this bias introduced by cell coverage, we prefer not to include this trajectory analysis in the manuscript to avoid misleading readers. Thus, as suggested by the reviewer, we softened our claim to “This suggests that our taxonomic scheme is agnostic to a mild perturbation such as LRRK2G2019S, suggesting that our clusters are reflective of cell types, rather than cell states. It is possible that with more severe perturbations, such as a toxin lesion, more substantial alterations of taxonomic schemes are observed(86, 93). However, we expect that for mild insults, day to day behavioral changes, or pharmacological paradigms, our clusters will be resistant to changes, although individual gene levels may vary. Nonetheless, we cannot definitively confirm that a given DA neuron cannot convert from one subtype to another. Ultimately, alternative approaches such as detailed fate mapping of clusters or RNAseq-based trajectory analyses with greater numbers of sampled cells could be used to resolve this question.”.

      Author response image 1.

      A)Trajectory analysis of wildtype and B) LRRK2<sup>G2019S</sup> samples. C) Pseudotime scores for each cluster across wildtype and Lrrk2 conditions. Error bars represent the confidence of error for false positives discovery rate of 5%.

      (3) The relationship between individual samples, GEMwell, and sequenced library should be clarified. If independent samples were combined into one GEMwell, this should be explicitly stated for clarity.

      We have revised the text to better clarify the methodology. In brief, each of our 4 independent samples (2 control, 2 mutants; equal sexes per sample) were isolated from n=2 pooled mice (for a total n=8 mice across the 4 samples). Each sample was processed in its own GEM well to produce 4 distinct libraries that were subsequently sequenced and analyzed as described.

      (4) Please include more details on DEG testing in the manuscript, this is key for interpreting the robustness of certain findings. Ideally, pseudobulked comparisons would be used here (given concerns in the field that DEG testing where N = number of cells artificially inflates the statistical power, violates assumptions of independence, and results in false positive DEGs).

      While we agree that pseudobulk analysis would be ideal for reducing false positives, our study, while exceptionally large in total numbers of DA cells profiled, was generated from 4 total 10X libraries as described above, without any mechanism to definitively demultiplex to the original n=8 source mice. Thus, pseudobulk comparisons would be performed using only n=2 per group, which is below the recommended sample size for these methods. Given this concern, we have moved the volcano plot from Figure 8D to the supplementals and added language to the methods and relevant figure legend acknowledging the limitation in Seurat’s default differential expression analysis methodology.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      Summary: 

      The innate immune system serves as the first line of defense against invading pathogens. Four major immune-specific modules - the Toll pathway, the Imd pathway, melanization, and phagocytosis- play critical roles in orchestrating the immune response. Traditionally, most studies have focused on the function of individual modules in isolation. However, in recent years, it has become increasingly evident that effective immune defense requires intricate interactions among these pathways. 

      Despite this growing recognition, the precise roles, timing, and interconnections of these immune modules remain poorly understood. Moreover, addressing these questions represents a major scientific undertaking. 

      Strengths: 

      In this manuscript, Ryckebusch et al. systematically evaluate both the individual and combined contributions of these four immune modules to host defense against a range of pathogens. Their findings significantly enhance our understanding of the layered architecture of innate immunity. 

      We thank the reviewer for their kind assessment.

      Weaknesses: 

      While I have no critical concerns regarding the study, I do have several suggestions to offer that may help further strengthen the manuscript. These include: 

      (1) Have the authors validated the efficiency of the mutants used in this study? It would be helpful to include supporting data or references confirming that the mutations effectively disrupted the intended immune pathways. 

      We have done so in Figure 1.

      (2) Given the extensive use of double, triple, and quadruple mutants, a more detailed description of the mutant construction process is warranted. 

      We now provide a supplement (File S1) that details the successive genetic crosses and recombinations that were required to generate these compound fly stocks carrying multiple mutations. We also provide some information regarding rapid screening of stocks for phenotypes. Of note some of these fly stocks have been deposited at VDRC as they will be useful to fly community to assess immune modules in a controlled background, and complete stock information will be tied to these stocks there.

      Reviewer #2 (Public review): 

      Summary: 

      In this work, the authors take a holistic view of Drosophila immunity by selecting four major components of fly immunity often studied separately (Toll signaling, Imd signaling, phagocytosis, and melanization), and studying their combinatory effects on the efficiency of the immune response. They achieve this by using fly lines mutant for one of these components, or modules, as well as for a combination of them, and testing the survival of these flies upon infection with a plethora of pathogens (bacterial, viral, and fungal). 

      Strengths: 

      It is clear that this manuscript has required a large amount of hands-on work, considering the number of pathogens, mutations, and timepoints tested. In my opinion, this work is a very welcome addition to the literature on fly immune responses, which obviously do not occur in one type of response at a time, but in parallel, subsequently, and/or are interconnected. I find that the major strength of this work is the overall concept, which is made possible by the mutations designed to target the specific immune function of each module (at least seemingly) without major effects on other functions. I believe that the combinatory mutants will be of use for the fly community and enable further studies of the interplay of these components of immune response in various settings. 

      To control for the effects arising from the genetic variation other than the intended mutations, the mutants have been backcrossed into a widely used, isogenized Drosophila strain called w1118. Therefore, the differences accounted for by the genotype are controlled. 

      I also appreciate that the authors have investigated the two possible ways of dealing with an infection: tolerance and resistance, and how the modules play into those. 

      We thank the reviewer for their kind assessment. 

      Weaknesses: 

      While controlling for the background effects is vital, the w1118 background is problematic (an issue not limited to this manuscript) because of the wide effects of the white mutation on several phenotypes (also other than eye color/eyesight). It is a possibility that the mutation influences the functionality of the immune response components, for example, via effects of the faulty tryptophan handling on the metabolism of the animal. 

      I acknowledge that it is not reasonable to ask for data in different backgrounds better representing a "wild type" fly (however, that is defined is another question), but I think this matter should be brought up and discussed. 

      We agree with the reviewer and have included caveats on the different genetic effects brought about the combinatory mutant approach including differences in white gene status, insertion of GFP or DsRed markers, and nature of genetic mutations (Line 142-on).

      “Of note, the strains used in this study differ in their presence/absence of the white<sup>+</sup> gene, present in the PPO1<sup>∆</sup>, NimC1<sup>1</sup> and eater<sup>1</sup> mutations.  In addition to its well established function in eye pigmentation, the white gene can also impact host neurology and intestinal stem cell proliferation (Ferreiro et al., 2017; Sasaki et al., 2021). We did not observe any obvious correlations between white<sup>+</sup> gene status and susceptibilities in this study. Moreover,  in a previous study looking at the cumulative effects of AMP mutations on lifespan, white gene status and fluorescent markers did not readily explain differences in longevity (Hanson and Lemaitre, 2023). We therefore believe that the extreme immune susceptibility we have created through deficiencies for pathways regulating hundreds of genes, or major immune modules, overwhelms the potential effects of white<sup>+</sup> and other transgenic markers. For additional information on which stocks bear which markers, see discussion in Supplementary file 1.”

      Of interest, we were highly conscious of this concern in working with combinatory AMP mutants which differed in white, GFP, and DsRed copies. However, even over the many weeks of snowballing effects on microbiota community composition and structure, we found no trends tied strictly to white+ or to other genetic insertions on lifespan (Hanson and Lemaitre, 2023; DMM).

      The whole study has been conducted on male flies. Immune responses show quite extensive sex-specific variation across a variety of species studied, also in the fly. But the reasons for this variation are not fully understood. Therefore, I suggest that the authors conduct a subset of experiments on female flies to see if the findings apply to both sexes, especially the infection-specificity of the module combinations.  

      We thank the reviewer for this suggestion. We have performed the requested experiments, and include female survival trends in Figure 4supp1. We have added the following text to the main manuscript (Line 554):

      “All survival experiments to this point were done with males. We therefore assessed key survival trends for these infections in females to learn whether the dynamics we observed were consistent across sexes (Figure 4supp1). For all three pathogens (Pr rettgeri, Sa aureus, C. albicans) the rank order of susceptibility was broadly similar between males and females, with higher rates of mortality in females overall. Thus, we found no marked sex-bygenotype interaction. Interestingly, the greater susceptibility of females in our hands is true even for ∆ITPM flies, although there are only a few surviving flies on which we can base these conclusions. However, these data may suggest the sexual dimorphism in defense against infection that we see against these pathogens is due to factors independent of the immune modules we disrupted.”

      It is worth noting that male-female sex dichotomies in infection are inconsistent across the literature, with strong lab-specific effects (Belmonte et al., 2020 and personal observation). In our lab setting, we consistently see female mortality higher than males when compared, independent of pathogen and mutant background. We have not seen notable interaction terms of sex and genotype for most immune deficient mutants. It is quite interesting to have done these experiments with ITPM, however, which reveals that there is at least a trend suggesting this dichotomy is independent of the four immune modules we deleted. Still, our infection conditions kill most males, and so it would be good to replicate this sex-specific ∆ITPM result in a dedicated study with doses chosen to improve the resolution of male-female differences. For now, we prefer to use conservative language and avoid overinterpreting this trend, but do feel it merits mentioning.  

      Recommendations for the authors:

      Comment on statistical requests

      Both reviewers requested further clarity on the statistical analyses supplemental to Figure 3. We haved address these comments as follows.

      First, we now provide an additional supplementary .zip file containing summary statistics for all survival data in Figure 3 (Supplementary File 3). We have additionally added this text to line 226 to make this data treatment more clear:

      …” we chose to focus on major differences apparent in summary statistics,Highlighting”…

      And we highlight that all survival data are also provided as Kaplan-Meier survival curves in the main or supplementary figures in Line 233:

      “Kaplan-Meier survival curves for all experiments are provided in the main text or supplementary information”.

      Second, as outlined in the main text, we were unable to sample across all pathogenby-genotype interactions systematically, and this unfortunately obfuscates robust statistical modelling. We addressed the challenge of finding meaningful statistical differences by focusing on trends only if they were i) consistent across experimental replicates, ii) of a consistent logic across comparable genotypes, ensuring random inter-experimental noise was not unduly shaping interpretations, and iii) of a mean lifespan difference ≥1.0 days compared to wild-type, and compared to relevant unchallenged or clean-injury controls. This last choice was especially important because not all experimental replicates included all genotypes due to challenges of animal husbandry and coordination among multiple researchers over five years of data collection. As a result, our initial analyses using a cox mixed-effects model found it to be rather useless, being insensitive to important experiment batch effects visible to the eye because statistically-affected genotypes were not present in all experiments.

      We therefore ensured that behaviour relative to controls within* experiments was consistent, rather than the comparison of genotypes to controls across the sum of experiments with a post-hoc treatment attempting to apportion variance to experiment batch (but unable to do so for some genotypes and some batches). Due to differeces in baseline health and the dynamics explained by studies like Duneau et al. (2017; eLife, there is an expected unequal variance of genotype*pathogen interactions across experiment batches. Unfortunately, this unequal variance, coupled with incomplete sampling across experiment batches, means “highly significant” differences can emerge that don’t hold up to scrutiny of comparisons to controls taken only from within an experiment batch. Thus, we chose to forego a cox mixed effect model approach entirely. Instead, our highly conservative approach, focusing on only very large effects with a mean lifespan difference ≥1.0 days, mitigates these issues. We have taken great care to ensure that any results we highlight stand up to inter-experiment batch effects. We would further draw the reviewers’ attention to our response to Reviewer 2 relating to Figure 3, which emphasizes the level of conservativism that we are applying.

      At the end of the Discussion, we have added the following sentence to emphasize these limitations:

      “…a combinatorial mutation approach to deciphering immune function can be extended even to the broad level of whole immune modules. Of note, we were unable to systematically sample all genotype-bypathogen interactions equally. We have therefore been highly conservative in our reporting of major effects. There are likely many important interactions” not discussed in our study. Future investigations may highlight important biology that is apparent in our data, but which we may not have mentioned here. To this end, we have deposited our isogenic immunity fly stocks in the Vienna Drosophila Resource Centre to facilitate their use. Beyond immunity, our tools can also be of use to study various questions at the cutting edge of aging, memory, neurodegeneration, cancer, and more, where immune genes are repeatedly implicated. We hope that this set of lines will be useful to the community to better characterize the Drosophila host defense.”

      We recognise this response may not fully satisfy the reviewers’ requests. While use of summary statistics is simple, our rules for highlighting interactions of importance are defined, readily understood and interpreted, and draw attention to key trends in that are backed by a solid understanding of the data and its limitations. We have taken this approach out of a responsibility to avoid making spurious assertions that stem from underpowered statistical models rather than from the biology itself.

      Reviewer #1 (Recommendations for the authors): 

      (1) Lines 1092-1093 - Please double-check the labeling of the panels in Figure 2. It appears that panels A and C correspond to single-module mutants, whereas panels B and D refer to compound-module mutants. 

      We have modified Figure 2 and Figure 2supp1 labelling. We also realise there was an error in the column titling that contributed to the confusion. We hope the new layout is clear, and thank the reviewers for noting this issue.

      (2) Lines 347-377 - Figure 2D is not cited in the text. 

      We now cite Fig2D in Line 356.

      (3) P values should be indicated in Figure 2 and Figure 3 for all relevant comparisons. Additionally, "ns" (not significant) should be added in Figure 5A-B. 

      We make the effort to show key uninfected survival trends in Figure 2, and list the total flies (n_flies) in Fig3 to provide the reader with the underlying confidence in the trends observed. We focus on differences of mean lifespan of at least 1 day, and which are consistent in direction across combinatory mutations.  We have avoided the multiple comparisons of cox proportional hazard survival analyses throughout this study because they are overly sensitive for our purposes, as we have previously when systematically comparing many genotypes to each other (see Hanson and Lemaitre, 2023; DMM).

      (4) Minor points: Hml-Gal4, UAS-GFP should be italic; Line 192-- "uL" and "uM"; Line 596: P>.05.

      We have made these changes. We’re unsure what the comment regarding P>.05 referred to, but have removed spaces and made it non-italics. 

      Reviewer #2 (Recommendations for the authors): 

      Statistical analyses and their outcomes are clearly indicated only for the data in Figure 1 and Figure 5 and in the supplement for Figure 1, while they are not reported/not easily accessible for other data. For the main figures, statistics should be indicated in the figure for an easier assessment of the data. In case of multiple comparisons potentially crowding the plots too much, statistics may be in a supplementary file/table. 

      See response above.

      In case of the hemocytes, besides phagocytosis, I would think that ROS generation via the DUOX/NOX system is also an integral part of the immune response against pathogens, and that has not been included here. That might be an interesting addition for future experiments. As the NimC1, eater double mutant flies are said to have fewer hemocytes, it is possible that this function of the hemocytes is affected as well. This could be commented on in the text. 

      The reviewer raises a good point. The role of DUOX and NOX in ROS responses is not assessed in our study. To our knowledge, DUOX and NOX participate primarily in the wound repair response, or in epithelial renewal at damage sites or in the gut. In our study on systemic immunity, we did not assess the role of clotting, the precise function of ROS, and we have missed other host defense or stress response mechanisms as well (e.g. constitutively-expressed AMP-like genes, TEPs, JAK-STAT) that likely play a role in the systemic immune defense. Considering the lethality caused by Nox and Duox mutation, there would be inherent genetic difficulties to recombine these as multiple mutations. Unfortunately, this makes it  difficult to include these processes in our analysis in a systematic manner.  We are already happy to have generated fly lines lacking four immune modules simultaneously, even if they are not fully immune deficient. We have mentioned this point in the discussion (Line 613-on).

      Of note, the NimC1, eater double mutants actually have decreased hemocyte counts at the adult stage (Melcarne et al,. 2019). Thus NimC1, eater double mutants are not impaired only in phagocytosis, but the overall cellular response. We make a point to outline this in Line 225-257, and 607.

      I think it could be mentioned that the melanization response at larval stage (against parasitoids) functions differently from the melanization described here (requiring hemocyte differentiation and PPO3).

      A good point. We have added this mention in Line 97:

      “In addition, a third PPO gene (PPO3) is specifically expressed by lamellocytes, specialized hemocytes that differentiate in larvae responding to and enveloping invading parasites (Dudzic et al., 2015)”.

      Overall, the clarity of the figures and figure legends could be worked on to make them a bit easier to follow. Below are some of my suggestions: 

      (1) In Figure 2, adding headings to parts C & D (similarly to A & B) would make it easier to follow what is happening in the figure at a glance. Also, it is rather difficult to visually follow which strain is which in the plots. I'd suggest adding the key/legend for single mutants below 2A & B, and the key for the double mutants below C & D. If a mutant is present in A & B and in C & D, it could be included in both keys. I also think that it would be intuitive to present the single mutants by dashed lines and double mutants by continuous lines (or vice versa), so that one would easily distinguish between them. Of note, the figure legend says that A & B are single mutants, but for example in B there are also some double mutants (?). 

      We have modified Figure 2 and Figure 2supp1 labelling. We also realise there was an error in the column titling that contributed to the confusion. We hope the new layout is clear, and thank the reviewers for noting this issue.

      (2) In Figure 3, it looks like ΔMel is almost identical to controls in the clean injury survival, but in Figure 2C, it is clearly doing worse. I might be missing something here, but would like the authors to clarify the matter. Also, the meaning of the numbers in the heat map could be explained in the figure legend and/or added to the figure (color key). 

      The reviewer is correct. We thank the reviewer for this astute observation. Inadvertently, we used an old version of the Figure 2 preparation where only a subset of experiments was entered in the Prism data file rather than the total data used to inform Figure 3. This issue affected all genotypes.

      We have reviewed the data in Figure 2, Figure 2supp1, and Figure 3, and updated these figures accordingly to ensure they represent the full survival data. We have also incorporated new experiments into the sum data related to male-female differences and to fill gaps in the data from the 1<sup>st</sup> submission. We will also note due to the nature of 1<sup>st</sup> decimal rounding that the difference between WT and ΔMel appears slightly underrepresented: the true difference (over the 7-day lifespan) is 0.37. We’ve provided a version of this figure rounded to 2 decimal places below, but prefer the simpler 1 decimal place in the main text for readability. The updated Figure 2 shows the full data in Figure 3 accurately.

      We will also take this opportunity to highlight how conservative our ≥1.0 days difference approach is. Breaking down survival curve patterns in Figure 2 relative to mean differences in Figure 3, for clean injury, approximately ~75% of ΔMel flies survive to day 7 with mortality mostly taking place between days 3-7. The result is a mean lifespan of 6.37 days. On a survival curve, this difference appears quite strong, but in our mean lifespan table the difference is rather muted (WT vs. ΔMel difference = 0.37 days). Thus, differences of ≥1.0 days reflect very strong trends in survival data that are near-guaranteed to be independent of experimental noise. While we note issues that prevented us from a fully systematic sampling for all experiments, we are confident that the ≥1.0 day differences we highlight, using the rules explained in the main text, are robust. While this approach could be seen as overly conservative, it is our preference in this initial study, containing combinations of 25 treatments and 14 genotypes, to be highly conservative. Future studies may investigate other strong differences we have not highlighted, and the data we provide here can help generate expectations and guide those studies.

      Author response image 1.

      Figure 3 with 2 decimals places of rounding for mean lifespans. The 7-day clean injury mean lifespan of WT is 6.74 days, and of ΔMel is 6.37 days. Due to rounding, in the 1 decimal Figure 3 this difference appears as if it is only 0.3 days, but it closer to 0.4 days. Regardless, this level of difference, which appears rather clearly in a survival curve, is well below the level of difference we have chosen to highlight in our study.

      (1) Figure 4: I find it very tedious to compare CFUs among different mutants from the plots. As the idea is to compare bacterial loads among the mutants at different timepoints, it would be easier to compare them if the data were shown within a timepoint (CFUs of each mutant at 2h, at 6h, and so on). This is also how the results are written in the text (within a time point). Would it also be clearer if the CFU plots were named, for example: " A', B', and C'"? 

      We appreciate this note. We feel both representations have merits and pitfalls, but prefer our original design showing the progression of bacterial growth within genotype first. However, we have added dotted lines representing the wild-type bacterial loads at 2hpi, 12hpi, and 24hpi to assist the reader in making acrossgenotype comparisons at key time points. Like this, the reader can see if the error bars (StDev) overlap the mean of the wild-type, and so make more intuitive judgements about whether these differences are meaningful.

      (2) Figure 2D is not referred to in the text. 

      We now cite Fig2D in Line 356.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      The modeling approaches are very sophisticated, and clearly demonstrate the selective nature of acute ketamine to reduce the impact of trial losses on subsequent performance, relative to neutral or gain outcomes. The authors then, not unreasonably, suggest that this effect is important in the context of the negative bias in interpreting events that is prominent in depression, in that if ketamine reduces the ability of negative outcomes to alter behavior, this may be a mechanism for its rapid acting antidepressant effects.

      However, there is a very strong assumption in this regard, as shown by the first sentence of the discussion which implies this is a systematic study of ketamine's acute antidepressant effects. In actuality, this is a study of the acute effects of ketamine on reinforcement learning (RL) modeled parameters. A primary concern here is that an effect presented as a "robust antidepressant-like behavioral effect" should be more enduring than just an alteration during the acute administration. As it is, the link to an "anti-depressant effect" is based solely on the selective effects on losses. This is not to say this is not an interesting observation, worthy of exploration. It is noted that a similar lack of enduring effects on outcome evaluation is observed in humans, as shown in supplemental fig. S4, but there is not accompanying citation for the human work.

      We agree with the reviewer that the way we linked the study results to ketamine’s antidepressant action can be misleading and based on a rather strong assumption which was not systematically tested in the study. We made the following changes to the manuscript:

      (1) These results constitute a rare report of a robust antidepressant-like behavioral effect produced by therapeutic doses of ketamine during acute phase (<1 hour) after injection (Introduction, 3rd paragraph, line 8-9 in the original manuscript).

      Changed to: These results constitute a rare report of an acute effect of therapeutic dose of ketamine on the processing of affectively negative events during dynamic decision-making.

      (2) We clarified in the Discussion that our study is to gain insights into, but not a systematic investigation of ketamine’s antidepressant action as follows:

      (2.1) A sentence was added (1st paragraph of Discussion): Using a token-based decision task and extensive computational modeling, we examined the behavioral modulation induced by therapeutic doses of ketamine to gain insights into possible early signs of ketamine’s antidepressant activity.

      (2.2) Consistent with the findings from humans, ketamine’s effect on outcome evaluation was acute and did not last over subsequent days (Supplemental Figure S4) (Discussion, 2nd paragraph, line 6-7 in the original manuscript).

      Changed to: While ketamine’s antidepressant effect is reported to be sustained over a week of period (5), ketamine’s effect on outcome evaluation was acute and did not last over subsequent days (Supplemental Figure S4). This discrepancy might be attributable to the possible differences in the state of brain network between healthy subjects and those with depression as well as the type of measures taken to assess ketamine’s effect.

      (2.3) A sentence was added (Discussion, last sentence of the 2nd paragraph) : Nevertheless, systematic studies are required to understand whether the reduced aversiveness to loss in our task might share the same mechanisms that underlie ketamine’s antidepressant action.

      One question that comes to mind in terms of the selectivity observed is whether similar work has been done to examine the acute effects of any other drugs. If ketamine is unique in this regard, that would be quite interesting.

      We think this is an interesting idea. However, comparing ketamine’s effect to that of other drugs is not the scope of the current study. We hope that we will be able to answer this question with future studies.

      Reviewer #2 (Public Review):

      Oemisch and Seo set out to examine the effects of low-dose ketamine on reinforcement learning, with the idea that alterations in reinforcement learning and/or motivation might inform our understanding of what alterations co-occur with potential antidepressant effects. Macaques performed a reinforced/punished matching pennies task while under effects of saline or ketamine administration and the data were fit to a series of reinforcement learning models to determine which model described behavior under saline most closely and then what parameters of this best-fitting model were altered by ketamine. They found a mixed effect, with two out of three macaques primarily exhibiting an effect of ketamine on processing of losses and one out of three macaques exhibiting an effect of ketamine on processing of losses and perseveration. They found that these effects of ketamine appeared to be dissociable from the nystagmus effects of the ketamine.

      The findings are novel and the data suggesting that ketamine is primarily having its effects on processing of losses (under the procedures used) are solid. However, it is unclear whether the connection between processing of losses and the antidepressant effects of ketamine is justified and the current findings may be more useful for those studying reinforcement learning than those studying depression and antidepressant effects. In addition, the co-occurrence of different behavioral procedures with different patterns of ketamine effects, with one macaque tested with different parameters than the other two exhibiting effects of ketamine that were best fit with a different model than the other two macaques, suggests that there may be difficulty in generalizing these findings to reinforcement learning more generally.

      (1) First, the authors should be more explicit and careful in the connection they are trying to make about the link between loss processing and depression. The authors call their effect a "robust antidepressant-like behavioral effect" but there are no references to support this or discussion of how the altered loss processing would relate directly to the antidepressant effects.

      We agree with the reviewer’s point on the way we made the connection between the study results and ketamine’s antidepressant action. This concern overlaps with the reviewer #1’s concern. Please refer to our response 2, 2-1, 2-2 and 2-3.

      (2) It appears that the monkey P was given smaller rewards and punishers than the other two monkeys and this monkey had an effect of ketamine on perseveration that was not observed in the other two monkeys. Is this believed to be due to the different task, or was this animal given a different task because of some behavioral differences that preceded the experiment? The authors should also discuss what these differences may mean for the generality of their findings. For example, might there be some set of parameters where ketamine would only alter perseveration and not processing of losses?

      Although the best-fitting ketamine model for monkey P includes an additional element – perseveration, we believe that monkey P’s baseline behavior and ketamine’s effect are not significantly different from the other two monkeys for the following reasons.

      First, monkey P was the first animal that we tested ketamine’s effect, and therefore we aimed to match the other two monkeys’ baseline behavior similar to monkey P’s behavior in order to reduce variability in ketamine’s effect potentially attributable to the difference in baseline behavior before pharmacological manipulation. We had to adjust the payoff matrix for the subsequent animals (Y and B) because these monkeys were more sensitive to loss, and seldom chose “risky” target (yielding loss). In order to make the other two monkeys’ behavior similar to that of monkey P, we adjusted the asymmetry between the risky and the safe target in the way that loss (neutral) outcome occurred from the safe (risky) target as well. Eventually, this adjustment made the baseline behavior similar across all three monkeys. The goal of the study was to reliably measure the ketamine’s effect, and not to study individual differences that can naturally occur with the same task parameters. Therefore, we believe that the adjustment of payoff matrix helped to reliably detect ketamine’s effect starting from the common baseline behavior.

      Second, the best-fitting model for monkey P (K-model 7) and that for the other two monkeys (K-model 4) make very similar predictions both qualitatively and quantitatively as are seen in the revised Figure 4. The parameters for outcome values estimated from these two models in monkey P are very similar as is seen in the revised Table 3. In addition, the difference in BIC between the model which includes only perseveration modulation (K-model 6) and the model incorporating outcome value modulation as well (K-model 7) is 441, whereas the difference in BIC between K-model 7 and the model that includes only outcome value modulation (K-model 4) is as small as 4. These BIC results indicate that the variability explained by ketamine’s modulation of outcome evaluation is remarkably larger that that explained by its modulation of perseveration in monkey P.

      Therefore, we conclude that ketamine’s effect was not significantly different between monkey P and the other two monkeys. We clarified this in the revised manuscript by adding the following paragraph in the Result section:

      “Unlike monkey Y and B, the best-fitting model for monkey P indicated that ketamine increased overall tendency to switch choice in addition to outcome-dependent modulation of outcome evaluation. However, BIC differed only slightly (dBIC = 3.99) between the best-fitting (K-model 7) and the second-best model (K-model 4) and the model predictions for choice behavior were very similar both qualitatively and quantitatively (Table 3, Figure 4). We conclude that the behavioral effects of ketamine were consistent across all three monkeys.”

      (3) The authors should discuss whether the plasma ketamine levels they observed are similar to those seen with rapid antidepressant ketamine or are higher or lower.

      We added a sentence in the first paragraph of the Result section as follows with a reference.

      “Plasma concentration and its time course over 60 minutes were also comparable to those measured after 0.5mg/kg in human subjects (35).”

      (35) Zarate CA, Brutsche N, Laje G, Luckenbaugh DA, Venkata SLV, Ramamoorthy A, et al (2012): Relationship of ketamine’s plasma metabolites with response, diagnosis, and side effects in major depression. Biol Psychiatry, 72: 331-338.

      (4) For Figure 4 or S3, the authors should show the data fitted to model 7, which was the best for one of the animals.

      We added the parameters and model predictions from both K-model 7 and K-model 4 for monkey P to help comparison between two models in Table 3, and Figure 4. Revised Table 3 and Figure 4 are as follows:

      Author response table 1.

      Maximum likelihood parameter estimates of the best models for saline and ketamine sessions.

      In all three animals, the model incorporating valence-dependent change in outcome evaluation best fit the choice data from ketamine sessions with (K-model 7 in the parenthesis, P) or without (K-model 4, P and Y/B) additional change in the tendency of choice perseveration (Figure 3, Table 3).

      Author response image 1.

      ketamine-induced behavioral modulation simulated with differential forgetting model (for saline session) and best-fitting K-model (for ketamine session).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Response to Public Comments

      (1) BioRxiv version history.

      Reviewer 1 correctly noted that we have posted different versions of the paper on bioRxiv and that there were significant changes between the initial version and the one posted as part of the eLife preprint process. Here we provide a summary of that history.

      We initially posted a bioRxiv preprint in November, 2021 (Version I) that included the results of two experiments. In Experiment 1, we compared conditions in which the stimulation frequency was at 2 kHz, 3.5 kHz, or 5.0 kHz. In Experiment 2, we replicated the 3.5 kHz condition of Experiment 1 and included two amplitude-modulated (AM) conditions, with a 3.5 kHz carrier signal modulated at 20 Hz or 140 Hz. Relative to the sham stimulation, non-modulated kTMP at 2 kHz and 3.5 kHz resulted in an increase in cortical excitability in Experiment 1. This effect was replicated in Experiment 2.

      In the original posting, we reported that there was an additional boost in excitability in the 20 Hz AM condition above that of the non-modulated condition. However, in re-examining the results, we recognized that the 20 Hz AM condition included an outlier that was pulling the group mean higher. We should have caught this outlier in the initial submission given that the resultant percent change for this individual is 3 standard deviations above the mean. Given the skew in the distribution, we also performed a log transform on the MEPs (which improves the normality and homoscedasticity of MEP distributions) and repeated the analysis. However, even here the participant’s results remained well outside the distribution. As such, we removed this participant and repeated all analyses. In this new analysis, there was no longer a significant difference between the 20 Hz AM and non-modulated conditions in Experiment 2. Indeed, all three true stimulation conditions (non-modulated, AM 20 Hz, AM 140 Hz) produced a similar boost in cortical excitability compared to sham. Thus, the results of Experiment 2 are consistent with those of Experiment 1, showing, in three new conditions, the efficacy of kHz stimulation on cortical excitability. But the results fail to provide evidence of an additional boost from amplitude modulation. 

      We posted a second bioRxiv preprint in May, 2023 (Version 2) with the corrected results for Experiment 2, along with changes throughout the manuscript given the new analyses.

      Given the null results for the AM conditions, we decided to run a third experiment prior to submitting the work for publication. Here we used an alternative form of amplitude modulation (see Kasten et. al., NeuroImage 2018). In brief, we again observed a boost in cortical excitability in from non-modulated kTMP at 3.5 kHz, but no additional effect of amplitude modulation.  This work is included in the third bioRrxiv preprint (Version 3), the paper that was submitted and reviewed at eLife.

      (2) Statistical analysis.

      Reviewer 1 raised a concern with the statistical analyses performed on aggregate data across experiments.  We recognize that this is atypical and was certainly not part of an a priori plan. Here we describe our goal with the analyses and the thought process that led us to combine the data across the experiments.

      Our overarching aim is to examine the effect of corticospinal excitability of different kTMP waveforms (carrier frequency and amplitude modulated frequency) matched at the same estimated cortical E-field (2 V/m). Our core comparison was of the active conditions relative to a sham condition (E-field = 0.01 V/m). We included the non-modulated 3.5 kHz condition in Experiments 2 and 3 to provide a baseline from which we could assess whether amplitude modulation produced a measurable difference from that observed with non-modulated stimulation. Thus, this non-modulated condition as well as the sham condition was repeated in all three experiments. This provided an opportunity to examine the effect of kTMP with a relatively large sample, as well as assess how well the effects replicate, and resulted in the strategy we have taken in reporting the results. 

      As a first step, we present the data from the 3.5 kHz non-modulated and sham conditions (including the individual participant data) for all three experiments in   4. We used a linear mixed effect model to examine if there was an effect of Experiment (Exps 1, 2, 3) and observed no significant difference within each condition. Given this, we opted to pool the data for the sham and 3.5 kHz non-modulated conditions across the three experiments. Once data were pooled, we examined the effect of the carrier frequency and amplitude modulated frequency of the kTMP waveform. 

      (3) Carry-over effects

      As suggested by Reviewer 1, we will examine in the revision if there is a carry-over effect across sessions (for the most part, 2-day intervals between sessions). For this, we will compare MEP amplitude in baseline blocks (pre-kTMP) across the four experimental sessions.

      Reviewer 1 also commented that mixing the single- and paired-pulse protocols might have impacted the results. While our a priori focus was on the single-pulse results, we wanted to include multiple probes given the novelty of our stimulation method. Mixing single- and different paired-pulse protocols has been relatively common in the non-invasive brain stimulation literature (e.g., Nitsche 2005, Huang et al, 2005, López-Alonso 2014, Batsikadze et al 2013) and we are unaware of any reports suggested that mixed designs (single and paired) distort the picture compared to pure designs (single only).

      (4) Sensation and Blinding

      Reviewer 2 bought up concerns about the sham condition and blinding of kTMP stimulation. We do think that kTMP is nearly ideal for blinding. The amplifier does emit an audible tone (at least for individuals with normal hearing) when set to an intensity to produce a 2 V/m E-field. For this reason, the participants and the experimenter wore ear plugs. Moreover, we played a 3.5 kHz tone in all conditions, including the sham condition, which effectively masked the amplifier sound. We measured the participant’s subjective rating of annoyance, pain, and muscle twitches after each kTMP session (active and sham). Using a linear mixed effect model, we found no difference between active and sham for each of these ratings suggesting that sensation was similar for active and sham (Fig 8). This matches our experience that kHz stimulation in the range used here has no perceptible sensation induced by the coil. To blind the experimenters (and participants) we used a coding system in which the experimenter typed in a number that had been randomly paired to a stimulation condition that varied across participants in a manner unknown to the experimenter.

      Reviewer 1 asked why we did not explicitly ask participants if they thought they were in an active or sham condition. This would certainly be a useful question. However, we did not want to alert them of the presence of a sham condition, preferring to simply describe the study as one testing a new method of non-invasive brain stimulation. Thus, we opted to focus on their subjective ratings of annoyance, pain, and finger twitches after kTMP stimulation for each experimental session.

      Response to Recommendations for the Authors

      Reviewer #1: 

      Reviewer # 1 in the public review noted the possibility of carry-over effects and suggested that we compare the amplitude of the MEPS in the pre blocks across the four sessions.

      Although we did not anticipate carry-over effects lasting 2 or more days, we have now conducted an analysis in which we use a linear mixed effect model with a fixed factor of Session and a random factor of Participant. The results show that there is not an effect of session [χ2(3) = 4.51, p \= 0.211].

      Author response table 1.

      Detailed comments and some suggestions to maybe improve the writing and figures: 

      Abstract: 

      BioRxiv Version 1: "We replicated this effect in Experiment 2 and found that amplitude-modulation at 20 Hz produced an additional boost in cortical excitability. " 

      BioRxiv Version 2, 3 and current manuscript: "Although amplitude-modulated kTMP increased MEP amplitude compared to sham, no enhancement was found compared to non-modulated kTMP." 

      I am a little concerned about this history because the conclusions seem to have changed. It looks like the new data has a larger number of subjects, which could explain the divergence. Although it is generally not good practice to analyze the data at interim time points, without accounting for alpha spending. It appears that data analysis methods may have also changed, as some of the extreme points in version 1 seem to be no longer in the new manuscript (Figure 4 Sham Experiment 1). 

      In the public review above we explain in detail the different versions of the bioRxiv preprint and how the results changed from the first version to the current manuscript.

      Introduction: <br /> "Second, the E-fields for the two methods exist in orthogonal subspaces" Can you explain what this means? 

      Thank you for this suggestion, we have updated the paper (pg. 4, line 78-81) by adding two sentences to explain what we mean by orthogonal subspaces and describe the consequences of this with respect to the E-fields resulting from tES and TMS. Specifically, we now comment that even if the E-fields of tES and TMS are similar in focality, they may target different populations of neurons.  

      "In addition, the kTMP waveform can be amplitude modulated to potentially mimic E-fields at frequencies matching endogenous neural rhythms [15]." That may be so, but reference [15] makes the exact opposite point, namely, that kHz stimulation has little effect on neuronal firing until you get to very strong fields. The paper that makes that claim is by Nir Grossman, but in my view, it is flawed as responses are most likely due to peripheral nerve (axon) stimulation there given the excessive currents used in that study. The reference to Wang and Peterchev [17] is in agreement with that by showing that you need 2 orders of magnitude stronger fields to activate neurons. 

      The reviewers are correct that that Ref 15 (Esmaeilpour et al, 2021), as well as Wang et al, 2023 use much higher E-fields than we target in our present study. However, our point here is that, while we cannot use our approach to apply E-fields at endogenous frequencies, we can do amplitude modulation of the kHz carrier frequency at these lower frequencies. We cited Esmaeilpour et al., (2021) because they show that high frequency stimulation with amplitude-modulated waveforms resulted in dynamic modulation at the “beating” frequency. Given we are well in subthreshold space in this paper, and well below the E-field levels in Esmaeilpour et al (2021), the open question is whether amplitude modulation at this level will be able to perturb neural activity (e.g., increase power of endogenous oscillations at the targeted frequency). 

      To address this concern, we modified the sentence (pg.6, lines 120-121) to now read "In addition, the kTMP waveform can be amplitude modulated at frequencies matching endogenous neural rhythms." In this way, we are describing a general property of kTMP (as well as other methods that can use high frequency signals).

      I am not aware of any in-vitro study showing the effects of kHz stimulation at 2V/m. The review paper by Neudorfer et al is very good. But if I got it correctly in a quick read it is not clear that there is experimental evidence for subthreshold effects. They do talk about facilitation, but the two experimental papers cited there on the auditory nerve don't quantify field magnitudes. I would really love it if you could point me to a relevant empirical study showing the effects of kHz stimulation at 2 V/m. 

      Perhaps all this is a moot point as you are interested in lasting (plastic) effects on MEP. For this, you cite one study with 11 subjects showing the effects of kHz tACS on MEPs [20]. I guess that is a start. The reference [21] is only a safety study, so it is probably not a good reference for that. Reference [22] also seems out of place as it is a modeling study. The effects on depression of low-intensity magnetic stimulation in references [23-26] are intriguing. 

      We agree with the reviewer that Ref 20 (now Ref 18: Chaieb, Antal & Paulus; 2011) is the most relevant one to cite here since it provides empirical evidence for changes in neural excitability from kHz stimulation, and in fact, serves as the model for the current study. We have retained Refs 23-26 (now Ref 19-22: Rohan et al., 2014; Carlezon et al., 2005; Rohan et al., 2004 & Dublin et al., 2019) since they also do show kHz effects on mood and removed Refs 21 (Chaieb et al., 2014) and 22 (Wang et al., 2018) for the reasons cited by the Reviewer.

      Figure 1: "The gray dashed function depicts the dependence of scalp stimulation threshold upon frequency [14]." It's hard to tell from that reference what the exact shape is, but the frequency dependence is likely steeper than what is shown here, i.e. 2 mA at 10 Hz can be really quite unpleasant. 

      We have removed the gray dashed line given that this might be taken to suggest a discrete transition. We now just have a graded transition to reflect that the tolerance of tES is subjective. We start the shading at 2 mA for the lowest frequencies given that there is general agreement that 2 mA is well-tolerated and decrease the shading intensity as frequency increases. The general aim of the figure is not to make strong claims about the threshold of scalp discomfort for tES, but to show that kTMP can target much higher cortical E-fields within the tolerable range.

      Methods: <br /> Procedures: <br /> It does not seem like double-blinding has been directly assessed. 

      We did not assess double blinding by directly assessing whether the participant was in a sham or active condition. We did not want to alert the participants of the presence of a sham condition after the first session of the 4-session study, preferring to simply describe the study as a test of a new method of non-invasive brain stimulation. For this reason, we opted to focus on their subjective ratings of annoyance, pain, and finger twitches after kTMP stimulation for each experimental session. These ratings did not differ between active and sham kTMP, which suggests kTMP has good potential for double blinding.

      MEP data analysis: Taking the mean of log power is unusual, but I suppose the reference provided gives a good justification. Does this explain the deviation from the biorxiv v1 results? 

      We opted to perform a logarithmic transformation of MEP amplitudes to improve the normality and homoscedasticity of the MEP distribution. We cite three papers (Refs 50-52: Peterchev et al., 2013, Nielsen 1996a, & Nielsen 1996b) that have applied a similar approach in handling MEP data. We had not done the transformation in the first bioRxiv but opted to do so in the eLife submission based on further review of the literature. We note that the two analyses produce similar statistical outcomes once we removed the outlier discussed in the Public Review.

      "Interactions were tested by comparing a model in which the fixed effects were restricted to be additive against a second model that could have multiplicative and additive effects." Not sure what this means. Why not run a full model with interactions included and read off the stats from that single model for the various factors? Should one not avoid running multiple models as one would have to correct p-values for multiple comparisons for every new test? 

      We used the lme4 package in R to fit our linear mixed effect models (Ref 54: Bates, Mächler, Bolker & Walker, 2015). In this package they intentionally leave out p-values for individual models or factors because they note there is a lack of convergence in the field about how to calculate parameter estimates in complex situations for linear mixed effect models (e.g., unbalanced designs). They suggest model comparison using the likelihood-ratio test to obtain and report p-values, which is what we report in the current manuscript.

      We revised the text in the section Linear Mixed Effects Models to state that likelihood ratio tests were used to obtain p-values to remove any confusion.

      Procedures: <br /> kTPM: Nice that fields were measured. Would be nice to see the data that established the empirical constant k. 

      We have expanded our discussion of how we established k in the Methods section. We first derived k using the equation E0 \= kfcI based on previously published reports of the current (I) and frequency (fc) of the MagVenture Cool-B65 coil (now Refs 29-30: Deng, Lisanby & Peterchev, 2013; Drakaki, Mathiesen, Siebner, Madsen & Thielscher, 2022). We then verified this value using the triangular E-field probe to within 5% error.

      Figure 3, spectrum. The placement of the fm label on the left panel is confusing. It suggests that fm was at the edge of the spectrum shown, which would not be the best way to show that there is nothing there - obviously, there isn't, but the figure could be more didactic. 

      Thanks for pointing this out. We modified the figure, moving the ‘fm’ label to the center of the first panel. This change makes it clear that there is no peak at the amplitude modulated frequency.

      "a trio of TMS assays of cortical excitability" Can you clarify what this means? 

      Sorry for the confusion. The trio of TMS assays refers to the single pulse and two paired-pulse protocols (SICI - ICF). We edited the Procedure section to clarify this (pg 9, line 195-197).

      Figure 2A: it would be nice to indicate which TMS blocks were single pulse and which were the two paired-pulse protocols. It is hard to keep track of it all for the three different experiments. 

      We have now clarified in the text (see above) that all three probes were used in each block for Experiments 1 and 2, and only the single-pulse probe in Experiment 3. We have modified the legend for Figure 2 to also provide this information.

      Results: <br /> "Based on these results, we combined the data across the three experiments for these two conditions in subsequent analyses." This strikes me as inappropriate. Should not a single model have been used with a fixed effect of experiment and fixed effect of stimulation condition? 

      We recognize that pooling data across experiments may be atypical. Indeed, our initial plan was to simply analyze each experiment on its own (completely within-subject analysis). However, after completing the three experiments, we realized that since the sham and non-modulated 3.5 kHz conditions were included in each experiment, we had an opportunity to examine the effect of kTMP in a relatively large N study (for NIBS research). Before pooling the data, we wanted to make sure that the factor of experiment did not impact the results and our analysis showed there was no effect of experiment. Note that we did not include the factor of stimulation condition in this model because we did not want to do multiple comparisons of the same contrast (3.5 kHz compared to sham). By pooling the data before analysis of the stimulation conditions we could then focus on our two key independent variables: 1) kTMP carrier frequency and 2) kTMP amplitude modulated frequency, doing fewer significance tests to minimize multiple comparisons. The linear mixed effect (LME) model allows us to include a random effect of participant. In this way, we account for the fact that some comparisons are within subjects and some comparisons are between subjects.

      The reviewer is correct that after pooling the data, we could have continued to include the factor of experiment in the LME models. This factor could still account for variance even though it was not significant in the initial test. Given this, we have now reanalyzed the data including the fixed factor of experiment in all the comparisons that contain data from multiple experiments. This has led us to modify the text in the Methods section under Linear Mixed Effects Models and in the Results section under Repeated kTMP Conditions (3.5 kHz and Sham) across Experiments. In addition, the results of the LME models have been updated throughout the Results section. We note that the pattern of results was unchanged with this modification of our analyses.

      "Pairwise comparisons of each active condition to sham showed that an increase was observed following both 2 kHz ..." I suppose this is all for Experiment 1? It is a little confusing to go back and forth between combining experiments and then separate analyses per experiment without some guiding text, aside from being a bit messy from the statistical point of view. 

      We did not go back to performing separate analyses of the experiments after pooling the data. Once we ran the test to justify pooling the data, subsequent tests were done with the pooled data to evaluate the effects of carrier frequency and amplitude modulation.

      Figure 5 is confusing because the horizontal lines with ** on top seem to refer to the same set of sham subjects, but the subjects of Experiments 2 and 3 are different from Experiment 1, so in these pairwise comparisons there is a mix of between-subject and within subject-comparison going on here. Did I get that right? 

      Yes – that is correct. As noted above we pooled the data after showing that there was no effect of experiment. Thus, the data for the sham and 3.5 kHz non-modulated conditions are from three different experiments. There was some overlap of subjects in Experiments 1 and Experiment 2 (Experiment 3 was all new participants).  We used a linear mixed effect model so that we could account for this mixed design. Participant was always included as a random factor, which allows us to account for the fact that some comparisons are within, and some are between. Based on a previous comment, we now include Experiment as a fixed factor (see above) which provides a way to evaluate variance across the different experiments.

      "We next compared sham vs. active non-modulated kTMP and found that active kTMP produced a significant increase in corticospinal excitability [χ2(1) = 23.46 p < 0.001" Is this for the 3.5Hz condition? 

      No, that is for an omnibus comparison of non-modulated kTMP (including 2 kHz, 3.5 kHz and 5 kHz conditions) vs. sham. We have edited the paper to include the three conditions that are included as the active non-modulated kTMP conditions for clarity (pg. 22, line 463). Having observed a significant omnibus result, we continued with paired comparisons: “Pairwise comparisons of each active condition to sham showed that an increase was observed following both 2 kHz [χ2(1) = 6.90, p = 0.009; d = 0.49] and 3.5 kHz kTMP [χ2(1) = 37.75, p < 0.001; d = 0.70; Fig 5: Non-Modulated conditions]. The 5 kHz condition failed to reach significance [χ2(1) = 1.43, p = 0.232; d = 0.21].”

      Paired-Pulse Assays: There are a number of results here without pointing to a figure, and at one point there is a reference to Figure 6, which may be in error. It would help to point the reader to some visual corresponding the the stats. 

      Thank you. This was an error on line 542. It should have read Figure 7. We have added two other pointers to Figure 7 where we discuss the absence of an effect of kTMP on SICI.

      Reviewer #2 (Recommendations For The Authors):

      I would recommend a couple of changes to the background.

      "Orthogonal subspaces" line 78. This is a fairly formal term that has little relevance here, although the difference between scalar and vector potential-based fields is interesting to think about. If it stays, it should be mathematically supported, but it's easily rewritten to deliver the gist of it. 

      We have updated the paper by adding text that we hope will clarify what we mean by orthogonal subspaces (pg. 4, line 78-81). We note that we developed the math behind this statement in a previous paper (Ref # 10: Sheltraw et al., 2021). We have changed the location of the citation so that it directly follows these sentences and will provide a pointer to readers interested in the physics and math concerning orthogonal subspaces. 

      The statement that the scalp e-field for TES is greater than the e-field for TMS for similar cortical fields needs a little more clarification, since historically they have operated orders of magnitude apart, and it is easy to misread and trip over this statement (although it is factually true). Presenting a couple of numbers at cortical and scalp positions would help illustrate the point. That you are not considering applying TES at traditional TMS levels but rather TMS at TES values is what is initially easy to miss. 

      We appreciate the feedback and have updated this section to provide the reader with a better intuition of this point. We now specify that the scalp to cortical E-field ratio is approximately 18 times larger for tES compared to TMS and cite our previous paper which has much more detail about how this was calculated.

      A note that the figures show scalp sensation around 1.0 V/m while the text states 0.5; cortical depths are an important thing for the reader to keep in mind. 

      This comment, when considered in tandem with one of the comments of Reviewer 1 led us to revise Figure 1. We removed the dashed gray line which might be taken to suggest a strict cutoff in terms of tolerability (which we did not intend). We now use shading that fades away to make the point of continuity. We have extended this down to a cortical E-field of 0.5 V/m to correspond with the text.  

      This is a nicely done and carefully reported experiment and I look forward to seeing more. 

      Thank you for your kind note!

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary: As TDP-43 mislocalization is a hallmark of multiple neurodegenerative diseases, the authors seek to identify pathways that modulate TDP-43 levels. To do this, they use a FACS based genome wide CRISPR KD screen in a Halo tagged TDP-43 KI iPSC line. Their screen identifies a number of genetic modulators of TDP-43 expression including BORC which plays a role in lysosome transport.

      Strengths:

      Genome wide CRISPR based screen identifies a number of modulators of TDP-43 expression to generate hypotheses regarding RNA BP regulation and perhaps insights into disease.

      Weaknesses:

      It is unclear how altering TDP-43 levels may relate to disease where TDP-43 is not altered in expression but mislocalized. This is a solid cell biology study, but the relation to disease is not clear without providing evidence of BORC alterations in disease or manipulation of BORC reversing TDP-43 pathology in disease.

      We thank the reviewer for this comment and have updated the discussion to include more discussion of the role TDP-43 may play in the BORCS8-associated neurodegenerative disorder and how understanding how lysosome localization changing TDP-43 levels may help patients (lines 313-321).

      The mechanisms by which BORC and lysosome transport modulate TDP-43 expression are unclear. Presumably, this may be through altered degradation of TDP protein but this is not addressed.

      We agree with the reviewer that understanding the mechanism by which lysosome transport regulates TDP-43 levels is important and plan to examine this in future studies.

      Previous studies have demonstrated that TDP-43 levels can be modulated by altering lysosomal degradation so the identification of lysosomal pathways is not particularly novel.

      We thank the reviewer for this comment and have updated the text to make this clearer (lines 310-313). What hasn’t been observed previously is a change in lysosome localization affecting TDP-43 levels.

      It is unclear whether this finding is specific to TDP-43 levels or whether lysosome localization may more broadly impact proteostasis in particular of other RNA BPs linked to disease.

      We agree that this is an interesting question and something that should be investigated in future studies.

      Unclear whether BORC depletion alters lysosome function or simply localization.

      We thank the reviewer for this comment. Lysosome function related to protein turnover has not yet been examined in the literature after loss of BORC, but other aspects of lysosome function (including lipid metabolism and autophagic flux) have been shown to be disrupted upon loss of BORC. We have updated the discussion to address this (lines 292-296).

      Reviewer #2 (Public review):

      Summary: The authors employ a novel CRISPRi FACS screen and uncover the lysosomal transport complex BORC as a regulator of TDP-43 protein levels in iNeurons. They also find that BORC subunit knockouts impair lysosomal function, leading to slower protein turnover and implicating lysosomal activity in the regulation of TDP-43 levels. This is highly significant for the field given that a) other proteins could also be regulated in this way, b) understanding mechanisms that influence TDP-43 levels are significant given that its dysregulation is considered a major driver of several neurodegenerative diseases and c) the novelty of the proposed mechanism.

      Strengths:

      The novelty and information provided by the CRISPRi screen. The authors provide evidence indicating that BORC subunit knockouts impair lysosomal function, leading to slower protein turnover and implicating lysosomal activity in the regulation of TDP-43 levels and show a mechanistic link between lysosome mislocalization and TDP-43 dysregulation. The study highlights the importance of localized lysosome activity in axons and suggests that lysosomal dysfunction could drive TDP-43 pathologies associated with neurodegenerative diseases like FTD/ALS. Further, the methods and concepts will have an impact to the larger community as well. The work also sets up for further work to understand the somewhat paradoxical findings that even though the tagged TDP-43 protein is reduced in the screen, it does not alter cryptic exon splicing and there is a longer TDP-43 half-life with BORC KD.

      Weaknesses:

      While the data is very strong, the work requires some additional clarification.

      We thank the reviewer for these comments. Our detailed responses are included below in the “recommendations for authors” section.

      Reviewer #3 (Public review):

      Summary: In this work, Ryan et al. have performed a state-of-the-art full genome CRISP-based screen of iNeurons expressing a tagged version of TDP-43 in order to determine expression modifiers of this protein. Unexpectedly, using this approach the authors have uncovered a previously undescribed role of the BORC complex in affecting the levels of TDP-43 protein, but not mRNA expression. Taken together, these findings represent a very solid piece of work that will certainly be important for the field.

      Strengths:

      BORC is a novel TDP-43 expression modifier that has never been described before and it seemingly acts on regulating protein half life rather than transcriptome level. It has been long known that different labs have reported different half-lives for TDP-43 depending on the experimental system but no work has ever explained these discrepancies. Now, the work of Ryan et al. has for the time identified one of these factors which could account for these differences and play an important role in disease (although this is left to be determined in future studies).

      The genome wide CRISPR screening has demonstrated to yield novel results with high reproducibility and could eventually be used to search for expression modifiers of many other proteins involved in neurodegeneration or other diseases

      Weaknesses:

      The fact that TDP-43 mRNA does not change following BORCS6 KD is based on a single qRT- PCR that does not really cover all possibilities. For example, the mRNA total levels may not change but the polyA sites may have switched from the highly efficient pA1 to the less efficient and nuclear retained pA4. There are therefore a few other experiments that could have been performed to make this conclusion more compelling, maybe also performing RNAscope experiments to make sure that no change occurred in TDP-43 mRNA localisation in cells.

      We thank the reviewer for this comment. To address this point, we performed an analysis of polyA sites on our RNA sequencing data using REPAC and did not find a change in TDP-43 poly adenylation after BORC KD (Figure S6C). Other transcripts do have altered polyA sites, which are summarized in Figure S6C. We also performed HCR FISH for TARDBP mRNA in TDP-43 and BORC KD neurons. While we did not see a difference in RNA localization (see A below, numbers on brackets indicate p-values), we also were not able to detect a significant difference in total TARDBP mRNA levels upon TDP-43 KD (see B below, numbers on brackets indicate p-values), suggesting that some of the signal detected is non-specific to TARDBP. Because of this, we cannot conclusively say that BORC KD does not alter TARDBP mRNA localization using the available tools.

      Author response image 1.

      Even assuming that the mRNA does not change, no explanation for the change in TDP-43 protein half life has been proposed by the authors. This will presumably be addressed in future studies: for example, are mutants that lack different domains of TDP-43 equally affected in their half-lives by BORC KD?. Alternatively, can a mass-spec be attempted to see whether TDP-43 PTMs change following BORCS6 KD?

      We agree with the reviewer that these are important experiments that could be done in the future to further examine the mechanism by which loss of BORC alters TDP-43 half-life. We examined our proteomics data for differential phosphorylation and ubiquitination in NT vs BORC KD (Figure S7G-H). We were unable to detect PTMs on TDP-43, so we cannot say if they contribute to the change in TDP-43 half-life we observed.

      Reviewer #1 (Recommendations for the authors):

      Recommendations are detailed in the public review.

      Reviewer #2 (Recommendations for the authors):

      Ryan et al, employ a CRISPRi FACS screen and uncover the lysosomal transport complex BORC as a regulator of TDP-43 protein levels in iNeurons. The authors provide strong evidence indicating that BORC subunit knockouts impair lysosomal function, leading to slower protein turnover and implicating lysosomal activity in the regulation of TDP-43 levels. The authors then provided additional evidence of TDP-43 perturbations under lysosome-inhibiting drug conditions, underscoring a mechanistic link between lysosome mislocalization and TDP-43 dysregulation. The study highlights the importance of localized lysosome activity in axons and suggests that lysosomal dysfunction could drive TDP-43 pathologies associated with neurodegenerative diseases like FTD/ALS. The work is exciting and could be highly informative for the field.

      Concerns: There are some disconnects between the figures and the main text that can benefit from refining of the figures to align better with the main text. This does not require additional experiments other than perhaps Figure 4B. The impact of the work could be further discussed - it is an interesting disconnect between the fact BORC KD causes decreased IF of the Halo-tagged TDP-43 and lysosomal transport, however this reduction does not impact cryptic exon expression and also increases TDP-43 half life (and of other proteins). It is a very interesting and potentially informative part of the manuscript.

      We thank the reviewer for their detailed reading of our manuscript. We have endeavored to better match the figures and the text and have added more discussion of the impact of the work.

      Minor:

      (1) Suggestion: relating to the statement "Gene editing was efficient, with almost all selected clones correctly edited." - please provide values or %.

      We updated the text to remove the statement about the editing efficiency, instead saying we identified a clone that was correct for both sequence and karyotype (lines 83-85).

      (2) Relating to Figure 1A: Please provide clarification regarding tagging strategy with the halotag - e.g. why in front of exon2.

      We updated the figure legend to reflect that the start codon for TDP-43 is in exon 2, hence why we placed the HaloTag there.

      (3) Relating to Figure S1: A and B seems to have been swapped.

      We thank the reviewer for catching this mistake and have fixed the figure/text.

      (4) Relating to Figure 1B: figure legend does not indicate grayscale coloring of TDP-43 signal.

      We have added text in the figure legend to indicate that the Halo signal is shown in grayscale in the left-handed panels.

      (5) Relating to Figure 1C: can the authors clarify abbreviation for 'NT' in text and legend.

      We thank the reviewer for catching this and have indicated in the text and figure legend that NT refers to the non-targeting sgRNA that was used as a control for comparison to the TDP-43 KD sgRNA.

      (6) Relating to figure 2B and S2A: main text mentioned "Non-targeting Guides" however the figure does not show non-targeting guides to confirm.

      We thank the reviewer for catching this oversight, we updated the figure legends for these figures to indicate that the non-targeting (NT) guides are shown in gray on the rank plot. They cluster towards the middle, more horizontal portion of the graphs, showing that the more vertical sections of the graph are hits.

      (7) Suggestion: To make it easier on the reader, please provide overlap numbers for the following statement ..."In comparing the top GO terms associated with genes that increase or decrease Halo-TDP-43 levels in iNeurons, we found that almost none altered Halo-TDP-43 levels in iPSCs...".

      We thank the reviewer for this comment and have updated the text to indicate that only a single term is shared between the iPSC and iNeuron screens (lines 113-117).

      (8) Relating to the statement "We cloned single sgRNA plasmids for 59 genes that either increased or decreased Halo-TDP-43 in iNeurons but not in iPSCs." Can the authors provide a list of the 59 genes.

      We have included a new column in the supplemental table S1 indicating the result of the Halo microscopy validation to hopefully clarify which genes lead to a validated phenotype and which did not.

      (9) Relating to the statement "To rule out the possibility of neighboring gene or off-target effects of CRISPRi, as has been reported previously15, we examined the impact of BORC knockout (KO) on TDP-43 levels. Using the pLentiCRISPR system, which expresses the sgRNA of interest on the same plasmid as an active Cas916 we found that KO of BORCS7 using two different sgRNAs decreased TDP-43 levels by immunofluorescence (Figure 5C-D)." Please provide clarification as to why BORCS7 was chosen out of all the BORCS? From the data presentation thus far (Figure 4B & 5A), the reader might have anticipated testing BORCS6 for panels 5C-D.

      We thank the reviewer for this comment. We tried a couple of BORCs with the pLentiCRISPR system, but BORCS7 was the only one we were convinced we got functional knockout for based on lysosome localization. We think that either the guides were not ideal for the other BORC components we tried, or we did not get efficient gene editing across the population of cells tested. Because we had previously been working with knock down and CRISPRi guides are not the same as CRISPR knock out guides, we couldn’t use the existing guide sequences we know work well for BORC. Since loss of one BORC gene causes functional loss of the complex and restricts lysosomes to the soma, we did not feel it necessary to assay all 8 genes.

      (10) Relating to the statement "We treated Halo-TDP-43 neurons with various drugs that disrupt distinct processes in the lysosome pathway and asked if Halo-TDP-43 levels changed. Chloroquine (decreases lysosomal acidity), CTSBI (inhibits cathepsin B protease), ammonium chloride (NH4Cl, inhibits lysosome-phagosome fusion), and GPN (ruptures lysosomal membranes) all consistently decreased Halo-TDP-43 levels (Figure 6A-B, S5A-C)" Please provide interpretations for Figures S5A and S5C in text.

      We thank the reviewer for catching this oversight and have updated the text accordingly (lines 183-191).

      (11) Relating to figure 6E: please provide in legend what the different colors used correlate with (i.e. green/brown for BORCS7 KD)?

      We thank the reviewer for pointing this out. These colors were mistakenly left in the figure from a version looking to see if the observed effects were driven by a single replicate rather than a consistent change (each replicate has a slightly different color). As the colors are intermingled and not separated, we concluded the effect was not driven by a single replicate. The colors have been removed from the updated figure for simplicity.

      (12) Relating to the statement "We observed a similar trend for many proteins in the proteome (Figure 8B)" This statement can benefit from stating which trend the authors are referring to, it is currently unclear from the volcano plot shown for Figure 8B.

      We thank the reviewer for catching this and have updated the text accordingly.

      (13) Relating to the statement "For almost every gene, we observed an increase or decrease in Halo-TDP-43 levels without a change in Halo-TDP-43 localization or compartment specific level changes (Figure 4B)." Please provide: (1) the number of genes examined, (2) additional clarification of "localization" and "compartment specific" level changes, (3) some quantification and or additional supporting data of the imaging results. Figures 5A-B presents with the same concern relating to the comment "To determine if results from Halo-TDP-43 expression assays also applied to endogenous, untagged TDP-43 levels, we selected 22 genes that passed Halo validation and performed immunofluorescence microscopy for endogenous (untagged) TDP-43 (Figure 4D-G,5A-B, S4E-F)." please clarify further.

      We thank the reviewer for requesting this clarification. This statement refers to all 59 genes tested by Halo imaging; only one (MFN2) showed any hints of aggregation or changes in localization, every other gene (58) showed what appeared to be global changes in Halo-TDP-43 levels. We were initially intrigued by the MFN2 phenotype; however, we were unable to replicate it on endogenous TDP-43 and thus concluded that this might be an effect specific to the tagged protein. The representative images shown in Figure 4B are representative of the changes we observed across all 59 genes tested (if changes were present). From the 59 genes that we observed a change in Halo-TDP-43 levels by microscopy, we selected a smaller number to move forward to immunofluorescence for TDP-43. We picked a subset of genes from each of the different categories we had identified (mitochondria, m6A, ubiquitination, and some miscellaneous) to validate by immunofluorescence, thinking that genes in the same pathway would act similarly. We have added a column to the supplemental table S1 indicating which genes were tested by immunofluorescence and what the result was. We have also attempted to clarify the results section to make the above clearer.

      (14) Relating to the statement "To determine if results from Halo-TDP-43 expression assays also applied to endogenous, untagged TDP-43 levels, we selected 22 genes that passed Halo validation and performed immunofluorescence microscopy for endogenous (untagged) TDP-43 (Figure 4D-G, 5A-B, S4E-F). Of these, 18 (82%) gene knockdowns showed changes in endogenous TDP-43 levels (Figure 4D-G, S4E-F)." It is difficult to identify the 18 or 22 genes in the figures as described in the main text.

      We added columns to the supplemental table S1 listing the genes and the result in each assay.

      (15) Relating to figures S7A and 8A and the first part of the section "TDP-43, like the proteome, shows longer turnover time in BORC KD neurons" Can the authors provide clarification why the SunTag assay was performed with BORCS6 KD (S7A) but the follow-up experiment (8A) was performed with BORCS7 KD. Does BORCS6 KD show similar results as BORCS7 with the SunTag assay, and does TDP-43 protein abundance with BORCS7 KD show similar results as BORCS6?

      Because loss of any of the 8 BORC genes causes functional loss of BORC and lysosomes to be restricted to the peri-nuclear space, we used BORC KDs interchangeably. Additionally, all BORC KDs had similar effects on Halo-TDP-43 levels.

      Reviewer #3 (Recommendations for the authors):

      Adding more control experiments that TDP-43 mRNA is really not affected following BORC KD

      We performed a FISH experiment to examine TARDBP mRNA localization upon BORC KD but were unable to conclusively say whether BORC KD changes TARDBP mRNA localization (see above). We also analyzed our RNA sequencing experiment for alternative polyadenylation sites upon BORC KD. Results are in Figure S6C.

      Although this could be part of a future study, the authors should try and determine what are the changes to TDP-43 that drive a change in the half-life.

      We agree with the reviewer that these are important experiments and hope to figure this out in the future.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer 1

      Summary:

      In the present study, authors found the ternary complex formed by NCAN, TNC, and HA as an important factor facilitating the multipolar to bipolar transition in the intermediate zone (IZ) of the developing cortex. NCAM binds HA via the N-terminal Link modules, meanwhile, TNC cross-links NCAN through the CDL domain at the C-terminal. The expression and right localization of these three factors facilitate the multipolar-bipolar transition necessary for immature neurons to migrate radially. TNC and NCAM are also involved in neuronal morphology. The authors used a wide range of techniques to study the interaction between these three molecules in the developing cortex. In addition, single and double KO mice for NCAN and TNC were analyzed to decipher the role of these molecules in neuronal migration and morphology.

      Strengths:

      The study of the formation of the cerebral cortex is crucial to understanding the pathophysiology of many neurodevelopmental disorders associated with malformation of the cerebral cortex. In this study, the authors showed, for the first time, that the ternary complex formed by NCAN, TNC, and HA promotes neuronal migration. The results regarding the interaction between the three factors forming the ternary complex are convincing.

      We appreciate the reviewers' positive assessment of our research.

      Weaknesses:

      However, regarding the in vivo experiments, the authors should consider some points for the interpretation of the results:

      • The authors did not use the proper controls in their experiments. For embryonic analysis, such as cortical migration, neuronal morphology, and protein distribution (Fig. 6, 7, and 9), mutant mice should be compared with control littermates, since differences in the results could be due to differences in embryonic stages. For example, in Fig. 6 the dKO is more developed than the WT embryo.

      It was challenging to compare double knockout mice with control littermates. When crossing Ncan and Tcn double heterozygous mice, the probability of obtaining double knockout mice is 1/16. Given an average litter size of around 8, acquiring a substantial number of double knockout mice would necessitate an impractical number of breeding pairs. Consequently, we were constrained to use non-littermate control mice. To address potential differences in developmental stages, we analyzed 19-20 embryos obtained from five individuals in each group, demonstrating that the observed differences between the two groups are more substantial than the inherent variability within each group.

      • The authors claim that NCAM and TNC are involved in neuronal migration from experiments using single KO embryos. This is a strong statement considering the mild results, with no significant difference in the case of TNC KO embryos, and once again, using embryos from different litters.

      We agree with the reviewer's comment that a single deletion of TNC has a minimal impact on neuronal migration. We have revised the Results section to reflect the mild nature of the TNC KO phenotype more accurately.

      Page 8, line 225: "In NCAN KO mice, a significantly lower percentage of labeled cells resided in the upper layer (Bin2), and more cells remained in the lower layer (Bin5) than in WT mice (Figure 7a). In contrast, the impact of a single deletion of TNC on neuronal cell migration was minimal. Although TNC KO mice exhibited a tendency to have a higher proportion of labeled cells in the lower layer (Bin4) than in WT mice, this did not reach statistical significance (Figure 7a). The delay in neuronal migration observed in the single KO mice was milder when compared to that observed in DKO mice (Figure 6a-c), suggesting that simultaneous deletion of both NCAN and TNC is necessary for a more pronounced impairment in neuronal cell migration."

      • The measurement of immunofluorescence intensity is not the right method to compare the relative amount of protein between control and mutant embryos unless there is a right normalization.

      We agree that measuring immunofluorescence intensity alone is insufficient for comparing the relative amount of protein. In Figure 8, we have employed Western blotting to compare the protein levels, revealing an approximately 50% reduction in NCAN and TNC following hyaluronidase digestion. In Figures 7b and 7c, we demonstrated alterations in the localization patterns of TNC and NCAN in Ncan KO and Tnc KO mice; however, we did not mention their quantity.

      • Page 7, line 206. "No significant abnormalities were observed in the laminar structure in 4-week-old DKO mice". The authors should be more careful with this statement since they did not check the lamination of the adult cortex. I would recommend staining, control and mutant mice, with markers of different cortical populations, such as Cux1, Ctip2, Tbr1, to asses this point.

      In response to the suggestion, we have conducted additional experiments to provide a more detailed examination of the laminar structure in the cerebral cortex. The results have been incorporated into the revised manuscript as follows:

      Page 7, line 209: "To investigate the laminar organization of the postnatal cerebral cortex, we analyzed the distribution of NeuN-positive postmitotic neurons in DKO mice at 2 weeks of age. No notable abnormalities were observed in the laminar structure of DKO mice (Figure 6-figure supplement 3a, b). Additionally, the laminar distribution of Ctip2-positive deep layer neurons showed no significant differences between WT and DKO mice (Figure 6-figure supplement 3a, c)."

      • The authors do not explain how they measured the intensity of TNC around the transfected Turbo-RFP-positive neurons.

      We added the following description to the Materials and Methods:

      Page 18, line 608: "Images were captured in the IZ region containing Turbo-RFP-positive neurons using a 100X magnification objective lens with 3.0X optical zoom on an AX R confocal microscope (Nikon). A total of 10 optical sections were acquired with a step size of 190 nm. Z-projection views were generated, and the staining intensity of TNC around Turbo-RFP-positive neurons was measured in a 59 × 59 µm area using ImageJ FIJI."

      • The loading control of the western blots should be always included.

      In Figure 6-figure supplement 1, we have incorporated western blot data using a GAPDH antibody as a loading control. We have added an explanation in the figure legend of Figure 3c, stating that we analyzed the same samples as those used in Figure 1e.

      • For Fig. 3e, I think values are represented relative to E18 instead to P2.

      Thank you for pointing that out. As suggested, we have corrected the representation in Fig. 3e to be relative to E18 instead of P2.

      • I would recommend authors use the standard nomenclature for the embryonic stages. The detection of the vaginal plug is considered as E0.5 and therefore, half a day should be added to embryonic stages (E14.5...).

      We have revised our manuscript to designate the detection of the vaginal plug as E0.5, and subsequently, we have adjusted all embryonic stages by adding half a day, such as E14.5.

      • Fig 10K: I do not see the differences in the number of neurites in the graph.

      We have modified the presentation from a box-and-whisker plot to a bar graph to enhance the visibility of differences in the average number of neurites.

      • Line 37: Not all of the cerebral cortex is structured in 6 layers but the neocortex.

      We have changed 'cerebral cortex' to 'cerebral neocortex.'

      Reviewer 2

      Summary:

      ECM components are prominent constituents of the pericellular environment of CNS cells and form complex and dynamic interactomes in the pericellular spaces. Based on bioinformatic analysis, more than 300 genes have been attributed to the so-called matrisome, many of which are detectable in the CNS. Yet, not much is known about their functions while increasing evidence suggests important contributions to developmental processes, neural plasticity, and inhibition of regeneration in the CNS. In this respect, the present work offers new insights and adds interesting aspects to the facets of ECM contributions to neural development. This is even more relevant in view of the fact that neurocan has recently been identified as a potential risk gene for neuropsychiatric diseases. Because ECM components occur in the interstitial space and are linked in interactomes their study is very difficult. A strength of the manuscript is that the authors used several approaches to shed light on ECM function, including proteome studies, the generation of knockout mouse lines, and the analysis of in vivo labeled neural progenitors. This multi-perspective approach permitted to reveal hitherto unknown properties of the ECM and highlighted its importance for the overall organization of the CNS.

      Strengths:

      Systematic analysis of the ternary complex between neurons, TNC, and hyaluronic acid; establishment of KO mouse lines to study the function of the complex, use of in utero electroporation to investigate the impact on neuronal migration;

      We appreciate the reviewers' insightful comments.

      Weaknesses:

      The analysis is focused on neuronal progenitors, however, the potential impact of the molecules of interest, in particular, their removal on differentiation and /or survival of neural stem/progenitor cells is not addressed. The potential receptors involved are not considered. It also seems that rather the passage to the outer areas of the forming cortex is compromised, which is not the same as the migration process. The movement of the cells is not included in the analysis.

      In this study, we demonstrated that the ternary complex of NCAN, TNC, and HA is predominantly localized in the subplate/intermediate zone. This region lacks neural stem/progenitor cells but serves as the initiation site for the radial migration of postmitotic neurons. Consequently, our study focused on the role of the ternary complex in neuronal migration and polarity formation. We acknowledge that we did not investigate in-depth the potential effects of ECM perturbation on the differentiation and survival of neural stem/progenitor cells. However, as highlighted by the reviewer, it is important to explore the effects on neural stem/progenitor cells. To address this concern, we analyzed Pax6-positive radial glial cells and Tbr2-positive intermediate progenitor cells in the ventricular zone of wild-type and Ncan/Tnc double knockout (DKO) mice. Immunohistochemical analysis revealed no significant differences between WT and DKO mice (Figure 6-figure supplement 4a). Furthermore, the morphology of nestin-positive radial fibers exhibited no distinguishable variations between WT and DKO mice (Figure 6-figure supplement 4b, c).

      (1) In the description of the culture of cortical neurons the authors mentioned the use of 5% horse serum as a medium constituent. HS is a potent stimulus for astrocyte differentiation and astrocytes in vitro release neurocan. Therefore, the detection of neurocan in the supernatant of the cultures as shown in Figure 1h might as well reflect release by cultivated astrocytes.

      As pointed out by the reviewer, Figure 1h did not conclusively demonstrate that neurons are the sole source of NCAN production. Indeed, in situ hybridization analysis revealed the widespread distribution of Ncan mRNA throughout the cerebral cortex (Figure 2a). This result suggests that the production of NCAN involves not only neurons but also other cell populations, including radial glial cells and astrocytes. While we acknowledge the potential contribution of other cell types to NCAN production, Ncan expression by neurons during radial migration is a crucial aspect of our findings (Figure 1i, j). We have revised the manuscript as follows:

      Page 5, line 111: "This result suggested the secretion of NCAN by developing neurons; however, we cannot rule out the involvement of coexisting glial cells in the culture system. To investigate the expression of Ncan mRNA during radial migration in vivo, we labeled radial glial cells in the VZ with GFP through in utero electroporation at E14.5 (Figure 1i, Figure 1-figure supplement 1)."

      (2) It is known that neurocan in vivo is expressed by neurons, but may be upregulated in astrocytes after lesion, or in vitro, where the cells become reactive.

      We have incorporated the following description into the discussion:

      Page 11, line 359: "Previous studies have reported an upregulation of NCAN and TNC in reactive astrocytes, indicating the potential formation of the ternary complex of NCAN, TNC, and HA in the adult brain in response to injury (Deller et al., 1997; Haas et al., 1999)."

      (3) Do NCAN KO neurons show an increase in neurite growth on the TNC substrates? The response on POL was changed (Fig. 10h-k), but the ECM substrates were not tested with the KO neurons.

      The impact of ECM substrates on NCAN KO neurons has not been investigated, and this remains an avenue for further exploration in our ongoing research. Future studies aim to elucidate the NCAN-TNC connection by identifying TNC cell surface receptors and unraveling the subsequent intracellular signaling pathways.

      (4) Do the authors have an explanation for why the ternary complex is concentrated in the SP/IZ zone?

      In the mature brain, hyaluronan acts as a scaffold that facilitates the accumulation of ECM components, including proteoglycans and tenascins around neurons. Therefore, it is conceivable that the ECM components bind to hyaluronan in the embryonic brain, resulting in its accumulation in the subplate/intermediate zone. In support of this hypothesis, enzymatic digestion of hyaluronan in the subplate/intermediate zone led to the disappearance of TNC and NCAN accumulation (Figure 8a-c). This result may account for the disparity observed, where Tnc mRNA is expressed in the ventricular zone while the TNC protein localizes to the subplate/intermediate zone.

      (5) Are hyaluronic acid synthesizing complexes (HAS) concentrated in the SP/IZ?

      According to the reviewer's comment, we have investigated the localization of Has2 and Has3 mRNA using in situ hybridization. However, due to the relatively low expression levels of these enzymes, we encountered challenges in obtaining clear signals (Author response image 1). Further research is needed to understand the mechanisms behind the localization of hyaluronan in the intermediate zone.

      Author response image 1.

      In situ hybridization analysis of Has2 and 3 mRNA on the E16.5 cerebral cortex. Upper images show results of in situ hybridization using antisense against Has2 and 3. Lower images are in situ hybridization using sense probes as negative controls.

      (6) CSPGs as well as TNC are part of the neural stem/progenitors cell niche environment. Does the removal of either of the ECM compounds affect the proliferation, differentiation, and/or survival of NSPCs, or their progeny?

      )7) This question relates to the fact that the migration process itself is not visualized in the present study, rather its outcome - the quantitative distribution of labeled neurons in the different bins of the analysis. This could also derive from modified cell numbers.

      As pointed out by the reviewer, previous studies have shown the role of CSPGs and TNC as components of the neural stem/progenitor cell niche (see reviews by (Faissner et al., 2017; Faissner and Reinhard, 2015). However, as mentioned in Response #2, based on our analyses, we did not observe a reduction in neural stem/progenitor cells in NCAN/TNC double-knockout mice. While we cannot precisely explain this discrepancy, it is worth noting that many past studies evaluated the activities of the ECM molecules in in vitro systems such as neurospheres. The observed differences may stem from variations in experimental systems.

      (8) What is the role of the ECM in the SP/IZ area? Do the cells need the ECM to advance, the reduction would then leave the neuronal progenitors in the VZ area? This somehow contrasts with interpretations that the ECM acts as an obstacle for neurite growth or cell migration, or as a kind of barrier.

      The role of the ECM is multifaceted, with certain ECM molecules known to inhibit neurite outgrowth while others facilitate it. Additionally, the effects of ECM can vary depending on the cell type. It is established that after migrating neurons adhere to radial fibers, they utilize these fibers as a scaffold to migrate toward the cortical surface. However, in the subplate/intermediate zone, migrating neurons have not yet adhered to radial fibers. This study provides evidence that multipolar neurons undergo morphological changes into bipolar cells with the assistance of the NCAN, TNC, and HA complex. Subsequently, this facilitates their movement along radial fibers.

      (9) A direct visualization of the movement of neural progenitors in the tissue as has been for example performed by the Kriegstein laboratory might help resolve some of these issues.

      As suggested by the reviewer, utilizing live imaging techniques to directly observe the movement of neural progenitors within the tissue is indeed a powerful tool. We recognize the significance of addressing these points in future research.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Zhang et al., investigated the relationship between monocular and binocular responses of V1 superficial-layer neurons using two-photon calcium imaging. They found a strong relationship in their data: neurons that exhibited a greater preference for one eye or the other (high ocular dominance) were more likely to be suppressed under binocular stimulation, whereas neurons that are more equivalently driven by each other (low ocular dominance) were more likely to be enhanced by binocular stimulation. This result chiefly demonstrates the relationship between ocular dominance and binocular responses in V1, corroborating what has been shown previously using electrophysiological techniques but now with greater spatial resolution (albeit less temporal resolution). The binocular responses were well-fitted by a model that institutes divisive normalization between the eyes that accounts for both the suppression and enhancement phenomena observed in the subpopulation of binocular neurons. In so doing, the authors reify the importance of incorporating ocular dominance in computational models of binocular combination.

      The conclusions of this paper are mostly well supported by the data, but there are some limitations of the methodology that need to be clarified, and an expansion of how the results relate to previous work would better contextualize these important findings in the literature.

      Strengths:

      The two-photon imaging technique used to resolve the activity of individual neurons within intact brain tissue grants a host of advantages. Foremost, two-photon imaging confers considerably high spatial resolution. As a result, the authors were able to sample and analyze the activity from thousands of verified superficial-layer V1 neurons. The animal model used, awake macaques, is also highly relevant for the study of binocular combination. Macaques, like humans, are binocular animals, meaning they have forward-facing eyes that confer overlapping visual fields. Importantly, macaque V1 is organized into cortical columns that process specific visual features from the separate eyes just like in humans. In combination with a powerful imaging technique, this allowed the authors to evaluate the monocular and binocular response profiles of V1 neurons that are situated within neighboring ocular dominance columns, a novel feat. To this aim, the approach was well-executed and should instill further confidence in the notion that V1 neurons combine monocular information in a manner that is dependent on the strength of their ocular dominance.

      Weaknesses:

      While two-photon imaging provides excellent spatial resolution, its temporal resolution is often lower compared to some other techniques, such as electrophysiology. This limits the ability to study the fast dynamics of neuronal activity, a well-understood trade-off of the method. The issue is more so that the authors draw comparisons to electrophysiological studies without explicit appreciation of the temporal difference between these techniques. In a similar vein, two-photon imaging is limited spatially in terms of cortical depth, preferentially sampling from neurons in layers 2/3. This limitation does not invalidate any of the interpretations but should be considered by readers, especially when making comparisons to previous electrophysiological reports using microelectrode linear arrays that sample from all cortical layers. Indeed, it is likely that a complete picture of early cortical binocular processing will require high spatial resolution (i.e., sampling from neurons in neighboring ocular dominance columns, from pia mater to white matter) at the biophysically relevant timescales (1ms resolution, capturing response dynamics over the full duration of the stimulus presentation, including the transient onset and steady-state periods).

      To address the same concern from all three reviewers, we discussed the technical limitations of two photon calcium imaging at the end of Discussion, including limited imaging depth, low temporal resolution, and nonlinearity. The relevant texts are copied here:

      (Ln 304) “Limitations of the current study

      Although capable of sampling a large number of neurons at cellular resolution and with low sampling bias, two-photon calcium imaging has its known limitations that may better make it a complementary research tool to electrophysiological recordings.

      For example, two-photon imaging can only sample neurons from superficial-layers, while binocular neurons also exist in deeper layers, and even neurons in the input layer are affected by feedback from downstream binocular neurons to exhibit binocular response properties (Dougherty, Cox, Westerberg, & Maier, 2019). Furthermore, calcium signals are relatively slow and cannot reveal the fast dynamics of neuronal responses. Due to these spatial and temporal limitations, a more complete picture of the neuronal mechanisms underlying binocular combination of monocular responses may come from studies using both technologies.

      In addition, calcium signals may exaggerate the nonlinear properties of neurons. Although calcium signals indicated by GCaMP5, our favored choice of calcium indicator, displays a linear relationship to neuronal spike rates within a range of 10-150 Hz (Li, Liu, Jiang, Lee, & Tang, 2017), weak and strong signals out of this range are more nonlinear, and may appear poorer and stronger, respectively, than electrode-recorded effects. Consequently, the differences in population responses between monocular and binocular stimulations revealed by this study might be less pronounced.”

      (Recommendations For The Authors):

      Overall, my main suggestion for the authors to improve the paper is to revise some of the interpretations of their results in relation to previous research. The purpose of the present study was to illustrate a more complete picture of the binocular combination of monocular responses by taking into consideration the ocular dominance of V1 cells (lines 34-36). A study published earlier this year had an identical purpose (Mitchell et al., Current Biology, 2023) and arrived at a highly similar conclusion (and also applied divisive normalization to fit their data). I would ask that this paper be mentioned in the introduction and discussed.

      The Mitchell et al 2023 paper is added to the Introduction and Discussion:

      (Ln 50) “In addition (to the Dougherty et al 2019 paper from the same group), Mitchell, Carlson, Westerberg, Cox, and Maier (2023) reported that binocular combination of monocular stimuli with different contrasts is also affected by neurons’ eye preference.”

      (Ln 286) “The critical roles of ocular dominance have been largely overlooked by extant binocular vision models to our knowledge, except that Anderson and Movshon (1989) demonstrated that a model consisting of multiple ocular dominance channels can better explain their psychophysical adaptation data, and that Mitchell et al. (2023) revealed that binocular combination of different contrasts presented to different eyes are affected by neurons’ ocularity preference.”

      Nevertheless, the results of the present study are very valuable. They add substantial spatial resolution and sophisticated relational analysis of monocular and binocular responses that Mitchell et al., 2023 did not include. Therefore, my suggestion is to emphasize the advantages of two-photon imaging in the introduction, focusing on the ability to image neurons in neighboring ocular dominance columns. The rigorous modeling of the relationship between nearby neurons with a range of eye preferences, in tandem with the incredible yield of two-photon imaging, is what sets this paper apart from previous electrophysiological work.

      The finding that binocular responses were dependent on ocular dominance is largely consistent with previous electrophysiological results. However, there should be a paragraph in the discussion section that speaks to the limitations of comparing two-photon imaging data to electrophysiological data. Namely, there are two limitations:

      (1) These two techniques confer different temporal resolutions. It is conceivable that some of the electrophysiology relationships (for example, described by Dougherty et al., 2019) may be dependent on the temporal window over which the data was averaged, typically over 50-100ms around stimulus onset, or 100-250ms comprising the neurons' sustained response to the stimulus. This possible explanation of the difference in obtained results would be especially useful for the discussion paragraph starting at line 232. It would also be helpful to readers for there to be some mention of the advantage of having high temporal resolution (i.e., the benefits of electrophysiology) since (a) recent work has distinguished between sequential stages of binocular combination (Cox et al., 2019) and (b) modern models of V1 neurons emphasize recurrent feedback to explain V1 temporal dynamics (see Heeger et al., 2019; Rubin et al., 2015), which could prove to be relevant for combination of stimuli in the two eyes (Fleet et al., 1997).

      Our discussion regarding the technical limitations of 2-p calcium imaging has been listed earlier. Specific to the Dougherty et 2019 paper, we added the following discussion to address the issue of temporal resolution difference between two technologies.

      (Ln 266) “In addition, it is unclear whether the discrepancies are caused by different temporal resolutions of electrode recording and calcium imaging. The results of Dougherty et al. (2019) represent changes of neuronal spike activities over a period of approximately 50-200 ms after the stimulus onset, which may reflect the sustained neuronal responses to the stimulus and possible feedback signals. Calcium signals are much slower and indicative of the aggregated neuronal responses over a longer period (up to 1000 ms in the current study). They should have smeared, rather than exaggerated, the differences between monocular and binocular responses, although we cannot exclude the possibility that some neuronal response changes beyond 200 ms are responsible for the discrepancies.”

      (2) The sample of V1 neurons in this study is limited to cells in the most superficial layers of the cortex (layers 2/3). This limitation is, of course, well understood, but it should be mentioned at least in the context of studying the formative mechanisms of binocular combination in V1 (since we know that binocular neurons also exist in layers 5/6, and there is now substantial evidence that even layer 4 neurons are not as "monocular" as we previously thought (Dougherty et al., 2019)).

      See our discussion regarding the technical limitations of 2-p calcium imaging listed earlier.

      In short, I believe the paper would be improved by (1) adding the above citations in the appropriate places, (2) acknowledging in the introduction that this question has been investigated electrophysiologically but emphasizing the advantages of two-photon imaging, and (3) adding a paragraph to the discussion section that discusses the temporal and spatial limitations when using two-photon imaging to study binocular combination, particularly when comparing the results to electrophysiology.

      Reviewer #2 (Public Review):

      Summary:

      This study examines the pattern of responses produced by the combination of left-eye and right-eye signals in V1. For this, they used calcium imaging of neurons in V1 of awake, fixating monkeys. They take advantage of calcium imaging, which yields large populations of neurons in each field of view. With their data set, they observe how response magnitude relates to ocular dominance across the entire population. They analyze carefully how the relationship changed as the visual stimulus switched from contra-eye only, ipsi-eye only, and binocular. As expected, the contra-eye-dominated neurons responded strongly with a contra-eye-only stimulus. The ipsi-eye-dominated neurons responded strongly with an ipsi-eye-only stimulus. The surprise was responses to a binocular stimulus. The responses were similarly weak across the entire population, regardless of each neuron's ocular dominance. They conclude that this pattern of responses could be explained by interocular divisive normalization, followed by binocular summation.

      Strengths:

      A major strength of this work is that the model-fitting was done on a large population of simultaneously recorded neurons. This approach is an advancement over previous work, which did model-fitting on individual neurons. The fitted model in the manuscript represents the pattern observed across the large population in V1, and washes out any particular property of individual neurons. Given the large neuronal population from which the conclusion was drawn, the authors provide solid evidence supporting their conclusion. They also observed consistency across 5 fields of view.

      The experiments were designed and executed appropriately to test their hypothesis. Their data support their conclusion.

      Weaknesses:

      One weakness of their study is that calcium signals can exaggerate the nonlinear properties of neurons. Calcium imaging renders poor responses poorer and strong responses stronger, compared to single-unit recording. In particular, the dramatic change in the population response between monocular stimulation and binocular stimulation could actually be less pronounced when measured with single-unit recording methods. This means their choice of recording method could have accidentally exaggerated the evidence of their finding.

      We discussed the nonlinearity of calcium signals as part of the technical limitations of 2-p imaging calcium. The calcium indicator we use, GCaMP5, has a reasonable range of linear relationship with spike rates. But out of this range, the nonlinearity is indeed a concern.

      (Ln 314) “In addition, calcium signals may exaggerate the nonlinear properties of neurons. Although signals indicated by GCaMP5, our favored choice of calcium indicator, displays a linear relationship to neuronal spike rate within a range of 10-150 Hz (Li et al., 2017), weak and strong signals out of this range are more nonlinear, and may appear poorer and stronger, respectively, than electrode-recorded effects. Consequently, the changes in population responses between monocular and binocular stimulations revealed by this study might be less pronounced.”

      The implication of their finding is that strong ocular dominance is the result of release from interocular suppression by a monocular stimulus, rather than the lack of binocular combination as many traditional studies have assumed. This could significantly advance our understanding of the binocular combination circuitry of V1. The entire population of neurons could be part of a binocular combination circuitry present in V1.

      This is a very good insight. We added the following sentences to the end of the first paragraph of Discussion:

      (Ln 242) “These findings implicate that at least for neurons in superficial layers of V1, significant ocular dominance may result from a release of interocular suppression during monocular stimulation, an unusual viewing condition as our vision is typically binocular, rather than a lack of binocular combination of inputs from upstream monocular neurons.”

      (Recommendations For The Authors):

      Line 150: "To model interocular response suppression, responses from each eye in Eq. 2 were further normalized by an interocular suppression factor wib or wcb," I recommend the authors improve their explanation of how they arrived at Eq. 3 from Eq. 2. As it stands, my impression is that they have one model for the responses to monocular stimulation, and another model for the responses to binocular stimulation. What I think is missing is that both equations are derived from the same model. Monocular stimulation is a situation in which the stimulus in one eye's contrast is zero. Could the authors clarify whether this situation produces an interocular suppression of zero, and how that leads to Eq. 2?

      We rewrote the modeling part to show that Equations 1-3 are sequential steps of development for the same model. We also added a brief paragraph to discuss how Eq. 3 could lead to Eq. 2 under monocular viewing:

      (Ln 166) “Although not shown in Eq. 3, we also assumed that the nonlinear exponent b also depends on the contrast of the stimulus presented to the other eye (i.e., Sc or Si). Consequently, when Sc or Si = 0 under monocular stimulation, Rc or Ri = 0 (Eq. 1), and interocular suppression wib or wcb = 1, so Eq. 3 changes back to Eq. 2. It is only when Sc and Si are equal and close to 1, as in the current study, that interocular suppression and binocular combination would be in the current Eq. 3 format.”

      Line 225: "However, individually, compared to monocular responses, responses of monocular neurons more preferring the stimulated eye are actually suppressed, and only responses of binocular neurons are increased by binocular stimulation." This sentence is difficult to follow. I recommend the authors improve clarity by breaking up the sentence into several sentences. If I understand correctly, they summarize the pattern in the data that is indicative of interocular divisive normalization, i.e., their final conclusion.

      This sentence no longer exists in the Discussion.

      Line 426: "Third, for those showing significant orientation difference, the trial-based orientation responses of each neuron were fitted with a Gaussian model with a MATLAB nonlinear least squares function:" The choice of using a Gaussian function to fit orientation tuning was probably suboptimal. A Gaussian function provides an adequate fit only for neurons whose tuning is very sharp. The responses outside of the peak fall down to the baseline and the two ends meet. Otherwise, the two ends do not meet. An adequate fit would be achieved with a function of a circular variable, which wraps around 180 deg. I recommend using a Von Mises function for fitting orientation tuning.

      We agree with the reviewer that the Von Mises function is more accurate than Gaussian for fitting orientation tuning functions. Indeed we are using it to fit orientation tuning of V4 neurons, many of which have two peaks. For the current V1 data, the differences between Von Mises and Gaussian fittings are very small, as shown in the orientation functional maps from three macaques below. Because we also use the same Gaussian fitting of orientation tuning in several published and current under-review papers, we prefer to keep the Gaussian fitting results in the manuscript.

      Author response image 1.

      Reviewer #3 (Public Review):

      The authors have made simultaneous recordings of the responses of large numbers of neurons from the primary visual cortex using optical two-photon imaging of calcium signals from the superficial layers of the cortex. Recordings were made to compare the responses of the cortical neurons under normal binocular viewing of a flat screen with both eyes open and monocular viewing of the same screen with one eye's view blocked by a translucent filter. The screen displayed visual stimuli comprising small contrast patches of Gabor function distributions of luminance, a stimulus that is known to excite cortical neurons.

      This is an important data set, given the large numbers of neurons recorded. The authors present a simple model to explain the binocular combination of neuronal signals from the right and left eyes.

      The limitations of the paper as written are as follows. These points can be addressed with some additional analysis and rewriting of sections of the paper. No new experimental data need to be collected.

      (1) The authors should acknowledge the fact that these recordings arise from neurons in the superficial layers of the cortex. This limitation arises from the usual constraints on optical imaging in the macaque cortex. This means that the sample of neurons forming this data set is not fully representative of the population of binocular neurons within the visual cortex. This limitation is important in comparing the outcome of these experiments with the results from other studies of binocular combination, which have used single-electrode recording. Electrode recording will result in a sample of neurons that is drawn from many layers of the cortex, rather than just the superficial layers.

      See our discussion regarding the technical limitations of 2-p calcium imaging listed earlier.

      (2) Single-neuron recording of binocular neurons in the primary visual cortex has shown that these neurons often have some spontaneous activity. Assessment of this spontaneous level of firing is important for accurate model fitting [1]. The paper here should discuss the level of spontaneous neuronal firing and its potential significance.

      We have noticed previously that at non-optimal spatial frequencies, calcium responses to a moving Gabor grating are close to zero (Guan et al., Prog Neurobiology, 2021, Fig. 1B), but we cannot tell whether this is due to calcium response nonlinearity, or a close-to-zero level of spontaneous neuronal activity. Prince et al (2002) reported low spontaneous responses of V1 neurons with moving grating stimuli (e.g., about 3 spikes/sec in one exemplar neuron, their Fig. 1B), so this appears not a big effect. In our data fitting, we do have an orientation-unspecific component in the Gaussian model, which represents the neuronal response at a non-preferred orientation, but not necessarily the spontaneous activity.

      (3) The arrangements for visual stimulation and comparison of binocular and monocular responses mean that the stereoscopic disparity of the binocular stimuli is always at zero or close to zero. The animal's fixation point is in the centre of a single display that is viewed binocularly. The fixation point is, by definition, at zero disparity. The other points on the flat display are also at zero disparity or very close to zero because they lie in the same depth plane. There will be some small deviations from exactly zero because the geometry of the viewing arrangements results in the extremities of the display being at a slightly different distance than the centre. Therefore, the visual stimulation used to test the binocular condition is always at zero disparity, with a slight deviation from zero at the edges of the display, and never changes. [There is a detail that can be ignored. The experimenters tested neurons with visual stimulation at different real distances from the eyes, but this is not relevant here. Provided the animals accurately converged their eyes on the provided binocular fixation point, then the disparity of the visual stimuli will always be at or close to zero, regardless of viewing distance in these circumstances.] However, we already know from earlier work that neurons in the visual cortex exhibit a range of selectivity for binocular disparity. Some neurons have their peak response at non-zero disparities, representing binocular depths nearer than the fixation depth or beyond it. The response of other neurons is maximally suppressed by disparities at the depth of the fixation point (so-called Tuned Inhibitory [TI] neurons). The simple model and analysis presented in the paper for the summation of monocular responses to predict binocular responses will perform adequately for neurons that are tuned to zero disparity, so-called tuned excitatory neurons [TE], but is necessarily compromised when applied to neurons that have other, different tuning profiles. Specifically, when neurons are stimulated binocularly with a non-preferred disparity, the binocular response may be lower than the monocular response[2, 3]. This more realistic view of binocular responses needs to be considered by the authors and integrated into their modelling.

      We agree and include the following texts when discussing the future work:

      (Ln 298) “In addition, in our experiments, binocular stimuli were presented with zero disparity, which best triggered the responses of neurons with zero-disparity tuning. A more realistic model of binocular combination also requires the consideration of neurons with other disparity-tuning profiles.”

      (4) The data in the paper show some features that have been reported before but are not captured by the model. Notably for neurons with extreme values of ocular dominance, the binocular response is typically less than the larger of the two monocular responses. This is apparent in the row of plots in Figure 2D from individual animals and in the pooled data in Figure 2E. Responses of this type are characteristic of tuned inhibitory [TI] neurons[2]. It is not immediately clear why this feature of the data does not appear in the summary and analysis in Figure 3.

      This difference is indeed captured by the model, which can be more easily appreciated in Fig. 4A where monocular and binocular model simulations are plotted in the same panel. In the text, we also wrote: (Ln 195) “It is apparent that binocular responses cannot be explained by the sum of monocular responses, as binocular responses are substantially lower than the summed monocular responses for both monocular and binocular neurons. Nor can binocular responses be explained by the responses to the preferred eye, as binocular responses are also lower than those to the preferred eye (the larger of the two monocular responses) for monocular neurons.”

      The paper text states that the responses were "first normalized by the median of the binocular responses". This will certainly get rid of this characteristic of the data, but this step needs better justification, or an amendment to the main analysis is needed.

      The relevant sentence has been rewritten as “Monocular and binocular data of each FOV/depth, as well as the pooled data, were first normalized by the respective median of the binocular responses of all neurons in the same FOV/depth.” This normalization would render the overall binocular responses to be around unity, for the purpose of facilitating comparisons among all FOV/depth, but it would not affect the overall characteristic of the data.

      In the present form, the model and analysis do not appear to fit the data in Figure 2 as accurately as needed.

      Thanks for pointing out the problem, as data fitting for FOV C_270 and the pooled data were especially inaccurate. The issue has been mostly fixed when each datum was weighted by its standard deviation (please see the updated Fig. 3).

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      The manuscript by Rios et al. investigates the potential of GSK3 inhibition to reprogram human macrophages, exploring its therapeutic implications in conditions like severe COVID-19. The authors present convincing evidence that GSK3 inhibition shifts macrophage phenotypes from pro-inflammatory to anti-inflammatory states, thus highlighting the GSK3-MAFB axis as a potential therapeutic target. Using both GM-CSF- and M-CSF-dependent monocyte-derived macrophages as model systems, the study provides extensive transcriptional, phenotypic, and functional characterizations of these reprogrammed cells. The authors further extend their findings to human alveolar macrophages derived from patient samples, demonstrating the clinical relevance of GSK3 inhibition in macrophage biology.

      The experimental design is sound, leveraging techniques such as RNA-seq, flow cytometry, and bioenergetic profiling to generate a comprehensive dataset. The study's integration of multiple model systems and human samples strengthens its impact and relevance. The findings not only offer insights into macrophage plasticity but also propose novel therapeutic strategies for macrophage reprogramming in inflammatory diseases.

      Strengths:

      (1) Robust Experimental Design: The use of both in vitro and ex vivo models adds depth to the findings, making the conclusions applicable to both experimental and clinical settings.

      (2) Thorough Data Analysis: The extensive use of RNA-seq and gene set enrichment analysis (GSEA) provides a clear transcriptional signature of the reprogrammed macrophages.

      (3) Relevance to Severe COVID-19: The study's focus on macrophage reprogramming in the context of severe COVID-19 adds clinical significance, especially given the relevance of macrophage-driven inflammation in this disease.

      Weaknesses:

      There are no significant weaknesses in the study, though some minor points could be addressed for clarity and completeness, as outlined in the recommendations below.

      Many thanks for these comments. Please find below the response to the  specific recommendations.

      Recommendations for the authors:

      (1) In lines 263-266, the term "MoMac-VERSE" and its associated clusters are introduced without sufficient explanation. The authors should provide additional clarification on what these clusters represent and how they were derived.

      We have revised the text according to the reviewer´s suggestion and followed the original nomenclature of the MoMac-VERSE monocyte/macrophage clusters, also recognizing the procedure for their identification. The newly modified text now states: "Thus, analysis of the MoMac-VERSE (a resource that identified conserved monocyte and macrophage states derived from healthy and pathologic human tissues) (GSE178209) (2), indicated that GSK3 inhibition augments the expression of the gene sets that define MoMac-VERSE subsets identified as long-term resident macrophages [Cluster HES1_Mac (#2)] and tumor-associated macrophages with an M2-like signature [Clusters HES1_Mac (#2), TREM2_Mac (#3), C1Q<sup>hi</sup>_Mac (#16) and FTL_Mac (#17)] (2) (Figure 1H)."

      (2) In line 283, the reference labeled "2227" appears incorrect. It seems to be a formatting issue, and it might refer to references 22-27. Please verify and correct.

      All wrongly formatted references throughout the manuscript have been checked and corrected.

      (3) In line 353, the reference is incorrect. Please reviewe ensure that all references are properly cited throughout the manuscript.

      All wrongly formatted references throughout the manuscript have been checked and corrected.

      (4) In line 368, one of the patient samples shows a decreased IL-10 response after CHIR treatment. The authors should acknowledge the heterogeneity in the primary cell responses and adjust the conclusion accordingly to reflect this variability.

      We have modified the text following the reviewer´s comment, and acknowledge the heterogeneity in the production of IL-10 after GSK3 inhibition in the three analyzed samples. The modified text now states: "Consistent with these findings, CHIR-AMØ exhibited higher expression of MAFB (Figure 6F) whose increase correlated with an augmented secretion of Legumain, CCL2 and IL-10 (Figure 6G), although the latter was only seen in two samples, probably reflecting heterogeneity in primary cell responses."

      (5) Figure 7B: the UMAP shows 4 populations, but according to the visualization in the sup fig 3, there should be many more clusters. How do the authors explain this? Are these patient-specific clusters? Also, IMs can be separated into at least subpopulations. Can the authors plot also bona fide macrophage markers expressed by all subpopulations?

      To clarify this whole issue, and avoid misleading visualization of donor-specific clusters (see below), we have now replaced all UMAP plots shown in the previous version (in old Figure 7 and old Supplementary Figure 3) with new UMAP plots after running scVI reduction. In addition, we are including a new Supplementary Figure (new Supplementary Figure 3) that contains the information of the 21310 single-cell transcriptomes from human lungs reported in GSE128033 (ref. 47) after filtering and integration [nFeature > 200 and < 6000; Unique Molecular Identifiers (nCount) > 1000) and % of mitochondrial genes (< 15 %)]. Besides, old Supplementary Figure 3 has been replaced by the new Supplementary Figure 4, which includes the information of the single-cell transcriptomes from human lung macrophages selected from GSE128033 (ref. 47) based on their expression of the monocyte/macrophage-associated markers CD163, FABP4, LYVE1 or FCN1.

      Addresing the first question, UMAPs in old Figure 7B and old Supplementary Figure 3B had a different  number of clusters because old Figure 7B was derived from old Supplementary Figure 3B after grouping macrophage clusters according to the expression of previously defined markers and to limit the weight of donor-specific clusters. Specifically, the macrophage clusters from old Figure 7B were re-grouped according to the differential expression of:

      - FCN1 (including cluster 4, 7 and 12 from Figure 7B): Infiltrating monocytes.

      - FABP4 and TYMS-negative (including clusters 0, 2, 5 and 13 from Figure 7B), or MARCO and INHBA (cluster 9 from Figure 7B) or PPARG (cluster 11 from Figure 7B): Alveolar macrophages (AMØ).

      - TYMS, MKI67, TOP2A and NUSAP1 (cluster 15 from Figure 7B): Proliferating AMØ.

      - LYVE1 or RNASE1 or LGMN (including clusters 1, 3, 6, 8, 10 and 14 from Figure 7B): Interstitial Macrophages (IMØ).

      As the reviewer suggested, this type of UMAP plot yielded a large number of donor-specific clusters. To avoid such a misleading representation, we have now plotted UMAPs after running scVI reduction in every case. The new plots are now shown in new Figure 7A, new Figure 7B, new Supplementary Figure 3 (containing the information of the 21310 single-cell transcriptomes from GSE128033) and the novel Supplementary Figure 4 (with the information of the single-cell transcriptomes from human lung macrophages from GSE128033).

      Finally, to address the last issue, we have now plotted the expression of genes used for macrophage definition (CD163, FABP4, LYVE1, FCN1), as well as proliferation-associated genes (TYMS, MKI67, TOP2A, NUSAP1) and other bona fide macrophage marker genes (SPI1, FOLR2) in Supplementary Figure 4C.

      (6) statistics should be indicated in every figure legend and for every subfigure where applicable.

      We have now included the specific statistical procedure applied for each Figure and panel.

      Reviewer 2 (Public review):

      The study by Rios and colleagues provides the scientific community with a compelling exploration of macrophage plasticity and its potential as a therapeutic target. By focusing on the GSK3-MAFB axis, the authors present a strong case for macrophage reprogramming as a strategy to combat inflammatory and fibrotic diseases, including severe COVID-19. Using a robust and comprehensive methodology, in this study it is conducted a broad transcriptomic and functional analyses and offers valuable mechanistic insights while highlighting its clinical relevance

      Strengths:

      Well performed and analyzed

      Weaknesses:

      Additional analyses, including mechanistic studies, would increase the value of the study

      In an effort to address the comment of the reviewer, we have performed more detailed analysis of the kinetics and dose-response effects of GSK3 inhibition, which are now provided as new Supplementary Figure 3A.

      Regarding additional mechanistic studies, we decided to explore the relationship between inactive GSK3β and MAFB levels at the early stages of M-CSF- or GM-CSF-driven monocyte-to-macrophage differentiation. These experiments, performed in three independent monocyte preparations, indicated that, 48 hours along differentiation, M-CSF promoted a huge increase in both MAFB expression and a slight (albeit significant) rise in inactive GSK3β (P-Ser9-GSK3β) (compared to either untreated or GM-CSF-treated monocytes), further supporting the macrophage re-programming effect of GSK3. However, since the M-CSF-promoted increase in MAFB levels was much robust than the enhancement in inactive GSK3β, we hypothesize that proteasomal degradation of MAFB might be also distinct between M-CSF- (M-MØ) and GM-CSF-dependent (GM-MØ) monocyte-derived macrophages.

      Author response image 1.

      Total GSK3β, p-Ser9-GSK3β and MAFB levels in three preparations of freshly purified monocytes either unstimulated (-) or stimulated with M-CSF (10 ng/ml) or GM-CSF (1,000 U/ml) at different time points, as determined by Western blot (upper panel). Vinculin protein levels were determined as protein loading control. Mean ± SEM of the GSK3β/Vinculin, p-Ser9-GSK3β/Vinculin, and MAFB/Vinculin protein ratios from the three independent experiments are shown (lower panel) (paired Student’s t test: *, p<0.05; ****, p<0.001).

      Based on this finding, we then determined proteasome activity in fully differentiated M-CSF- and GM-CSF-dependent monocyte-derived macrophages. Use of the Immunoproteasome Activity Fluorometric Assay Kit II (UBPBio) in M-MØ and GM-MØ, either untreated or exposed to the proteasome inhibitor MG132, revealed that immune-proteasomal and proteasomal activity is significantly stronger in GM-MØ than in M-MØ,  as demonstrated in assays for chymotrypsin-like (ANW) and branched amino acid preferring (PAL) activity (immunoproteasome), and trypsin-like (KQL) activity (both proteasome and immunoproteasome). This result suggested that, indeed, immunoproteasomal activity might contribute to the differential expression of MAFB in M-MØ and GM-MØ.

      Author response image 2.

      Immunoproteasome activity in M-MØ and GM-MØ, either untreated or exposed to MG132, as determined using the Immunoproteasome Activity Fluorometric Assay Kit II (UBPBio) on the three indicated peptides (upper panel).  Mean ± SEM of three independent experiments are shown (paired Student’s t test: *, p<0.05) (lower panel).

      Consequently, we next set up experiments to assess whether the proteasome inhibitor MG132 was capable of enhancing the expression of MAFB-dependent genes in GM-MØ. Preliminary results of GM-MØ exposure to MG132 for 6 hours indicated an increase in the expression of MAFB protein and the MAFB-dependent genes LGMN and IL10. , as well as a reduction in the expression of the GM-MØ-specific gene CD1C.

      Author response image 3.

      A. Schematic representation of the exposure of MG132 to GM-MØ for 6 hours. B. MAFB protein levels in four independent preparations of GM-MØ exposed to either DMSO (DMSO-GM-MØ) or the proteasome inhibitor MG132 (MG132-GM-MØ) for 6 hours, as determined by Western blot (left panel). GAPDH protein levels were determined as protein loading control. Mean ± SEM of the MAFB/GAPDH protein ratios from the four independent experiments are shown (right panel) (paired Student’s t test: ***, p<0.005). C. Relative mRNA levels of the indicated genes in DMSO-GM-MØ and MG132-GM-MØ, as determined by RT-PCR on seven independent samples (paired Student’s t test: ***, p<0.005; ****, p<0.001).

      Unfortunately, this proteasome inhibitor (MG-132) caused a great reduction in cell viability after 6-8 hours. Since a similar decrease in cell viability was observed upon analysis with the ONX-0914 immunoproteasome inhibitor, we could not procede any further with this approach.

      Given the reviewer´s suggestion to include mechanistic insights to the manuscript, we are now providing these results (and the corresponding figures) only for the reviewer´s information and to make clear our attempts to comply with his/her request.

      Recommendations for the authors:

      The results are of interest, and only some minor issues need to be addressed to strengthen the conclusions of the study.

      We gratefully thank the reviewer for his/her comments. 

      (1) This study employs a single dose of 10 μM of the GSK3 inhibitor CHIR-99021 for 48 hours, which is reasonable for in vitro studies. However, further investigation into the effect of different doses and exposure times could provide additional insight into optimal dosing and durability of reprogramming effects. In addition, would an alternative GSK3 inhibitors have comparable effects?

      Following the reviewer suggestion, we have performed a kinetics and dose-response analysis of the effects of CHIR-99021, using MAFB protein levels as a readout. This experiments is now shown in new Supplementary Figure 1A, that replaces the old Supplementary Figure 1A panel where a shorter kinetics was presented. Results of this new experiment indicates a maximal effect of 10µM CHIR-99021, and that the effect of the inhibitor becomes maximal 24-48 hours after treatment. The text has been modified accordingly, and it now states: "Kinetics and dose-response analysis of the effects of CHIR-99021 on MAFB expression showed that maximal protein levels were achieved after a 24-48 hour exposure to 10µM CHIR-99021 (Supplementary Figure 1A), conditions that were used hereafter."

      Regarding the use of alternative GSK3 inhibitors, we had already provided that information in Supplementary Figure 1B, where the effects of SB-216763 (10 µM) or LiCl (10 mM) were evaluated. The huge reversal of the Tyr<sup>216</sup>/Ser<sup>9</sup> GSK3β phosphorylation ratio observed with CHIR-99021 was not seen with other GSK3 inhibitors, as indicated in the text. In any event, we believe that the relevance of this result with SB-216763 or LiCl is minimized by the results generated after siRNA-mediated GSK3 knockdown (shown in Figure 4), that completely reproduced the effects seen with CHIR-99021.

      (2) Why in the "reanalysis of single cell RNAseq data" section, the authors use Seurat v5 (R) but then change to python, and the other way around?

      As indicated in the documentation for Integrative Analysis in Seurat v5 (https://satijalab.org/seurat/articles/seurat5_integration), scVIIntegration requires reticulate package which allow us to run Python environment in R.

      (3) When the authors refer to the clusters enriched in MoMacVERSE, they use the labels of the clusters (for example #2 or #3). I would suggest using the annotations described in the original paper, to link it to the bibliography published through the labels established in the paper.

      We have revised the text according to the reviewer´s suggestion and followed the original nomenclature of the MoMac-VERSE monocyte/macrophage clusters, also recognizing the procedure for their identification. The newly modified text now states: "Thus, analysis of the MoMac-VERSE (a resource that identified conserved monocyte and macrophage states derived from healthy and pathologic human tissues) (GSE178209) (2), indicated that GSK3 inhibition augments the expression of the gene sets that define MoMac-VERSE subsets identified as long-term resident macrophages [Cluster HES1_Mac (#2)] and tumor-associated macrophages with an M2-like signature [Clusters HES1_Mac (#2), TREM2_Mac (#3), C1Q<sup>hi</sup>_Mac (#16) and FTL_Mac (#17)] (2) (Figure 1H)."

      (4) In line 309. Is there any significance on the "having a stronger effect"?

      We apologize for the misleading sentence. The phrase has been modified for better clarity, and the text now states: "Like CHIR-99021, silencing of both GSK3A and GSK3B augmented the expression of MAFB, with the simultaneous silencing of both GSK3A and GSK3B genes having a stronger effect (Figure 4B), and modulated the expression of 329 genes (Figure 4C,D)."

      (5) In line 337, "(22)(27)", are these references?

      All wrongly formatted references throughout the manuscript have been checked and corrected.

      (6) In the single-cell reanalysis, could you please provide integration Qc plots? It would be interesting to have it on the paper.

      To clarify this whole issue, and avoid misleading visualization of donor-specific clusters (see below), we have now replaced all UMAP plots shown in the previous version (in old Figure 7 and old Supplementary Figure 3) with new UMAP plots after running scVI reduction. In addition, we are including a new Supplementary Figure (new Supplementary Figure 3) that contains the information of the 21310 single-cell transcriptomes from human lungs reported in GSE128033 (ref. 47) after filtering and integration [nFeature > 200 and < 6000; Unique Molecular Identifiers (nCount) > 1000) and % of mitochondrial genes (< 15 %)]. Besides, old Supplementary Figure 3 has been replaced by the new Supplementary Figure 4, which includes the information of the single-cell transcriptomes from human lung macrophages selected from GSE128033 (ref. 47) based on their expression of the monocyte/macrophage-associated markers CD163, FABP4, LYVE1 or FCN1.

      As requested by the reviewer, we are now providing the Qc plots for the re-analysis in the new Supplementary Figures 3 and 4.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      In its current form, I would exclude the cryo-EM data from the manuscript. It does not add much and it is distracting from the excellent work that you did on the functional characterization of the variant. Alternatively, you could try to improve the resolution and see if you can get some more meaningful analysis out of the structures? I noticed that you only collected very small datasets. If you decide to pursue a higher resolution reconstruction, collecting more movies will give you a better chance to obtain a higher resolution.

      We express our gratitude to the reviewer for their invaluable feedback. While acknowledging that our structure currently maintains a low resolution, it still provides valuable insights into the splice's proximity to the N412 glycan density. This proximity and low-resolution map hindered the complete modeling of all the splice residues. Notably, this structure represents the first depiction of this particular splice variant. Consequently, it lays a foundation for subsequent studies in the field, and hence, we would want to keep it in the manuscript. As per reviewers’ suggestions, we have now included comparisons of our structure with the GluK1-2a receptor structure reported recently (Mayerson et al. 2022). We do plan to carry out higher-resolution structures in the future.

      I would probably also exclude the RNAseq analysis. I think that Figure 1 is fine, but the supplement 1 is not very successful in convincing me that the exon 9 is expressed mainly in early stages of brain development. In addition, the plot in Figure 1 indicates strong expression in the cerebellar cortex in 20s and 30s. If you decide to keep the data, I strongly encourage you to include more details on the analysis in the methods section.

      Thanks for this insightful comment. We have now modified this section extensively for better clarity. Indeed, the expression of this variant seems to be dynamic in different brain regions. This has now been specified in the revised manuscript. Figure 1 shows the expression of GRIK1 exon 9 gene in different regions of the human brain and donor age. The supplementary figure 1 is a zoom-in on one such region, the Cerebral cortex, where we observe the maximum expression of GRIK1. In this region, we also observed higher expression of exon 9 in the early stages of development. The scales of Figure 1 (0-4 RPKM) and supplemental Figure 1(06RPKM) are different due to more expression of other exons in supplemental Figure 1 (example, we observe 4RPKM expression in the shade of red, for figure 1, whereas similar values of 4RPKM are orange-yellow in the supplemental figure1). Using Supplemental Figure 1, we wanted to show the expression of exon 9 with respect to other exons during developmental stages that prove that GluK1-1 is highly expressed in the initial stages of life. more details on the analysis in the methods section has been added now.

      Additionally, there are a few minor issues in the data presentation:

      (1) in Fig. 2C there seems to be a mismatch between the green dose response plot and the GluK12a trace shown. The plot reports an EC50 of 187.7 uM, whereas in the sample trace 0.25 mM agonist activates only to ~20%.

      We have verified the data and statistics, confirming their consistency with the values reported in the manuscript. For Figure 2C, we present representative traces from a single cell. However, the EC50 value was calculated using Hill's equation based on averaged data from 5 cells.

      (2) The axis label is misprinted in Figure 3C

      Thanks. Corrected.

      (3) In Fig 5 supplement 1, panel B - the 3 last labels above the western blot lanes are off so it is difficult to see which sample corresponds to which lane.

      Thanks. We have corrected the figure.

      Reviewer #2 (Recommendations For The Authors):

      Overall I congratulate the authors of this study nicely done. It represents a large body of work.

      We thank the reviewer for his/her time and positive comments.

      I have several minor corrections that authors could consider for the revision of the manuscript P7. The desensitization rate of GluK1-2a was "delayed"... replace by "increased".

      Corrected.

      P9. Last line 0.37; P.. Add the P value.

      P value has been added as suggested.

      P11 authors indicate that K368/375//379/382H376-E mutant exhibit significant difference in desensitization properties in presence of NEto1, but on the 1st line of p11, they provide a P value above 0.05

      We thank the reviewer for pointing out this discrepancy and have fixed the same. We have discussed two mutants that show slower desensitization when compared to GluK1-1a co-expressed with Neto1. The K to E mutant has significance, while the des value for the K368/375//379/382H376-E mutant shows the same pattern, though not significantly. We have now modified the text to explain this more clearly.

      P19 the calculation of mean weighted tau TDes is not clear and should be better explained.

      Thanks. We have added more details in the Methods sections. We analyzed the current decays in response to 1–2 ms or 1 s applications by employing an exponential function or the sum of two exponential functions. This analysis allowed us to derive a weighted mean τdes using the formula [(τ1 × amplitude1) + (τ2 × amplitude2)]/[amplitude1 + amplitude2]. The tau values represent the time constants obtained from the exponential fits, while the amplitudes correspond to the estimated contributions of each component to the total peak current amplitude.

      [(A1 * t1) + (A2 * t2)] / (A1 + A2)

      It represents the calculation of a weighted mean, where A1 and A2 are the amplitudes, and t1 and t2 are the corresponding time constants. The formula calculates the overall mean time constant by taking into account the contribution of each component to the total amplitude.

      P19 the rate of recovery was obtained by fitting the one-phase association "with" exponential function. With is missing.

      We have corrected this error.  Thanks.

      P21 which method has been used for site directed mutagenesis

      Overlapping PCR was carried out for mutagenesis using the primers listed in Figure 4-table supplement 1. A ligation-free cloning approach (Zhang et al., 2017) was used. It has now been elaborated in the methodology section under Site directed mutagenesis.

      P21 and 22. Provide complete reference of reagent including species of antibodies.

      Thanks. We have added all the details in the methods section now. 

      Anti-His: Rabbit mAb #12698 (Cell Signaling Technology)

      Anti-Neto1: Rabbit #SAB3500679 (Sigma Aldrich)

      Anti-GFP: Mouse mAb G1546 (Sigma Aldrich)

      Anti-actin: Mouse mAb A3853 (Sigma Aldrich)

      P22 How much anti His antibody was used with 40microliter of protein A?

      We have used 2µg/ 40uL of Protein A slurry. This has now been added to the methodology.

      P23 Authors seem to have used a virus to express protein but the protocol is not given. For example what is P2 virus?

      We have now modified the manuscript to include details of baculovirus generation as per the protocol described in Goehring et al. 2014. We followed the same protocol wherein the 2nd generation of virus (P2) generated in insect (SF9) cells was used for infecting suspensionadapted HEK293-T cells for large-scale GluK1-1aEM protein expression.

      Reviewer #3 (Recommendations For The Authors):

      Major concerns:

      (1) The effect of the splice insert on Gluk1 regulation by Neto proteins is not fully clear. For example, experiments in Fig. 3G indicate that the desensitization time for Gluk1-1a + Neto2 is ~32ms. This value is half compared with data obtained from whole-cell experiments shown in Fig. 3A (~70ms). What is the reason for this discrepancy? If variability is observed between experiments, I wonder how valid are the comparisons made in panel A between GluK11a+Neto2 vs GluK1-2a+Neto2 groups. In the case of recovery analysis, authors found significant differences comparing both groups in the presence of Neto (Fig. 3B) but recovery times are not identic for Gluk1-1a vs Gluk1-2a (without Neto). Thus, I wonder if the fold change related to the control group (without Neto) is different. 

      We appreciate your detailed feedback, which has allowed us to clarify and reinforce the validity of our experimental findings. Different recording configurations (e.g., outside-out patch (Fig. 3G) versus whole-cell recordings (Fig. 3A) have been used. Whole-cell recordings average responses over a larger membrane area and also have slower solution exchange times compared to outside-out patch recordings. This may have contributed to the variability in desensitization times. However, similar trends in our whole cell vs. outside-out patch recordings were observed. Further, all the data except those presented in Figs 3G and 3H are from whole-cell recordings. We have performed multiple independent experiments and utilized rigorous statistical analyses to validate our comparisons. We report mean values with standard deviations or confidence intervals to provide a more accurate representation of the data.

      Neto1 significantly speeds up the recovery from desensitization for both variants, with a more pronounced effect on GluK1-1a (GluK1-1a +Neto1: 0.68 s) compared to GluK1-2a (GluK1-2a +Neto1: 1.15 s). The recovery times are not identical for the two variants, likely due to the presence of splice insert in GluK1-1a. Neto2, on the other hand, slows recovery for both variants without significant differential effects. However, the recovery rate from the desensitized state is faster for GluK1-1 compared to GluK1-2a alone, although insignificant (without Neto). 

      In the case of the glutamate concentration-response curve (Fig. 3C), EC50 values for Neto1 and Neto2 are relatively the same, but this approach on its own does not provide insights about the role of the splice insert. Previous experiments with the Gluk1 reveal differences between EC50 in the presence of Neto1 or 2 (Fisher, 2015), suggesting that the insert could regulate glutamate binding affinity, but still, this point is not directly demonstrated in this work.

      Thanks for this insightful comment. Indeed, we cannot conclude that splice residues directly affect glutamate sensitivity and have modified the text accordingly. The Fisher paper demonstrated that both Neto1 and Neto2 can influence glutamate sensitivity in GluK1-2a, with EC50 values of 124.6 ± 16.2 µM. Specifically, in the presence of Neto1 and Neto2, the EC50 values are 4.4 ± 0.4 µM and 13.7 ± 4.2 µM, respectively, indicating a noticeable effect though not substantially different for GluK1-2a coexpressed with either Neto1 and Neto2. Our observation for the GluK1-1a has been similar, with both Neto1 and Neto2 showing a leftward shift.

      (2) Similar to the previous point, a proper interpretation of mutant data is missing in the manuscript. From current data, it is difficult to visualize the role of the insert on Netodependent regulation, mainly, because of the fact that some mutations alone affect Gluk1-1 channel properties. The authors conclude their data by stating that "while the modulation of the receptor by Neto 1 is affected by mutations in splice insert, the modulation by Neto 2 remains largely unaffected" (Page 13). However, this statement is confusing since the co-expression of Gluk1-1a with Neto2 (Fig. 5) prevents the effect caused by mutation K368 alone (Fig. 4), indicating that modulations by Neto 2 are indeed potentially affected by the mutations. Please, clarify. Also, the effect of the K368/375/379/382H376-E mutant on Neto modulation (pink bar in Fig. 5) is impossible to interpret properly since the effect of the mutation alone is not shown in the manuscript.

      Thanks for seeking this important clarification. It is indeed true that splice residue mutations themselves affect the receptor functional properties in comparison to the wild-type receptors. For the sake of clarity, we have presented the effect of splice mutants on receptor properties separately from the effect of mutations on modulation by Neto proteins. Figure 4 demonstrates a comparison between wild-type and mutant receptors without the Neto proteins, showcasing different kinetic properties, while Figure 5 provides detailed information on the role of the insert in Neto-dependent regulation. 

      It’s true we could not record the effect of the K368/375/379/382H376-E mutant alone or when coexpressed with Neto 2 due to low peak amplitudes (mentioned in Table 1) that prevented reliable comparisons. However, robust currents were observed when the same mutant was coexpressed with Neto1, and hence comparisons were shown for this mutant with GluK1-1a wild-type + Neto1. 

      We have now modified the statement "while the modulation of the receptor by Neto 1 is affected by mutations in splice insert, the modulation by Neto 2 remains largely unaffected" and the last paragraph as follows:

      “Neto1 appears to have more pronounced effects on the mutant receptors compared to Neto2. Specifically, Neto1 significantly slowed desensitization for the K368-E mutant, accelerated recovery from desensitization for K368-E and K368/375/379/382H376-E mutants, increased agonist efficacy for K368-E and K375/379/382H376-E mutants, and altered rectification properties for K368E and K368/375/379/382H376-E mutants. In contrast, Neto2 had fewer significant effects on the mutant receptors, with the main impact being an increase in agonist efficacy for the K368-E mutant. Notably, Neto2 did not significantly affect desensitization, recovery from desensitization, or rectification properties of the mutant receptors when compared with wildtype GluK1-1a coexpressed with Neto2. These findings suggest that the splice residues in GluK1-1a differentially influence receptor modulation by Neto1 and Neto2, with Neto1 showing more extensive modulation of the mutant receptors' functional properties.”

      (3) An open question after reading this interesting work is if the proposed change in Neto regulation because of the splice insert is due to changes in Gluk1-Neto interactions or because the rearrangement after interaction with Neto proteins is different. Pull-down experiments (Fig 5 Sup.1) suggest that the splice insert and all the mutants tested do not prevent interaction with Neto proteins. I wonder if the authors could complement their data with a quantitative approach/analysis to demonstrate if the splice insert and the mutants affect Neto1/2 interactions (as expected for the rationale when creating the mutants).

      Thank you for this insightful suggestion. You raise an important point about distinguishing between changes in GluK1-Neto interactions and potential differences in receptor rearrangement after Neto binding. While our pull-down experiments suggest that the splice insert and mutants don't prevent Neto interactions (probably due to a larger interaction interface all along the receptor), a quantitative approach would indeed provide more nuanced information. In future studies, we do plan to perform a quantitative approach like Surface plasmon resonance to assess the changes in interactions upon mutations in the splice and/or Neto proteins in different states of the receptor. In addition, obtaining cryo-EM structures of GluK1 splice variants in complex with Neto1 and Neto2 would provide crucial insights into their interaction interfaces and any conformational changes induced by binding. 

      (4) Related to the Gluk1-1a structure, the authors state that the overall structure is similar to the one without the insert (page 14); however, this is not properly shown in the manuscript. Even if the overall architecture of the channel is the same, authors should make a proper/adequate comparison between both structures/domains to support their claims. Also, one should expect that the insertion of 15 amino acids would affect in some way the closing neighboring domains. The differential effect of the splice insert on glutamate and kainate EC50 values (Fig. 2 and Fig. 2 sup.1), suggests that the insert could introduce a sort of rearrangement in the binding domain. Thus, I wonder if a more elaborated analysis of the current structural data could reveal some structural insights that would explain the specific functional differences due to the splice insert. If the low resolution and the missing residues avoid making some comparisons and establish differences between sidechain orientations, still, a proper comparison between the domain backbones would be helpful to validate the author's statement at least. Also, I wonder if the changes could be resolved better in a closed state or APO structure, instead of the desensitized structure. Finally, are the structures obtained in DDM and nanodiscs similar?

      As per the reviewer’s suggestion, we have now added a new figure in the supplementary information, “Figure 6-figure supplement 9,” where we show a superimposition of GluK11aEM (detergent-solubilized or reconstituted in nanodiscs) and GluK1-2a (PDB:7LVT; silver) showing overall conservation of the structures in the desensitized state.

      As evident from the figure and rmsd values mentioned above, we do not observe significant movements at both ATD and LBD layers of GluK1-1a with respect to GluK1-2a. Also as can be observed the DDM solubilized and nanodisc reconstituted GluK1-1a (Panel A) are very similar with a rmsd of ~2.19Å across all the 2664 Calpha atom pairs. Due to low resolution of our structures, we have refrained from carrying out detailed structural comparisions.

      Our efforts to capture the closed state or apo state structures have failed due to either severe orientation bias (only top views) or increased heterogeneity. 

      (5) Methods section lacks relevant information for proper data interpretation as well as for replicating some experiments in the future. For example:

      A) The experimental design to determine the rectification index with a Ramp protocol is not clear: 1) Why the authors applied a ramp protocol if receptors desensitize along the time? Please clarify the protocol.

      Ramp protocols were used only for the wild-type receptors to compare their voltage-dependent behavior, as this was the first study to compare the two splice variants. All kainate receptors (GluK1-GluK5) desensitize over time. However, their rectification properties have been studied previously (both the absence and presence of Neto proteins) using Ramp protocols as they are faster than step protocols.  

      B) Are polyamines included in the solutions to perform the rectification assays?

      No, polyamines were not added to the intracellular solution, and the effect of the endogenous polyamine block was measured. This has now been specified in the results as well as the methods section.

      C) It is not clear if the experiments to calculate IK/IG ratios were performed in the same preparation (This is, the same cell was stimulated with glutamate and then kainate or vice versa).

      Indeed, the current responses for glutamate vs kainate are performed in the same cell (the same cell was stimulated by glutamate then kainate) so that the responses can be compared. It’s now been specified in the methods section.

      D) The experimental design for calculating recovery is not clear.

      We employed a double pulse protocol to measure receptor recovery. The protocol involved applying two consecutive pulses of agonist stimulation to the receptor. Initially, we applied a brief agonist pulse to activate the receptor, followed by a specific recovery period. After the recovery period, we administered a second agonist pulse to assess the receptor's recovery response. The receptor's recovery was determined by comparing the response amplitude of the second pulse to that of the first pulse, providing valuable insights into the receptor's recovery kinetics. Recovery rates were calculated with single exponential association fits in Prism. We have now modified the text for better clarity.

      E) Please indicate the species used for both functional and Cryo-EM (rat Gluk1 isoform?).

      Thanks for pointing this out. We have now specified in relevant methodology sections that Rattus norvegicus GluK1 and Neto proteins were used in this study.

      F) Please describe the nanodisc reconstitution protocol and how the nanodisc protein was purified, if appropriate.

      The MSP1E3D1 was purified by following the protocol given by the Sligar group in 2014 (doi: 10.1016/S0076-6879(09)64011-8). The nanodisc reconstitution protocol has now been elaborated in the revised manuscript.

      G) Site-directed mutagenesis methodology is incomplete. Please check.

      We have now elaborated this section to include more details.

      Minor concerns:

      (1) Authors state that splice residues are ~30A away from the TM domain. Currently, there is no friendly representation showing the localization of the splice in the structure, besides Fig.6E. The manuscript could benefit itself if authors include a better 3D representation or a scheme to highlight the position of the splice relative to critical domains.

      Thanks for pointing this out. The distance between TRP 381 CA (ATD) and LEU 636 CA (TM3) is 92.10 Å. We have changed the value in the text to ~92 Å.

      Author response image 1.

      (2) Authors mention that mutations in the insert to alanine show normal traffic to the plasma membrane but low current amplitude. Then, I wonder if single-channel conductance, mean open time or open probability is affected by the splice insert. Showing the effects of the insert on single-channel properties would strengthen the manuscript's quality.

      It is a good suggestion. However, as can be observed from our whole cell or outside out patch data, we obtained low peak amplitudes (<50 pA) for many of our receptor-only constructs and also suffered from high SEM for some recordings due to heterogeneity between cells of the same population. The suggestion to study the single channel properties of these receptors is considered for future experiments

      (3) It is unclear how the insert or the mutations specifically affect glutamate- or kainate-induced responses because authors analyze IK/IG ratios only. Maybe authors could consider including an analysis of the role of the insert on specific glutamate- or kainate-induced response to gain insights about ligand selectivity.

      All the values have been included in the excel for raw data. We have included the desensitization kinetics of mutant receptors in the presence of glutamate and compared it to the wild type GluK1-1a. Kainate induced responses were very heterogenous (high SEM for % desensitization) and hence have not been included in the main data.

      (4) Please be consistent with nomenclature along the manuscript to avoid confusion. For example, Are Gluk-1-1 and Gluk-1-1a referring to the same variant?

      GluK1-1 has been used in the abstract and the introduction where we introduce the N-terminal splice variant which either has the 15 residues (termed as GluK1-1) or lacks it (GluK1-2). The C- terminal splice variants for GluK1 are named as “a-d”, with “a” being the smallest Cterminal domain variant. Later in the manuscript, we have used only GluK1-1a terminology to represent the ATD splice variant with shortest C-terminal domain.

      The introduction and spatiotemporal results talk about the GluK1-1 receptors wherein the 

      (5) Legend figure 2: Repeated phrase should be removed. Please check.

      (6) Page 8: "This is similar to the effect observed in GluK1-2 receptors whereby the glutamate EC50 was shown to increase by Neto proteins [Neto1: 34-fold and Neto2: 7.5-fold (Palacios-Filardo et al., 2016) and Neto1/2: 10-30X (Fisher, 2015)]". It seems that values from Fisher's paper are backward. Please correct. 

      (7) Page 9. Second paragraph. Spelling mistake when referring to Fig. 3G.

      Thanks for pointing out the inadvertent errors; we have now corrected all of them.

      (8) Figure 3: The title in Y axis overlaps with the figure. Please check.

      We have corrected the error.

      (9) Page 10: "In addition, K375/379/382H376-E mutant also exhibited a slowdown in the recovery (K375/379/382H376-E: 4.83 {plus minus} 0.31 s P=0.2774) (Figure 4C; Table 1)." Statistical analysis indicates this is not correct. Please tone down this statement. For example: "...mutant also exhibited a trend to a slowdown in the recovery although differences do not reach statistical significance".

      Thanks. We have modified the statement as suggested.

      (10) Page 11: "and a reduction was observed for K375/379/382H376-E receptors (1.17 {plus minus} 0.28 P=0.3733) compared to wild-type (Figure 4D; Table 1)." Same issue as the previous minor comment.

      Thanks. We have modified the statement as suggested.

      (11) Page 11: "We observed that mutants K368-E and K368/375/379/382H376-E, desensitize significantly slower in the presence of Neto1" This statement is not true for K368/375/379/382H376-E mutant. Please correct.

      Thanks. We have modified the statement as suggested and specified the difference.

      (12) Legend Figure 4. Colored asterisks are not clear in the figure. Please check.

      Thanks. The reference to colored asterisks has been removed from the legend as they are not used.

      (13) Representative data shown in Fig 5 sup.2A do not match very well with the final quantification shown in Fig 5A. Please check. Also, the authors state in the result section (page 10) that data shown in Fig. 5A indicate that "GluK1-1a modulation by Neto 1 is influenced by the splice residues". This could be true only for residue K368; however, this is not so obvious since the two mutants containing K368E are inconsistent. Please check and clarify.

      Only representative traces are shown in Fig 5 sup 2 A. However, the quantification shown in Fig 5 A is from multiple cells. We have rechecked all the data and found it to be consistent. We have rewritten this section and modified it for better clarity.

      (14) Figure 6-supplement 2: Please incorporate missing values of MW standards in panel B.

      Thanks. We have modified the figure to include values for MW standards.

      (15) It is not clear the rationale for showing construct C552Y C557V C575S in Fig. 6 sup.3, panel A. This mutant is not mentioned in the manuscript.

      It has been mentioned in the methodology section under “Construct design for expression and purification of rat GluK1-1aEM”. It (C552Y C557V C576S) is one of the constructs used in optimizations that were checked for good protein yields. Based on FSEC protein profiles, we used C552Y, C557V (2X Cys mutant) as GluK1-1aEM, which is mentioned in the same section.

      (16) Fig. 6 sup.4 Not clear what does mean w.r.c. Please specify in the legend.

      With respect to (w. r. t.) has been specified in the manuscript.

      (17) Suggestion to improve data presentation in Fig. 4D and Fig. 3 sup.1B: For easier comparison of IK/IG ratios, representative traces for kainate and glutamate in the same group could be shown using the same Y-scale.

      It has been purposely shown with two different Y-scales due to the differences in peak amplitudes in the presence of glutamate or kainate. 

      (18) Fig. 3 sup.1A: Based on the figure legend, horizontal bars representing the application of glutamate are not consistent with time scale bars. Please, check. In the same figure, panel B, the representative traces shown for GluK-1a-Neto1 are not consistent with IK/IG ratio shown in Fig. 3D.

      Thanks, we have corrected the horizontal bars representing glutamate application. The representative traces shown for GluK-1a-Neto1 were rechecked and are consistent with the IK/IG ratio shown in Fig. 3D.

      (19) I wonder if the authors could discuss the lack of Neto1 effect on the wild type Gluk1-2a channel, as proposed previously.

      Sheng et al., 2015 showed that Neto1 enhances the desensitization onset of GluK1. However, it is unclear which GluK1 splice variants were used in that study. GluK1 has several splice variants, but in the present study, we specifically compared GluK1-1a and 2a. In our case, we did not observe the effect of Neto1 on wild-type GluK1-2a in either of the two techniques (whole cell and outside-out patch) we utilized for our study. However, as can be observed from our data, the GluK1-2a receptor alone shows a faster desensitization kinetics than the previous study (Copits et al., 2011). The differences could stem from different experimental conditions such as constructs, recording conditions used etc.

      Copits BA, Robbins JS, Frausto S, Swanson GT. Synaptic targeting and functional modulation of GluK1 kainate receptors by the auxiliary neuropilin and tolloid-like (NETO) proteins. Journal of Neuroscience. 2011 May 18;31(20):7334-40.

      Sheng N, Shi YS, Lomash RM, Roche KW, Nicoll RA. Neto auxiliary proteins control both the trafficking and biophysical properties of the kainate receptor GluK1. Elife. 2015 Dec 31;4:e11682. doi: 10.7554/eLife.11682. PMID: 26720915; PMCID: PMC4749551.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      The bacterial neurotransmitter:sodium symporter homoglogue LeuT is an well-established model system for understanding the fundamental basis for how human monoamine transporters, such as the dopamine and serotonin, couple ions with neurotransmitter uptake. Here the authors provide convincing data to show that the K+ catalyses the return step of the transport cycle in LeuT by binding to one of the two sodium sites. The paper is an important contribution, but it's still unclear exactly where K+ binds in LeuT, and how to incorporate K+ binding into a transport cycle mechanism.

      Public Reviews:

      Reviewer #1 (Public Review):

      This manuscript tackles an important question, namely how K+ affects substrate transport in the SLC6 family. K+ effects have previously been reported for DAT and SERT, but the prototypical SLC6fold transporter LeuT was not known to be sensitive to the K+ concentration. In this manuscript, the authors demonstrate convincingly that K+ inhibits Na+ binding, and Na+-dependent amino acid binding at high concentrations, and that K+ inside of vesicles containing LeuT increases the transport rate. However, outside K+ apparently had very little effect. Uptake data are supplemented with binding data, using the scintillation proximity assay, and transition metal FRET, allowing the observation of the distribution of distinct conformational states of the transporter.<br /> Overall, the data are of high quality. I was initially concerned about the use of solutions of very high ionic strength (the Km for K+ is in the 200 mM range), however, the authors performed good controls with lower ionic strength solutions, suggesting that the K+ effect is specific and not caused by artifacts from the high salt concentrations.

      The major issue I have with this manuscript is with the interpretation of the experimental data. Granted that the K+ effect seems to be complex. However, it seems counterintuitive that K+ competes with Na+ for the same binding site, while at the same time accelerating the transport rate. Even if K+ prevents rebinding of Na+ on the inside of vesicles, it would be expected that K+ then stabilizes this Na+-free conformation, resulting in a slowing of the transport rate. However, the opposite is found. I feel that it would be useful to perform some kinetic modeling of the transport cycle to identify a mechanism that would allow K+ to act as a competitive inhibitor of Na+ binding and rate-accelerator at the same time.

      This ties into the second point: It is not mentioned in the manuscript what the configuration of the vesicles is after LeuT reconstitution. Are they right-side out? Is LeuT distributed evenly in inside-out and right-side out orientation? Is the distribution known? If yes, how does it affect the interpretation of the uptake data with and without K+ gradient?

      Finally, mutations were only made to the Na1 cation binding site. These mutations have an effect mostly to be expected, if K+ would bind to this site. However, indirect effects of mutations can never be excluded, and the authors acknowledge this in the discussion section. It would be interesting to see the effect of K+ on a couple of mutants that are far away from Na+/substrate binding sites. This could be another piece of evidence to exclude indirect effects, if the K+ affinity is less affected.

      Reviewer #2(Public Review):

      To characterize the relationship between Na+ and K+ binding to LeuT, the effect of K+ on Na+- dependent [3 H] leucine binding was studied using a scintillation proximity assay. In the presence of K+ the apparent affinity for sodium was reduced but the maximal binding capacity for this ion was unchanged, consistent with a competitive mechanism of inhibition between Na+ and K+.

      To obtain a more direct readout of K+ binding to LeuT, tmFRET was used. This method relies on the distance-dependent quenching of a cysteine-conjugated fluorophore (FRET donor) by a transition metal (FRET acceptor). This method is a conformational readout for both ion- and ligand-binding. Along with the effect of K+ on Na+-dependent [3 H] leucine binding, the findings support the existence of a specific K+ binding site in LeuT and that K+ binding to this site induces an outward closed conformation.

      It was previously shown that in liposomes inlaid with LeuT by reconstitution, intra-vesicular K+ increases the concentrative capacity of [ 3 H] alanine. To obtain insights into the mechanistic basis of this phenomenon, purified LeuT was reconstituted into liposomes containing a variety of cations, including Na+ and K+ followed by measurements of [ 3 H] alanine uptake driven by a Na+ gradient.

      The ionic composition of the external medium was manipulated to determine if the stimulation of [3 H] alanine uptake by K+ was due to an outward directed potassium gradient serving as a driving force for sodium-dependent substrate transport by moving in the direction opposite to that of sodium and the substrate. Remarkably it was found that it is the intra-liposomal K+ per se that increases the transport rate of alanine and not a K+ gradient, suggesting that binding of K+ to the intra-cellular face of the transporter could prevent the rebinding of sodium and the substrate thereby reducing their efflux from the cell. These conclusions assume that the measured radioactive transport is via right-side-out liposomes rather than from their inverted counterparts (in case of a random orientation of the transporters in the proteoliposomes). Even though this assumption is likely to be correct, it should be tested.

      Since K+- and Na+-binding are competitive and K+ excludes substrate binding, the Authors chose to focus on the Na1 site where the carboxyl group of the substrate serves as one of the groups which coordinate the sodium ion. This was done by the introduction of conservative mutations of the amino acid residues forming the Na1 site. The potassium interaction in these mutants was monitored by sodium dependent radioactive leucine binding. Moreover, the effect the effect of Na+ with and without substrate as well as that of potassium on the conformational equilibria was measured by tmFRET measurements on the mutants introduced in the construct enabling the measurements. The results suggest that K+-binding to LeuT modulates substrate transport and that the K+ affinity and selectivity for LeuT is sensitive to mutations in the Na1 site, pointing toward the Na1 site as a candidate site for facilitating the interaction between K+ in some NSS members.

      The data presented in this manuscript are of very high quality. They are a detailed extension of results by the same group (Billesbolle et. al, Ref. 16 from the list) providing more detailed information on the importance of the Na1 site for potassium interaction. Clearly this begs for the identification of the binding site in a potassium bound LeuT structure in the future. Presumably LeuT was studied here because it appears that it is relatively easy to determine structures of many conformational states. Furthermore, convincing evidence showed that the stimulatory effect of K+ on transport is not because of energization of substrate accumulation but is rather due to the binding of this cation to a specific site.

      Reviewer #1 (Recommendations For The Authors):

      • Include a transport mechanism that can account for the K+ effects.

      We appreciate the opportunity to elaborate further regarding how we envision this complex mechanism. It is generally known that, within the LeuT-fold transporters, the return step is ratelimiting for the transport process. Our data suggests that K+ binds to the inward-facing apo form.

      Accordingly, we propose that the role of K+ binding is to facilitate LeuT to overcome the rate-limiting step. We propose the following mechanistic model: When Na+ and substrate is released to the intracellular environment the transporter must return to the outward-facing conformation. This can happen in (at least) two ways: 1) The transporter in its apo-form closes the inner gate and opens to the extracellular side, now ready to perform a new transport cycle. 2) The transporter rebinds Na+, which allows for the rebinding of substrate. It can now go in reverse (efflux) or it once again release its content. The transporter can naturally also only rebind Na+ and release it again to the cytosol.

      The purpose of K+ binding is to prevent Na+ rebinding and to promote a conformational state of the transporter, which does not allow Na+ binding. Even though Na+ has a higher affinity for the site, K+ is much more abundant.

      This model is supported by our previous experiment, showing that intravesicular K+ prevents [3H]alanine efflux while LeuT performs Na+-dependent alanine transport. Thus, the increase in Vmax could be due to a decreased efflux (exchange mode), or a facilitation of the rate-limiting step, or a combination of the two.

      Note that the model does not require that K+ is counter-transported. It just has to prevent Na+ rebinding. However, even though we failed to show K+ counter-transport, it does not mean that it does not happen. Further experiments must clarify this issue.

      To be more explicit about our proposed mechanistic model, we have expanded the last paragraph in the Discussion section. It now reads:

      “We propose that K+ binding either facilitates LeuT transition from inward- to outward-facing (the rate limiting step of the transport cycle), or solely prevents the rebinding and possible efflux of Na+ and substrate. It could also be a combination of both. Either way, intracellular K+ will lead to an increase in Vmax and concentrative capacity. Note that our previous experiment showed an increased [3H]alanine efflux when LeuT transports alanine in the absence of intra-vesicular K+16. Specifically, the mechanistic impact of K+ could be to catalyze LeuT away from the state that allows the rebinding of Na+ and substrate. This way, K+ binding would decrease the possible rebinding of intracellularly released Na+ and substrate, thereby rectifying the transport process and increase the concentrative capacity and Vmax (Figure 6). Our results suggest that K+ is not counter-transported but rather promotes LeuT to overcome an internal rate limiting energy barrier. However, further investigations must be performed before any conclusive statement can be made here.”

      • Describe the orientation of the transporter in the vesicles.

      When working with reconstituted NSS, the transport activity is determined by the Na+ gradient. This is also evident in the experiments where we dissipate the Na+ gradient. Here we find transport activity compatible to background. We can also see in the literature, that directionality is rarely determined for transport proteins in reconstituted systems. When that is said, it is difficult to know how the inside-out LeuT contribute to the transport process. Will they work in reverse and contribute to the accumulation of intravesicular [3H]alanine? If so, to what extent? They will likely not be affected by the intravesicular K+. Therefore, their possible contribution will ‘work against’ our results and decrease the apparent K+ effects reported herein. Taken together, unless the vast majority of LeuT molecules are inside-out, knowing the actual proportion will not, in our perspective, affect our interpretations and conclusions of the data.

      When that is said, we have also been curious about this issue and with the question raised by the reviewer, we performed the suggested experiment. We have inserted the results in Figure 3 – Figure supplement 1D. The figure shows that a fraction of the reconstituted LeuT are susceptible to thrombin cleavage of the accessible C-terminal. We have quantified the cleaved fraction to around 40% of the total (see Author response image 1 below). It is, however, a crude estimate since it is difficult to perform reliable dosimetry with fractions that close together. Thus, we are reluctant to add a quantitative measure in the article text.

      Author response image 1.

      We have inserted the following in the main text:

      “It is difficult to control the directionality of proteins when they are reconstituted into lipid vesicles. They will be inserted in both orientations. Outside-out and inside-out. In the case of LeuT it is the imposed Na+-gradient which is determines the directionality of transport. Uptake through the insideout transporters will probably also happen. Note that the inside-out LeuT will not have the K+ binding site exposed to the intra-vesicular environment. Accordingly, a propensity of transporters will likely not be influenced by the added K+ and will tend to mask the contribution of K+ to the transport mode from the right-side out LeuT. To investigate LeuT directionality in our reconstituted samples, we performed thrombin cleavage of accessible C-terminals on intact and perforated vesicles, respectively. The result suggests that the proportion of LeuT inserted as outside-out is larger than the proportion with an inside-out directionality (Figure 3 – Figure supplement 1D).”

      For the inserted Figure 3 – Figure supplement 1D, we have added the following legend:<br /> “(D) SDS-PAGE analysis of LeuT proteoliposomes following time-dependent thrombin digestion of accessible C-terminals (reducing the mass of LeuT by ~1.3 kDa). The reaction was terminated by the addition of PMSF at the specified time points. The lanes corresponding to the time-dependent proteolysis are flanked by lanes containing proteoliposomes without thrombin (left, 0 min) or digested in the presence of DDM (right, 180 min+DDM). Arrows indicate bands of full-length (top) and cleaved (bottom) LeuT.”

      • Check the effects of mutations away from the Na1 cation binding site.

      We have included the LeuT K398C in the study as a negative control for unspecific effects on Na+ and K+ binding. The mutant exhibit Na+ dependent [3H]leucine binding and K+-dependency similar to LeuT WT – see Table 2 and Table 2 - Figure Supplement 1G.

      As a minor point, the authors use the term "affinity" liberally. However, unless these are direct binding experiments, the term "apparent affinity" may be more appropriate, since Km values are affected by the transport cycle (in uptake), as well as binding of cations/substrate.

      We thank the reviewer for emphasizing this important point. We have revised the manuscript accordingly. We use ‘affinity’ when it has been determined under equilibrium conditions, either as a SPA binding experiment or based on tmFRET. We use the term ‘Km’ when the apparent affinity has been determined during non-equilibrium conditions such as during substrate transport.

      Reviewer #2 (Recommendations For The Authors):

      As mentioned in part 2, it is important to show the effect of internal potassium on transport in-sided liposomes. This could be done using the methodology developed by Tsai et. al. Biochemistry 51 (2012) 1557-1585.

      We appreciate this important point and have performed the suggested experiment. See reviewer 1 comment #2

      In the Abstract and throughout it is mentioned that K+ is not counter transported, yet on the bottom of p. 16 it is mentioned that this is possible.

      We have tried to be very cautious with any interpretation about whether K+ is only binding or whether it is also counter-transported. Either way, it must facilitate a transition towards a non-Na+ binding state. We tried to differentiate between the two possibilities by investigating if an outwarddirected K+ gradient alone could drive transport (Figure 3E). We do not observe any significant difference from background (no gradient). However, the gained information is rather weak: It is still possible that K+ is counter-transported, but the K+ gradient does not impose any driving force. Instead, it ensures a rectification of the Na+-dependent substrate transport. If so, this experiment would come up negative even if K+ is counter-transported.

      To be more explicit, we have changed the wording on page 16.

      Our results suggests that K+ is not counter-transported, but rather promote LeuT to overcome an internal rate limiting energy barrier. However, further investigations must be performed before any conclusive statement can be made here.

      Fig.2-Fig. Supplement 1: it is important to show that the effect of leucine is sodium-dependent by adding the control K+ and leucine.

      We thank the reviewer for suggesting this important control. We have added the experiment to Figure 2 – Figure supplement 1 as suggested. The effect is not different from K+ alone supporting the SPA-binding data that K+-binding does not promote substrate binding.

      Point for discussion: Whereas potassium is counter transported in SERT, there are conflicting interpretations on this in DAT (Ref. 15 from the list and Bhat et. al eLife (2021) 10:e67996). The situation in LeuT seems like the scenario described by Bhat et. al.

      We appreciate the suggestion for a proposed link between LeuT and hDAT. Although, as mentioned above, we find it early days to be too certain on this option. We have now mentioned the mechanistic similarity in the Discussion following our description of the proposed mechanistic model (see first request from reviewer #1):

      “If K+ is not counter-transported, LeuT might comply with the mechanism previously suggested for the human DAT31.”

      Fig. 5-Fig. Supplement 1: Why are no data on N27Q and N286Q given? If these mutants have no transport activity this should be stated. Moreover, alanine uptake by A22V is almost sodium independent and is also very fast, suggesting binding, not transport. Are the counts sensitive to ionophores like nigericin?

      We appreciate this important point. Indeed, the LeuT N27Q and N286Q are transport inactive. This information is now inserted in the main text when describing the conformational dynamics of N27QtmFRET and N286QtmFRET.

      We agree with the reviewer that the [3H]alanine uptake for A22V is not very conclusive. The vesicles with Na+ on both sides (open diamonds) do allow [3H]alanine binding. Vesicles with added gramicidin are similar in activity. The fast rate could indeed suggest a binding event. This we also do not rule out in the main text. However, the contribution in activity from LeuT A22V in vesicles with a Na+ gradient cannot be explained by a binding event alone. Then it should bind more [3H]alanine in the presence of a Na+ gradient, which is possible, but hard to imagine. Also, the alanine affinity for LeuT A22V is ~1 µM (Table 1). At this affinity it should be literally impossible to detect any binding because the off-rate is so fast that it would all dissociate during the washing procedure.

      We have described the data and left out any interpretation (e.g. changed ‘[3H]alanine transport’ to ‘[3H]alanine activity’). In addition, we have replaced: “This correlates with the lack of changes in conformational equilibrium observed in the tmFRET data between the NMDG+, Na+ and K+ states.” with: “Further investigations must clarify whether the changes in observed [3H]alanine activity constitutes a transport- or a binding event.”

      Lower part of p. 16. The Authors speculate "that the mechanistic impact of K+ binding could be to accelerate a transition away from the conformation where Na+ and substrate are released, to a state where they can no longer rebind and thus revert the transport process (efflux)". This could be easily tested by measuring exchange, which should not be influenced by potassium.

      We performed this experiment in Billesbolle et al. 2016. Nat Commun (Fig. 1f). We show that the exchange is decreased in the presence of K+. We hypothesize that this is because K+ binding forces LeuT away from the exchange mode.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Positive comments:

      We appreciate the positive comments of the editor and reviewers. The editor noted that the paper presents a “technological advance” that has enabled “important insights about the brain circuits through which the cerebellum could participate in social interactions.” Reviewer 1 thought this was a “timely and important study with solid evidence for correlative conclusions” and that the experiments were “technically challenging” and “well-performed”. Reviewer 2 stated that the finding of correlated activity between the regions is “interesting as non-motor functions of the cerebellum are relatively little explored.” They also thought “that the data are presented clearly, and the manuscript is well-written”. Reviewer 3 mentioned that “this approach can be useful for many neuroscientists”. We thank all the positive comments from the editors and all the reviewers.

      Reviewer #1 (Public Review)

      While the novelty of the device is strongly emphasized, I find that its value is somewhat diminished by the wire-free device developed by the same group as it should thus be possible to perform calcium imaging wire-free and electrophysiological recording via a single conventional cable (or also via wireless headstages).

      While it would be potentially possible to use a wire-free Miniscope in parallel with a wired electrophysiology recording system, this would result in a larger footprint on the animal’s head, more than a gram in increased weight due to an added LiPo battery, a larger electrophysiology head-stage, and limited recording length due to a battery capacity of around 20 minutes. Our main goal for the development of the E-scope platform was to develop an expandable electrophysiology recording board that would work with all previously built UCLA Miniscopes while also streamlining the integration of power and data into the coaxial cable connection already familiar to hundreds of labs using Miniscopes. The vast majority of Miniscope experiments are done using wired systems and we aimed to support the expansion of those systems instead of requiring a more substantial switch to using wire-free Miniscopes.

      The role of the identified network activations in social interactions is not touched upon.

      We agree with the reviewer that we have not discovered a causal role for the co-modulated activity patterns we have observed. As these causal experiments will require the development of real-time techniques for blocking socially evoked changes in firing rate in cerebellum and ACC, we are currently planning experiments to address causality. These results will be described in a future publication.

      Reviewer #1 (Recommendations for the Authors):

      Please provide the number of recorded mice.

      The number is now provided in the revised manuscript.

      If the recorded areas (cerebellar cortex, DN, and ACC) are part of the same circuit regulating social interactions, it would be nice to get insights into the directionality of the circuit. The authors favor the possibility that during social behavior, cerebellar efferences indirectly influence ACC activities (as in Figure 4A), however, no evidence is presented to support this interpretation. ACC activities might also indirectly influence PC firing. It may be possible to get insights into this by comparing the timing of neuronal activity in the different areas with respect to social onset.

      For this study, we mainly focused on the output of the cerebellar circuit to the cortex as previous work shows that dentate nucleus projects to the thalamus, which in turn projects to ACC and other cortical regions. (Badura et al.,eLife, 2018; Kelly et al., Nat. Neurosci., 2020) The temporal resolution of calcium imaging is limited (with the rise time of calcium events with genetically-encoded indicators taking hundreds of milliseconds) such that the resolution is insufficient to precisely assess the relative onset timing of the two regions. Our work certainly does not rule out cortical influences on PC firing.

      Reviewer #2 (Public Review)

      However, the causal relationship is far from established with the methods used, leaving it unclear if these two brain regions are similarly engaged by the behavior or if they form a pathway/loop.

      As indicated in our response to Reviewer #1’s similar critique, the goal of the presented study is to demonstrate the feasibility and capabilities of this novel device. This new tool will allow us to conduct a comprehensive and rigorous study to assess the causal role of the interactions between the cerebellum and ACC in social behavior (as well as other behaviors). These experiments are being designed now.

      Reviewer #2 (Recommendations for the Authors):

      It is unclear what is entirely unique about the E-scope. It seems that its advance is simply a common cable that allows interfacing with both devices (lighter weight than two cables is stated in the Discussion). Is this really an advance? What are its limitations? E.g., how close can the recording sites be to one another? How can it be configured for any other extracellular recording approach (tetrodes, 64-channel arrays, or Neuropixels)?

      In our experience, multiple lines of wires tethered to different head-mounted devices on an animal significantly impacts their behavior. Therefore, one of the major advantages of the UCLA Miniscope Platform is the use of a single, flexible coaxial cable to minimize the impact on tethering on behavior. The E-Scope platform builds on top of this work by incorporating electrophysiology recording capabilities into this single, flexible coaxial cable. Additionally, the electrophysiology recording hardware is backwards compatible with all previously built UCLA Miniscopes and can run through open-source and commercial commutators already used in Miniscope experiments.

      The available bandwidth within the shared single coaxial cable can handle megapixel Miniscope imaging along with the maximum data output of a 32 channel Intan Ephys IC. The E-Scope platform presented here does run the Intan Ephys IC at 20KSps for all 32 channels instead of the maximum 30KSps due to microcontroller speed limitations, but this could be overcome by using a fast microcontroller or clock, or slightly reducing the total number of electrodes samples. Finally, the E-Scope was designed to support any electrode types supported by the Intan Ephys IC. This includes up to 32 channels of passive probes such as single electrodes, tetrodes, silicon probes, and flexible multi-channel arrays but does not include Neuropixels as Neuropixels use custom active electronics on the probe to multiplex, sample, and serialize electrophysiology data.

      The authors only analyzed simple spikes in PCs for social-related activity. What about complex spikes? Is this correlated with ACC activity?

      Complex spikes were detectable to the extent that we were able to define that the recorded cell was a PC, but because these cells were recorded in freely behaving mice, accurate complex spike detection was not reliable enough to be used for further correlational analyses.

      The data is sampled in the two regions (cerebellum and ACC) at very different rates (imaging is much slower than electrophysiology; ephys data was binned). How does this affect the correlation plots?

      We generated firing rate maps for the cerebellar neural activity using a binning size that matched the sampling frequency of calcium imaging (see Methods). As mentioned in our methods, to study the relationship between the electrophysiology and calcium imaging data we binned the spike trains using 33 ms bins to match the calcium imaging sampling rate (~30 Hz). This limits the temporal resolution to calculate fine-scale correlations, but the correlations that we report are on a behaviorally relevant temporal scale. The fine temporal resolution of the electrophysiology data however can still be used to further examine at a higher temporal resolution the relationship between cerebellar output and specific social behavior epochs.

      For the correlation analysis, over what time frame was the activity relationship examined? How was this duration determined?

      Author response image 1.

      The main criteria for the time frame used to study the correlation analysis was the behavioral timescale of social interaction [see figure above for the number of social (red) and object (blue) interaction bouts (a), their duration (b) and coefficient of variation (CV) (c)]. Overall, the activity relationship time frame was based on the average duration of the social interactions (~3 sec). Periods of 3.8 before and 5.8 sec after interaction onset were used to study. Accordingly, the cross-correlograms were constructed using a maximum lag length of 5 sec. In the article we reported correlation at lag 0.

      The relationship between the cerebellum and ACC seems unconvincing. If two brain regions are similarly engaged by the behavior, wouldn't they have a high correlation? Is the activity in one region driving the other?

      We reference studies showing an anatomical and functional indirect connection between the cerebellum and the ACC or prefrontal cortex (Badura et al., eLife, 2018). Also, as stated in the introduction, the ACC is a recognized brain area for social behavioral studies. In the results, we stated that correlations increase in groups of neurons that are similarly engaged during a specific epoch in the social interaction was an expected finding. What was not expected was that there would be no difference in the distribution's correlation when the social epochs were removed, suggesting that intrinsic connectivity does not drive a difference in correlations.

      Although, since there is a cerebello-cortical loop, further study will be needed to understand which area initiates this type of activity during social behavior,

      • In the figures, the color-coded scale bars should be labeled as z-scores (confusing without them).

      • In Figure 4, the color differences for Soc-ACC, Soc+ACC and SocNS ACC should be more striking as it is hard to tell them apart because they are all similar shades of blue-gray.

      We thank the reviewer for their suggestions for improving the figures. We have incorporated these changes in Figures 2, 3 and along with their figure supplements. Graphs in Figure 4D-G have been edited to make the lines more visible to the reader.

      Reviewer #3 (Public Review)

      However, a mouse weighs between 20 and 40 g, so that an implant of 4.5 g is still quite considerable. It can be expected that this has an impact on the behavior and, possibly, the well-being of the animals. Whether this is the case or not, is not really addressed in this study.

      The weight of the E-Scope (4.5 g) is near the maximum that is tolerated by animals in our experience. We therefore acclimated the mouse to the weight with dummy scopes of increasing weights over a 7-10 day period. During this period, we observed the animal to have normal exploratory behavior. Specifically, there is no change in the sociability of the animals (Figure 2A) and animals cover the large arena (48x 48 cm, Figure 2H).

      Overall, the description of animal behavior is rather sparse. The methods state only that stranger age-matched mice were used, but do not state their gender. The nature of the social interactions was not described? Was their aggressive behavior, sexual approach and/or intercourse? Did the stranger mice attack/damage the E-Scope? Were the interactions comparable (using which parameters?) with and without E-Scope attached? It is not even described what the authors define as an "interaction bout" (Figure 2A). The number of interaction bouts is counted per 7 minutes, I presume? This is not specified explicitly.

      As mentioned in the methods section of the original version of our manuscript, all the target mice were age-matched “male” mice. As per the reviewer’s suggestion, we now have added in the manuscript that before any of our social interaction behavioral experiments, aggressive or agitated mice were removed after assessing their behavior in the arena during habituation. For all trials, all mice were introduced for the first time.

      We also mention in the methods section of our manuscript, that social behaviors were evaluated by proximity between the subject mouse and novel target mouse (2 cm from the body, head, or base of tail). From our recordings, we did not observe any aggressive, mounting, nor any other dominance behavior over the E-Scope subject mouse during the 7 minutes of social interaction assessment. Social interaction bouts in Figure 2A show the average number of social interaction bouts during the recording time. This has now been expanded upon in our revised manuscript.

      It would be very insightful if the authors would describe which events they considered to be action potentials, and which not. Similarly, the raw traces of Figure 1E are declared to be single-unit recordings of Purkinje cells. Partially due to the small size of the traces (invisible in print and pixelated in the digital version), I have a hard time recognizing complex spikes and simple spikes in these traces. This is a bit worrisome, as the authors declare the typical duration of the pause in simple spike firing after a complex spike to be 20-100 ms. In my experience, such long pauses are rare in this region, and definitely not typical. In the right panel of Figure 1A, an example of a complex spike-induced pause is shown. This pause is around 15 ms, so not typical according to the text, and starts only around 4 ms after the complex spike, which should not be the case and suggests either a misalignment of the figure or the detection of complex spike spikelets as simple spikes, while the abnormally long pause suggests that the authors fail to detect a lot of simple spikes. The authors could provide more confidence in their data by including more raw data, making explicit how they analyzed the signals, and by reporting basic statistics of firing properties (like rate, cv or cv2, pause duration). In this respect, Figure 2 - figure supplement 3 shows quite a large percentage of cells to have either a very low or a very high firing rate.

      We now provide a better example of simple spikes and complex spikes in Fig 1E and corrected our comment in the body of the manuscript. Previous version of the SS x CS cross-correlation histogram in Figure 1G as the reviewer mentions, was not the best example, because of the detected CS spikelets. However, the detection of CS spikelets has little impact on the interpretation of the results. We have replaced this figure with a better example of the SS x CS cross-correlation histogram.

      The number of Purkinje cells recorded during social interactions is quite low: only 11 cells showed a modulation in their spiking activity (unclear whether in complex spikes, simple spikes or both. During object interaction, only 4 cells showed a significant modulation. Unclear is whether the latter 4 are a subset of the former 11, or whether "social cells" and "object cells" are different categories. Having so few cells, and with these having different types of modulation, the group of cells for each type of modulation is really small, going down to 2 cells/group. It is doubtful whether meaningful interpretation is possible here.

      While the number of neurons is not as high as those reported for other regions, the number presented depicts the full range of responses to social behavior. It is extremely difficult to obtain stable neurons in freely behaving socially interacting animals and only a handful of neurons could be recorded in each animal. Among these recorded neurons only a subset responds to social interactions further reducing the numbers. The results however are consistent among cell types and the direction of modulation fits with the inhibitory connectivity between PCs and DN neurons. To our knowledge, we are the first group to publish neuronal activity of PC and DN neurons from freely behaving mice during social behavior.

      Neural activity patterns observed during social interaction do not necessarily relate specifically to social interaction, but can also occur in a non-social context. The authors control this by comparing social interactions with object interactions, but I miss a direct comparison between the two conditions, both in terms of behavior (now only the number of interactions is counted, not their duration or intensity), and in terms of neural activity. There is some analysis done on the interaction between movement and cerebellar activity (Figure 2 - figure supplement 4), but it is unclear to what extent social interactions and movements are separated here. It would already help to indicate in the plots with trajectories (e.g., Fig. 2H) indicate the social interactions (e.g., social interaction-related movements in red, the rest of the trajectories in black).

      We have updated the social interaction plots in Figure 2H in the revised version of the manuscript.

      Reviewer #3 (Recommendations for the Authors):

      Increase the number of cerebellar neurons that are recorded.

      Due to the difficulty of the experiment and the low yield which we get for cerebellar recordings, substantially increasing the number of neurons will require many more experiments which are not feasible at this time.

      Include more raw data and make the analysis procedure more insightful with illustrations of intermediate steps.

      We have included a more thorough description of the analysis in the methods section of the revised manuscript.

      Provide a better description of the behavior.

      We have increased the level of detail regarding the mouse behavior in the Results and Methods sections. This includes a more detailed description of the parameters we used to analyze the social interaction.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1:

      Summary:

      The authors aim to test the sensory recruitment theory of visual memory, which assumes that visual sensory areas are recruited for working memory, and that these sensory areas represent visual memories in a similar fashion to how perceptual inputs are represented. To test the overlap between working memory (WM) and perception, the authors use coarse stimulus (aperture) biases that are known to account for (some) orientation decoding in the visual cortex (i.e., stimulus energy is higher for parts of an image where a grating orientation is perpendicular to an aperture edge, and stimulus energy drives decoding). Specifically, the authors show gratings (with a given "carrier" orientation) behind two different apertures: one is a radial modulator (with maximal energy aligned with the carrier orientation) and the other an angular modulator (with maximal energy orthogonal to the carrier orientation). When the subject detects contrast changes in these stimuli (the perceptual task), orientation decoding only works when training and testing within each modulator, but not across modulators, showing the impact of stimulus energy on decoding performance. Instead, when subjects remember the orientation over a 12s delay, orientation decoding works irrespective of the modulator used. The authors conclude that representations during WM are therefore not "sensory-like", given that they are immune to aperture biases. This invalidates the sensory recruitment hypothesis, or at least the part assuming that when sensory areas are recruited during WM, they are recruited in a manner that resembles how these areas are used during perception.

      Strengths:

      Duan and Curtis very convincingly show that aperture effects that are present during perception, do not appear to be present during the working memory delay. Especially when the debate about "why can we decode orientations from human visual cortex" was in full swing, many may have quietly assumed this to be true (e.g., "the memory delay has no stimuli, and ergo no stimulus aperture effects"), but it is definitely not self-evident and nobody ever thought to test it directly until now. In addition to the clear absence of aperture effects during the delay, Duan and Curtis also show that when stimulus energy aligns with the carrier orientation, cross-generalization between perception and memory does work (which could explain why perception-to-memory cross-decoding also works). All in all, this is a clever manipulation, and I'm glad someone did it, and did it well.

      Weaknesses:

      There seems to be a major possible confound that prohibits strong conclusions about "abstractions" into "line-like" representation, which is spatial attention. What if subjects simply attend the endpoints of the carrier grating, or attend to the edge of the screen where the carrier orientation "intersects" in order to do the task? This may also result in reconstructions that have higher bold at areas close to the stimulus/screen edges along the carrier orientation. The question then would be if this is truly an "abstracted representation", or if subjects are merely using spatial attention to do the task.

      Alternatively (and this reaches back to the "fine vs coarse" debate), another argument could be that during memory, what we are decoding is indeed fine-scale inhomogenous sampling of orientation preferences across many voxels. This is clearly not the most convincing argument, as the spatial reconstructions (e.g., Figure 3A and C) show higher BOLD for voxels with receptive fields that are aligned to the remembered orientation (which is in itself a form of coarse-scale bias), but could still play a role.

      To conclude that the spatial reconstruction from the data indeed comes from a line-like representation, you'd need to generate modeled reconstructions of all possible stimuli and representations. Yes, Figure 4 shows that line results in a modeled spatial map that resembles the WM data, but many other stimuli might too, and some may better match the data. For example, the alternative hypothesis (attention to grating endpoints) may very well lead to a very comparable model output to the one from a line. However testing this would not suffice, as there may be an inherent inverse problem (with multiple stimuli that can lead to the same visual field model).

      The main conclusion, and title of the paper, that visual working memories are abstractions of percepts, is therefore not supported. Subjects could be using spatial attention, for example. Furthermore, even if it is true that gratings are abstracted into lines, this form of abstraction would not generalize to any non-spatial feature (e.g., color cannot become a line, contrast cannot become a line, etc.), which means it has limited explanatory power.

      We thank the reviewer for bringing up these excellent questions.

      First, to test the alternative hypothesis of spatial attention, we fed a dot image into the image-computable model. We placed the dot where we suspect one might place their spatial attention, namely, at the edge of the stimulus that is tangent to the orientation of the grating. We generated the model response for three orientations and their combination by rotating and averaging. From Author response image 1 below, one can see that this model does not match the line-like representation we reported. Nonetheless, we would like to avoid making the argument that attention does not play a role. We strongly suspect that if one was attending to multiple places along a path that makes up a line, it would produce the results we observed. But there begins a circularity in the logic, where one cannot distinguish between attention to a line-like representation and a line of attention being the line-like representation.

      Author response image 1.

      Reconstruction maps for the dot image at the edge of 15°, 75°, 135°, and the combined across three orientation conditions.

      Second, we remain agnostic to the question of whether fine-scale inhomogenous sampling of orientation selective neurons may drive some of the decoding results we report here. It is possible that our line-like representations are driven by neurons tuned to the sample orientation that have receptive fields that lie along the line. Here, we instead focus on testing the idea that WM decoding does not depend on aperture biases.

      Finally, we agree with the reviewer that there is much more work to be done in this area. Our working hypothesis, that WM representations are abstractions of percepts, is admittedly based on Occam's razor and an appeal to efficient coding principles. We also agree that these results may not generalize to all forms of WM (eg, color). As always, there is a tradeoff between interpretability (visual spatial formats in retinotopically organized maps) and generalizability. Frankly, we have no idea how one might be able to test these ideas when subjects might be using the most common type of memory reformatting - linguistic representations, which are incredibly efficient.

      Additional context:

      The working memory and perception tasks are rather different. In this case, the perception task does not require the subject to process the carrier orientation (which is largely occluded, and possibly not that obvious without paying attention to it), but attention is paid to contrast. In this scenario, stimulus energy may dominate the signal. In the WM task, subjects have to work out what orientation is shown to do the task. Given that the sensory stimulus in both tasks is brief (1.5s during memory encoding, and 2.5s total in the perceptual task), it would be interesting to look at decoding (and reconstructions) for the WM stimulus epoch. If abstraction (into a line) happens in working memory, then this perceptual part of the task should still be susceptible to aperture biases. It allows the authors to show that it is indeed during memory (and not merely the task or attentional state of the subject) that abstraction occurs.

      Again, this is an excellent question. We used a separate perceptual task instead of the stimulus epoch as control mainly for two reasons. First, we used a control task in which participants had to process the contrast, not orientation, of the grating because we were concerned that participants would reformat the grating into a line-like representation to make the judgments. To avoid this, we used a task similar to the one used when previous researchers first found the stimulus vignetting effect (Roth et al., 2018). Again, our main goal was to try to focus on the bottom-up visual features. Second, because of the sluggishness of the BOLD response, combined with our task design (ie, memory delay always followed the target stimulus), we cannot disentangle the visual and memory responses that co-exist at this epoch. Any result could be misleading.

      What's also interesting is what happens in the passive perceptual condition, and the fact that spatial reconstructions for areas beyond V1 and V2 (i.e., V3, V3AB, and IPS0-1) align with (implied) grating endpoints, even when an angular modulator is used (Figure 3C). Are these areas also "abstracting" the stimulus (in a line-like format)?

      We agree these findings are interesting and replicate what we found in our previous paper (Kwak & Curtis, Neuron, 2022). We believe that these results do imply that these areas indeed store a reformatted line-like WM representation that is not biased by vignetting. We would like to extend a note of caution, however, because the decoding results in the higher order areas (V3AB, IPS0-1, etc) are somewhat poor (especially in comparison to V1, V2, V3) (see Figure 2).

      Reviewer #2:

      Summary:

      According to the sensory recruitment model, the contents of working memory (WM) are maintained by activity in the same sensory cortical regions responsible for processing perceptual inputs. A strong version of the sensory recruitment model predicts that stimulus-specific activity patterns measured in sensory brain areas during WM storage should be identical to those measured during perceptual processing. Previous research casts doubt on this hypothesis, but little is known about how stimulus-specific activity patterns during perception and memory differ. Through clever experimental design and rigorous analyses, Duan & Curtis convincingly demonstrate that stimulus-specific representations of remembered items are highly abstracted versions of representations measured during perceptual processing and that these abstracted representations are immune to aperture biases that contribute to fMRI feature decoding. The paper provides converging evidence that neural states responsible for representing information during perception and WM are fundamentally different, and provides a potential explanation for this difference.

      Strengths:

      (1) The generation of stimuli with matching vs. orthogonal orientations and aperture biases is clever and sets up a straightforward test regarding whether and how aperture biases contribute to orientation decoding during perception and WM. The demonstration that orientation decoding during perception is driven primarily by aperture bias while during WM it is driven primarily by orientation is compelling.

      (2) The paper suggests a reason why orientation decoding during WM might be immune to aperture biases: by weighting multivoxel patterns measured during WM storage by spatial population receptive field estimates from a different task the authors show that remembered but not actively viewed - orientations form "line-like" patterns in retinotopic cortical space.

      We thank the reviewer for noting the strengths in our work.

      Weaknesses:

      (1) The paper tests a strong version of the sensory recruitment model, where neural states representing information during WM are presumed to be identical to neural states representing the same information during perceptual processing. As the paper acknowledges, there is already ample reason to doubt this prediction (see, e.g., earlier work by Kok & de Lange, Curr Biol 2014; Bloem et al., Psych Sci, 2018; Rademaker et al., Nat Neurosci, 2019; among others). Still, the demonstration that orientation decoding during WM is immune to aperture biases known to drive orientation decoding during perception makes for a compelling demonstration.

      We agree with the reviewer, and would add that the main problem with the sensory recruitment model of WM is that it remains underspecified. The work cited above and in our paper, and the results in this report is only the beginning of efforts to fully detail what it means to recruit sensory mechanisms for memory.

      (2) Earlier work by the same group has reported line-like representations of orientations during memory storage but not during perception (e.g., Kwak & Curtis, Neuron, 2022). It's nice to see that result replicated during explicit perceptual and WM tasks in the current study, but I question whether the findings provide fundamental new insights into the neural bases of WM. That would require a model or explanation describing how stimulus-specific activation patterns measured during perception are transformed into the "line-like" patterns seen during WM, which the authors acknowledge is an important goal for future research.

      We agree with the reviewer that perhaps some might see the current results as an incremental step given our previous paper. However, we would point out that researchers have been decoding memorized orientation from the early visual cortex for 15 years, and not one of those highly impactful studies had ever done what we did here, which was to test if decoded WM representations are the product of aperture biases. Not only do our results indicate that decoding memorized orientation is immune to these biases, but they critically suggest a reason why one can decode orientation during WM.

      Reviewer #3:

      Summary:

      In this work, Duan and Curtis addressed an important issue related to the nature of working memory representations. This work is motivated by findings illustrating that orientation decoding performance for perceptual representations can be biased by the stimulus aperture (modulator). Here, the authors examined whether the decoding performance for working memory representations is similarly influenced by these aperture biases. The results provide convincing evidence that working memory representations have a different representational structure, as the decoding performance was not influenced by the type of stimulus aperture.

      Strengths:

      The strength of this work lies in the direct comparison of decoding performance for perceptual representations with working memory representations. The authors take a well-motivated approach and illustrate that perceptual and working memory representations do not share a similar representational structure. The authors test a clear question, with a rigorous approach and provide convincing evidence. First, the presented oriented stimuli are carefully manipulated to create orthogonal biases introduced by the stimulus aperture (radial or angular modulator), regardless of the stimulus carrier orientation. Second, the authors implement advanced methods to decode the orientation information present, in visual and parietal cortical regions, when directly perceiving or holding an oriented stimulus in memory. The data illustrates that working memory decoding is not influenced by the type of aperture, while this is the case in perception. In sum, the main claims are important and shed light on the nature of working memory representations.

      We thank the reviewer for noting the strengths in our work.

      Weaknesses:

      I have a few minor concerns that, although they don't affect the main conclusion of the paper, should still be addressed.

      (1) Theoretical framing in the introduction: Recent work has shown that decoding of orientation during perception does reflect orientation selectivity, and it is not only driven by the stimulus aperture (Roth, Kay & Merriam, 2022).

      Excellent point, and similar to the point made by Reviewer 1. We now adjust our text and cite the paper in the Introduction.

      Below, we paste our response to Reviewer 1:

      “Second, we remain agnostic to the question of whether fine-scale inhomogenous sampling of orientation selective neurons may drive some of the decoding we report here. It is possible that our line-like representations are driven by neurons tuned to the sample orientation that have receptive fields that lie along the line. Here, we instead focus on testing the idea that WM decoding does not depend on aperture biases.”

      (2) Figure 1C illustrates the principle of how the radial and angular modulators bias the contrast energy extracted by the V1 model, which in turn would influence orientation decoding. It would be informative if the carrier orientations used in the experiment were shown in this figure, or at a minimum it would be mentioned in the legend that the experiment used 3 carrier orientations (15{degree sign}, 75{degree sign}, 135{degree sign}) clockwise from vertical. Related, when trying to find more information regarding the carrier orientation, the 'Stimuli' section of the Methods incorrectly mentions that 180 orientations are used as the carrier orientation.

      We apologize for not clearly indicating the stimulus features in the figure. Now, we added the information about the target orientations in Figure 1C legend. Also, we now corrected in the Methods section the mistakes about the carrier orientation and the details of the task. Briefly, participants were asked to use a continuous report over 180 orientations. We now clarify that “We generated 180 orientations for the carrier grating to cover the whole orientation space during the continuous report task.”

      (3) The description of the image computable V1 model in the Methods is incomplete, and at times inaccurate. i) The model implements 6 orientation channels, which is inaccurately referred to as a bandwidth of 60{degree sign} (should be 180/6=30). ii) The steerable pyramid combines information across phase pairs to obtain a measure of contrast energy for a given stimulus. Here, it is only mentioned that the model contains different orientation and spatial scale channels. I assume there were also 2 phase pairs, and they were combined in some manner (squared and summed to create contrast energy). Currently, it is unclear what the model output represents. iii) The spatial scale channel with the maximal response differences between the 2 modulators was chosen as the final model output. What spatial frequency does this channel refer to, and how does this spatial frequency relate to the stimulus?

      (i) First, we thank the reviewer for pointing out this mistake since the range of orientations should be 180deg instead of 360deg. We corrected this in the revised version.

      (ii) Second, we apologize for not being clear. In the second paragraph of the “Simulate model outputs” section, we wrote,

      “For both types of stimuli, we used three target orientations (15°, 75°, and 135° clockwise from vertical), which had two kinds of phases for both the carriers and the modulators. We first generated the model’s responses to each target image separately, then averaged the model responses across all phases for each orientation condition.”

      We have corrected this text by now writing,

      from vertical), two phases for the carrier (0 or π), and two phases for the modulator (sine “For both types of stimuli, we used three target orientations (15°, 75°, and 135° clockwise from vertical), two phases for the carrier (0 or π), and two phases for the modulator (sine or cosine phase). We first generated the model responses to each phase condition separately, then averaged them across all phases for each orientation condition.”

      (iii) Third and again we apologize for the misunderstanding. Since both modulated gratings have the same spatial frequency, the channel with the largest response should be equal to the spatial frequency of the stimulus. We corrected this by now writing,

      “For the final predicted responses, we chose the subband with maximal responses (the 9th level), which corresponds to the spatial frequency of the stimulus (Roth, Heeger, and Merriam 2018).”

      (4) It is not clear from the Methods how the difficulty in the perceptual control task was controlled. How were the levels of task difficulty created?

      Apologies for not being clear. The task difficulty was created by setting the contrast differences between the two stimuli. The easiest level is choosing the first and the last contrast as pairs, while the hardest level is choosing the continuous two contrasts. We added these sentences

      “The contrast for each stimulus was generated from a predefined set of 20 contrasts uniformly distributed between 0.5 and 1.0 (0.025 step size). We created 19 levels of task difficulty based on the contrast distance between the two stimuli. Thus, the difficulty ranged from choosing contrast pairs with the largest difference (0.5, easiest) to contrast pairs with the smallest difference (0.025, hardest). Task difficulty level changed based on an adaptive, 1-up-2-down staircase procedure (Levitt 1971) to maintain performance at approximately 70% correct.”

      Recommendations For The Authors

      (Reviewer #1):

      (1) If the black circle (Fig 3A & C) is the stimulus size, and the stimulus (12º) is roughly half the size of the entire screen (24.8º), then how are spatial reconstructions generated for parts of the visual field that fall outside of the screen? I am asking because in Figure 3 the area over which spatial reconstructions are plotted has a diameter at least 3 times the diameter of that black circle (the stimulus). I'm guessing this is maybe possible when using a very liberal fitting approach to prf's, where the center of a prf can be outside of the screen (so you'd fit a circle to an elongated blob, assuming that blob is the edge of a circle, or something). Can you really reliably estimate that far out into visual space/ extrapolate prf's that exist in a part of the space you did not fully map (because it's outside of the screen)?

      We thank the reviewer for pointing out this confusing issue.

      First, the spatial construction map has a diameter 3 times the diameter of the stimulus because we included voxels whose pRF eccentricities were within 20º in the reconstruction, the same as Kwak & Curtis, 2022. There are reasons for doing so. First, while the height of the screen is 24.8º, the width of the screen is 44º. Thus, it is possible to have voxels whose pRF eccentricities are >20º. Second, for areas outside the height boundaries, there might not be pRF centers, but the whole pRF Gaussian distributions might still cover the area. Moreover, when creating the final map combined across three orientation conditions, we rotated them to be centered vertically, which then required a 20x20º square. Finally, inspecting the reconstruction maps, we noticed that the area that was twice the stimulus size (black circle) made very little contributions to the reconstructions. Therefore, the results depicted in Figure 3A&C are justified, but see the next comment and our response.

      (2) Is the quantification in 3B/C justified? The filter line uses a huge part of visual space outside of the stimulus (and even the screen). For the angular modulator in the "perception" condition, this means that there is no peak at -90/90 degree. But if you were to only use a line that is about the size of the stimulus (a reasonable assumption), it would have a peak at -90/90 degree.

      This is an excellent question. We completely agree that it is more reasonable to use filter lines that have the same size (12º) as the stimulus instead of the whole map size (40º). Based on the feedback from the Reviewer, we redid the spatial reconstruction analyses and now include the following changes to Figure 3.

      (1) We fitted the lines using pixels only within the stimulus. In Figure 3A and Figure 3C, we now replaced the reconstruction maps.

      (2) We added the color bar in Figure 3A.

      (3) We regenerated the filtered responses and calculated the fidelity results by using line filters with the stimulus size. We replaced the filtered responses and fidelity results in Figure 3B and Figure 3D. With the new analysis, as anticipated by the Reviewer, we now found peaks at -90/90 degrees for the angular modulated gratings in the perceptual control task in V1 and V2. Thank you Reviewer 1!!!!

      (4) We also made corresponding changes in the Supplementary Figure S4 and S5, as well as the statistical results in Table S4 and S5.

      (5) In the “Methods” section, we added “within the stimulus size” for both “fMRI data analysis: Spatial reconstruction” and “Quantification and statistical analysis” subsections.

      (3) Figure 4 is nice, but not exactly quantitative. It does not address that the reconstructions from the perceptual task are hugging the stimulus edges much more closely compared to the modeled map. Conversely, the yellow parts of the reconstructions from the delay fan out much further than those of the model. The model also does not seem to dissociate radial/angular stimuli, while in the perceptual data the magnitude of perceptual reconstruction is clearly much weaker for angular compared to radial modulator.

      We thank the reviewer for this question. First, we admit that Figure 4 is more qualitative than quantitative. However, we see no alternative that better depicts the similarity in the model prediction and the fMRI results for the perceptual control and WM tasks. The figure clearly shows the orthogonal aperture bias. Second, we agree that aspects of the observed fMRI results are not perfectly captured by the model. This could be caused by many reasons, including fMRI noise, individual differences, etc. Importantly, different modulators induce orthogonal aperture bias in the perceptual but not the WM task, and therefore does not have a major impact on the conclusions.

      (4) The working memory and perception tasks are rather different. In this case, the perception task does not require the subject to process the carrier orientation (which is largely occluded, and possibly not that obvious without paying attention to it), but attention is paid to contrast. In this scenario, stimulus energy may dominate the signal. In the WM task, subjects have to work out what orientation is shown to do the task. Given that the sensory stimulus in both tasks is brief (1.5s during memory encoding, and 2.5s total in the perceptual task), it would be interesting to look at decoding (and reconstructions) for the WM stimulus epoch. If abstraction (into a line) happens in working memory, then this perceptual part of the task should still be susceptible to aperture biases. It allows the authors to show that it is indeed during memory (and not merely the task or attentional state of the subject) that abstraction occurs.

      We addressed the same point in the response for Reviewer 1, “additional context” section.

      Recommendations for improving the writing:

      (1) The main text had too little information about the Methods. Of course, some things need not be there, but others are crucial to understanding the basics of what is being shown. For example, the main text does not describe how many orientations are used (well... actually the caption to Figure 1 says there are 2: horizontal and vertical, which is confusing), and I had to deduce from the chance level (1/3) that there must have been 3 orientations. Also, given how important the orthogonality of the carrier and modulator are, it would be good to have this explicit (I would even want an analysis showing that indeed the two are independent). A final example is the use of beta weights, and for delay period decoding only the last 6s (of the 12s delay) are modeled and used for decoding.

      We thank the reviewer for identifying aspects of the manuscript that were confusing. We made several changes to the paper to clarify these details.

      First, we added the information about the orientations we used in the caption for Figure 1 and made it clear that Figure 1C is just an illustration using vertical/horizontal orientations. Second, the carrier and the modulator are different in many ways. For example, the carrier is a grating with orientation and contrast information, while the modulator is the aperture that bounds the grating without these features. Their phases are orthogonal, and we added this in the second paragraph of the “Stimuli” section. Last, in the main text and the captions, we now denote “late delay” when writing about our procedures.

      (2) Right under Figure 3, the text reads "angular modulated gratings produced line-like representations that were orthogonal carrier orientation reflecting the influence of stimulus vignetting", but the quantification (Figure 3D) does not support this (there is no orthogonal "bump" in the filtered responses from V1-V3, and one aligned with the carrier orientation in higher areas).

      This point was addressed in the “recommendations for the authors (Reviewer 1), point 2” above.

      Minor corrections to text and figures:

      (1) Abstract: "are WM codes" should probably be "WM codes are".

      We prefer to keep “are WM codes” as it is grammatically correct.

      (2) Introduction: Second sentence 2nd paragraph: representations can be used to decode representations? Or rather voxel patterns can be used...

      Changed to “On the one hand, WM representations can be decoded from the activity patterns as early as primary visual cortex (V1)...”

      (3) Same paragraph: might be good to add more references to support the correlation between V1 decoding and behavior. There's an Ester paper, and Iamchinina et al. 2021. These are not trial-wise, but trial-wise can also be driven by fluctuating arousal effects, so across-subject correlations help fortify this point.

      We added these two papers as references.

      (4) Last paragraph: "are WM codes" should probably be "WM codes are".

      See (1) above.

      (5) Figure 1B & 2A caption: "stimulus presenting epoch" should probably be "stimulus presentation epoch".

      Changed to “stimulus epoch”.

      (6) Figure 1C: So this is very unclear, to say stimuli are created using vertical and horizontal gratings (when none of the stimuli used in the experiment are either).

      We solved and answered this point in response to Reviewer 3, point 2.

      (7) Figure 2B caption "cross" should probably be "across".

      We believe “cross” is fine since cross here means cross-decoding.

      (8) Figure 3A and C are missing a color bar, so it's unclear how these images are generated (are they scaled, or not) and what the BOLD values are in each pixel.

      All values in the map were scaled to be within -1 to 1. We added the color bar in both Figure 3 and Figure 4.

      (9) Figure 3B and D (bottom row) are missing individual subject data.

      We use SEM to indicate the variance across subjects.

      (10) Figure D caption: "early (V1 and V2)" should probably be "early areas (V1 and V2)".

      Corrected.

      (11) Methods, stimuli says "We generated 180 orientations for the carrier grating to cover the whole orientation space." But it looks like only 3 orientations were generated, so this is confusing.

      We solved and answered this point in response to Reviewer 3, point 2.

      (12) Further down (fMRI task) "random jitters" is probably "random jitter"

      Corrected.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      The manuscript by Sayeed et al. uses a comprehensive series of multi-omics approaches to demonstrate that late-stage human cytomegalovirus (HCMV) infection leads to a marked disruption of TEAD1 activity, a concomitant loss of TEAD1-DNA interactions, and extensive chromatin remodeling. The data are thoroughly presented and provide evidence for the role of TEAD1 in the cellular response to HCMV infection.

      However, a key question remains unresolved: is the observed disruption of TEAD1 activity a direct consequence of HCMV infection, or could it be secondary to the broader innate antiviral response? In this respect, the study would benefit from experiments that assess the effect of TEAD1 overexpression or knockdown/deletion on HCMV replication dynamics. Such functional assays could help delineate whether TEAD1 perturbation directly influences viral replication or is part of a downstream/indirect cellular response, providing deeper mechanistic insights.

      To examine the effect of TEAD1 on HCMV, we performed an experiment in primary human foreskin fibroblasts (HFF) which were stably transduced with constitutive TEAD1. To constitutively express TEAD1, we cloned the open reading frame of TEAD1 into pLenti-puro (Plasmid #39481 from Addgene). We selected for transduced cells using puromycin. For these experiments, we first assessed two multiplicities of infection (MOI): 1 and 10 (Reviewer Response Figure 1). Based on the TEAD1 expression in these cells relative to non-transduced HFF cells, we performed HCMV infection experiments in cells transduced with TEAD1 lentivirus at an MOI of 1.

      For infections, we used a version of HCMV in which the C terminus of the capsi-associated tegument protein pUL32 (pp150) is tagged by enhanced green fluorescent protein (GFP) (PMID: 15708994). This experimental design allowed us to assess the impact of constitutive TEAD1 expression on HCMV infection. GFP and immediate early protein expression levels were measured 48 hours after infection by flow cytometry.

      After infecting parent cells (no constitutive TEAD1) and TEAD1 constitutively expressing cells with a GFP-positive HCMV at MOIs of 0.3 and 1, we identified equivalent GFP expression in the two conditions, indicating equivalent levels of HCMV infection 48 hours after initial infection (Reviewer Response Figure 1A). We also identified equivalent immediate early protein expression at 48 hours after infection, as measured both by percent positivity (Reviewer Response Figure 1B) and mean florescent intensity (Reviewer Response Figure 1C). At 96 hours with an MOI of 3, constitutive expression of TEAD1 led to a slight reduction in the expression of the HCMV proteins pp65 (encoded by UL83) and UL44 at 72 and 96 hours post initial infection (Reviewer Response Figure 1D). These results suggest that TEAD1 expression has minimal effects, if any, on the expression of these two late HCMV proteins in fibroblasts.  Regulation of particular HCMV genes by TEAD1 is likely to be central for HCMV replication and reactivation in other specialized cell types relevant to viral pathogenesis and disease. However, definitive studies are beyond the scope of the current study. 

      Author response image 1.

      Constitutive TEAD1 expression reduces expression of two HCMV late genes at 72 and 96 hours after infection. A-C. Primary human foreskin fibroblasts with and without constitutive TEAD1 expression were infected with pp150-GFP HCMV at a multiplicity of infection (MOI) of 0.3 or 1 and assessed 48 hours post infection. A. HCMV positive cells were quantified by measuring the percent of cells that were GFP positive. B. The percentages of immediate early (IE1/IE2) positive cells were quantified by flow cytometry. C. The mean florescence intensity of immediate early positive cells was quantified by flow cytometry. D. Primary human foreskin fibroblasts with and without constitutive TEAD1 expression were infected with pp150-GFP HCMV at an MOI of 1 and assessed by Western blot at various time point post infection. UL44 and pp65 are expressed late in the cascade of HCMV gene expression. TEAD1 expression levels and uncropped Westerns are provided in Supplemental Figure S8

      Reviewer Response Methods:

      Flow cytometric analysis of viral entry and spread using GFP expression and HCMV immediate early (IE) protein staining

      Parental and TEAD1 transduced human foreskin fibroblasts were seeded into 12-well plates at 1.0 × 10<sup>5</sup> cells per well and either mock infected or infected with pp150-GFP HCMV (PMID: 15708994) at MOIs of 0.3 or 1 on the same day. Cells were trypsinized at appropriate time points and then neutralized with complete medium. Cell suspensions were spun down at 500g for 5 minutes, and the cell pellet was fixed in 70% ethanol for 30 minutes. Following fixation, cells were permeabilized in phosphate-buffered saline (PBS) containing 0.5% bovine serum albumin (BSA) and 0.5% Tween 20 for 10 minutes at 4°C, pelleted, and then stained with IE1/IE2 antibody (mAb810-Alexa Fluor 488) diluted in PBS supplemented with 0.5% BSA for 2 hours. Cells were washed with PBS supplemented with 0.5% BSA–0.5% Tween 20 and then resuspended in PBS. Cells were analyzed using a flow cytometer (BD Biosciences). Infected cells were also trypsinized at appropriate time points, neutralized in the appropriate media, and directly analyzed for GFP positivity on the flow cytometer.

      Western blot analyses of HCMV protein expression in infected cells with and without constitutive TEAD1 expression

      TEAD1 transduced and parental human foreskin fibroblasts were seeded into 6-well cell culture plates at a density of 3.0 × 10<sup>5</sup> cells per well and either mock infected or infected with pp150-GFP HCMV (PMID: 15708994) at an MOI of 1. Whole-cell lysates were collected at various time points post-infection, separated by SDS-PAGE, and transferred to nitrocellulose for Western blot analysis. Western blots were probed with the following primary antibodies: anti-IE1/IE2 (Chemicon), anti-UL44 (kind gift of John Shanley), anti-pp65 (Virusys Corporation), and cellular β-actin antibody (Bethyl Laboratories). Next, each blot was incubated with appropriate horseradish peroxidase-conjugated anti-rabbit or anti-mouse IgG secondary antibodies. Chemiluminescence was detected and quantified using a C-DiGit blot scanner from Li-Cor.

      Reviewer #2 (Public review):

      Summary:

      This work uses genomic and biochemical approaches for HCMV infection in human fibroblasts and retinal epithelial cell lines, followed by comparisons and some validations using strategies such as immunoblots. Based on these analyses, they propose several mechanisms that could contribute to the HCMV-induced diseases, including closing of TEAD1-occupying domains and reduced TEAD1 transcript and protein levels, decreased YAP1 and phospho-YAP1 levels, and exclusion of TEAD1 exon 6.

      Strengths:

      The genomics experiments were done in duplicates and data analyses show good technical reproducibility. Data analyses are performed to show changes at the transcript and chromatin level changes, followed by some Western blot validations.

      Weaknesses:

      This work, at the current stage, is quite correlative since no functional studies are done to show any causal links. For readers who are outside the field, some clarifications of the system and design need to be stated.

      Reviewer #2 (Recommendations for the authors):

      Here are some specific questions:

      (1) Since all current analyses are correlative, it is difficult to know which changes are of biological significance. For example, experiments manipulating TEAD transcription factor or YAP with effects on how cells respond to HCMV infection would significantly strengthen the conclusions, which are largely speculations now.

      Please see response to Reviewer 1, which highlights newly added functional assays that include the constitutive (forced) expression of TEAD1, as suggested.

      (2) How similar are these cell lines (human fibroblasts and retinal epithelial cell lines) resembling the actually infected cells in patients that lead to symptoms?

      In infected cells in patients, HCMV initially infects both fibroblasts and epithelial cells. HCMV penetrates fibroblasts by fusion at the cell surface but is endocytosed into epithelial cells (PMID: 18077432). Thus, most experimental studies of HCMV in vitro use primary human foreskin fibroblasts and a retinal epithelial cell line, as we do in this study.

      Additional information on primary human fibroblasts as a model of HCMV infection in humans

      There is a nice review article that provides the history of the study of the molecular biology of HCMV that describes how Stanley Plotkin from the Wistar Institute first identified human fibroblast HCMV infected cells (PMID: 24639214). The primary fibroblasts of the foreskin of neonates are available commercially (sometimes called HS68) and model neonatal HCMV infection. Neonatal HCMV, or Congenital Cytomegalovirus, is a leading cause of congenital infection and a significant cause of non-genetic hearing loss in the US (https://www.cdc.gov/cytomegalovirus/congenital-infection/index.html). While many infected newborns appear healthy at birth, a substantial percentage experience long-term health problems, including hearing loss, developmental delays, and vision problems (PMID: 39070527). 

      More information on ARPE-3 as a model of HCMV infection in humans

      HCMV retinitis is a leading cause of vision loss and results from HCMV infection of retinal cells. Retinal epithelial cells are the primary target for HCV infection in the eye. The cell line ARPE-19 is derived from a primary human adult retinal pigment epithelium explant and is commonly used to study HCMV and is thought to be physiologically relevant to the human infection (PMID: 8558129 and 28356702). When compared to primary retinal pigment epithelia, ARPE-19 cells develop a similar cellular and molecular phenotype to primary cells from adults and neonates (PMID: 28356702).

      (3) What is the rationale for using 48 hours' infection? Is this the typical timeframe for patients to develop symptoms?

      HCMV genes are expressed in a temporally controlled manner (PMID: 35417700). Early genes (within the first 4 hours) are involved in regulating transcription, while genes within 4-48 hours are involved in DNA replication and further transcriptional regulation. The 48 hour mark corresponds to the onset of significant viral replication and interactions between the virus and the host immune response. After 48 hours, late genes are expressed, which encode structural proteins as well as viral proteins that inhibit host anti-viral responses.  Most studies that focus on the role of HCMV’s early and immediate early genes are performed at 24 or 48 hours. Similarly, most studies that assess the initial innate immune response to HCMV are performed within the initial 48 hours after in vitro infection.

      In most people with healthy immune systems, there are no symptoms (PMID: 34168328). While 60% of people in developed countries and 90% of those in developing countries are serologically positive for past infection, it is challenging to study the kinetics of symptom development due to heterogeneity in the initial virion exposure, the cell types that are initially infected, and immune response. HCMV persists throughout the lifetime of the infected individual by establishing latent infection.

      Also, among all these large-scale global changes, what are primary and what are secondary?

      A kinetic study with many timepoints would be needed to identify the primary and secondary genomic changes associated with HCMV infection. These experiments, while exciting, are beyond the scope of this manuscript.

      (4) Fig.2: In addition to the changes for each cell type, comparison of unchanged, closed and opened with infection regions between the two cell types could be informative for commonalities and differences between cell types.

      This was a good suggestion.  We have added a new Supplemental Figure S2, which compares the differentially accessible regions between the two cell types:

      We have also added the following sentence to the Results section:

      “Comparison of differentially accessible chromatin between ARPE and HFF revealed that the vast majority of the HCMV-induced changes are specific to one of the two cell types (Supplemental Figure S2).”

      (5) "Of the 23,018 loops present in both infected and uninfected cells, only 10 are differential at a 2-fold cutoff and a false discovery rate (FDR) <0.01."

      We thank the reviewer for drawing our attention to the differential chromatin looping analysis.  Your comment prompted us to re-examine the methodologies we employed to identify differential chromatin looping events between uninfected and infected cells.  In the process, we realized that the relatively low resolution of chromatin looping assays such as HiChIP might require additional care in classifying a particular loop as shared or differential when comparing two experimental conditions. We have thus revamped our differential chromatin looping methodologies by adding 5kb “pads” to either end of each chromatin loop “anchor”.

      The corresponding passage now reads:

      “We next used the HiChIP data to identify HCMV-dependent differential chromatin looping events (see Methods). In total, uninfected cells have 143,882 loops. With HCMV infection, 90,198 of these loops are lost, and 44,045 new loops are gained (Supplemental Dataset 3). Because the number of altered loops was large, we repeated loop calling and differential analysis with FDR values less than 0.05, 0.01, and 0.001 (Supplemental Dataset 3). For all three cutoffs, the percentage of loops specific to an infection state were very similar. We also randomly downsampled the number of input pairs used for calling loops to verify that our results were not due to a difference in read depth (Supplemental Dataset 3). For the three smaller subsets of data, the number of loops specific to an infection state only changed slightly. The full quantification of each chromatin looping event and comparisons of events between conditions are provided in Supplemental Dataset 6.”

      Are these cells asynchronous and how to determine whether certain changes are not due to cell cycle stage differences?

      Cells were plated to an identical density of cells per well before either mock or HCMV infection for this study. Based on the differentially expressed genes cell cycle pathways were not amongst the top 50 enriched molecular pathways.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      […] Weaknesses:

      While there are no glaring weaknesses in this study, it should be noted that a great deal of literature has pinpointed the MPOA (and specifically inhibitory cells in this area) as being critical to sexual behavior, including female mating. However, no study to my knowledge has explored self-paced female mating with such fine control over manipulating and monitoring cellular activity in this region. In addition, this study may act to inspire others to further explore the additional brain regions found to show upregulation of neural activity (Fos) during mating completion in the female using the data sets generated here.

      Reviewer #2 (Public Review):

      […] Weaknesses:

      The authors include an elegant manipulation of ejaculation-activated neurons in the MPOA using DREADD. However, this study was limited to show that activation of previously activated cells was sufficient to reduce approach behavior in a paced mating paradigm and receiving intromissions in a home cage mating paradigm. An inhibition approach using DREADD would have been a great complement to this study as it would have examined if activation of the cells was required. Moreover, additional tests for sexual motivation would have greatly strengthened the overall conclusions.

      Reviewer #3 (Public Review):

      […] Weaknesses:

      (1) Their activity-dependent labeling strategy is not exclusive to mating completion but instead includes all neurons active before, during, and after the social encounter. In the manuscript, the authors did not discuss the time course of Fos activation or the timeframe of the FosTRAP labeling strategy. Fos continues to be expressed and is detectable for hours following neural activation. Therefore, the FosTRAP strategy also labels neurons that were activated 3 hours before the injection of 4-OHT. The original FosTRAP2 paper which is cited in this manuscript (DeNardo et al, 2019) performed a detailed analysis of the labeling window in Supplementary Figure 2 of that paper. Here is quoted text from that paper: "Resultant patterns of tdTomato expression revealed that the majority of TRAPing occurred within a 6-hour window centered around the 4-OHT injection." Thus, the FosTRAP "mating completion" groups throughout this manuscript also include neurons activated 3 hours before mating completion, which includes neurons activated during appetitive and consummatory mating behaviors.

      This makes all of the FosTRAP data very difficult to interpret. Compounding this is the issue that the two groups the authors compare in their experiments are females administered 4-OHT following appetitive investigation behaviors (with the male removed before mating behaviors occurred) and females administered 4-OHT following mating completion. The "appetitive" group labeled neurons activated only during appetitive investigation, but the "completion" group labeled neurons activated during appetitive investigations, consummatory mating bouts, and mating completion. Therefore, in the brain-wide analysis of Figure 2, it is impossible to identify brain regions that were activated exclusively by mating completion and not by consummatory mating behaviors. This could have been achieved if the "completion" group was compared to a group of females that had commenced consummatory mating behaviors but were separated from the male before mating was completed. Then, any neurons labeled by the "completion" FosTRAP but not the "consummatory" FosTRAP would be neurons specifically activated by mating completion. In the current brain-wide analysis experiments, neurons activated by consummatory behaviors and mating completion can not be disassociated.

      This same issue is present in the interpretation of the chemogenetic activation data in Figure 6. In the experiments of Figure 6, the authors are activating neurons naturally activated during consummatory mating behaviors as well as those activated during mating completion.

      We appreciate the reviewers comments and concerns about the TRAP method.

      First, we agree that the FosTRAP method does not have the sensitivity to separate ensembles that happen within a short time window. From our preliminary results, we have observed that the cells that inject 4-OHT after mating completion induce more tdTomato cells in the MPN than injection after appetitive behavior or consummatory behavior (Author response image 1).

      To further compare the difference between the “consummatory” and “completion” ensemble, we included an additional cohort where we TRAP cells responding to consummatory behavior. This cohort is added to Figure 2, 6, S3, S4, S9, S10 and S11. From the whole brain mapping of TRAP cells, we found that many hypothalamic and extended amygdala areas including the medial preoptic area, and the bed nucleus of stria terminalis were shown to have significantly larger tdTomato+ cell density in the completion group than in the appetitive group while there was a tendency that the consummatory group also had larger cell density than the appetitive group. In the Gq-DREADD experiment, we found that the Completion-hM3Dq group but not the Consummatory-hM3Dq group showed the reduction of sexual motivation of the female mouse in the self-paced mating assay (Figure 6). The Completion-hM3Dq group but not the Consummatory-hM3Dq group also showed significantly low intromission events and tended to show lower receptivity in the home cage mating assay (Figure S10). Furthermore, post-hoc histological analysis also showed that the c-Fos+ and TRAP labeled cells in the MPN tended to be the larger in the Completion-hM3Dq group than in the Consummatory-hM3Dq group (Figure S9). These results, together with the in vivo Calcium imaging experiments in Figure 3, 4 and 5, suggests that the MPN contains male-ejaculation responsive cells that are distinct with the male-mounting responsive cells and that they are sufficient to suppress female sexual motivation.

      However, it is true that with the current state of mouse genetic tools, we do not have any methods with higher time accuracy. We have discussed the limitations of FosTRAP method regarding its low time sensitivity in the Discussion section.

      Author response image 1.

      Representative image showing TRAP labeling in the MPN after mating completion and intromission

      (2) This study does not definitively show that the female mice used in this study display decreased sexual motivation after the completion of mating. The females exhibit reduced interaction with males that had also just completed mating, but it is unclear if the females would continue to show reduced interaction time if given the choice to interact with a male that was not in the post-ejaculatory refractory period. Perhaps, these females have a natural preference to interact more with sexually motivated males compared to recently mated (not sexually motivated) males. To definitively show that these females exhibit decreased sexual motivation the authors should perform two control experiments: 1) provide the females with access to a fully sexually motivated male after the females have completed mating with a different male to see if interaction time changes, and 2) compare interaction time toward mated and non-mated males using the self-paced mating assay. These controls would show that the reduction in the interaction time is because the females have reduced sexual motivation and not because these females just naturally interact with sexually motivated males more than males in the post-ejaculatory refractory period.

      We highly appreciate the reviewers comments regarding the interpretation of the self-paced mating assay. To address the concerns, we added an experiment where the female subjects were introduced to a novel sexually motivated male mice in the self-paced mating assay immediately after receiving ejaculation (Figure S2). As result, we found that similar to the self-paced mating assay using the same male animal, the female subject spends significantly more time in the isolation zone on the post-ejaculation day when compared to the pre-ejaculation day.

      (3) It is unclear how the transient 90-second response of these MPOA neurons following the completion of mating causes the prolonged reduction in female sexual motivation that is at the minutes to hours timeframe. No molecular or cellular mechanism is discussed.

      (4) The authors discuss potential cell types and neural population markers within the MPOA and go into some detail in Figure S3. However, their experiments are performed with only the larger excitatory and inhibitory MPOA neural populations.

      While the molecular or cellular mechanism of prolonged activity of MPOA neurons is  critical to understand the neural mechanism of how sustained neural activity in the MPOA suppress female sexual motivation, it is out of the reach of the current manuscript and a subject of future studies. We have added a section in the discussion part to further discuss the potential molecular mechanisms.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      If the authors haven't already, it would be useful if the authors could make the brain-wide analysis of Fos activity publicly available.

      We have distributed the data to https://dandiarchive.org/

      I would also make sure the n's are included in each Figure Legend for each panel (some are missing in the Supplementals).

      We appreciate the comment, we have added the number of subjects to Figure 3, 4, 5.

      It would also be best to provide clearer labels to some of the Figures, for example, Figure 5D, the Types should also be labeled with what behaviors they correspond to.

      We appreciate the comment. Figure 5 is focused on post-ejaculation neural activity. The cell types are categorized based neural activity after experiencing male ejaculation, it does not correspond to any behaviors.

      Reviewer #2 (Recommendations For The Authors):

      (1) A first recommendation is to replace the use of the term "mating completion" with "ejaculation". Male and female rodents display a period of reduced approach behavior following display or experiencing ejaculation, which is referred to as the post-ejaculatory interval. The current studies investigate the neural ensemble that contributes to this post-ejaculatory interval in female mice. In addition, male and female rodents will display a prolonged period of sexual inactivity referred to as satiety, which is typically observed after repeated display or experience of ejaculations. The current studies do not investigate satiety. Moreover, in the current studies, female mice appeared to display approach behavior (time in the interaction zone) even within the 10 minutes following experiencing ejaculation (Fig 1F). Hence, the term "completion" is not accurate and should be replaced by "ejaculation" in all figures and throughout the manuscript. Replacing completion with ejaculation will also clarify what defines "onset of completion", which this reviewer assumes refers to the onset of ejaculatory behavior observed in the male.

      Thank you for the comment. We agree that the mating completion was inappropriate. We have changed the wording to ejaculation or post-ejaculatory period.

      (2) Likewise, a variety of other terms and descriptions need to be adjusted for consistency and accuracy. For example, "room" when referring to the interaction or isolation zones; "onset of mating completion" when referring to ejaculation; "male intruder" to refer to the introduction of the male mating partner, but using a term typically used for an intruder-resident aggression test. Replacing these terms will aid in reducing confusion for the reader and more accurately describe the behavioral parameters.

      We appreciate the comment. We have updated the terms “male intruder” to “partner”, “room” to “area” or “zone”.

      (3) The use of the paced mating paradigm is a strength of these studies. This paradigm has been widely used and validated to study female sexual behavior in rodents. Please refer to recent reviews and landmark papers using this paradigm in addition to the current cited papers to better reflect the vast wealth of studies that previously reported the behavioral data that were replicated in this study.

      We have added a section discussing the self-paced mating assay, its merits and caveats P8.

      (4) In the paced mating test, females can pace the receipt of sexual stimulation, and latencies to withdraw and return to the male-containing chamber are considered indicators of sexual motivation. Female withdrawal will increase with the intensity of the sexual stimulation and latency to return is longer following ejaculation. Paced mating is thus a balance of approach and withdrawal behaviors that increases reward and likelihood of pregnancy for females. Moreover, ejaculation-induced withdrawal and longer latencies to return and approach are altered by hormonal status and by the introduction of a novel male partner. Thus, female sexual behavior is complex and withdrawal behavior (in this paper measured as time spent in an isolation zone) needs to be interpreted with caution and not simply referred to as sexual motivation. I recommend expanding the description of the paradigm to highlight the strengths and limitations of this paradigm and use caution to interpret time spent in the isolation zone as a lack of sexual motivation. I also recommend referring to the period after ejaculation as the post-ejaculatory interval (instead of completion).

      Thank you for the comment. We have changed the wording in the manuscript to adjust the way it refers to sexual motivation.

      (5) In the current paper, time in the isolation zone and the number of transitions are used as the behavioral measures. Latencies, which are typically included in paced mating studies, were missing from the data. If data are available for latencies to withdraw and return to the interaction zone after mount, intromission, and ejaculation, please add these data. If such data were not collected or are not available, please recognize this caveat.

      Thank you for the comment. In figure 1, which all animals did experience male ejaculation, we added latency analysis (Figure 1I and 1P). The result indicates as suggested in the literature, female mice took significantly longer to return the interaction zone after male-ejaculation.

      (6) The brain-wide mapping study of cFos expression after ejaculation confirms and extends prior findings, mostly in rats. Please reference prior papers in female rodents showing cFos after ejaculation and discuss how the current data replicate or differ from prior data.

      In the manuscript P8 L351, we have referred to Pfaus et al., 1993 to discuss the similarity in the c-Fos expression pattern studied in rats. We have further added descriptions to emphasize the similarity between the two datasets.

      (7) A paragraph describing the specific cell types that are activated in the MPOA is an essential part of the study and is described in detail, but only shown in supplementary figures. Given the emphasis on this particular part of the study, a recommendation is to incorporate these data as a regular figure instead of supplementary material.

      While we greatly appreciate the comment, we consider that the molecular characterization of MPOA neurons are not the main focus  of the paper and decided to keep it in the supplementary figure.

      (8) Calcium imaging studies were performed in the home cage for obvious practical reasons. However, in the home cage testing, the females withdraw from the males using a different approach and do not exit an interaction zone through a division. There may also be differences in the male sexual behavior patterns and thus the stimulation that females receive from the male. Yet, it appears that ejaculation induces similar patterns of neural activation in this paradigm. Thus, it is likely that neuron activation is a result of receiving ejaculation, rather than withdraw behavior. Please briefly discuss the comparisons between the cFos and calcium imaging conclusions in these two different paradigms.

      We have added a section discussing the self-paced mating assay, its merits and caveats P8. Withdrawal and latency and its interpretation is discussed in this section.

      (9) The final study includes the manipulation of ejaculation-activated neurons in the MPOA using DREADD. This study was limited to show that activation of previously activated cells was sufficient to reduce approach behavior in a paced mating paradigm and receiving intromissions in a home cage mating paradigm. An inhibition approach using DREADD would have been a great complement to this study as it would have shown if activation of the cells was required. Moreover, additional tests for sexual motivation, such as partner preference tests would have greatly strengthened the results since a lack of entering an interaction zone can also be explained by impaired sensory processing or locomotor behavior. Finally, CNO also appeared to impact time in the isolation zone for a subset of animals in the ejaculation (completion) control group and the appetitive group. These effects didn't reach statistical significance, but groups also had low sample sizes (n=6-7) and may thus have been underpowered. The recommendation is to include these caveats and shortcomings in the discussion of these results.

      We appreciate the comments. We first added an inhibitory approach to show the necessity of MPOA neurons. As result, we found that the inhibition of these neurons did not affect the behavior in the self-paced mating assay but increased the subjects sexual receptivity (Figure S11). For the low sample size, we have added a power analysis in the statistical section.

      (10) The studies utilized ovariectomized females with hormone priming. Since sexual receptivity in females is highly dependent on the hormonal milieu, the authors are encouraged to add an explanation of why ovariectomized females were used and if the results may have differed in cycling females.

      We appreciate the comments. The female subjects used in the TRAP experiment will be needing to experience ejaculation from the male mice twice, once to label the cells, and second during the reactivation. In order to avoid pregnancy during the first experience, we ovariectomized the female and controlled their hormonal conditions. This method has been used successfully in other sexual behavior studies (Yang et al., 2013, Ring., 1944.).  This was described in P11. We have further demonstrated in Figure 1N-T that female mice were not ovariectomized and were under the natural estrus cycle showed similar suppression of sexual interaction after the completion of mating. The manuscript was updated to discuss that the behavior change after mating completion is not dependent on the ovary.

      (11) Overall, the paper lacks references to relevant prior studies. For example, many studies have been reported over the past 2-3 decades about the effects of female rodent sexual behavior on activation in the brain and the effects of different vaginocervical stimulation on pregnancy and fertility. It is absolutely the case that much remains unknown about the complex neural circuitries that control behavior during the post-ejaculatory interval and sexual satiety in both male and female rodents, but studies have indicated roles for hypothalamic areas, bed nucleus of the stria terminals, ventral tegmental area, posterior thalamus, and prefrontal cortex. Hence, the current introduction and discussion do not adequately summarize or acknowledge these prior investigations and therefore place these new findings in the context of what was previously known.

      We appreciate the comment and added references to P2 L65, P8 L355-357 to discuss existing literature about c-Fos mapping analysis after ejaculation or genital stimulation in female rats.

      (12) Finally, sample sizes appear to be modest, ranging n=4-8 (except n=14 in the completion group in Figure S7) and vary between groups within and between studies. Please explain in the methods section how sample sizes were pre-determined and acknowledge if studies may have potentially been underpowered.

      The sample size for behavior experiments in this study were n = 6-9. This was predetermined based on previous studies examining female sexual behavior (Ishii et al. 2017, Liu et al. 2022, Yin et al. 2022). To further examine the number of animals required for our behavioral experiments, we pooled data used in this study and conducted a power analysis (n = 111 pooled data, control n = 94, stim n = 17). We conducted a power analysis using the variance calculated from pooled average time in isolation zone. These data were pooled from control animals in each experiment (eg. animals with GFP control virus injected, saline injected, etc.). The average time in isolation zone in the after ejaculation or after reactivating the completion cells was 420 ± 210 seconds, and 49 ± 91 seconds in the control group (mean ± s.d.). Within this population, we found that 5 animals were sufficient to detect the difference (p < 0.05, power = 0.8) in Students t-test. We have added this explanation in the supplemental experimental procedure, page P18, line 817-827.

      Reviewer #3 (Recommendations For The Authors):

      The authors should discuss the fact that the FosTRAP2 strategy labels neurons activated 3 hours before the 4-OHT injection. As the manuscript is written, it seems to suggest that the 4-OHT injection given following mating completion only labeled neurons activated during mating completion. This is very misleading. I respect the amount of work and rigor that went into these experiments. The single-cell imaging, implementation of the FosTRAP strategy, and behavioral analysis are all well executed. Novel insights into the neural regulation of female sexual drive can be gleaned from the neural imaging experiments. Unfortunately, the limitations of the FosTRAP strategy make those studies very difficult to interpret, and therefore, a more candid discussion and re-interpretation of the data from the FosTRAP experiments is needed.

      We appreciate the reviewers comments and concerns about the TRAP method.

      First, we agree that the FosTRAP method does not have the sensitivity to separate ensembles that happen within a short time window. From our preliminary results, we have observed that the cells that inject 4-OHT after mating completion induce more tdTomato cells in the MPN than injection after appetitive behavior or consummatory behavior (Author response image 1).

      To further compare the difference between the “consummatory” and “completion” ensemble, we included an additional cohort where we TRAP cells responding to consummatory behavior. This cohort is added to Figure 2, 6, S3, S4, S9, S10 and S11. From the whole brain mapping of TRAP cells, we found that many hypothalamic and extended amygdala areas including the medial preoptic area, and the bed nucleus of stria terminalis were shown to have significantly larger tdTomato+ cell density in the completion group than in the appetitive group while there was a tendency that the consummatory group also had larger cell density than the appetitive group. In the Gq-DREADD experiment, we found that the Completion-hM3Dq group but not the Consummatory-hM3Dq group showed the reduction of sexual motivation of the female mouse in the self-paced mating assay (Figure 6). The Completion-hM3Dq group but not the Consummatory-hM3Dq group also showed significantly low intromission events and tended to show lower receptivity in the home cage mating assay (Figure S10). Furthermore, post-hoc histological analysis also showed that the c-Fos+ and TRAP labeled cells in the MPN tended to be the larger in the Completion-hM3Dq group than in the Consummatory-hM3Dq group (Figure S9). These results, together with the in vivo Calcium imaging experiments in Figure 3, 4 and 5, suggests that the MPN contains male-ejaculation responsive cells that are distinct with the male-mounting responsive cells and that they are sufficient to suppress female sexual motivation.

      However, it is true that with the current state of mouse genetic tools, we do not have any methods with higher time accuracy. We have discussed the limitations of FosTRAP method regarding its low time sensitivity in the Discussion section.

      Editor notes:

      Should you choose to revise your manuscript, please include full statistical reporting in the main text including test statistic, degrees of freedom, an exact P value.

      Thank you for the comment. The statistical values were added to the manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Strengths:

      The authors embarked on an ambitious journey to seek the answer regarding 3D genome changes predisposing to metastatic organotropism. The authors succeeded in the assembly of a comprehensive panel of breast cancer cell lines and the aggregation of the 3D genome structure data to conduct a hypothesis-driven computation analysis. The authors also achieved in including proper controls representing normal non-cancerous epithelium and the end organ of interest. The authors did well in the citation of relevant references in 3D genome organization and EMT.

      Weaknesses:

      (1) The authors should clearly indicate how they determine the patterns of spread of the breast cancer cell lines being utilized in this manuscript. How did the authors arrive at the conclusion that certain cell lines would be determined as "localized spread" and "metastatic tropism to the lung"? This definition is crucial, and I will explain why.

      It is indeed a critical point to clearly define and explain what qualifies as metastatic potential to particular organs in our system. Here, we intentionally limited our scope to metastasis that had occurred within the human system. Our cell lines are chosen based on their sites of origin and etiological history in the patients from which they were derived. For example, the cancer cell line BT474 was classified as “localized” because these cells were derived from a solid tumor in the breast itself. Meanwhile, MCF7 and T47D cell lines are considered lung metastatic because these cells were collected from the pleural effusion from the lung. We therefore model human organotropism from the breast to the lung by using cells that originated from infiltrative ductal carcinoma (human breast) but were collected from pleural effusions (human lung). We then use as a comparison a human lung cancer-derived cell line that was itself purified from a pleural effusion. In this way, we can compare the genome structure of a lung cancer cell in the lung environment to a breast cancer cell that has metastasized to the lung environment.

      In our revised version, we further clarify this definition in the text as well as in additional annotations in our supplemental table of all cell line information.

      Todd Golub's team from the Broad Institute of MIT and Harvard published "A metastasis map of human cancer cell lines" to exhaustively create a first-generation metastasis map (MetMap) that reveals organspecific patterns of metastasis. (By the way, this work was not cited in the reference in this manuscript.) The MetMap Explorer (https://depmap.org/metmap/vis-app/index.html) is a public resource that could be openly accessed to visualize the metastatic potential of each cell line as determined by the in vivo barcoding approach as described in the MetMap paper in the format of petal plots. 5 organs were tested in the MetMap paper, including brain, lung, liver, kidney, and bone. The authors would discover that some of the organ-specific metastasis patterns defined in the MetMap Explorer would be different from the authors' classification. For example, the authors defined MCF7 as a line as lung metastatic, and rightly so the MetMap charted a signal towards lung with low penetrance and low metastatic potential. The authors defined ZR751 as a line with localized spread, however, the MetMap charted a signal towards the kidney with low penetrance and low metastatic potential, the signal strength similar to the lung metastasis in MCF7. A similar argument could be made for T47D. The TNBC line MDA-MB-231 is indeed highly metastatic, however, in MetMap data, its metastasis is not only specific to the lung but towards all 5 organs with high penetrance and metastatic potential. The 2 lung cancer cell lines mentioned in this study, A549 and H460, the authors defined them as localized spread to the lung. However, the MetMap data clearly indicated that A549 and H460 are highly metastatic to all 5 organs with high penetrance and high metastatic potential.

      We acknowledge the valuable contributions of animal models in metastatic cancer studies, but we also want to avoid the potentially confounding variable of the animal microenvironment. The MetMap Explorer contains valuable information (and as part of our clarification on this point, we now cite the MetMap in the text), but the “metastatic potential of each cell line” for this tool is measured in a mouse environment. Knowing that a particular cell line, which originated from a human lung metastasis, can further metastasize to other organs in a mouse does not necessarily mean that those cells could do so in humans. The microenvironment responses to metastatic colonization recapitulate the events in wound repair, and these can differ among species (https://pubmed.ncbi.nlm.nih.gov/28916657/ https://pubmed.ncbi.nlm.nih.gov/39729995/ ). Further, the changes a cell needs to make to adapt to a new organ system in a mouse could be confounded by the changes needed to adapt to mouse conditions in general. Finally, migration from a site of ectopic injection may not mimic migration from an initial tumor site. These factors lead to well known cases where MetMap does not reflect the metastatic potential of cancers in humans. As a classic example, prostate cancer frequently metastasizes to bone in humans, and the PC3 cell line was derived from a bone metastatic prostate cancer. However, MetMap shows no evidence of PC3 being able to metastasize to bone in a mouse.

      We agree that the very best data would come from matched primary and metastatic tumors in the same human patient, but those data do not currently exist and generating them would require future work beyond the scope of this study.

      Since results will vary among different experimental models testing metastatic organotropism, (intracardiac injection was the metastasis model being adopted in the MetMap), the authors should state more clearly which experimental model system served as the basis for their definition of organ-specific metastasis. In my opinion, this is the most crucial first step for this entire study to be sound and solid.

      Taking all the above into account, in our revision, we have now included further clarification in the main text to more clearly explain how and why we chose the cell lines we did and what the advantages and limitations of this choice are.

      (2) Figure 1b: The authors found that "MDA-MB-231 cells were grouped with the lung carcinoma cells. This implies that the genome organization of this cell line is closer to that of lung cells than to other breast epithelial cell lines.". In fact, another TNBC line BT549 was also clustered under the same clade. So this clade consisted of normal-like and highly metastatic lines. Therefore, the authors should be mindful of the fact that the compartment features might not directly link to metastasis (or even metastatic organotropism).

      In figure 1b, the grouping that includes MDA-MB-231 (lung metastatic breast cancer) connected to A549, and H460 (lung cancer) occurs at a distance of about 0.2. If the clustering tree were cut at a distance of 0.26, 6 separate clusters would result: two clusters of Luminal subtypes (all labeled red), one that includes all healthy epithelial cells (both lung and breast, all labeled green), one that links two localized breast cancers, one that links MDA-MB-231 to lung carcinoma cell lines, and then BT549 by itself. So, while BT549 appears next to MDA-MB-231 along the horizontal axis, this is just coincidence of the representation: the dendrogram shows it is quite distant from all the other cell lines in this cluster according to compartment profile.

      So, it is only MDA-MB-231 that is very closely linked with the lung cancer cell types.

      It is true that the healthy lung cells (HTBE) are clustered separately and are more similar to normal/non tumorigenic breast epithelial cells (HMEC and MCF10A) than to any cancer cell type. This could suggest that there are aspects of the compartment pattern that represent any healthy epithelium as compared to cancer. What we find in the compartment profile, in both the clustering and the PCA analysis, is that compartment signatures contain information about cell properties on several overlapping levels: there is an aspect of the compartment profile that distinguishes healthy from cancerous cells, an aspect that distinguishes luminal cancers from other subtypes, a part that associates with organotropism, and an aspect that captures EMT status. The final compartment status is a composite of these numerous factors.

      We have clarified the text to indicate that we mean MDA-MB-231 clusters near lung cancer, not necessarily healthy lung cell models.

      (3) Figure 3: In the text, the authors stated, "To further investigate this result, we examined the transcription status of genes that changed compartment across the EMT spectrum and, conversely, the compartment status of genes that changed transcription (Fig. 3b, c, and d)". However, it was not apparent in the figure that the cell lines were arranged according to an EMT spectrum.

      To display these comparisons more clearly, we have now revised figure 3b, c, and d in two ways: First, we have defined the gene and cell line clustering by one set of data (for example, compartment identity in 3b) and then displayed the other data (gene expression) with all genes and cell lines in the same order. Therefore, for each column, genes and cell lines can be compared visually between top and bottom rows. Second, we have colored cell line names from purple to yellow according to their EMT scores as shown in Supplementary Figure 1a. This allows a visual indication of how the clustering separates cell lines by EMT status.

      Also, the clustering heatmaps did not provide sufficient information regarding the genes with concordant/divergent compartments vs transcription changes. It would be more informative if the authors could spend more effort in annotating these genes/pathways.

      We want to clarify that the genes plotted in the heatmaps in Figure 3 are also the genes whose functional enrichment we present in figures 1 and 2. So, the genes that segregate strongly based on A/B compartment (but not gene expression) in figure 3b are the same genes whose GO terms are annotated in Figure 1d. Likewise, the genes that segregate strongly based on gene expression, but not A/B compartment, in figure 3c and d are the same genes whose GO terms are annotated in Figure 2b. We have now made this connection clearer in the text.

      But, we also agree with the reviewer that it is important to explore a bit further the relationship between these divergent sets of genes. Our explorations have led to several observations:

      (1) In some cases, the compartment-segregated genes and the transcription-segregated genes are different members of the same pathways. In Author response image 1 below, for example, we show interactions (according to STRING) for genes from figure 3c that are highly expressed in the epithelial-like cell lines and are annotated as involved in epithelial development (green). We then added to the network genes from figure 3b that are specifically in the A compartment in the epithelial-like cell lines but not mesenchymal cell lines that are also annotated as involved in epithelial development (red). Most of these epithelial development genes that change expression are in the A compartment in all cell lines and therefore do not rely on spatial compartment changes for their regulation. But some additional epithelial development genes, which are interconnected in this same network, are changing compartments across the EMT spectrum. One example, FOXA1, is a key hub in the network and is known to be a pioneer transcription factor involved in development and differentiation. Controlling this gene at the level of spatial genome organization rather than local transcriptional control could be important in the stable cell fate changes that can happen with EMT.

      Author response image 1.

      (2) Overall, the set of genes that change compartments does not have as strong functional enrichment as the transcription change set of genes. This could indicate that some of the compartment changes that occur with EMT are not directly gene regulatory but rather enable an overall conformational change of the chromatin that is needed for the alterations in physical cell state or to accomplish long distance gene regulation changes.

      (3) Related to long distance gene regulation changes, we also see cases in which the gene that changes transcription but not compartment across EMT is adjacent to regions that switch compartments.

      A good example is TFF3 (yellow, Supplementary figure 1C). TFF3 is one of the genes that strongly segregates across EMT by transcription, being more highly expressed in epithelial-like (bottom 4 tracks) but not mesenchymal-like (top 4 tracks) cancers. Despite this differential expression, it is almost always in the A compartment across all cell lines. However, it is adjacent to regions that show strong compartment change EMT signatures. So, even though this specific gene region is not changing compartment, its regulation may be influenced by the entire region being Aassociated in epithelial-like but neighboring regions becoming B-associated in mesenchymal like cancers.

      TFF3 is expressed in normal breast epithelium and has been implicated as a biomarker for endocrine therapy response in breast cancer.

      Meanwhile, many genes that are in these compartment switching regions (BACE2, DSCAM, PDE9A) are not among the strongest expression signature genes.

      (4) Interestingly, some of the regions (such as the region shown in Supplementary figure 1C) that change compartment across the breast cancer spectrum overlap with regions that we found change compartment in the progression of prostate cancer, as shown in the string.db enrichment analysis below.

      Author response image 2.

      In our revised manuscript, we now include more of these explanations in the text and include the example offset compartment and transcription change region shown about as panel c of Supplementary Figure 1.

      (4) Figure 4: The title of the subheading of this section was 'Lung metastatic breast cancer cell lines acquire lung-like genome architecture". Echoing my comments in point 1, I am a bit hesitant to term it as "lung metastatic" but rather "metastatic' in general since cell lines such as MDA-MD-231 do metastasize to other organs as well. However, I do get the point that the definition of "lung metastasis" is derived from the common metastasis features among the cell lines here (MCF7, T47D, SKBR3, MDAMB-231). There might be another argument about whether the "lung" carcinoma cell lines can be considered "localized" since they are also capable of metastasizing to other organs.

      Rather than classifying cells on metastatic “potential” (as measured in a mouse), our cell lines are chosen based on their sites of origin and etiological history in the patients from which they were derived. Cancer cell lines called “lung metastasis” were collected from the pleural effusion from the human lung. Likewise, we call a cancer “localized” because it was taken from the tissue where the cancer originated, even if it might, if placed into a different context, be able to metastasize. We would argue that the genome structure features of the “localized” cancers reflect cancers that have not yet metastasized (even if they could in the future) while the “metastatic” cancers have already gone to a certain location (even if they could in theory have gone to a different location).

      In a way, what the authors probably were trying to leverage here is the "tissue" identity of that organ.

      Having said this, in addition to showing the "lung permissive changes", the authors should show the "breast identity conservation" as well. Because this section started to deal with the concept of "tissue/lineage identify", the authors should also clarify whether these breast cancer cell lines capable of making lung metastasis are also preserving their original tissue identity from the compartment features (which would most likely be the case).

      This is a great question. We have now more explicitly checked the proportions of genomic regions that change compartments to match lung vs. maintaining breast-specific compartment identity. The graphs in Supplementary Figure 2 begin with all genomic bins that have distinctive compartment identity between non-cancerous breast and lung epithelial cells. Then, the plots show what fraction of these tissue-specific bins change compartment to match lung vs. maintaining breast identity in each breast cancer cell line category. As we have shown in other graphs, particularly for switches to the A compartment, more bins change to match lung in the metastatic vs. primary site cell lines. In most cases, more than 50% of the tissue-specific bins shift to look more like lung.

      (5) Rest of the sections: The authors started to claim that the organ-specific metastasis permissive compartmental features mimic the destinated end organ. The authors utilized additional non-breast cancer cell lines (prostate cancer cell lines LNCaP as localized and DU145 as brain metastatic) in brain metastasis to strengthen this claim. (DU145 in MetMap again is highly metastatic to lung, brain, and kidney). However, this makes one wonder that for cell lines that are capable of metastasizing to multiple organ sites (eg. MDA-MB-231, DU145, A459, H460), does it mean that they all acquire the permissive features for all these organs? This scenario is clinically relevant in Stage 4 patients who often present with not only one metastatic lesion in one single organ but multiple metastatic lesions in more than one organ (eg. concomitant liver and lung metastasis). Do the authors think that there might be different clones having different tropism-permissive 3D genome features or there might be evolutionary trajectory in this?

      In my opinion, to further prove this point, the authors might need to consider doing in vivo experiments to collect paired primary and organ-specific metastatic samples to look at the 3D genome changes.

      We agree that an ideal experimental follow up to this study would be to collect paired metastatic and primary tumors, either in mouse xenograft or, even better, from patients. This is beyond the scope of what we can do for our current paper, but we have added a statement to the discussion of further experiments that would be required to clarify this point.

      (6) Technically, the study utilized public Hi-C data without generating new Hi-C data. The resolution of the Hi-C data for compartments was set at 250KB as the binning size indicating that the Hi-C data was at lower resolution so it might not be ideal to address other 3D genome architecture changes such as TADs or long-range loops. It is therefore unknown whether there might be permissive TAD/loop changes associated with organotropism and this is the limitation of this study.

      Our decision to focus on A/B compartmentalization rather than TAD or loop structure in this analysis was intentional and biologically motivated, rather than solely being a reflection of data resolution. Both compartments and topologically associated domains (TADs) are key parts of genome organization and disruption of these structures has the potential to alter downstream gene regulation, as shown by numerous studies. However, compartments have been found, more so than TADs, to be strongly associated with cell type and cell fate. Therefore, in this manuscript, we decided to focus only on the compartment organization changes between different healthy and cancerous cells as they are more likely to represent the stable alterations of the genome organization malignant transformations.

      (7) In the final sentence of the discussion the authors stated "Overall, our results suggest that genome spatial compartment changes can help encode a cell state that favors metastasis (EMT)". The "metastasis (EMT)" was in fact not clearly linked inside the manuscript. The authors did not provide a strong link between metastasis and EMT in their result description. It is also unclear whether the EMTassociated compartment identity would also correlate with the organotropic compartment identity.

      We agree that this statement involves too strong of an assumption. The literature on this topic is vast and complex, and while there is abundant evidence that pathways of EMT can play important roles in facilitating metastasis, there are other pathways at play in the metastatic process as well (https://journals.plos.org/Plosbiology/article?id=10.1371/journal.pbio.3002487). We have made a clearer statement about this in the text now.

      To address the question of whether the organotropic changes related to the EMT changes, we calculated the overlap between the genomic bins that strongly segregated cell lines in the compartment principal component analysis (PC1) with those that showed “organotropic” changes. As you can see in supplementary table 3, this overlap is actually very small, where only 3% of bins are important both for the EMT segregation of cell lines and organotropism.

      We have now included this overlap information as supplementary table 3 and have addressed this in the text.

      Reviewer #2 (Public review):

      Summary:

      This work addresses an important question of chromosome architecture changes associated with organotopic metastatic traits, showing important trends in genome reorganization. The most important observation is that 3D genome changes consistent with adaptations for new microenvironments, including lung metastatic breast cells exhibiting signatures of the genome architecture typical to a lung cell-like conformation and brain metastatic prostate cancer cells showing compartment shifts toward a brain-like state.

      Strengths:

      This work presents interesting original results, which will be important for future studies and biomedical implications of epigenetic regulation in norm and pathology.

      Weaknesses:

      The authors used publicly available data for 15 cell types. They should show how many different sources the data were obtained from and demonstrate that obtained results are consistent if the data from different sources were used.

      In our revised version, we have provided a clarified table of information about all the publicly available data used from all the cell lines, indicating the sources of the data. The 17 datasets used come from 8 different studies. So, indeed, the reviewer is correct that many different sources of data were used. To address the question of whether our results would be consistent if data from different sources were used, we created a comparison map of the A/B compartment profiles for data from multiple sources when it was available. You can see below that the Hi-C data from different sources for the same cell lines cluster quite closely and show high correlation and are well separated from different cell lines. So, we do not think that source batch effects play a major role in our results.

      Author response image 3.

      Recommendations for the authors:  

      Reviewer #1 (Recommendations for the authors):

      (1) Figure 1a: This figure could be re-formatted without the arrows. Arrows usually indicate upstreamto-downstream relationships along certain processes. Using arrows here would mislead people to think that the cell lines were derived from one another. The same could apply to the supplementary figures.

      We have now edited figure 1a to include lines linking cell lines, indicating conceptual relationships, rather than arrows, which would imply direct derivation.

      (2) Figure 1c: The PCA (PC2 axis) indeed seemed to separate the HER2 status quite well. One concern is MCF7, it is labeled as ERpos/HER2neg in MetMap but seems to be clustered as HER2pos in this study. Are they the same? (This again highlights the importance of cell line definition and annotation).

      It is a good point that MCF7, while generally considered HER2 negative (we indicate this negative status in Supplementary Table 1), falls near HER2 positive cells in PCA space. This indicates that PCA captures tendencies but is not a perfect classifier. In a high dimensional, complex system, it is expected that an unsupervised analysis such as this will not capture just one biological feature in a given principal component, and therefore something like HER2 status may not segregate perfectly. However, this analysis does suggest that MCF7 3D genome structure has features that are more similar to other HER2+ cell lines. This raises the interesting possibility that it may actually behave like HER2+ cells in some ways even while being HER2- itself. We have more clearly stated the MCF7 discrepancy in the text.

      Reviewer #2 (Recommendations for the authors):

      (1) The description of results can be shortened, to make it easier to read and understand.

      In our revision, we have tried to clarify where possible, but it was difficult to shorten without losing important caveats and context (especially to make important points emphasized by reviewer 1).

      (2) "100 most positive and negative eigenvalues for PC1" - please provide the correct description.

      We have altered this to make it clearer and more correct: “using the genes from the regions with the top 100 most positive and 100 most negative eigenvector loadings for this PC1”

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This is an interesting theoretical study examining the viability of Virtual Circular Genome (VCG) model, a recently proposed scenario of prebiotic replication in which a relatively long sequence is stored as a collection of its shorter subsequences (and their compliments). It was previously pointed out that VCG model is prone to socalled sequence scrambling which limits the overall length of such a genome. In the present paper, additional limitations are identified. Specifically, it is shown that VCG is well replicated when the oligomers are elongated by sufficiently short chains from ”feedstock” pool. However, ligation of oligomers from VCG itself results in a high error rate. I believe the research is of high quality and well written. However, the presentation could be improved and the key messages could be clarified.

      Strengths:

      High-quality theoretical modeling of an important problem is implemented.

      Weaknesses:

      The conclusions are somewhat convoluted and could be presented better.

      (1) It is not clear from the paper whether the observed error has the same nature as sequence scrambling.

      We thank the Reviewer for pointing out that this important point was not clearly explained. The sequence errors observed in our model are indeed of the same nature as sequence scrambling previously identified by Chamanian and Higgs (Chamanian and Higgs, PLoS Comp Biol 2022). The core issue is the ligation of two oligomers representing non-adjacent segments of the genome sequence, leading to the formation of ”chimeric” products that are not part of the desired genome.

      Our analysis identifies the ligation of VCG oligomers (V+V reactions) as the primary mechanism driving sequence scrambling. This allowed us to propose two strategies to mitigate sequence scrambling: (i) tuning the length and concentration of the VCG oligomers, and (ii) considering scenarios where only feedstock monomers contribute to elongation (non-reactive VCG oligomers). We modified the Introduction and Results section of our manuscript to convey this connection more clearly.

      (2) The authors introduce two important lengths LS1 and LS2 only in the conclusions and do not explain enough which each of them is important. It would make sense to discuss this early in the manuscript.

      We agree with the Reviewer and have followed the suggestion to introduce the two important length scales earlier in the manuscript (in the Model section of the main text). In the updated version, we refer to these length scales as the exhaustive coverage length L<sub>E</sub> (formerly LS1) and the unique subsequence length L<sub>U</sub> (formerly LS2). The exhaustive coverage length L<sub>E</sub> is defined as the maximum motif length for which all possible sequences of that length appear somewhere in the genome. In contrast, the unique subsequence length L<sub>U</sub> is the minimum motif length such that each subsequence of that length occurs only once in the genome, thus giving each motif a unique ”address”.

      Generally, a genome of length L<sub>G</sub> contains at most 2L<sub>G</sub> distinct subsequences, implying that L<sub>E</sub> can be at most , and L<sub>U</sub> must be at least , where ⌊...⌋ and ⌈...⌉ denote the next lower and higher integer, respectively. While the previous version of the manuscript focused exclusively on the limiting case L<sub>E</sub> \= L<sup>max</sup><sub>E</sub> and L<sub>U</sub> \= L<sup>min</sup><sub>U</sub> , we have extended our analysis to genomes with a broader range of L<sub>E</sub> and L<sub>U</sub> values the revised manuscript.

      This extended analysis reveals that, for accurate and efficient replication, the VCG oligomer length must always exceed L<sub>U</sub>, regardless of the choice of L<sub>E</sub>. The required margin beyond L<sub>U</sub> depends on the distribution of intermediate-length motifs (i.e., with L<sub>E</sub> < L < L<sub>U</sub>), but is typically only a few nucleotides.

      (3) It is not entirely clear why specific length distribution for VCG oligomers has to be assumed rather than emerged from simulations.

      We have integrated these new findings into the Results section of the main text and expanded the discussion of their implications for the prebiotic relevance of the VCG scenario in the Discussion section. Full methodological details are provided in the Supplementary Material (Sections S1 and S8).

      We thank the Reviewer for this insightful question. Our choice to assume specific length distributions for VCG oligomers is motivated by both conceptual and practical considerations. We explain our reasoning more clearly in the revised manuscript, in the beginning of the Model section of the main text.

      Conceptually, our study focuses on the propagation of sequence information by an already-formed VCG, rather than its emergence from a random pool. As discussed by Chamanian and Higgs, the spontaneous formation of a VCG from randomly interacting oligomers is a rare event. Our aim is to understand whether, once formed, such a structure can robustly replicate under prebiotic conditions. This question is best addressed when the genome and the oligomer pool (including their lengths and concentrations) can be systematically controlled.

      From a practical standpoint, working with a controllable pool of oligomers facilitates direct comparison to recent experimental studies that use predefined and well-characterized oligomer pools (Ding et al. JACS 2023). With our current methods and realistic rate constants, simulating the emergence of such pools from simple building blocks (e.g., monomers and dimers) would be computationally prohibitive, due to the low ligation rate. For example, in a system containing monomers (concentration 0.1mM) and octamers (concentration 1µM) in a volume of V = 3.3µm<sup>3</sup>, simulating the time between two ligation events takes over 300 hours of compute time (see SI Fig. S2). This renders dynamic pool generation unfeasible for the scope of our study.

      (4) Furthermore, the problem has another important length, L0 that is never introduced or discussed: a minimal hybridization length with a lifetime longer than the ligation time. From the parameters given, it appears that L0 is sufficiently long (∼ 10 bases). In other words, it appears that the study is done is a somewhat suboptimal regime: most hybridization events do not lead to a ligation. Am I right in this assessment? If that is the case, the authors might want to explore another regime, L_0 < LS_1, by considering a higher ligation rate.

      Indeed, we assume that the ligation rate is smaller than both the hybridization and dehybridization rates for any oligomer typically included in the pool (up to length 10). In terms of effective length scales, this corresponds to L<sub>0</sub> ≈ 10nt, with L<sub>0</sub> defined as stated by the Reviewer, i.e., the hybridization length corresponding to a lifetime comparable to the ligation time. Most of our analysis actually exploits the small ligation rate, by employing an adiabatic approximation in which ligation is assumed to be slower than any hybridization or dehybridization process in the pool irrespective of oligomer length. As the Reviewer states, in this regime most hybridization events are transient, and will not result in ligation, since the complexes typically dissociate before ligation can occur.

      While we agree that this assumption limits the overall yield of replication, it has a beneficial effect on replication fidelity. Oligomers that hybridize with mismatches tend to unbind more quickly due to the destabilizing effect of mismatches. In the slow-ligation regime, such complexes are likely to dissociate before a ligation can occur, preventing the formation of incorrect products. In contrast, if the ligation rate was comparable to the unbinding rate of mismatched hybrids, these incorrect associations could undergo ligation, thereby lowering the fidelity of replication. We thus view the regime L<sub>0</sub> > L<sub>V</sub> as more favorable for studying the error-suppressing potential of the VCG mechanism, though we acknowledge that exploring the effects of faster ligation rates is an interesting question for future work.

      Reviewer #2 (Public review):

      Summary:

      This important theoretical and computational study by Burger and Gerland attempts to set environmental, compositional, kinetic, and thermodynamic constraints on the proposed virtual circular genome (VCG) model for the early non-enzymatic replication of RNA. The authors create a solid kinetic model using published kinetic and thermodynamic parameters for non-enzymatic RNA ligation and (de)hybridization, which allows them to test a variety of hypotheses about the VCG. Prominently, the authors find that the length (longer is better) and concentration (intermediate is better) of the VCG oligos have an outsized impact on the fidelity and yield of VCG production with important implications for future VCG design. They also identify that activation of only RNA monomers, which can be achieved using environmental separation of the activation and replication, can relax the constraints on the concentration of long VCG component oligos by avoiding the error-prone oligo-oligo ligation. Finally, in a complex scenario with multiple VCG oligo lengths, the authors demonstrate a clear bias for the extension of shorter oligos compared to the longer ones. This effect has been observed experimentally (Ding et al., JACS 2023) but was unexplained rigorously until now. Overall, this manuscript will be of interest to scientists studying the origin of life and the behavior of complex nucleic acid systems.

      Strengths:

      • The kinetic model is carefully and realistically created, enabling the authors to probe the VCG thoroughly.

      • Fig. 6 outlines important constraints for scientists studying the origin of life. It supports the claim that the separation of activation and replication chemistry is required for efficient non-enzymatic replication. One could easily imagine a scenario where activation of molecules occurs, followed by their diffusion into another environment containing protocells that encapsulate a VCG. The selective diffusion of activated monomers across protocell membranes would then result in only activated monomers being available to the VCG, which is the constraint outlined in this work. The proposed exclusive replication by monomers also mirrors the modern biological systems, which nearly exclusively replicate by monomer extension.

      • Another strength of the work is that it explains why shorter oligos extend better compared to the long ones in complex VCG mixtures. This point is independent of the activation chemistry used (it simply depends on the kinetics and thermodynamics of RNA base-pairing) so it should be very generalizable.

      We thank the Reviewer for the careful assessment of our work and this concise summary of our main points.

      Weaknesses:

      • Most of the experimental work on the VCG has been performed with the bridged 2aminoimidazolium dinucleotides, which are not featured in the kinetic model of this work. Oher studies by Szostak and colleagues have demonstrated that non-enzymatic RNA extension with bridged dinucleotides have superior kinetics (Walton et al. JACS 2016, Li et al. JACS 2017), fidelity (Duzdevich et al. NAR 2021), and regioselectivity (Giurgiu et al. JACS 2017) compared to activated monomers, establishing the bridged dinucleotides as important for non-enzymatic RNA replication. Therefore, the omission of these species in the kinetic model presented here can be perceived as problematic. The major claim that avoidance of oligo ligations is beneficial for VCGs may be irrelevant if bridged dinucleotides are used as the extending species, because oligo ligations (V + V in this work) are kinetically orders of magnitude slower than monomer extensions (F + V in this work) (Ding et al. NAR 2022). Formally adding the bridged dinucleotides to the kinetic model is likely outside of the scope of this work, but perhaps the authors could test if this should be done in the future by simply increasing the rate of monomer extension (F + V) to match the bridged dinucleotide rate without changing rate of V + V ligation?

      We thank the Reviewer for this insightful comment. Indeed, we did not design our model to specifically describe the use of bridged 2-aminoimidazolium dinucleotides as feedstock for the VCG scenario. Adding the bridged dinucleotides to our model would require allowing for feedstock that effectively changes its length during the ligation reaction. As anticipated already by the Reviewer, this is outside the scope of our current modeling framework, which was chosen to explore the generic issue of sequence scrambling in the VCG scenario without distinguishing between different types of activation chemistries.

      Along the lines of the Reviewer’s suggestion, we clarified in the revised manuscript that we consider two limiting cases out of a family of models with two different ligation rate constants, k<sub>lig,1</sub> for ligations involving a monomer and k<sub>lig,>1</sub> for ligations involving no monomer, allowing for kinetic discrimination between these processes. We consider the two limiting cases where either k<sub>lig,1</sub> = k<sub>lig,>1</sub> or k<sub>lig,1</sub>/k<sub>lig,1</sub> → 0. The latter case, captures the behavior expected from an activation chemistry that enables fast primer extension but slow ligation, thereby suppressing sequence scrambling via V+V ligation events. The corresponding results, presented in Figure 6 and 7, indeed show that the VCG replication efficiency approaches 100% for pools that are rich in VCG oligomers.

      Our coarse-grained model, which does not explicitly describe the activation chemistry, was sufficient to capture important kinetic and thermodynamic constraints of the VCG scenario, and to qualitatively explain the experimental observation of a preferential extension of short over long VCG oligomers (Fig. 7B). For future work, we plan to extend our model to account for the activation chemistry in detail, to allow for a more quantitative comparison between theory and experiment.

      • The kinetic and thermodynamic parameters for oligo binding appear to be missing two potentially important components. First, base-paired RNA strands that contain gaps where an activated monomer or oligo can bind have been shown to display significantly different kinetics of ligation and binding/unbinding than complexes that do not contain such gaps (see Prywes et al. eLife 2016, Banerjee et al. Nature Nanotechnology 2023, and Todisco et al. JACS 2024). Would inclusion of such parameters alter the overall kinetic model?

      We thank the Reviewer for highlighting these recent studies. Todisco et al. (JACS 2024) report that complexes with gaps are well described by standard nearest-neighbor models, while stacking interactions at nick sites confer additional stability beyond these predictions. Our model is therefore expected to capture the thermodynamics of complexes with gaps accurately, but likely underestimates the stability of complexes containing nicks. In the VCG pool, all productive ligation complexes (F+F, F+V, V+V) inherently contain a nick and thus benefit from this stabilization, whereas unproductive complexes typically do not. The added stability is expected to increase the residence time of oligomers in productive complexes, thereby enhancing overall extension rates. However, since this stabilization applies uniformly across all productive complexes, it does not shift the relative contributions of different ligation pathways (in particular, correct vs. incorrect).

      This reasoning assumes that hybridization and dehybridization occur on timescales faster than ligation or primer extension. It is conceivable that this separation of timescales does not hold, particularly for oligomers binding to templates with gaps, where association is slower due to steric hindrance, while dissociation is further slowed by stabilizing nicks. As a result, the residence time of such complexes can become comparable to (or longer than) the ligation timescale. We now discuss this aspect more thoroughly in the revised Results and Discussion sections. Capturing the resulting effects in our analytical framework would require relaxing the adiabatic assumption, which is beyond the scope of this work. We recognize the relevance of the non-adiabatic regime of the dynamics, and hope to explore this regime in follow-up work.

      • Second, it has been shown that long base-paired RNA can tolerate mismatches to an extent that can result in monomer ligation to such mismatched duplexes (see Todisco et al. NAR 2024). Would inclusion of the parameters published in Todisco et al. NAR 2024 alter the kinetic model significantly?

      In contrast to complexes with nicks and gaps, mismatched complexes (Todisco et al. NAR 2024) will decrease replication fidelity relative to the results presented in our manuscript. Our current model assumes perfect base pairing, such that replication errors arise only from binding events involving regions too short to reliably identify the correct genomic position (sequence scrambling). Allowing mismatches will indeed introduce an additional error mechanism via imperfect yet sufficiently stable duplexes, thereby increasing the rate of incorrect extensions. However, we expect this effect to be limited. Due to the thermodynamic cost of internal loops, mismatched duplexes most often have their mismatches near the ends of the hybridized region, where their destabilizing effect is weakest (Todisco et al. NAR 2024). Terminal mismatches at the 3’end of the primer have been shown to reduce the primer-extension rate significantly via a stalling effect (Rajamani et al. JACS 2010, Leu et al. JACS 2013). Hence, we would expect errors due to mismatched duplexes to primarily occur for mismatches at the 5′ end. Such errors could be mitigated by a VCG pool consisting only of oligomers that are sufficiently long relative to the unique motif length of the virtual genome.

      We have extended the Discussion section to address this interesting issue.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      • ’(apostrophes) should be prime symbols instead of apostrophes

      We thank the Reviewer for spotting this mistake, which we have now corrected.

      • In the Introduction, the section that discusses the fidelity of enzyme-free copying should include a reference to Duzdevich et al. NAR 2021, as that work measured the fidelity experimentally.

      We have included this reference together with other references on the kinetics of hybridization/dehybridization to nicks and gaps in the main text.

      • The term feedstock oligomers may be problematic, because these also include monomers. In the ”Templated ligation” section of the Model, the statement ”We consider pools in which all oligomers are activated, as well as pools in which only monomers are activated” is imprecise. ”All oligomers, including monomers,...” would be better so as to avoid confusion in readers accustomed to standard RNA language.

      We thank the Reviewer for this helpful suggestion. In the revised manuscript, we now use the term feedstock (rather than feedstock oligomers) to avoid confusion. We have also revised the sentence in the ”Templated ligation” section to read ”all oligomers, including monomers, ...” as recommended.

      • The ”Experimentally determined association rate constants” reference 24-26, which measured the rate constants for DNA. Considering that the authors are modeling RNA, I wonder if Ashwood et al. Biophysical Journal 2023 contains any relevant RNA data that could help refine the model?

      We thank the Reviewer for pointing us to the study by Ashwood et al. We have added this reference to the corresponding paragraph in the revised manuscript. Their RNA association rate constant (∼ 5 × 10<sup>7</sup> M<sup>−1</sup> s<sup>−1</sup>) is larger than the one we used (∼ 1×10<sup>6</sup> M<sup>−1</sup> s<sup>−1</sup>), however a larger association rate is in fact beneficial for the validity of our adiabatic approximation, and thus would not affect our results, as long as the thermodynamic stability remains the same. This is because faster association then also implies faster dissociation, and the ratio of the ligation timescale to the timescales of (de)hybridization then becomes even smaller, which is the regime where the adiabatic approximation made in our analysis is justified.

      • In ”Triplexe softype 1—8 and 1—9...”,the word triplexes will confuse readers with RNA expertise as triplexe simply a triple-strandedRNA.

      We thank the Reviewer for pointing out the potentially ambiguous nomenclature. To avoid confusion with triplestranded RNA structures, we now refer to binary (ternary, ...) complexes instead of duplexes (triplexes, ...) throughout the revised manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      The authors addressed the influence of DKK2 on colorectal cancer (CRC) metastasis to the liver using an orthotopic model transferring AKP-mutant organoids into the spleens of wild-type animals. They found that DKK2 expression in tumor cells led to enhanced liver metastasis and poor survival in mice. Mechanistically, they associate Dkk2-deficiency in donor AKP tumor organoids with reduced Paneth-like cell properties, particularly Lz1 and Lyz2, and defects in glycolysis. Quantitative gene expression analysis showed no significant changes in Hnf4a1 expression upon Dkk2 deletion. Ingenuity Pathway Analysis of RNA-Seq data and ATAC-seq data point to a Hnf4a1 motif as a potential target. They also show that HNF4a binds to the promoter region of Sox9, which leads to LYZ expression and upregulation of Paneth-like properties. By analyzing available scRNA data from human CRC data, the authors found higher expression of LYZ in metastatic and primary tumor samples compared to normal colonic tissue; reinforcing their proposed link, HNF4a was highly expressed in LYZ+ cancer cells compared to LYZ- cancer cells. 

      Strengths: 

      Overall, this study contributes a novel mechanistic pathway that may be related to metastatic progression in CRC. 

      Weaknesses: 

      The main concerns are related to incremental gains, missing in vivo support for several of their conclusions in murine models, and missing human data analyses. Additionally, methods and statistical analyses require further clarification. 

      Main comments: 

      (1) Novelty 

      The authors previously described the role of DKK2 in primary CRC, correlating increased DKK2 levels to higher Src phosphorylation and HNF4a1 degradation, which in turn enhances LGR5 expression and "stemness" of cancer cells, resulting in tumor progression (PMID: 33997693). A role for DKK2 in metastasis has also been previously described (sarcoma, PMID: 23204234). 

      (2) Mouse data 

      a) The authors analyzed liver mets, but the main differences between AKT and AKP/Dkk2 KO organoids could arise during the initial tumor cell egress from the intestinal tissue (which cannot be addressed in their splenic injection model), or during pre-liver stages, such as endothelial attachment. While the analysis of liver mets is interesting, given that Paneths cells play a role in the intestinal stem cell niche, it is questionable whether a study that does not involve the intestine can appropriately address this pathway in CRC metastasis. 

      We value the reviewer’s comment that the splenic injection model cannot represent metastasis from the primary tumors, intravasation and extravasation. Therefore, we performed the orthotopic transplantation of AKP and KO organoids into the colon directly then, tested metastasis of cancer.

      Author response image 1.

      Primary tumor formation and liver metastasis by orthotopic transplantation of AKP or KO colon cancer organoids. 6-8 week-old male C57BL/6J mice were treated with 2.5% DSS dissolved in drinking water for 5 days, followed by regular water for 2 days to remove gut epithelium. After recovery with the regular water, the colon was flushed with 1000 μl of 0.1% BSA in PBS. Then, 200,000 dissociated organoid cells in 200 μl of 5% Matrigel and 0.1% BSA in PBS were instilled into the colonic luminal space. After infusion, the anal verge was sealed with Vaseline. 8 weeks after transplantation, the mice were sacrificed to measure primary tumor formation and liver metastasis.

      As a result, 4 out 6 mice in the control group successfully formed colorectal primary tumors whereas only 2 out 6 mice showed primary tumor formation in the KO group (Author response image 1A). The size of tumors was reduced by about half (10-12 mm to 5-7 mm). Only one AKP mouse developed metastasized nodules in the liver (Author response image 1B). Next, to measure the circulating tumor cells, we harvested at least 500 ul of bloods from the portal vein and then analyzed tdTomato-positive tumor cells (Author response image 2). Flow cytometry analysis of PBMCs showed the presence of tdTomatohiCD45- cells as well as tdTomatomidCD45+ cells in 2 out of 6 AKP mice, while no tdTomato-positive cells were observed in the PBMCs of KO organoid-transplanted mice.

      Due to the limited numbers of mice showed primary and metastatic tumor formation, we cannot provide a statistic analysis of DKK2-mediated metastasis. However, our revised data indicate a trend that DKK2 KO reduced primary tumor formation, the number of circulating tumor cells and liver metastasis. This trend is consistent with our previous report in the iScience paper, which showed that DKK2 KO reduced AOM/DSS-induced polyp formation about 60 % and decreased metastasis in the splenic injection model system in this manuscript. Further studies are necessary to confirm this trend and to provide the underlying mechanisms of intravasation and extravasation of circulating tumor cells.

      Author response image 2.

      Flow cytometry analysis of tdTomato+ circulating colon tumor cells in PBMCs. PBMCs were harvested via the portal vein after euthanasia. CD45 and tdTomato were analyzed by flow cytometry.

      b) The overall number of Paneth cells found in the scRNA-seq analysis of liver mets was strikingly low (17 cells, Figure 3), and assuming that these cells are driving the differences seems somewhat far-fetched. Adding to this concern is inappropriate gating in the flow plot shown in Figure 6. This should be addressed experimentally and in the interpretation of data. 

      We appreciate for reviewer’s comments to clarify this point. Since the number of LYZ+ cells is low in our scRNA-seq analysis, we performed flow cytometry in Figure 6H showing the clear population expressing LYZ in the same splenic injection model of metastasis. Figure 6H is a representative image of triplicates for each group and we performed this experiment three times, independently. As suggested, we changed the graph format and updated the gating and statistical analysis in Fig 6H and 6I. This in vivo result confirmed our in vitro data showing that DKK2 KO reduced LYZ+ cells while increase the HNF4α1 proteins.

      c) Figures 3, 5, and 6 show the individual gene analyses with unclear statistical data. It seems that the p-values were not adjusted, and it is unclear how they reached significance in several graphs. Additionally, it was not stated how many animals per group and cells per animal/group were included in the analyses. 

      In Fig. 3, mouse scRNA-seq data were generated from pooled cancer samples from 5 animals per group. The Wilcoxon signed-rank test was performed for each gene and/or regulon activity. Since multiple testing adjustments were not performed, a p-value adjustment is neither needed nor applicable..

      In Fig. 5, human data were analyzed. Cells from the same sample are dependent, but differential gene expression (DEG) analysis typically calculates statistics under the assumption that they are independent. This assumption may explain the low p-values observed in our data. To address this issue, we applied pseudobulk DEG analysis to our human single-cell data. Even after correcting for statistical error, we confirmed that the genes of interest still exhibited significantly different expression patterns (Author response image 3).

      Author response image 3.

      Pseudobulk DEG analysis confirmed the differential expression genes of interest.

      In Fig.6H-6I, the number of animals per group is provided in the figure legend.

      d) Figure 6 suggests a signaling cascade in which the absence of DKK2 leads to enhanced HNF4A expression, which in turn results in reduced Sox9 expression and hence reduced expression of Paneth cell properties. It is therefore crucial that the authors perform in vivo (splenic organoid injection) loss-of-function experiments, knockdown of Sox9 expression in AKP organoids, and Sox9 overexpression experiments in AKP/Dkk2 KO organoids to demonstrate Sox9 as the central downstream transcription factor regulating liver CRC metastasis. 

      Sox9 is a well-established marker gene for Paneth cell formation in the gut. Therefore, overexpression or knockout of the Sox9 gene would result in either an increase or decrease in Paneth cells in the organoids. We believe that the suggested experiments fall outside the scope of this manuscript. Instead, we demonstrated the change in the Paneth cell differentiation marker, Sox9, in the presence or absence of DKK2.

      e) Given the previous description of the role of DKK2 in primary CRC, it is important to define the step of liver metastasis affected by Dkk2 deficiency in the metastasis model. Does it affect extravasation, liver survival, etc.? 

      We appreciate the reviewer’s insights and perspectives. Regarding liver survival, it is well known that stem cell niche formation is a critical step for the outgrowth of metastasized cancer cells (Fumagalli et al. 2019, Cell Stem Cell). LYZ+ Paneth cells are recognized as stem cell niche cells in the intestine, and human scRNA-seq data have shown that LYZ+ cancer cells express stem cell niche factors such as Wnt and Notch ligands. To determine whether LYZ+ cancer cells act as stem cell niche cells, we performed confocal microscopy to assess whether LYZ+ cancer cells express WNT3A and DLL4 in AKP organoids (Author response image 4). The results show that LYZ labeling co-localizes with DLL4 and WNT3A expression, while the organoid reporter tdTomato is evenly distributed. Additionally, our in vitro and in vivo data indicate that DKK2 deficiency leads to a reduction of LYZ+ cancer cells, which may contribute to stem cell niche formation. Based on these findings, we propose that DKK2 is an essential factor for stem cell niche formation, which is required for cancer cell survival in the liver during the early stages of metastasis. Although our revised data confirmed the trend that DKK2 deficiency decreases liver metastasis, we have not yet determined whether DKK2 is involved in extravasation. This research topic should be addressed in future studies.

      Author response image 4.

      Confocal microscopy analysis for lysozyme (LYZ) and Paneth cell-derived stem cell niche factors, WNT3A and DLL4 in AKP colon cancer organoids.

      The method is described in the supplemental information. The list of antibodies used: DLL4 (delta-like 4) Polyclonal Antibody (Invitrogen, PA5-85931), WNT3A Polyclonal Antibody (Invitrogen, PA5-102317), Goat anti-Rabbit IgG (H+L) Cross-Adsorbed Secondary Antibody, Alexa Fluor™ 488 (Invitrogen, A-11008), Anti-Lysozyme C antibody (H-10, Santacurz, sc-518083), Goat anti-Mouse IgM (Heavy chain) Secondary Antibody, Alexa Fluor™ 647 (Invitrogen, A-21238).

      (3) Human data 

      Can the authors address whether the expression of Dkk2 changes in human CRC and whether mutations in Dkk2 as correlated with metastatic disease or CRC stage? 

      The human data were useful in identifying the presence of LYZ+ cancer cells with Paneth cell properties. However, due to the limited number of late-stage patient samples with high DKK2 expression, the results were not statistically significant. Nevertheless, the trend suggests a positive correlation between DKK2 expression and the malignant stage of CRC.

      (4) Bioinformatic analysis 

      The authors did not provide sufficient information on bioinformatic analyses. The authors did not include information about the software, cutoffs, or scripts used to make their analyses or output those figures in the manuscript, which challenges the interpretation and assessment of the results. Terms like "Quantitative gene expression analyses" (line 136) "visualized in a Uniform Approximation and Projection" (line 178) do not explain what was inputted and the analyses that were executed. There are multiple forms to align, preprocess, and visualize bulk, single cell, ATAC, and ChIP-seq data, and depending on which was used, the results vary greatly. For example, in the single-cell data, the authors did not inform how many cells were sequenced, nor how many cells had after alignment and quality filtering (RNA count, mt count, etc.), so the result on Paneth+ to Goblet+ percent in lines 184 and 185 cannot be reached because it depends on this information. The absence of a clustering cutoff for the single-cell data is concerning since this greatly affects the resulting cluster number (https://www.nature.com/articles/s41592-023-01933-9). The authors should provide a comprehensive explanation of all the data analyses and the steps used to obtain those results. 

      We apologize for the insufficient information. Below, we provide detailed information on the data analyses, which are also available in the GEO database (Bulk RNA-seq: GSE157531, ATAC-seq: GSE157529, ChIP-seq: GSE277510). Methods are updated in the current version of supplemental information.

      (5) Clarity of methods and experimental approaches 

      The methods were incomplete and they require clarification. 

      We’ve updated our methods as requested by the reviewer.

      Reviewer #2 (Public Review): 

      Summary: 

      The authors propose that DKK2 is necessary for the metastasis of colon cancer organoids. They then claim that DKK2 mediates this effect by permitting the generation of lysozyme-positive Paneth-like cells within the tumor microenvironmental niche. They argue that these lysozyme-positive cells have Paneth-like properties in both mouse and human contexts. They then implicate HNF4A as the causal factor responsive to DKK2 to generate lysozyme-positive cells through Sox9. 

      Strengths: 

      The use of a genetically defined organoid line is state-of-the-art. The data in Figure 1 and the dependence of DKK2 for splenic injection and liver engraftment, as well as the long-term effect on animal survival, are interesting and convincing. The rescue using DKK2 administration for some of their phenotype in vitro is good. The inclusion and analysis of human data sets help explore the role of DKK2 in human cancer and help ground the overall work in a clinical context. 

      Weaknesses: 

      In this work by Shin et al., the authors expand upon prior work regarding the role of Dickkopf-2 in colorectal cancer (CRC) progression and the necessity of a Paneth-like population in driving CRC metastasis. The general topic of metastatic requirements for colon cancer is of general interest. However, much of the work focuses on characterizing cell populations in a mouse model of hepatic outgrowth via splenic transplantation. In particular, the concept of Paneth-like cells is primarily based on transcriptional programs seen in single-cell RNA sequencing data and needs more validation. Although including human samples is important for potential generality, the strength could be improved by doing immunohistochemistry in primary and metastatic lesions for Lyz+ cancer cells. Experiments that further bolster the causal role of Paneth-like CRC cells in metastasis are needed. 

      Recommendations for the Authors:

      Reviewing Editor (Recommendations for the Authors): 

      Here we note several key concerns with regard to the main conclusions of the paper. Additional experiments to directly address these concerns would be required to substantially update the reviewer evaluation. 

      (1) Demonstration of a causal role of Paneth-like cells in CRC metastasis, for example by sorting the Paneth-like cells - either by the markers they identified in the subsequent single cell or by scatter - to establish whether the frequency of the Paneth-like cells in a culture of organoids is directly correlated with tumorigenicity and engraftment. 

      We sincerely appreciate the reviewing editor’s comment. First, as previously reported (Shin et al., iScience 2021), there is no difference in proliferation between WT and KO during in vitro organoid culture or in vivo colitis-induced tumors. However, DKK2 deficiency led to morphological changes, which we analyzed using bulk RNA-seq. As described in the manuscript, Paneth cell marker genes, such as Lysozymes and defensins, were significantly reduced in DKK2 KO AKP organoids.

      Due to the nature of these markers, it is technically challenging to isolate live LYZ+ cancer cells. To address this issue in the future, we plan to develop organoids that express a reporter gene specific for Paneth cells. In this manuscript, we demonstrated a correlation between DKK2 and the formation of LYZ+ cancer cells. In both the splenic injection model (Fig. 1) and the orthotopic transplantation model (Fig. R1-R2), we observed that transplantation of cancer organoids with reduced numbers of LYZ+ cells (KO organoids) led to decreased metastatic tumor formation. The number of LYZ+ cells in KO-transplanted mice remained low in liver metastasized tumor nodules (Fig. 6H-I6). Immunohistochemistry further confirmed that LYZ+ cancer cells were barely detectable in KO samples (Author response image 5). These data suggest that DKK2 is essential for the formation of LYZ+ cancer cells, which are necessary for outgrowth following metastasis.

      Author response image 5.

      Histology of Lysozyme positive cells in metastasized tumor nodules in liver of colon cancer organoid transplanted mice. Immunohistochemistry of Lysozyme positive Paneth-like cells cells in liver metastasized colon cancer (Upper panels, DAB staining). Identification of tumor nodules by H&E staining (lower panels, Scale bar = 100 μm). Magnified tumor nodules are shown in the 2nd and 3rd columns (Scale bar = 25 μm). Arrows indicate Lysozyme positive Paneth like cells in tumor epithelial cells. Infiltration of Lysozyme positive myeloid cells is detected in both AKP and KO tumor nodules. AKP: Control colon cancer organoids carrying mutations in Apc, Kras and Tp53 genes. KO: Dkk2 knockout colon cancer organoids

      (2) Further characterization of Lyz+/Paneth-like cells to further the authors' argument for the unique function that they have in their tumor model. Specifically, do the cells with Paneth-like cells secrete Wnt3, EGF, Notch ligand, and DII4 as normal Paneth cells do? 

      We appreciate the reviewing editor’s comment. In response, we performed confocal microscopy analysis to examine the protein levels of LYZ, Wnt3A, and DLL4 in AKP colon cancer organoids (Author response image 4). The data presented above show that LYZ+ cancer cells express both Wnt3A and DLL4, suggesting that LYZ+ colon cancer cells may function similarly to Paneth cells, which are stem cell niche cells. Furthermore, using the Panglao database, we demonstrated that LYZ+/Paneth-like cells exhibit typical Paneth cell properties in human scRNA-seq data (Fig. 4 and Fig. 5). These findings suggest that LYZ+ colon cancer cells possess Paneth cell properties.

      (3) Experiments to test metastasis, ideally from orthotopic colonic tumors, to ensure phenotypes aren't restricted to the splenic model of hepatic colonization and outgrowth used at present. 

      We are in agreement with the reviewing editor and reviewers, which is why we conducted the orthotopic transplantation experiment. However, we encountered challenges in establishing this model effectively. After multiple trials, we observed that many mice did not form primary tumors, and the variability, particularly in metastasis, was difficult to control. Only a few AKP-transplanted mice developed liver metastasis. The representative revision data have been provided above. Nevertheless, we believe that this model needs further improvement and optimization to reliably study metastasis originating from primary tumors.

      (4) To generalize claims to human cancer, the authors should test whether loss of DKK2 impacts LYZ+ cancer cells in human organoids and affects their engraftment in immunodeficient mice compared to control. Another more correlative way to validate the LYZ+ expression in human colon cancer would be to stain for LYZ in metastatic vs. primary colon cancer, expecting metastatic lesions to be enriched for LYZ+ cells. 

      We agree with your point, and this will be addressed in future studies.

      (5) Clarifying inconsistencies regarding effect of DKK2 loss on HNF4A (Figure 1E vs Figure 6I). 

      In Figure 1 E, we measured the mRNA levels of HNF4A in metastasized foci by qPCR while in Figure 6I, we measured the protein level of HNF4A by flow cytometry. Recent studies, including our previous report, have shown that HNF4A protein levels are regulated by proteasomal degradation mediated by pSrc (Mori-Akiyama et al. 2007, Gastroenterology, Bastide et al. 2007, Journal of Cell Biology, Shin et al. 2021 iScience). Consequently, while the mRNA levels remained unchanged in Fig. 1E, we observed a reduction of HNF4A protein levels in Figure 6I.

      (6) Addressing concerns about statistics and reporting as outlined by Reviewer 1. 

      Thank you very much for your assistance in improving our manuscript. The updates have been incorporated as detailed above.

      These are the central reviewer concerns that would require additional experimentation to update the editorial summary. Other concerns should be addressed in a revision response but do not require additional experimentation. 

      Reviewer #1 (Recommendations For The Authors): 

      Specific comments: 

      • Do Dkk2-KO organoids grow normally?

      Yes, in vitro.

      Since the authors reported on the effects of Dkk2 in the induction/maintenance of the Paneth cell niche, changes in AKP organoid numbers of growth rate between Dkk2-WT and KO would be an expected outcome. 

      Disruption of Paneth cell formation in normal organoids is expected to alter growth. However, DKK2 KO in colon cancer organoids with mutations in the Apc, Kras, and Tp53 genes exhibits growth rates and organoid sizes similar to those of WT AKP controls. In contrast to in vitro observations, we observed a significant reduction in metastasized tumor growth in vivo. Further analyses of factors derived from LYZ+ cancer cells will help address the discrepancy in DKK2's absence between in vitro and in vivo conditions.

      • Figure 1: 

      - Panel C: The legend indicates what c.p. stands for.

      c.p.m. stands for count per minutes for in vivo imaging analysis. This has been updated in the Figure legend.

      - Panel E: Please comment on the possible underlying reasons for the lack of change in HNF4a1 levels. 

      This has been updated in response to the reviewing editor’s comment (5) above.

      - Panel E: Number of mice from which isolated cancer nodules were harvested. 

      Total mice per group were 5. This has been updated in the legend.

      • Figure 2: 

      - Suggestion: Panel A should be presented in Figure 1 since Dkk2 KO organoids are already used in Figure 1. 

      We added this to present the recovery of DKK2 by adding recombinant DKK2 proteins in Fig.2.

      - Panel B: Please explain why these genes are marked in blue. 

      It has been described in the legend. “Paneth cell marker genes are highlighted as blue circles (AKP=3 and KO=5 biological replicates were analyzed).”

      • Figure 3: 

      - Indicate the number of cells recovered from AKP vs. KO mice (since liver metastasis was already reduced in KO mice). This should be shown in a UMAP. 

      - Panel A: 4th line in the pathways, correct "Singel" typo. 

      We appreciate your correction. It has been fixed.

      - Panel A: There are multiple versions of PanglaoDB with different markers; a list of all that was used to determine cell type should be provided. 

      - Panel C: Bar value for the WNT pathway is not displayed, and there is no legend to indicate the direction of the analysis (that is, AKPvsKO or KOvsAKP). 

      It is KOvsAKP, described in the figure legend.

      - Panel C: Ingenuity pathway analysis is not a good tool to look at this type of result because it does not include the gene fold changes in the analysis, so it only provides a Z-score of the presence of that pathway and not the degree it is increased or fold changes - recommend substituting any type of GSEA analysis, such as fgsea. -o Panel D: the term "Patient" to refer to mice is confusing. Use "Mice" or "Treatment" or "Condition" instead. 

      Corrected

      - Panel D: Information about the number of mice per group, cells per animal (or liver let) used, and additional clarification about the statistical analysis used is required, as differences shown in this panel appear subtle given the standard variation in each group. Box plots need to show individual/raw values. 

      • Figure 4: 

      - Panel E: It would be helpful to show the cutoff lines for the Paneth cell score and Lyz expression in the graphs. 

      It has been updated in response to the reviewer’s request.

      • Figure 5: 

      - Panel B: again, information about the number of "patients" or cells used and clarification about the statistical analysis used is required as the display of data generates concerns about the distribution within groups. Box plots need to show individual/raw values

      It has been updated in response to the reviewer’s request.

      • Figure 6: 

      - Panel A: Add a legend to inform the direction of the process (e.g., red, activation, blue, repression). We noticed the Yap1 bar data had no color. Is there a reason for that? Please explain this point in the revised manuscript. 

      Red color added for the Yap1.

      - Panel A: Ingenuity pathway analysis is not a good tool to look at this type of results because it does not include the gene Foldchanges in the analysis, so it only provides a Z-score of the presence of that pathway and not the degree it is increased or not. I recommend substituting any type of GSEA analysis, such as fgsea. 

      - Panels A&B: Again, only p-value scores were provided, while fold changes are necessary to define the ratio of presence increase of normal vs. AKP. 

      - Panel D: No raw or pre-processed ChIP-seq data was provided. Additionally, please indicate exactly the genome location (it seems the image was edited from a raw made on UCSC genome browser-it should be remade by adding coordinates and other important information (genes around, epigenetic, etc.). 

      - Panel H/I: Flow cytometry gating is inappropriate, as its catching cells are negative for LYZ in both AKP and KO cells, resulting in an overestimation of the number of Lyz cells. Gating should specifically select very few LYZ-positive cells in the top/left quadrant. 

      The updates have been made, and the statistical data have been re-analyzed.

      - Panel J: Information about the number of animals/organoids or cells used and clarification about the statistical analysis used is required, as the display of data generates concerns about the distribution within groups. Box plots need to show individual/raw values. 

      • Overall: 

      - A supplementary table with all the sequenced libraries and their depth, read length/cell count should be provided.

      All of the information is now available in the GEO database. We used previously published human epithelial datasets for human single cell analysis (Joanito*, Wirapati*, Zhao*, Nawaz* et al, Nat Genetics, 2022, PMID: 35773407).

      - The Hallmark Geneset used is very broad, and the authors should confirm the results on GO bp. 

      Using Gene Ontology biological processes (GO bp), we observed that glycolysis-related genes were enriched in our newly described cell population, although the adjusted p-value did not exceed 0.05.

      Author response image 6.

      GSEA with GOBP pathway highlighted glycoprotein and protein localization to extracellular region, both of which are related Paneth cell functions. Paneth cells secrete α-defensins, angiogenin-4, lysozyme and secretory phospholipase A2. The enriched glycoprotein process and protein localization not extracellular region reflect the characteristics of Paneth cells. 

       

      - qPCR is not a good way to confirm sequencing results; while PCR data is pre-normalized, sequencing is normalized only after quantification, so results on 6 E and F should be shown on the sequencing data. 

      The expression level of Sox9 is relatively low. In our bulk RNA-seq data, the averages for Sox9 in AKP versus DKK2 KO are 28.2 and 25.1, respectively. While there is a similar trend, the difference is not statistically significant in this dataset, and we did not include an experimental group for reconstitution. Therefore, we conducted qPCR experiments for the reconstitution study by adding recombinant DKK2 (rmDKK2) protein to the culture. Furthermore, it is well established that Sox9 is an essential transcription factor for the formation of LYZ+ Paneth cells. Based on this, we assessed the levels of LYZ and Sox9 using qPCR and confocal microscopy in the presence or absence of DKK2.

      • Edits in the text: 

      - There are several typographical errors. Specific suggestions are provided below. 

      - Line 43: "Chromatin immunoprecipitation followed by sequencing analysis," state analysis of what cells before continuing with "revealed..." revealed... 

      - Line 77: Recent findings have identified 

      - Line 138: were reduced in KO tumor samples à rephrase to clarify "KO-derived liver tumors" 

      - Line 167: Recombinant mouse DKK2 protein treatment in KO organoids partially rescued this effect. Add "partially" since adding rmDkk2 didn't fully restore Lyz1 and Lyz2 levels. 

      - Line 185-187: the authors should not reference Figure 6 because it has not been introduced yet. 

      - Line 198-199: The authors claimed a correlation between Dkk2 expression and Lgr5 expression; however, the graph presented in Figure 3B does not indicate this. The R-value was 0.11, which does not indicate a correlative expression between these genes. 

      - Line 232-233: the authors need to show any connection to Dkk2 gene expression in human samples in order to draw that conclusion. 

      - Line 294: expression, leading to the formation 

      - Line 347: Wnt ligand (correct Wng typo) 

      We have modified our manuscript in accordance with the reviewer’s suggestions.

      Reviewer #2 (Recommendations For The Authors): 

      Specific criticisms/suggestions: 

      Author claim 1: Dkk2 is necessary for liver metastasis of colon cancer organoids. <br /> This model is one of hepatic colonization and eventual outgrowth and not metastasis. Metastasis is optimally assessed using autochthonous models of cancer generation, with the concomitant intravasation, extravasation, and growth of cancer cells at the distant site. The authors should inject their various organoids in an orthotopic colonic transplantation assay, which permits the growth of tumors in the colon, and they can then identify metastasis in the liver that results from that primary cancer lesion (i.e., to better model physiologic metastasis from the colon to liver). 

      The data of orthotopic colonic transplantation data has been provided above (Author response images 1 and 2).

      Author claim 2: DKK2 is required for the formation of lysozyme-positive cells in colon cancer. 

      It would greatly strengthen the authors' claim if supraphysiologic or very high amounts of DKK2 enhance CRC organoid line engraftment ( i.e., the specific experiment being pre-treatment with high levels of DKK2 and immediate transplantation to see a number of outgrowing clones). If DKK2 is causal for the engraftment of the tumors, increased DKK2 should enhance their capacity for engraftment. 

      Paneth cells have physical properties permitting sorting and are readily identifiable on flow cytometry. The authors should demonstrate increased tumorigenicity and engraftment by sorting the Paneth-like cells-either by the markers they identified in the subsequent single cell or by scatter to establish whether the frequency of the Paneth-like cells in a culture of organoids is directly correlated with engraftment potential. 

      Further characterization of the Paneth-like cells would help further the authors' argument for the unique function that they have in their tumor model. Specifically, do the cells with Paneth-like cells secrete Wnt3, EGF, Notch ligand, and DII4 as normal Paneth cells do? Immunofluorescence, sorting, or western blots would all be reasonable methods to assess protein levels in the sorted population. 

      This has been performed and provided above (Author response images 1 and 3)

      Author claim 3: Lyzosome (LYZ)+ cancer cells exhibit Paneth cell properties in both mouse and human systems. 

      For the claim to be general to human cancer, the author should demonstrate that loss of DKK2 impacts LYZ+ cancer cells in human organoids and affects their engraftment in immunodeficient mice compared to control. Another more correlative way to validate the LYZ+ expression in human colon cancer would be to stain for LYZ in metastatic vs. primary colon cancer, expecting metastatic lesions to be enriched for LYZ+ cells. 

      The claims on the metabolic function of Paneth-like cells need more clarification. Do the cancer cells with Paneth features have a distinct metabolic profile compared to the other cell populations? The authors should address this through metabolic characterization of isolated LYZ+ cells with Seahorse or comparison of Dkk2 KO to WT organoids (i.e., +/-LYZ+ cancer cell population). 

      To address this question, we need to develop organoids with a Paneth cell reporter gene. We appreciate the reviewer’s comment, and this should be pursued in future studies.

      Author claim 4: HNF4A mediates the formation of Lysozyme (Lyz)-positive colon cancer cells by DKK2. 

      The authors implicate HNF4A and Sox9 as causal effectors of the Paneth-like cell phenotype and subsequent metastatic potential. There appears to be some discordance regarding the effect of DKK2 loss on HNF4A. In Figure 1E, the authors show that gene expression in metastatic colon cancer cells for HNF4A in DKK2 knockout vs AKP control is insignificant. However, in Figure 6I, there is a highly significant difference in the number of HNF4A positive cells, more than a 3-fold percentage difference, with a p-value of <0.0001. If there is the emergence of a rare but highly expressing HNF4A cell type that on aggregate bulk expression leads to no difference, but sorts differentially, why is it not identified in the single-cell data set? These data together are highly inconsistent with regards to the effect of DKK2 on HNF4A and require clarification. 

      Previous studies have demonstrated that HNF4A is regulated by proteasomal degradation mediated by pSrc. As a result, the mRNA level of HNF4A remains unchanged, while the protein level is significantly reduced in colon cancer cells. DKK2 KO leads to decreased Src phosphorylation, resulting in the recovery of HNF4A protein levels. This explains why HNF4A cannot be detected in scRNA-seq datasets, which measure mRNA. We have shown this in our previous report. In this manuscript, based on ChIP-seq data using an anti-HNF4A monoclonal antibody, as well as confocal microscopy and qPCR data for the Sox9 gene, we propose that HNF4A acts as a regulator of cancer cells exhibiting Paneth cell properties.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This manuscript provides novel and important findings regarding the impact of noradrenergic signaling from the locus coeruleus on hippocampal gene expression. The locus coeruleus is the sole source of noradrenaline to the hippocampus and many rapid molecular changes induced by stress are regulated by noradrenaline. This manuscript provides a rigorous investigation into hippocampal genes uniquely regulated by noradrenaline in the presence or absence of stress. Data were collected and analyses were performed using solid methodology, and the results mostly convincingly support the conclusion made with few weaknesses. The study would benefit from a more comprehensive analyses of sex differences.

      Response: We thank the reviewers and the editors for the positive evaluation of our work and for the constructive feedback. To address some of the key criticisms, we have performed several new experiments and analyses. Importantly, we now provide a much more rigorous comparison of males and females, which strongly suggests that there are no major sex differences in the transcriptomic response to stress and noradrenaline in the hippocampus. We think that these - and other additions discussed below - significantly strengthen the manuscript. We provide detailed responses to all the reviewers comments. We have added numbers to the reviewers’ comments for easier referencing.

      Reviewer #1 (Public Review):

      Comment 1: Privitera et al., provide a comprehensive and rigorous assessment of how noradrenaline (NA) inputs from the locus coeruleus (LC) to the hippocampus regulate stress-induced acute changes in gene expression. They utilize RNA-sequencing with selective activation/inhibition of LC-NA activity using pharmacological, chemogenetic and optogenetic manipulations to identify a great number of reproducible sets of genes impacted by LC activation. It is noteworthy that this study compares transcriptomic changes in the hippocampus induced by stress alone, as compared with selective circuit activation/inhibition. This reveals a small set of genes that were found to be highly reproducible. Further, the publicly available data will be highly useful to the scientific community.

      Response: We are very grateful for this positive evaluation.

      Comment 2: A major strength of the study is the inclusion of both males and females. However, with this aspect of the study also lies the biggest weakness. While the experiments tested males and females, they were not powered for identifying sex differences. There are vast amounts of literature documenting the inherent sex differences, both under resting and stress-evoked conditions, in the LC-NA system and this is a major missed opportunity to better understand if there is an impact of these sex-specific differences at the genetic level in a major LC projection region. There are many instances whereby sex effects are apparent, but do not pass multiple testing correction due to low n's. The authors highlight one of them (Ctla2b) in supplemental figure 6. This gene is only upregulated by stress in females. It is appreciated that the manuscript provides an incredible amount of novel data, making the investigation of sex differences ambitious. Data are publicly available for others to conduct follow up work, and therefore it may be useful if a list of those genes that were different based on targeted interrogation of the dataset be provided with a clear statement that multiple testing corrections failed. This will aid further investigations that are powered to evaluate sex effects.

      Response: The assessment of the reviewers and the editorial feedback encouraged us to look more thoroughly into potential sex differences, because we believe it would indeed be a major additional strength if our manuscript could make a firm statement on this important issue. To this end, we have expanded the manuscript in two major ways:

      (1) To expand the analysis of sex effects also to the dorsal hippocampus, and to increase robustness of the data, we have performed RNA-seq in 32 additional samples of male and female mice exposed to stress (or control) and propranolol (or saline) injection. Figure 1fh and Supplementary Figure 1d-f have been updated to reflect this new addition, and the results are presented in a new section on Pages 3-4 (pasted below for ease of reviewing). In summary, the strongly support our initial observation that the effects of stress on gene expression, as well as the effects of propranolol on blocking stress-induced effects, are highly similar in both sexes.

      (2) To further increase the power for detection of sex-effects, we have performed a small meta-analysis. For this, we combined several RNAseq datasets from the current manuscript and published datasets from our previous work (Floriou-Servou et al., 2018; von Ziegler et al., 2022), which also investigated transcriptomic sex-differences in the hippocampus 45 min after cold swim stress exposure in the same setup as used for the current manuscript. This approach increased our sample size to 51 males and 20 females. In summary, this well-powered approach shows no evidence for sex differences in the transcriptional response to stress, even when more lenient analyses were applied. These results are described in a new section on page 4, and summarized in Supplementary Figures 1f+g. This section is pasted below for ease of reviewing.

      "While blocking β-adrenergic receptors was able to block stress-induced gene expression, we did not test whether propranolol might decrease gene expression already at baseline, independent of stress. Additionally, all tests had thus far been conducted in male mice, raising the question about potential sex differences in NA-mediated transcriptomic responses. To address these two issues, we repeated the experiment in both sexes and included a group that received a propranolol injection but was not exposed to stress (Fig. 1f). Combining the data from both experiments, we repeated the analysis for each region, to identify genes whose response to stress was inhibited by propranolol (Figure 1g). As in the previous experiment, we found that many of the stress-induced gene expression changes were blocked by propranolol injection in both dHC (Figure 1g, left panel) and vHC (Figure 1g, right panel). Importantly, propranolol did not change the expression level of these genes in the absence of stress. We then directly compared the genes sensitive to stress and propranolol treatment in both dHC and vHC. To this end, we plotted the union of genes showing a significant stress:propranolol interaction in either region in one heatmap across both dHC and vHC (Supplementary Figure 1d). This showed again that the stress-induced changes were very similar in dHC and vHC, and that propranolol similarly blocked many of them. Finally, we asked whether the response differs between males and females. Despite clear sex differences in gene expression at baseline (data not shown), we found no significant sex differences in response to stress or propranolol between male and female mice (FDR<0.05; Fig. 1g). To more directly visualize this, we compared females and males by plotting the log2-fold changes of the stress:propranolol interaction across all stress-induced genes that were blocked by propranolol. We find very similar regulation patterns in both sexes (Figure 1h). Although none of these sex differences are significant, some genes seem to show quantitative differences, so we plotted the expression patterns of the 5 genes showing the largest difference in interaction term as box-plots, which suggest that these spurious differences are likely due to noisy coefficient estimates (Supplementary Fig. 1e). To address concerns that our analysis of sex differences might not have been sufficiently powered, we performed a meta-analysis of the experiments shown here along with previously published datasets from our lab (Floriou-Servou et al. 2018; von Ziegler et al. 2022). In all these experiments, the vHC of male and female mice was profiled 45 min after exposure to an acute swim stress challenge. This resulted in a sample size of 51 males and 20 females. Despite this high number of independent samples, we could not identify any statistically significant interaction between sex and the stress response. To identify candidates that might not reach significance while discounting differences due to noise in fold-change estimates, we reproduced the same analysis using DESeq2 with Approximate Posterior Estimation for generalized linear model (apeglm) logFC shrinkage (A. Zhu, Ibrahim, and Love 2018). This analysis also did not reveal any sex differences in the stress response (Supplementary Fig. 1f). We then tailored the meta-analysis specifically to the set of stress-responsive genes that were blocked by propranolol, and also for these genes the response to stress was strikingly similar in both sexes (Supplementary Fig. 1g). Altogether, we conclude that there are no major sex differences in the rapid transcriptomic stress response in the hippocampus, and that blocking beta-receptors prevents a large set of stress-induced genes in both females and males."

      To put these findings in context with existing literature, we agree with the reviewer that there are many studies that have reported sex differences in the LC-circuitry as summarized by Bangasser and colleagues (Bangasser et al., 2016, 2019). However, these studies primarily focus on the LC itself, suggesting that female rats have more LC neurons, denser LC-dendrites in the peri-LC region, and that LC neurons are more readily activated by stress in females because of heightened sensitivity to CRF-signaling. A recent study in mice reports, in contrast, that females have fewer TH-positive neurons in the LC, but they also find enhanced excitability of LC neurons in females (Mariscal et al., 2023). Similarly, one study has suggested molecular differences in the makeup of the LC (Mulvey et al., 2018). Our experiments, however, focus on the impact of NA release in a projection region (hippocampus). Further, we use a strong stress induction protocol (swim stress) and various potent modes of direct LC activation, so differences in "LC-excitability" are likely less relevant in this context. We added evidence showing that we trigger powerful NA release in both sexes (Supplementary Figure 2c-h; see response to Reviewer #2, Comment #3 for more details). In addition, we show that the intensity or pattern of LC stimulation does not appear to alter the molecular response (Figure 3a-b), and that various stressors (mild or intense) all trigger the same NA-dependent molecular changes (Figure 4a-b). Therefore, our results suggest that once NA is released (in the hippocampus), the molecular downstream effects on gene expression are very similar - independent of stimulation intensity, sex, or hippocampal subregion (dorsal/ventral). This does not mean that there are no sex differences for activation of LC, but rather that the transcriptional response to NA release in the hippocampus is robust across sexes, and that propranolol seems to block NA-dependent effects similarly in both sexes. This does not rule out quantitative differences between sexes that only emerge with targeted analyses of individual genes, or once fluctuations in ovarian hormones are taken into account. We have updated the section in the discussion to summarize these considerations in light of the new results (see pages 20-21, section: "A uniform molecular response to stress and noradrenaline release in both sexes").

      Comment 3: A major finding of the present study is the involvement of noradrenergic transcriptomic changes occurring in astrocytic genes in the hippocampus. Given the stated importance of this finding within the discussion, it seems that some additional dialogue integrating this with current literature about the role of astrocytes in the hippocampus during stress or fear memory would be important.

      Response: We thank the reviewer for giving us an opportunity to add a more detailed discussion about the role of astrocytes and thyroid hormones in the hippocampus during learning and memory formation. We have added these statements to the discussion:

      “Within the hippocampus, astrocytic pathways are emerging as important players for learning and memory processes (Gibbs, Hutchinson, and Hertz 2008; Bohmbach et al. 2022). In fact, it is well-known that NA enhances memory consolidation (Schwabe et al. 2022; McGaugh and Roozendaal 2002), and recent work suggests that these effects are mediated by astrocytic β-adrenergic receptors (Gao et al. 2016; Iqbal et al. 2023). Our transcriptomic screens revealed Dio2 as the most prominent target influenced by LC activity. Dio2 is selectively expressed in astrocytes and encodes for the intracellular type II iodothyronine deiodinase, which converts thyroxine (T4) to the bioactive thyroid hormone 3,3',5-triiodothyronine (T3) and therefore regulates the local availability of T3 in the brain (Bianco et al. 2019). Enzymatic activity of DIO2 has further been shown to be increased by prolonged noradrenergic transmission through desipramine treatment in LC projection areas (Campos-Barros et al. 1994). This suggests that the LC-NA system and its widespread projections could act as a major regulator of brain-derived T3. Notably, T3-signaling plays a role in hippocampal memory formation (Rivas and Naranjo 2007; Sui et al. 2006), raising the possibility that NA-induced Dio2 activity in astrocytes might mediate some of these effects.”

      Comment 4: The comparison of the candidate genes activated by the LC in the present study (swim) with datasets published by Floriou-Servou et al., 2018 (Novelty, swim, restraint, and footshock) is an interesting and important comparison. Were there other stressors identified in this paper or other publications that do not regulate these candidate genes? Further, can references be added to clarify to the reader, that prior studies have identified that novelty, restraint and footshock all activate LC-NA neurons.

      ponse: Thank you for the positive feedback. We have only tested the stressors reported in Figure 4a-b (novelty, swim, restraint, and footshock). It is known that all these stressors trigger noradrenaline release, in fact we are not aware of stressors that do not trigger NA release. This reproducible finding supports the notion that the identified set of genes is indeed highly NAresponsive. As suggested, we have now included references that show increased NA release in response to all these stressors:

      “Therefore, we assessed their expression in a dataset comparing the effect of various stressors on the hippocampal transcriptome (Floriou-Servou et al., 2018). The stressors included restraint, novelty and footshock stress, which have all previously been shown to increase hippocampal NA release (HajósKorcsok et al., 2003; Lima et al., 2019; Masatoshi Tanaka et al., 1982).”

      Comment 5: Comparisons are made between chemogenetic studies and yohimbine, stating that fewer genes were activated by chemogenetic activation of LC neurons. There is clear justification for why this may occur, but a caveat may need to be mentioned, that evidence of neuronal activation in the LC by each of these methods were conducted at 90 (yohimbine) versus 45 (hM3Dq) minutes, and therefore it cannot be ruled out that differences in LC-NA activity levels might also contribute.

      Response: The reviewer raises an important point about some inconsistencies between the time points chosen in our study, an aspect that was also pointed out by Reviewer #2. We have chosen the 45 and 90 min time points for two different reasons. On the one hand, cFos changes on the protein level are known to peak 90 min after neuronal activation, and we wanted to capture the strongest possible cFos signal in the LC. On the other hand, we wanted to measure gene expression changes triggered by NA release, which already occur 45 min after noradrenergic activation (Roszkowski et al., 2016). Thus, when the experimental design allowed separate experiments (e.g. systemic yohimbine injection), we chose to measure gene expression after 45 min, but to validate cFos activation in the LC separately after 90min. In response to DREADD activation, however, we wanted to confirm within the same animal that LC activation was successful, and thus we collected LC and hippocampus simultaneously (Figure 2c,d). While the cFos increase is already very pronounced at the 45min time point (Figure 2g), the quality of IHC is slightly lower because the tissue cannot be perfused in this experimental design. Therefore, we do not think that the time point for cFos sampling matters in this context. However, we agree with the reviewer that it remains unclear whether yohimbine and DREADDs activate the LC with similar potency. To directly compare NA release would require a set of photometry-based experiments to measure NA release using genetically-encoded NA-sensors. While we have added such experiments for LC activation with DREADDs and optogenetics to show rapid NA release indeed occurs in the hippocampus (see Reviewer #2, Comment 3; Supplementary Figure 2c-h), yohimbine interferes with the NA-sensors as explained in detail in response to Reviewer 2, Comment 3. Thus, it was too challenging for us to directly compare the release dynamics in response to DREADDs and yohimbine, which was also not the main focus of our work. To explicitly address this caveat, we have extended the corresponding section in the discussion:

      "Finally, our observation that systemic administration of the α2-adrenergic receptor antagonist yohimbine very closely recapitulates the transcriptional response to stress stands in contrast to the much more selective transcriptional changes observed after chemogenetic or optogenetic LC-NA activation. This difference could be due to various factors. First, it remains unclear how strong the LC gets activated by yohimbine versus hM3Dq-DREADDs. However, given the potent LC activation observed after DREADD activation, it seems unlikely that yohimbine would lead to a more pronounced LC activation, thus explaining the stronger transcriptional effects. Second, contrary to LC-specific DREADD-activation, systemic yohimbine injection will also antagonize postsynaptic α2-adrenergic receptors throughout the brain (and periphery). More research is needed to determine whether this could have a more widespread impact on the hippocampus (and other brain regions) than isolated LC-NA activation, further enhancing excitability by preventing α2-mediated inhibition of cAMP production. Finally, systemic yohimbine administration and noradrenergic activity have been shown to induce corticosterone release into the blood (Johnston, Baldwin, and File 1988; Leibowitz et al. 1988; Fink 2016). Thus, yohimbine injection could have broader transcriptional consequences, including corticosteroid-mediated effects on gene expression."

      Comment 6: Please add information about how virus or cannula placement was confirmed in these studies. Were missed placements also analyzed separately?

      Response: Pupillometry recordings were performed with all animals involving optogenetic or chemogenetic manipulations of the LC, before subjecting them to stress experiments. These assessments account for both correct optic fiber placement and virus expression (Privitera et al., 2020). If an animal did not show a clear pupil response, it was not included any further in the study. To demonstrate correct cannula placement for drug infusion of isoprotenerol in the dorsal hippocampus, we added a representative image of cannula placement in Supplementary Figure 1h.

      Comment 7: Time of day for tissue collection used in genetic analysis should be reported for all studies conducted or reanalyzed.

      Response: Thank you for pointing out this omission. Tissue collection for RNA-seq analysis was always performed between 11am and 5pm during the dark phase of the reversed light-dark cycle. We have added this information to the corresponding method section (“Tissue collection”).

      Reviewer #1 (Recommendations For The Authors):

      Comment 8: This is a well written, comprehensive and rigorous manuscript that will be of great interest to those in the scientific community.

      Response: Thank you for the positive evaluation of our work and for the constructive feedback.

      Reviewer #2 (Public Review):

      Comment 1: The present manuscript investigates the implication of locus coeruleus-noradrenaline system in the stress-induced transcriptional changes of dorsal and ventral hippocampus, combining pharmacological, chemogenetic, and optogenetic techniques. Authors have revealed that stress-induced release of noradrenaline from locus coeruleus plays a modulatory role in the expression of a large scale of genes in both ventral and dorsal hippocampus through activation of β-adrenoreceptors. Similar transcriptional responses were observed after optogenetic and chemogenetic stimulation of locus coeruleus. Among all the genes analysed, authors identified the most affected ones in response to locus coeruleus-noradrenaline stimulation as being Dio2, Ppp1r3c, Ppp1r3g, Sik1, and Nr4a1. By comparing their transcriptomic data with publicly available datasets, authors revealed that these genes were upregulated upon exposure to different stressors. Additionally, authors found that upregulation of Ppp1r3c, Ppp1r3g, and Dio2 genes following swim stress was sustained from 90 min up to 2-4 hours after stress and that it was predominantly restricted to hippocampal astrocytes, while Sik1 and Nr4a1 genes showed a broader cellular expression and a sharp rise and fall in expression, within 90 min of stress onset.

      Overall, the paper is well written and provides a useful inventory of dorsal and ventral hippocampal gene expression upregulated by activation of LC-NA system, which can be used as starting point for more functional studies related to the effects of stress-induced physiological and pathological changes.

      Response: We thank the reviewer for the careful assessment of our work.

      Comment 2: However, I believe that the study would have benefited of a more comprehensive analyses of sex differences. Experiments in females were conducted only in one experiment and analyses restricted to the ventral hippocampus.

      Response: In response to the comments by the reviewer, as well as Reviewer #1 and the editors, we have sequenced an additional 32 brain samples to expand the comparison of sex effects in females and males across dorsal and ventral hippocampus, and we included a new meta-analysis of 3 experimental datasets (51 male and 20 female) samples, to thoroughly assess sex differences in the transcriptomic response to stress. We refer the reviewer to our detailed response provided above to Reviewer #1, comment #2, and the updated results section on pages 3-4.

      Comment 3: Although, the experiments were overall sound and the results broadly support the conclusion made, I think some methodological choices should be better explained and rationalized. For instance, the study focuses on identifying transcriptional changes in the hippocampus induced by stress-mediated activation of the LC-NA system, however NA release following stress exposure and pharmacological or optogenetic manipulation was mostly measured in the cortex.

      Response: Because the hippocampus was used for RNA-sequencing, we could not assess NA release in the hippocampus (as this would require fiber implants that would interfere with molecular measures, or different tissue processing for HPLC). Nonetheless, we wanted to assess the transcriptional changes in the hippocampus, while simultaneously measuring successful stimulation of the LC-NA system in the same animals. To achieve this, we pursued 3 routes: 1) we used pupillometry to confirm functional LC activation; 2) we measured cFOS in the LC to directly demonstrate LC activation; 3) we assessed NA release using uHPLC (which requires larger tissue samples) and we chose the cortex because both cortex and hippocampus receive NA predominantly from the LC (Samuels & Szabadi, 2008). Importantly, we had previously shown that chemogenetic LC activation leads to a similar NA turnover in both the cortex and hippocampus, as measured by uHPLC (Zerbi et al., 2019). The relevant figure from that paper is inserted below to quickly show the striking similarity between hippocampus and cortex.

      Author response image 1.

      Levels of noradrenaline (NE) turnover (MHPG/NE ratio) in the cortex (CTX) and hippocampus (HC), measured in whole tissue with uHPLC 90min after hM3Dq-DREADD activation of the LC (copied and cropped from Zerbi et al, 2019, Neuron).

      In response to the reviewers comment, we performed additional experiments to directly demonstrate that LC-activation with DREADDs as well as optogenetics causes an increase in hippocampal NA-release. We recorded NA release in the hippocampus (using fiber photometry combined with genetically encoded NA sensors). For DREADD activation, we observed a strong increase in hippocampal noradrenaline that started a few minutes after clozapine administration, and this increase was sustained throughout the duration of the 21 minute recording (see Supplementary Figure2c-e). For optogenetic LC activation, we find a rapid and immediate sharp increase in NA levels in the hippocampus (Supplementary Figure 2f-h). These experiments were performed in females and males and triggered similar responses. An adapted and cropped version of Supplementary Figure 2 is pasted below for ease of reading.

      Please note that we could not perform a similar experiment using yohimbine, because the GRABNE sensors are based on the alpha-2 adrenergic receptor, thus yohimbine administration interferes with the photometry recording. However, we believe that it is clear from this response that strong activation of the LC leads to uniform release of NA in the hippocampus and cortex.

      Author response image 2.

      c, Schematic of fiber photometry recording of hippocampal NA during chemogenetic activation of the LC. After 5 min baseline recording in the homecage animals were injected with clozapine (0.03mg/kg, i.p.) and placed in the OFT for 21min. d, Average ΔF/F traces of GRABNE2m photometry recordings in response to chemogenetic activation of the LC (mean±SEM for hM3DGq+ and hM3DGq- split into females and males, n=3/group/sex). e, Peak ΔF/F response of fiber photometry trace. f, Schematic of fiber photometry recording of hippocampal NA during optogenetic activation of the LC. Animals were lightly anesthetized (1.5% isoflurane) and recorded in a stereotaxic frame. After 1 min baseline recording, animals were stimulated three times with 5Hz for 10s (10ms pulse width, ~8mW laser power) and recorded for 2 min post-stimulation. g, Average ΔF/F traces of the NA sensors GRABNE1m and nLightG in response to optogenetic activation of the LC (mean±SEM for females and males, n(females)= 10, n(males)=5. h, Peak ΔF/F response of fiber photometry trace.

      Comment 4: Furthermore, behavioral changes following systemic pharmacologic or chemogenetic manipulation were observed in the open field task immediately after peripheral injections of yohimbine or CNO, respectively. Is this timing sufficient for both drugs to cross the blood brain barrier and to exert behavioral effects?

      Response: We have previously shown that chemogenetic activation of the LC through clozapine elicits pupil responses within 1-2 minutes after injection (Privitera et al., 2020; Zerbi et al., 2019). This indicates that clozapine rapidly crosses the blood brain barrier and affects LC activity within a few minutes after injection. Our additional experiments using genetically encoded sensors in the hippocampus show this even more directly (Supplementary Figure 2d), see also the response to Comment 3 above.

      Similarly, yohimbine also rapidly crosses the blood brain barrier within the same time frame (Hubbard et al., 1988). These observations are consistent with the rapid behavioral effects that can be detected within a few minutes after injection of clozapine for LC-DREADD activation (Zerbi et al., 2019), and for yohimbine as well (von Ziegler et al., 2023). In response to another comment of this reviewer, we have also re-analyzed the behavior presented in the current manuscript in time-bins of 3 minutes, which also shows the rapid onset of effects in response to yohimbine (within the first 3 min) and DREADDs (within 6 min), see Supplementary Fig. 3.

      Comment 5: Finally, the study shows that activation of noradrenergic hippocampus-projecting LC neurons is sufficient to regulate the expression of several hippocampal genes, although the necessity of these projection to induce the observed transcriptional effects has been tested to some extent through systemic blockade of beta-adrenoceptor, I believe the study would have benefited of more selective (optogenetic or chemogenetic) necessity experiments.

      Response: We understand the reviewer's point that blocking the LC during stress exposure would be an interesting experiment. However, it is very hard to completely silence the LC during intense stressors. In fact, despite intense efforts, we have not been able to silence the LC during swim stress exposure using DREADDs or other chemogenetic approaches (PSAM/PSEM). We were in fact able to silence the LC with the optogenetic inhibitor JAWS (and others have reported successful LC silencing with GtACR2), but there is a major issue involving the "rebound effect", where more NA is released once the inhibition is stopped. We would thus have had to optogenetically silence the LC for 45-90 min, which would create heat artifacts, and require challenging control experiments to draw firm conclusions. Given all these issues, we reasoned that blocking adrenergic receptors is a simple and elegant solution, which provides clear evidence for the necessity of beta-adrenergic signaling.

      Reviewer #2 (Recommendations For The Authors):

      Major concerns:

      Comment 6: The study focuses on the identification of transcriptional changes in the hippocampus induced by stress-mediated activation of the LC-NA system, however, noradrenaline release following stress exposure or yohimbine injection was measured in the cortex. Authors should consider measuring NA concentrations in the hippocampus after exposure to swim stress or administration of yohimbine, or at least explain their choice to analyse to cortex in the manuscript.

      Response: We have addressed this issue in detail in Response to "Reviewer 2, Comment #3", where we provided an overview of the additional data that support our approach. As mentioned before, measuring NA release after yohimbine is not compatible with our GRABNE-photometry approach, as the GRAB-sensor is based on alpha2-adrenoceptor. Here, we would like to add that measuring NA release using photometry during swim stress is also challenging. The challenge is the vigorous movement (swimming, typically in one direction), which creates pressure on the cables/implants. We felt that overcoming these experimental challenges (setup, troubleshooting and controls) would be beyond the scope of the paper, given that it is already known that this stressor leads to strong NA release in the hippocampus. We have now included references that demonstrate that all the stressors used in our work trigger NA increase in the hippocampus (see response to Reviewer 1, Comment 3): “Therefore, we assessed their expression in a dataset comparing the effect of various stressors on the hippocampal transcriptome (Floriou-Servou et al., 2018). The stressors included restraint, novelty and footshock stress, which have all previously been shown to increase hippocampal NA release (Hajós-Korcsok et al., 2003; Lima et al., 2019; Masatoshi Tanaka et al., 1982).”

      Comment 7: Concerning the experiment aimed at investigating sex differences in gene expression, it is not clear the reason why authors decided to restrict their analyses in females to the ventral hippocampal only. The explanation that in males they did not detect major differences between the dorsal and ventral hippocampus is not sufficient, because there could have been different effects in females. Therefore, the conclusion made by the authors that their "results suggest that the transcriptomic response is independent of sex" is not entirely correct, since sex differences were only evaluated in the ventral hippocampus.

      Response: We appreciate the reviewer's critique. As described above, we have now also sequenced the dorsal hippocampal tissue from the propranolol experiment (males and females, 32 samples) and additionally added an extensive meta-analysis of three large datasets (n=71) to compare transcriptional sex differences in response to stress. A detailed description of these experiments and how they have extended/supported our conclusions have been provided in response to Reviewer #1, Comment #2.

      Comment 8: Besides the effects on females, the same experiment examined whether propranolol by itself (in the absence of stress) would have been able to alter gene expression: such effects were not examined in the dorsal hippocampus. In contrast, in a different experiment, the effects of isoproterenol on genes expression were restricted to the dorsal hippocampus only. Furthermore, related to this latter experiment, intra-dorsal hippocampal injection of isoproterenol should presumably mimic the rise in NA observed after stress exposure, why was gene expression measured 90 min after isoproterenol central injections while in the other experiments gene expression was determined 45 min after stress, that is when authors observe the peak NA concentration?

      Response: We have addressed the reviewer's critique of dorsal vs ventral hippocampus by reanalyzing 32 additional samples from dorsal hippocampus of male and female mice after propranolol (or saline) injection. Please see response to Reviewer #1, comment #2.

      Regarding the time points: We have chosen the 45 and 90 min time points mainly for two reasons. First, cFos protein changes are known to be strongest 90 min after neuronal activation. Second, because we wanted to capture gene expression changes triggered by NA release, we reasoned that these effects must be fast and should thus be measured at an early transcriptional time-point (45min). However, after performing the time-course experiment after swim stress exposure (Figure 4d,c), we observed that the LC-NA-sensitive genes (e.g. Dio2 and several PP1-subunits) show the strongest changes 90 min after stress exposure. Therefore, in some of our experiments we opted to analyze gene expression changes at 90min, converging with the time-point we typically use for cFos staining. Contrary to the reviewer's statement, peak NA concentrations are not observed 45 min after the various interventions, but rather the peak in the main metabolite (MHPG) is observed then, due to the temporal dynamics of NA release and breakdown. NA release occurs immediately upon stress exposure (or direct LC activation), which we also show in the new photometry data described above. Thus, rapid NA release triggers intracellular cascades that lead to downstream transcriptional changes, which peak presumably between 4590 min later.

      Comment 9: Behavioral changes following systemic pharmacologic or chemogenetic manipulation were observed in the open field task immediately after peripheral injections of yohimbine or CNO, respectively. Is this timing sufficient for both drugs to cross the blood brain barrier and to exert behavioral effects? It is also not immediately clear the reason why the open field tasks have different durations depending on the experiments, which can also impact the results. Authors might also consider to split the open field data analyses in 2 or 3 min time-bins, to allow for a better comparison across the different results.

      Response: We thank the reviewer for the suggestion to plot the behavior data as time-bins. We have implemented this change for the yohimbine and DREADD experiments, and updated the corresponding figure accordingly (Supplementary Figure 3, pasted below for ease of reading). The new visualization clearly shows that yohimbine injection triggers rapid behavioral effects already in the first three minutes, whereas the LC-DREADD activation triggers behavioral changes within 3-6 minutes after injection. Thus, clear drug effects are visible in the first 10 minutes, which is comparable to the standard OFT test (10min testing) shown in response to swim stress exposure (Suppl. Figure 3a). The choice to expose mice to the OFT for 21 minutes in total was due to the fact that we based our experimental approach on the optogenetic LC-stimulation protocol first published by McCall and colleagues (McCall et al, Neuron, 2015), in which the LC is stimulated for 3 min followed by 3 min pauses (see Suppl. Figure 3d). Because of this on-off design, we decided to keep the optogenetic analysis simple and show the overall effect (Supplementary Figure 3d), particularly as we know that NA dynamics do not recover rapidly enough after 3 min continuous stimulation to justify a bin-analysis (unpublished data).

      Author response image 3.

      Effects of acute stress and noradrenergic stimulation on anxiety-like behaviour in the open field test. a, Stress-induced changes in the open field test 45 min after stress onset. Stressed animals show overall reductions in distance traveled (unpaired t-test; t=3.55, df=22, p=0.0018), time in center (welch unpaired t-test; t=3.50, df=13.61, p=0.0036), supported rears (unpaired t-test; t=3.39, df=22, p=0.0026) and unsupported rears (unpaired t-test; t=5.53, df=22, p = 1.47e-05) compared to controls (Control n = 12; Stress n = 12). This data have been previously published (von Ziegler et al., 2022). b, Yohimbine (3 mg/kg, i.p.) injected animals show reduced distance traveled (unpaired t-test; t=2.39, df=10, p=0.03772), reduced supported rears (unpaired t-test; t=6.56, df=10, p=0.00006) and reduced unsupported rears (welch unpaired t-test; t=3.69, df=4.4, p = 0.01785) compared to vehicle injected animals (Vehicle n = 6; Yohimbine n = 7). c, Chemogenetic LC activation induced changes in the open field test immediately after clozapine (0.03 mg/kg, i.p.) injection. hM3Dq+ animals show reduced distance traveled (unpaired t-test; t=6.28, df=13, p=0.00003), reduced supported rears (unpaired t-test; t=4.28, df=13, p=0.0009), as well as reduced unsupported rears (welch unpaired t-test; t=4.28, df=13, p = 0.00437) compared to hM3D- animals (hM3Dq- n = 7; hM3Dq+ n = 8). d, Optogenetic 5 Hz LC activation induced changes during the open field test. ChR2+ animals show reduced supported rears (unpaired t-test; t=2.42, df=64, p=0.0185) and reduced unsupported rears (unpaired ttest; t=2.91, df=64, p = 0.00499) compared to ChR2- animals (ChR2- n = 32; ChR2+ n = 36). Data expressed as mean ± SEM. p < 0.05, p < 0.01, p < 0.001, **p < 0.0001.

      Comment 9: The study shows that activation of noradrenergic hippocampus-projecting LC neurons is sufficient to regulate the expression of several hippocampal genes. I believe the study would have benefited of more selective necessity experiments. Authors might consider adding optogenetic (or chemogenetic) experiments aimed at inhibiting LC-NA hippocampal projections during stress exposure (or, alternatively, perform intrahippocampal pharmacological blockade of β-adrenoreceptors during stress exposure), and determine the effects on gene expression.

      Response: We kindly refer the reviewer to our previous response to Comment #2 above.

      Minor concerns:

      There is a typo in the abstract. Please correct "LN-NA" with "LC-NA"

      Response: Thank you, we have corrected it.

      References

      Bangasser, D. A., Eck, S. R., & Ordoñes Sanchez, E. (1/2019). Sex differences in stress reactivity in arousal and attention systems. Neuropsychopharmacology: Official Publication of the American College of Neuropsychopharmacology, 44(1), 129–139.

      Bangasser, D. A., Wiersielis, K. R., & Khantsis, S. (06/2016). Sex differences in the locus coeruleusnorepinephrine system and its regulation by stress. Brain Research, 1641, 177–188.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      Mice can learn to associate sensory cues (sound and light) with a reward or activation of dopamine neurons in the ventral tegmental area (VTA), and then anticipate the reward from the sensory cue only. Using this paradigm, Harada et al. showed that after learning, the cue is able to induce dopamine release in the projection targets of the VTA, namely the nucleus accumbens and lateral hypothalamus (LH). Within the LH, dopamine release from VTA neurons (either by presentation of the cue or direct optical stimulation of VTA neurons) activates orexin neurons, measured as an increase in intracellular calcium levels.

      Strengths:

      This study utilized genetically encoded optical tools to selectively stimulate dopamine neurons and to monitor dopamine release in target brain areas and the calcium response of orexin neurons. This allowed a direct assessment of the relationship between the behavioral response of the animals, the release of a key neurotransmitter in select brain areas, and its effect on target cells, with a precision previously not possible. The results shed light on the mechanism underlying reward-related learning and expectation.

      Weaknesses: - The Ca increase in orexin neurons in response to optical stimulation of VTA DA neurons is convincing. However, there is an accumulated body of literature indicating that dopamine inhibits orexin neurons through D2 receptors, particularly at high concentrations both directly and indirectly (PMID 15634779, 16611835, 26036709, 30462527; but note that synaptic effects at low conc are excitatory - PMID 30462527, 26036709). There should be a clear acknowledgment of these previous studies and a discussion directly addressing the discrepancy. Furthermore, there are in-vivo studies that investigated the role of dopamine in the LH involving orexin neurons in different behavioral contexts (e.g. PMID 24236888). The statement found in the introduction "whether and how dopamine release modulates orexin neuronal activity has not been investigated vigorously" (3rd para of Introduction) is an understatement of these previous reports.

      We thank the Reviewer for pointing out that we missed several important citations. We added the references mentioned and the discrepancy of concern is addressed in the discussion section

      • Along these lines, previous reports of concentration-dependent bidirectional dopaminergic modulation of orexin neurons suggest that high and low levels of DA would affect orexin neurons differently. Is there any way to estimate the local concentration of DA released by the laser stimulation protocol used in this study? Could there be a dose dependency in the Intensity of laser stimulation and orexin neuron response?

      We agree that this is an interesting point. However, one limitation of our study, and of intensity-based genetically-encoded sensors in general, is that the estimation of the concentration is technically difficult. The sensor effectively reports changes in extra-synaptic levels of neurotransmitters, but to get the absolute value other modalities would be needed such as fast scan voltammetry. This limitation is now included in the discussion section.

      • The transient dip in DA signal during omission sessions in Fig2C (approx 1% decrease from baseline) is similar in amplitude compared to the decrease seen in non-laser trails shown in Fig 1C right panel (although the time course of the latter is unknown as the data is truncated). The authors should clarify whether those dips are a direct effect of the cue itself or indeed reward prediction error.

      Thanks for raising this important point. Indeed, there is a dip of the signal during non-stimulation trials. At day 1, the delivery of the cue triggered a dip and at day 10, there was a slight increase of the signal and followed by the dip. The data is difficult to interpret but our hypothesis is that two components trigger this dip of the signal. One is the aversiveness of the cue. Because a relatively loud sound (90dB) was used for the cue, it would not be surprising if the auditory cue was slightly aversive to the experimental animals. It has been shown that aversive stimuli induce a dip of dopamine in the NAc, although it is specific to NAc subregions. The second component is reward prediction error. Although the non-laser paired cue never triggered the laser stimulation, it is similar to the laser paired one. In a way both are composed of loud tone and same color of the visual cue (spatially different). We think it is possible that reward-related neuronal circuit was slightly activated by the non-laser paired cue. In line with this interpretation, a small increase of the signal was observed at day 10 but not day 1. If our hypothesis is true, since this signal was induced by two components, further analysis is unfortunately difficult.

      • There seem to be orexin-negative-GCaMP6 positive cells (Fig. 4B), suggesting that not all cells were phenotypically orexin+ at the time of imaging.<br /> The proportion of GCaMP6 cells that were ORX+ or negative and whether they responded differently to the stimuli should be indicated.

      While we acknowledge the observation of orexin-negative-GCaMP6 positive cells in Figure 4B, it's important to note that this phenomenon is consistent with the characteristics of the hOX-GCaMP virus used in prior experiments. The virus has undergone thorough characterization, and it has been reported to exhibit over 90% specificity, as demonstrated in prior work conducted in the laboratory of one of our contributing authors (PMID: 27546579). To address the concern raised by the reviewer, we have included Supplemental Figure 4 confirming that all mice consistently exhibited qualitatively similar hOX-GCaMP transients upon dopaminergic terminal stimulation. This additional evidence supports the reliability and specificity of our experimental approach.

      • Laser stimulation of DA neurons at the level of cell bodies (in VTA) induces an increase in DA release within the LH (Fig. 3C, D), however, there is no corresponding Ca signal in orexin neurons (Fig.4C).

      We realized that the figures were not clear and we understood that the reviewer did not see any corresponding Ca signal, but this description is not true. We now added Supplemental Figure 3 to show that there is Ca signal at day 1 already.

      In contrast, stimulating DA terminals within the LH induces a robust, long-lasting Ca signal (> 30s) in orexin neurons (Fig. 5). The initial peak is blocked by raclopride but the majority of Ca signal is insensitive to DA antagonists (please add a positive control or cite references indicating that the dose of antagonists used was sufficient; also the timing of antagonist administration should be indicated).

      This is now included in the discussion section. Also, the timing and dose of the antagonist is now described in the method section.

      Taken together, these results seem to suggest that DA does not directly increase Ca signal in orexin neurons. What could be mediating the remaining component?

      This point has been included in the discussion section.

      • Similarly, there is an elevation of Ca signal in orexin neurons that remains significantly higher after the cue/laser stimulation (Fig. 4F). It appears that it is this sustained component that is missing in omission trials. This can be analyzed further.

      It is true that there is a sustained component in stimulation trials, that is missing in omission trials. Most likely that is evoked by the stimulation of dopamine neurons. We argue that this component is isolated in Fig 5 and analyzed as much as we can.

      • Mice of both sexes were used in this study; it would be interesting to know whether sex differences were observed or not.

      We agree that this is an important point. However, our sample number is not high enough to make a meaningful comparison between male and female.

      Reviewer #2 (Public Review):

      Summary:

      This is an interesting and well-written study assessing the role of dopaminergic inputs from the VTA on orexin cell responses in an opto-pavlovian conditioning task. These data are consistent with a possible role of this system in reward expectation and are surprisingly one of the first demonstrations of a role for dopamine in this phenomenon.

      Strengths:

      The study has used an interesting opto-Pavlovian approach combined with fibre photometry.

      Weaknesses:

      It is unclear what n size was used or analysed, particularly for AUC measures e.g. Figures 1 D/E and 3 G. The number of trials reflected and the animal numbers need clarification.

      The sample size is indicated in the legend section.

      The study focused on opto-stim omissions - this work would be significantly strengthened by a comparison to a real-world examination where animals are trained for a radiation reward (food pellet).

      We agree that this would be an important experiment. This experiment is partially done in one of the contributing authors laboratories (doi.org/10.1101/2022.04.13.488195) and would be one of our follow up study.

      Have the authors considered the role of orexin in the opposing situation i.e. a surprise addition of reward?

      That would be an interesting experiment. To do that, natural reward, not optical stimulation, should be used as a reinforcer. This could be part of our follow up study.

      Similarly, there remains some conjecture regarding the role of these systems in reward and aversion - have the authors considered aversive learning paradigms - fear, or fear extinction - to further explore the roles of this system? There are some (important) discussions about the possible role of orexin in negative reinforcement. Further studies to address this could be warranted.

      It is true that dopamine also plays a significant role in aversive learning. Therefore, this would be an interesting experiment. The discussion section now includes this point.

      I think some further discussion of the work by Lineman concerning the interesting bidirectional actions of d1/d2 r signalling on glutamatergic transmission onto orexin neurons is worthwhile. While this work is currently cited, the nuance and perhaps relevance to d1 and d2 signalling could be contextualised a little more (https://doi.org/10.1152/ajpregu.00150.2018).

      Thanks for the suggestion. The discussion has been expanded.

      Reviewer #3 (Public Review):

      Summary:

      Harada and colleagues describe an interesting set of experiments characterizing the relationship between dopamine cell activity in the ventral tegmental area (VTA) and orexin neuron activity in the lateral hypothalamus (LH). All experiments are conducted in the context of an opto-Pavlovian learning task, in which a cue predicts optogenetic stimulation of VTA dopamine neurons. With training, cues that predict DA stimulation come to elicit dopamine release in LH (a similar effect is seen in accumbens). After training, omission trials (cue followed by no laser) result in a dip (inhibition) of dopamine release in LH, characteristic of reward prediction error observed in the striatum. Across cue training, the activity pattern of orexin neurons in LH mirrors that of LH DA levels. However, unlike the DA signal, orexin neurons do not exhibit a decrease in activity in omission trials. Systemic blockade of D2 but not D1 receptors blocked DA release in LH following VTA DA cell stimulation.

      Strengths: Although much work has been dedicated to examining projections from orexin cells to VTA, less has been done to characterize reciprocal projections and their function. In this way, this paper is a very important addition to the literature. The experiments are technically sound (with some limitations, below) and utilize sophisticated approaches, the manuscript is nicely written, and the conclusions are mostly reasonable based on the data collected.

      Weaknesses:

      I believe the impact of the paper could be enhanced by considering and/or addressing the following:

      Major:

      • I encourage the authors to discuss in the Introduction previous work on DA regulation of orexin neurons. In particular, the authors cite, but do not describe in any detail, the very relevant Linehan paper (2019; Am J Physiol Regul) which shows that DA differentially alters excitatory/inhibitory input onto orexin neurons and that these actions are reversed by D1 vs D2 receptor antagonists. Another paper (Bubser, 2005, EJN) showed that dopamine agonists increase the activity of orexin neurons and that these effects are blocked by D1/D2 antagonists. The current findings should be discussed in the context of these (and any other relevant) papers in the Discussion, too.

      Thanks for the valuable suggestion. This point has been integrated and the introduction and discussion sections have been revised carefully.

      • In the Discussion, the authors provide two (plausible) explanations for why they did not observe a dip in the calcium signal of orexin neurons during omission trials. Is it not possible that these cells do not encode for this type of RPE?

      We completely agree that it is possible. Now our current hypothesis is that dopamine in the LH encodes RPE and that information is transmitted to orexin neurons. Orexin neurons integrate other information and encode something else, we call it ‘multiplexed cognitive information’. It is still open question what this means exactly. This point is now mentioned in the discussion section.

      • Related to the above - I am curious about the authors' thoughts on why there is such redundancy in the system. i.e. why is dopamine doing the same thing in NAC and LH in the context of cue-reward learning?

      Thank you for the question. This is an important point, indeed. Our current hypothesis is described in the discussion section.

      ’Our data indicate that dopamine in both the NAc and LH encodes reward prediction error (RPE). One open question is the existence of such a redundant mechanism. We hypothesize that dopamine in the LH boosts dopamine release via a positive feedback loop between the orexin and dopamine systems. It has already been established that some orexin neurons project to dopaminergic neurons in the VTA, positively modulating firing. On the other hand, our data indicate that dopamine in the LH stimulates orexinergic neurons. These collective findings suggest that when either the orexin or dopamine system is activated, the other system is also activated consequently. Although the current findings align with this idea, the hypothesis should be carefully challenged and scrutinized.’

      • The data, as they stand, are largely correlative and do not indicate that DA recruitment of orexin neurons is necessary for learning to occur. It would be compelling if blocking the orexin cell recruitment affected some behavioral outcomes of learning. Similarly - does raclopride treatment across training prevent learning?

      We appreciate the insightful comment. It is indeed a limitation of our study that we lack behavioral data. However, given the extensive previous research on the crucial role of orexin in motivated behavior, we argue that establishing dopaminergic regulation of the orexin system itself is a valuable contribution. This perspective is thoroughly discussed in the dedicated section of our paper. It's important to note that the injection of D2 antagonists, including raclopride, is known to induce significant sedation. Due to this sedative effect, combining behavioral experiments with these drugs poses considerable challenges.

      • Only single doses of SCH23390 and raclopride were used. How were these selected? It would be nice to use more of a dose range to show that 1) and effect of D1R blockade was not missed, and 2) that the reduction in orexin signal with raclopride was dose-dependent.

      The rationale of the dose has been added to the discussion session. It is reported that these doses block dopamine receptors. We agree that it would be nice to have a dose-response curve, we are reluctant to increase the doses to avoid adverse effect to the experimental animals. The doses we used effectively induced hypo-locomotion, although data is not shown.

      • Fig 1C, could the effect the authors observed be due to movement?

      We argue this is unlikely. We recorded two channels one for the control and the other one for the signal. The motion-related artifact is corrected based on the control channel. One example trace around the laser stimulation is shown below. Please note that a typical motion-related artifact is a fast dip of the signal, normally observed in both 405 and 465 nm channels.

      Relatedly, what was the behavior like when the cue was on? Did mice orient/approach the cue?

      Although it has been reported that rats approach the cue (PMID: 30038277) in a similar task, it was not obvious in our case. It could be because we used both visual and auditory cues. Mice showed a general increase of locomotion during the cue and the stimulation but the direction was not clear to the experimenter.

      Also, when does the learning about the cue occur? Does it take all 10 days of learning or does this learning/cue-induced increase in dopamine signaling occur in less than 10 days?

      It is hard to say when the learning occurs. When we look at the learning curve of Figures 1,3 and 4, it seems the response to the cue plateaus at day 5 but since we don’t have behavioral data, the assessment is relayed only on the neuronal signal.

      • Also related to the above, could the observed dopamine signal be a result of just the laser turning on? It would seem important to include mice with a control sensor.

      We recorded two channels, 405 nm and 465 nm wavelength. 405 nm signal did not show increase of the signal while 465 nm signal did. The example trace is shown. Besides, the sensor has been characterized by the corresponding author already so we argue that this is unlikely.

      Author response image 1.

      Fig 1E, the effect seems to be driven by one mouse which looks like it could be a statistical outlier. The inclusion of additional animals would make these data more compelling.

      We agree that adding more mice would make data more compelling. However, considering the fact that dopamine in the accumbens has been investigated vigorously and our data is in line with the prior studies, we argue that we have enough data to claim our conclusion.

      • For Fig 1C, 3D, 3F, and 4D, could the authors please show the traces for the entire length of laser onset? It would be helpful to see both the rise and the fall of dopamine signals.

      For Fig 1C, one panel has been added. For fig 3, 4, supplemental figure was created to show the signal around laser stimulation.

      • Fig 2C, could the authors comment on how they compared the AUC to baseline? Was this comparison against zero? Because of natural hills and troughs during signals prior to cue (which may not equate to a zero), comparing the omission-induced dip to a zero may not be appropriate. A better baseline might be using the signals prior to the cue.

      The signal immediately before the cue onset was considered as a baseline, and baseline was subtracted. This means zero and baseline would be the same in our way of analysis.

      • Could the authors comment on how they came up with the 4-5.3s window to observe the AUC in Fig 3H?

      Since the kinetic of dopamine in the NAc and LH is different, different time windows have been used to observed a dip of dopamine. The analysis of the kinetics has been added.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Specific feedback to the authors

      • Sample size for each experiment/group could not be found.

      The sample size is now included in the legends.

      • In most figures, the timing of onset for the cue and laser stimulation is unclear. This makes the data interpretation difficult. They should be labeled as in Fig. 3C, for example.

      Panels have been updated to address this point.

      • Please provide the rationale for selecting the time range for the measurement of AUC for different experiments (e.g. Fig. 2C, 3H, 4A, 5F).

      The kinetics of dopamine in NAc and LH are different. This is now shown in the new Supplemental Figure 2. Based on this difference, the different window was chosen.

      • Fig. 1E, 3G right, 4E right: statistical analysis should use two-way repeated measures ANOVA rather than one-way ANOVA. Fig 1D, 3G left and 4E left panels can also be analyzed by two-way repeated measures ANOVA.

      We realized that those panels were redundant. Some panels have been removed and the analysis has been conducted according to this point.

      Minor comments:

      Fig. 2C can also show non-omission trials as a comparison.

      The panel has been updated.

      • The term "laser cue" is confusing, as the cue itself does not involve a laser.

      ’Laser-paired cue’ is used instead.

      • Color contrast can be improved for some figures, including Fig. 2C right, Fig. 3H right, and green and blue fluorescent fonts.

      The panels have been updated.

      • Figure legends: Tukey's test, rather than Tekey's test.

      This has been fixed.

      • There are some long-winded sentences that are hard to follow.

      Edited.

      • p.2, line 11 from bottom: should read ...the VTA evokes the release of dopamine.

      Edited

      • p.3, line 9: remove e from release.

      This has been addressed.

      Reviewer #3 (Recommendations For The Authors):

      Minor:

      • When discussing the understudied role of dopamine in brain regions other than the striatum in the Introduction, it might be helpful to cite this article: https://elifesciences.org/articles/81980 where the authors characterize dopamine in the bed nucleus of stria terminalis in associative behaviors and reward prediction error.

      The discussion session has been updated accordingly.

      • In the Discussion, it might be better to refrain from describing the results as 'measuring dopamine release' in the LH. Since there was no direct detection of dopamine release, rather a dopamine binding to the dLight receptors, referring to the detection as dopamine signaling/binding/transients is a better alternative.

      This point has been addressed.

      • In the Discussion, without measuring tonic dopamine release, it is difficult to say that there was a tonic dopamine release in the LH prior to negative RPE. In addition, I wouldn't describe the negative RPE as silencing of dopamine neurons projecting to the LH since this was not directly measured and it is hard to say for sure if the dip in dopamine is caused by silencing of the neurons. There certainly seems to be a reduction in extra-synaptic dopamine signaling in LH, however, what occurs upstream is unknown.

      We respectfully disagree with this point. In our opinion, the dopamine transient is more important than the firing of dopamine neurons because what matters for downstream neurons is dopamine concentration. For example, administration of cocaine increases the dopamine concentration extra-synaptically via blockade of DAT, while the firing of dopamine neurons go down via activation of D2 receptors expressed in dopamine neurons. Administration of cocaine is not known to induce negative RPE.

      • Typo at multiple places: 'Tekey's multiple comparison test'.

      This has been fixed.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      The experimental rigor and design of the noctural IOP experiments was weak with low n values and differing methods of IOP measurement (conscious versus anesthetized). The same method of IOP measurement needs to be used for all measurements to make any conclusions on the circadian patterns of IOP in each condition.

      One of the goals of our study was to confirm the results from the Patel et al (2021; PMID33853948) study, which in which nocturnal IOP measurements were conducted in anesthetized mice and diurnal IOP measurements in awake animals but we agree with both Reviewers that IOP should be measured under identical experimental conditions. Parenthetically, the number of animals per each treatment paradigm in the original version (N = 4) was sufficient to produce statistical significance for diurnal control vs diurnal TGFB, and diurnal control vs nocturnal control conditions.

      To address the comment, we generated an additional cohort of TGFb2-expressing mice (N = 6) in which nocturnal and diurnal measurements were performed in awake animals. The results are shown in the revised Figure 6. Similar to the anesthetized cohort, the diurnal IOP in Lv-TGFB2 mice was statistically indistinguishable from the nocturnal value, indicating that TGFB2-induced OHT is not additive to physiological (circadian) OHT. The TRPV4-dependence of ocular hypertension induced by physiological and pathological methods suggests that the channel functions as a final common mechanism for ocular hypertension.

      Reviewer #2 (Public review):

      Figure 1A-C. Often there is a difference between the massage (message?, op. authors) and transcript data. I recommend the authors to confirm with qPCR data with another mode of protein measurements.

      We are not sure we understand the Reviewer’s comment regarding the “difference between the message and transcript data” but note that the mRNA data shown in panels A & B are confirmatory of previously published transcriptomic and proteomic screens (eg, Fleenor et al., IOVS 2006; Bollinger et al., IOVS 2011;  Callaghan et al., Scientific Reports 2022; Li et al., Current Eye Research 2022 etc) and were included to show that the transcriptional response of canonical SMAD and pro-fibrotic genes unfolds as predicted from previous work. With regard to TRPV4 signaling, we expand transcriptomic data with protein analysis (Western blots) and functional analyses (measurements of TRPV4-mediated current and calcium imaging). Transcriptomic, protein expression, electrophysiological and imaging experiments revealed a remarkable consistency in TGFB2-dependence of gene (Fig. 1C) and protein expression (Fig. 1D), transmembrane current (Fig. 3C) and intracellular calcium (Fig. 2).

      Parenthetically, we attempted to get a sense for the TGFB2-dependence of Piezo1 protein expression by conducting Western blots with multiple antibodies and experimental conditions. These efforts were unsuccessful, presumably due to the complexity (30-40 TM domains) and large molecular weight (280-300 kDa) of the protein. We note, however, that Piezo1 signaling cannot account for the observed OHT given that studies by us and others  (Yarishkin et al., 2021, PMID: 33226641 and Zhu et al., 2021; PMID: 33532718) associated Piezo1 signaling with facility increases. The revised m/s reads: “The suppression of outflow facility by Piezo1 inhibitors applied under in vitro and in vivo conditions (39, 81) instead suggests that Piezo1 opposes the hypertensive functions of TRPV4.” The preprint by Redmon et al. (bioRxiv 2024, PMID 39041037) expands the TRPV4-dependence of OHT to microbead-induced, steroid-induced and nocturnal models of OHT to indicate that TRPV4 functions as a universal driver of elevated IOP.  We reiterate this in the revised Discussion.

      Does direct TRPV4 activation also induce the expression of these markers? Does inhibition of TRPV4, after TGF-β treatment, prevent the expression of these markers? Is TRPV4 acting downstream of this response?

      A RNASeq study conducted by us (Rudzitis et al., under review) suggests that the agonist GSK101 has minimal effect on the fibrotic and canonical pathways shown in panels A and B. These data are beyond the scope of the present study. They will be published elsewhere, however, we include the data associated with genes depicted in panels A and B for the reviewer at the end of this Response.

      We conducted an additional series of experiments to test whether TGFB2-induced upregulation of the TRPV4 and Piezo1 genes is itself TRPV4-dependent. As shown in the new SFig. 1, upregulation of the two genes is unaffected by TRPV4 inhibition.

      Figure 1D. Beta tubulin is not a membrane marker. Having staining of b tubulin in membrane fraction shows contamination from the cytoplasm. Does the overall expression also increase?

      b-tubulin associates with the plasma membrane by binding to integral membrane proteins in the plasma and organellar membranes through palmitoylation and attachment to linker proteins and as an integral component of exocytotic vesicles (Wolff, BBA 2009; Hogerheide et al., PNAS 2017). The protein is often used as a loading control for the TRPV4 protein (please see https://www.cellsignal.com/products/primary-antibodies/trpv4-antibody/65893; Grove et al., Science Signaling 2019 and Moore et al., PNAS 2013).  Parenthetically, our RNASeq studies did not find modulation of b-tubulin expression by TGFβ2 [CNR and DK, unpublished observations].

      We examined the overall (cytosolic and membrane) TRPV4 expression and observed, similarly to the membrane fraction alone (Figure 2), upregulation following cytokine stimulation:

      Author response image 1.

      Western blot, total protection extract from control and TGFb2-treated TM cells [Alomone antibody].

      These results in our estimation do not add to the overall narrative and were not included into the paper.

      Figure 4A: it is not very clear. I recommend including a zoom image or better resolution image.

      We include a whole-page image as the new SFigure 4.

      Figure 5B and 6B. Why there is a difference between groups in pre-injection panel. As Figure 5A, in pre-injection, there is no difference between LV-TGFβ and LV-control while in 5B there is a significant difference between these groups.

      We revised Figure Legends to clarify that “pre-injection” in Figures 5B and 6B refers to IOP measurements before the intracameral injection of HC-06  not pre-injection of lentiviral constructs.

      Discussion section. Line 279: "TRPV4 channels in cells treated with TGFβ2 are likely to be constitutively active" ... needs to be discussed further.

      We rewrote the paragraph to clarify that TRPV4 is a thermosensitive channel that is expected to be constitutively active at the incubator temperature:

      “The effectiveness of TRPV4 inhibition in suppressing TGFB2-induced contractility (Fig. 4) is consistent with constitutive activation of TRPV4 channels in incubator-cultured cells.  TRPV4 is a thermosensitive channel (Q10 ~10). Mouse TRPV4 is activated by physiological temperatures (Chung et al., 2003; Shibasaki et al., 2007) with peak activation between ~34 - 37oC (Guler et al., 2003). The several-fold increase in functional expression of the channel in TGFB2-treated cells (Fig. 2) would be expected to promote tonic influx of Ca2+ and Ca2+-dependent cellular signaling. The abrogation of the contractile response in the presence of HC-06 indicates that TRPV4-mediated Ca2+ influx represents the principal source of calcium that drives the contractile response. Consistent with this, supplementation with the agonist GSK101 was sufficient to evoke TM contraction (Fig. 4B).”

      Line 280: "The residual contractility in HC-06-treated cells may reflect TGFβ2-mediated contributions from Piezo1." Piezo1 has a low threshold for mechanosensitivity. How do the authors discuss the observation that, in the presence of Piezo1, TRPV4 has a more prominent mechanosensory function? Is this tied to TGFβ signalling?

      This is an interesting question. Our macroscopic and single channel recordings of Piezo1 activity in TM cells recapitulate the time course published in the original Coste et al. (2010) study, showing the channel inactivates within 10-100 msec (Yarishkin et al., 2021). Thus, it is likely that the channel is largely inactivated during chronic ocular hypertension. Indeed, it has been suggested that resting membrane tension alone may be sufficient to inactivate Piezo1 (Lewis and Grandl, 2015), with cells grown on stiff substrates (e.g., under our experimental conditions) experiencing almost complete Piezo1 inactivation. We propose that the primary function of Piezo channels may be to sense and transduce transient mechanical loading. The remarkable IOP-lowering effectiveness of TRPV4 antagonists and knockdown indicates that - in contrast to Piezo1 - TRPV4 activation is sustained.  

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The complete strain name for the Trpv4-/- mice are missing.

      Corrected.

      The layout for Figure 6 is confusing as HC-06 was only used in panels B and C but the labels are above panel A.

      Corrected.

      Reviewer #2 (Recommendations for the authors):

      Only two mice were used for the noctural IOP experiments. Justification for retreating the same mice in opposite eyes and counting it as n=4 is not rigorous or justified.

      The number of mice investigated in the original submission was four. In Week 1, two mice underwent PBS injections and 2 two mice were treated with HC-06. After the baseline was re-established in Week 2, the treatments were reversed.

      We supplemented these numbers with an additional cohort of 6 mice, with identical results re: nocturnal vs diurnal IOP. These data are presented in the revised Figure 6.

      Why are daytime IOPs measured in awake mice but noctural IOP's measured in isoflurane anesthetized mice? Anesthesia is well known to effect IOP and using two different methods could alter the results, especially when comparing between the groups. This could be why you did not see a noctural rise in the TGFB injected eyes. The same method needs to be used for all measurements to make any conclusions on the circadian patterns of IOP in each condition.

      This is a good point, please see our response above.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      Building on their own prior work, the authors present valuable findings that add to our understanding of cortical astrocytes, which respond to synaptic activity with calcium release in subcellular domains that can proceed to larger calcium waves. The proposed concept of a spatial "threshold" is based on solid evidence from in vivo and ex vivo imaging data and the use of mutant mice. However, details of the specific threshold should be taken with caution and appear incomplete unless supported by additional experiments with higher resolution in space and time.

      We thank the reviewers and editors for the positive assessment of our work as containing valuable findings that add to our understanding of cortical astrocytes. We also appreciate their positive appraisal of the proposed concept of a spatial threshold supported by solid evidence. 

      Regarding their specific comments, we truly appreciate them because they have helped to clarify issues and to improve the study. Point-by-point responses to these comments are provided below. Regarding the general comment on the spatial and temporal resolution of our study, we would like to clarify that the spatial and temporal resolution used in the current study (i.e., 2 - 5 Hz framerate using a 25x objective with 1.7x digital zoom with pixels on the order of 1 µm2) is within the norm in the field, does not compromise the results, nor diminish the main conceptual advancement of the study, namely the existence of a spatial threshold for astrocyte calcium surge. 

      We respect the thoughtfulness of the reviewers and editors towards improving the paper.

      Public Reviews:

      Reviewer #1 (Public Review):

      Lines et al., provide evidence for a sequence of events in vivo in adult anesthetized mice that begin with a footshock driving activation of neural projections into layer 2/3 somatosensory cortex, which in turn triggers a rise in calcium in astrocytes within "domains" of their "arbor". The authors segment the astrocyte morphology based on SR101 signal and show that the timing of "arbor" Ca2+ activation precedes somatic activation and that somatic activation only occurs if at least {greater than or equal to}22.6% of the total segmented astrocyte "arbor" area is active. Thus, the authors frame this {greater than or equal to}22.6% activation as a spatial property (spatial threshold) with certain temporal characteristics - i.e., must occur before soma and global activation. The authors then elaborate on this spatial threshold by providing evidence for its intrinsic nature - is not set by the level of neuronal stimulus and is dependent on whether IP3R2, which drives Ca2+ release from the endoplasmic reticulum (ER) in astrocytes, is expressed. Lastly, the authors suggest a potential physiologic role for this spatial threshold by showing ex vivo how exogenous activation of layer 2/3 astrocytes by ATP application can gate glutamate gliotransmission to layer 2/3 cortical neurons - with a strong correlation between the number of active astrocyte Ca2+ domains and the slow inward current (SIC) frequency recorded from nearby neurons as a readout of glutamatergic gliotransmission. This is interesting and would potentially be of great interest to readers within and outside the glia research community, especially in how the authors have tried to systematically deconstruct some of the steps underlying signal integration and propagation in astrocytes. Many of the conclusions posited by the authors are potentially important but we think their approach needs experimental/analytical refinement and elaboration.

      We thank the reviewer for her/his positive appraisal and comments that has helped us to improve the study. In response to their insights, we aim to address the key points raised below:

      (1) Sequence of Events: We acknowledge the reviewer's interest in our findings regarding the sequence of events. We have provided a more detailed description of the methods and results to clarify the spatiotemporal relationships between domain activation and spatiotemporal clustering, to centripetal and centrifugal calcium propagation in relation to soma activation.

      (2) Spatial Threshold: The reviewer accurately identifies our characterization of a spatial threshold (≥22.6% activation) with temporal characteristics as a crucial aspect of our study. We have expanded upon this concept by offering a clearer illustration of how this threshold relates to somatic and global activation.

      (3) Intrinsic Nature of Spatial Threshold: The reviewer's insightful observation regarding the inherent quality of the spatial threshold, regardless of its dependence on neuronal stimuli is noteworthy. We have provided additional details to substantiate this claim, shedding more light on the fundamental nature of this phenomenon.

      (4) Physiological Implications: The reviewer rightly highlights the potential physiological significance of our findings, particularly in relation to gliotransmission in cortical neurons. We have enhanced our discussion by elaborating on the implications of these observations.

      The primary issue for us, and which we would encourage the authors to address, relates to the low spatialtemporal resolution of their approach. This issue does not necessarily compromise the concept of a spatial threshold, but more refined observations and analyses are likely to provide more reliable quantitative parameters and a more comprehensive view of the mode of Ca2+ signal integration in astrocytes. 

      We agree with the reviewer that our spatial-temporal resolution (2 – 5 Hz framerate using a 25x objective and 1.7x digital zoom with pixels on the order of 1 µm) does not compromise the proposed concept of the existence of a spatial threshold for the intracellular calcium expansion.

      For this reason, and because their observations might be perceived as both a conceptual and numerical standard in the field, we believe that the authors should proceed with both experimental and analytical refinement. Notably, we have difficulty with the reported mean delays of astrocyte Ca2+ elevations upon sensory stimulation. The 11s delay for response onset in "arbor" and 13s in the soma are extremely long, and we do not think they represent a true physiologic latency for astrocyte responses to the sensory activity. Indeed, such delays appear to be slower even than those reported in the initial studies of sensory stimulation in anesthetized mice with limited spatial-temporal resolution (Wang et al. Nat Neurosci., 2006) - not to say of more recent and refined ones in awake mice (Stobart et al. Neuron, 2018) that identified even sub-second astrocyte Ca2+ responses, largely preserved in IP3R2KO mice. Thus, we are inclined to believe that the slowness of responses reported here is an indicator of experimental/analytical issues. There can be several explanations of such slowness that the authors may want to consider for improving their approach: (a) The authors apparently use low zoom imaging for acquiring signals from several astrocytes present in the FOV: do all of these astrocytes respond homogeneously in terms of delay from sensory stimulus? Perhaps some are faster responders than others and only this population is directly activated by the stimulus. Others could be slower in activation because they respond secondarily to stimuli. In this case, the authors could focus their analysis specifically on the "fast-responding population". (b) By focusing on individual astrocytes and using higher zoom, the authors could unmask more subtle Ca2+ elevations that precede those reported in the current manuscript. These signals have been reported to occur mainly in regions of the astrocyte that are GCaMP6-positive but SR101-negative and constitute a large percentage of its volume (Bindocci et al., 2017). By restricting analysis to the SR101-positive part of the astrocyte, the authors might miss the fastest components of the astrocyte Ca2+ response likely representing the primary signals triggered by synaptic activity. It would be important if they could identify such signals in their records, and establish if none/few/many of them propagate to the SR-101-positive part of the astrocyte. In other words, if there is only a single spatial threshold, the one the authors reported, or two or more of them along the path of signal propagation towards the cell soma that leads eventually to the transformation of the signal into a global astrocyte Ca2+ surge. 

      We thank the reviewer for these excellent and important comments. The qualm with the mean delays of astrocyte activation is indeed a result of averaging together astrocyte responses to a 20 second stimulus. Indeed, astrocyte responses are heterogeneous and many astrocytes respond much quicker, as can be seen in example traces in Figs. 1D, 1G, and 3C. Indeed, with any biological system variability exists, however here we take the averaged responses in order to identify a general property of astrocyte calcium dynamics: the existence of the concept of a spatial threshold for astrocyte calcium surge. We have now included a paragraph in the Discussion section on this subject on P15, L16-22:

      “We were able to discover this general phenomenon of astrocyte physiology through the use of a novel computational tool that allowed us to combine almost 1000 astrocyte responses. Variation is rife in biological systems, and there are sure to be eccentricities within astrocyte calcium responses. Here, we focused on grouped data to better understand what appears to be an intrinsic property of astrocyte physiology. We used different statistical examinations and tested our hypothesis in vivo and in situ, and all these methods together provide a more complete picture of the existence of a spatial threshold for astrocyte calcium surge.“

      The specialized work of Stobart et al. 2018, was focused more on the fast activation of microdomain subpopulations than the induction of later somatic activation. Indeed, Stobart et al. 2018 and Wang et al. 2006 also found that somatic responses of astrocytes were delayed in the range of seconds. Importantly, Wang et al., 2006 describe that the activation of astrocytes is frequency dependent, that is, the higher the frequency, the faster and higher the activation. In the present, work we stimulated at just 2 Hz to better investigate the spatial threshold. Excitingly, the results showed by Stobart et al., 2018 agree with ours, Rupprecht et al. 2024 and Fedotova et al. 2023, that there is a sequence of activation from the domains to the somas, which could be due to the time that is required for the summation of the initial microdomain signal to reach a threshold capable to activate the soma. These above referenced studies have many similarities with our own but are different in the underlying scientific question that led to diverging methodology, however we want to stress that we agree with the reviewers that our methods provide sufficient evidence for the cell-scale scientific phenomenon that we are studying, which is the spatial threshold for astrocyte calcium surge. Finally, we have included an additional figure (new Figure 5) that only looks at the calcium dynamics of early responding cells and found no significant difference in the spatial threshold in this population compared to our original quantification.

      In this context, there is another concept that we encourage the authors to better clarify: whether the spatial threshold that they describe is constituted by the enlargement of a continuous wavefront of Ca2+ elevation, e.g. in a single process, that eventually reaches 22.6% of the segmented astrocyte, or can it also be

      constituted by several distinct Ca2+ elevations occurring in separate domains of the arbor, but overall totaling 22.6% of the segmented surface? Mechanistically, the latter would suggest the presence of a general excitability threshold of the astrocyte, whereas the former would identify a driving force threshold for the centripetal wavefront. In light of the above points, we think the authors should use caution in presenting and interpreting the experiments in which they use SIC as a readout. Their results might lead some readers to bluntly interpret the 22.6% spatial threshold as the threshold required for the astrocyte to evoke gliotransmitter release. Indeed, SIC are robust signals recorded somatically from a single neuron and likely integrate activation of many synapses all belonging to that neuron. On the other hand, an astrocyte impinges in a myriad of synapses belonging to several distinct neurons. In our opinion, it is quite possible that more local gliotransmission occurs at lower Ca2+ signal thresholds (see above) that may not be efficiently detected by using SIC as a readout; a more sensitive approach, such as the use of a gliotransmitter sensor expressed all along the astrocyte plasma-membrane could be tested to this aim.  

      The reviewer raised an excellent point. Whether the spatial threshold of 22.6% occur in the segmented astrocyte or may be reached occurring in separate domains of the arbor, is an important question and we address this by the inclusion of a novel analysis shown in the new figure (new Figure 5) in the revised version of the manuscript. In this new analysis, we demonstrate that the average distance between domain activation is not significantly different between subthreshold activity and the activity that precedes or follows the suprathreshold cellular activation. In contrast, we do find a significant difference in the average time between domain activation between subthreshold activity and activity that precedes and follows suprathreshold activation. We go further with a generalized linear model to show that percent area of active domains and temporal clustering is related to soma activation and not spatial clustering. This suggests that domain activation doesn’t need to be spatially clustered together to induce soma activation and subsequent calcium surge, but more importantly, domain activation must be over the spatial threshold and occur within a timeframe. This has been added to the Results on P10, L2-40:

      “Our results demonstrate the relationship between the percentage of active domains and soma activation and subsequent calcium surge. Next, we were interested in the spatiotemporal properties of domain activity leading up to and during calcium surge. Because we imaged groups of astrocytes, we were able to constrain our analyses to fast responders (onset < median population onset) in order to evaluate astrocytes that were more likely to respond to neuronal-evoked sensory stimulation and not nearby astrocyte activation (Figure 5A). In this population the spatial threshold was 23.8% within the 95% confidence intervals of [21.2%, 24.0%]. First, we created temporal maps, where each domain is labeled as its onset relative to soma activation, of individual astrocyte calcium responses to study the spatiotemporal profile of astrocyte calcium surge (Bindocci et al., 2017; Rupprecht et al., 2024) (Figure 5B). Using temporal maps, we quantified the spatial clustering of responding domains by measuring the average distance between active domains. We found that the average distance between active domains in subthreshold astrocyte responses were not significantly different from pre-soma suprathreshold activity (16.3 ± 0.4 µm in No-soma cells versus 16.2 ± 0.3 µm in Pre-soma cells, p = 0.75; n = 286 No-soma vs n = 326 Pre-soma, 30 populations and 3 animals; Figure 5C). Following soma activation, astrocyte calcium surge was marked with no significant change in the average distance between active domains (16.0 ± 0.3 µm in Post-soma cells versus 16.3 ± 0.4 µm in No-soma cells, p = 0.57 and 16.2 ± 0.3 µm in Presoma cells, p = 0.31; n = 326 soma active and n = 286 no soma active, 30 populations and 3 animals; Figure 5C). Taken together this suggests that on average domain activation happens in a nonlocal fashion that may illustrate the underlying nonlocal activation of nearby synaptic activity. Next, we interrogated the temporal patterning of domain activation by quantifying the average time between domain responses, and found that the average time between domain responses was significantly decreased in pre-soma suprathreshold activity compared to subthreshold activities without subsequent soma activation (9.4 ± 0.3 s in No-soma cells versus 4.4 ± 0.2 s in Pre-soma cells, p < 0.001; n = 326 soma active vs n = 286 not soma active, 30 populations and 3 animals; Figure 5D). The average time between domain activation was even less after the soma became active during calcium surge (2.1 ± 0.1 s in Post-soma versus 9.4 ± 0.3 s in No-Soma cells, p < 0.001 and 4.4 ± 0.1 s in Pre-soma cells, p < 0.001; n = 326 soma active and n = 286 not soma active, 30 populations and 3 animals; Figure 5D). This corroborates our findings in Figure S2 and highlights the difference in temporal profiles between subthreshold activity and astrocyte calcium surge. 

      We then tested the contribution of each of our three variables describing domain activation (percent area, average distance and time) to elicit soma activation by creating a general linear model. We found that overall, there was a significant relationship between these variables and the soma response (p = 5.5e-114), with the percent area having the largest effect (p = 3.5e-70) followed by the average time (p = 3.6e-7), and average distance having no significant effect (p = 0.12). Taken together this suggests that the overall spatial clustering of active domains has no effect on soma activation, and the percent area of active domains within a constrained time window having the largest effect.”

      Regarding comments on SIC, we fully agree with the reviewer. In the revised version of the manuscript, we have included text in the discussion to ensure the correct interpretation of the results, i.e., the observed 22.6% spatial threshold for the SIC does not necessarily indicate an intrinsic property of gliotransmitter release; rather, since SICs have been shown to be calcium-dependent, it is not surprising that their presence, monitored at the whole-cell soma, matches the threshold for the intracellular calcium extension. We have added to the Discussion P16, L15-30:

      “Astrocyte calcium activity induces multiple downstream signaling cascades, such as the release of gliotransmitters (Araque et al., 2014; de Ceglia et al., 2023). Using patch-clamp recordings of a single nearby neuron we showed that a nearby population of astrocyte calcium surge is also correlated to the increase in slow inward currents (SICs), previously demonstrated to be dependent on astrocytic vesicular release of glutamate (Araque et al., 2000; Durkee et al., 2019; Fellin et al., 2004). The increase of SICs we observed from patching a single neuron is likely the integration of gliotransmitter release onto synapses from a group of nearby astrocytes. Indeed, subthreshold astrocyte calcium increases alone can trigger activity in contacted dendrites (Di Castro et al., 2011). An exciting avenue of future research would be to observe the impact of a single astrocyte calcium surge on nearby neurons (Refaeli and Goshen, 2022). How many neurons would be affected, and would this singular event be observable through patch clamp from a single neuron? The output of astrocyte calcium surge is equally important to network communication as the labeling of astrocyte calcium surge, as it identifies a biologically relevant effect onto nearby neurons. Many downstream signaling mechanisms may be activated following astrocyte calcium surge, and the effect of locally concentrated domain activity vs astrocyte calcium surge should be studied further on different astrocyte outputs.”

      Additional considerations are that the authors propose an event sequence as follows: stimulus - synaptic drive to L2/3 - arbor activation - spatial threshold - soma activation - post soma activation - gliotransmission. This seems reminiscent of the sequence underlying neuronal spike propagation - from dendrite to soma to axon, and the resulting vesicular release. However, there is no consensus within the glial field about an analogous framework for astrocytes. Thus, "arbor activation", "soma activation", and "post soma activation" are not established `terms-of-art´. Similarly, the way the authors use the term "domain" contrasts with how others have (Agarwal et al., 2017; Shigetomi et al., 2013; Di Castro et al., 2011; Grosche et al., 1999) and may produce some confusion. The authors could adopt a more flexible nomenclature or clarify that their terms do not have a defined structural-functional basis, being just constructs that they justifiably adapted to deal with the spatial complexity of astrocytes in line with their past studies (Lines et al., 2020; Lines et al., 2021).

      We agree there is no consensus within the glial field about this event sequence. One major difference between this sequence of events and neuronal spike propagation is directionality from dendrite to soma to axon. It is unknown whether directionality of the calcium signal exists in astrocytes. However, our finding in Figure 5E suggests a directionality of centripetal propagation from the arborization to the soma to elicit calcium surge that leads to centrifugal propagation. In the Results on P10-11, L41-8:

      “Recent work studying astrocyte integration has suggested a centripetal model of astrocyte calcium, where more distal regions of the astrocyte arborization become active initially and activation flows towards the soma (Fedotova et al., 2023; Rupprecht et al., 2024). Here, we confirm this finding, where activated domains located distal from the soma respond sooner than domains more proximal to the soma (linear correlation: p < 0.05, R2 = 0.67; n = 30 populations, 3 animals; Figure 4E). Next, we build upon this result to also demonstrate that following soma activation, astrocyte calcium surge propagates outward in a centrifugal pattern, where domains proximal to the soma become active prior to distal domains (linear correlation: p < 0.01, R2 = 0.89; n = 30 populations, 3 animals; Figure 4E). Together these results detail that intracellular astrocyte calcium follows a centripetal model until the soma is activated leading to a calcium surge that flows centrifugally. This suggests that astrocytes have the capabilities to integrate the nearby local synaptic population, and if this activity exceeds the spatial threshold then it leads to a whole-cell response that spreads outward.” 

      And in the Discussion P15, L3-15:

      “Close examinations of the calcium surge uncovered distinct propagations whether before or after soma activation. Firstly, our analysis found that temporal clustering changed before and after calcium surge, with both being above subthreshold activity, and that this characteristic was absent when assessing spatial clustering. When comparing the percent area, spatial and temporal clustering of active domains using a GLM, we found that the percent area was the most significant parameter describing a threshold to soma activation. We then compared the delay of domain activation and its distance from the soma, and recreated previous results that suggest a centripetal model of astrocytic calcium responses from the distal arborizations to the soma (Fedotova et al., 2023; Rupprecht et al., 2023). Here, we went a step further and discovered that soma activation switches this directionality for astrocytic calcium surge to propagate outward in a centrifugal manner away from the soma. Taken together, these results demonstrate the integrative potential of astrocyte calcium responses and characterize further the astrocyte calcium surge to relay this other parts of the astrocyte.”

      The term “microdomain” is used in the references above to define distal subcellular domains in contact with synapses, and in order to dissociate from this term we adopt the nomenclature “domain” to define all subcellular domains in the astrocyte arborization. These items have been discussed and clarified in the revised version of the manuscript on P5, L17-19:

      “The concept of domain to define all subcellular domains in the astrocyte arborization should not be confused with the concept of microdomain, that usually refers to the distal subcellular domains in contact with synapses.”

      Our previous points suggest that the paper would be significantly strengthened by new experimental observations focusing on single astrocytes and using acquisitions at higher spatial and temporal resolution. If the authors will not pursue this option, we encourage them to at least improve their analysis, and at the same time recognize in the text some limitations of their experimental approach as discussed above. We indicate here several levels of possible analytical refinement.

      We believe our spatial (25x objective and 1.7x digital zoom with pixels on the order of 1µm) and temporal (2 – 5 Hz framerate) resolution is within the range used in the glial field. In any case the existence of a spatial threshold for astrocyte calcium surge is not compromised with the use of this imaging resolution.

      The first relates to the selection of astrocytes being analyzed, and the need to focus on a much narrower subpopulation than (for example) 987 astrocytes used for the core data. This selection would take into greater consideration the aspects of structure and latency. With the structural and latency-based criteria for selection, the number of astrocytes to analyze might be reduced by 10-fold or more, making our second analytical recommendation much more feasible.

      We agree that individual differences exist, however, establishing a general concept requires the sampling of many astrocytes. Nevertheless, we have included a new figure (new Figure 5) that analyzes early responders.

      For structure-based selection - Genetically-encoded Ca2+ indicators such as GCaMP6 are in principle expressed throughout an astrocyte, even in regions that are not labelled by SR101. Moreover, astrocytes form independent 3D territories, so one can safely assume that the GCaMP6 signal within an astrocyte volume belongs to that specific astrocyte (this is particularly evident if the neighboring astrocytes are GCaMP6negative). Therefore, authors could extend their analysis of Ca2+ signals in individual astrocytes to the regions that are SR101-negative and try to better integrate fast signals in their spatial threshold concept. Even if they decided to be conservative on their methods, and stick to the astrocyte segmentation based on the SR-101 signal, they should acknowledge that SR101 dye staining quality can vary considerably between individual astrocytes within a FOV - some astrocytes will have much greater structural visibility in the distal processes than others. This means that some astrocytes may have segmented domains extending more distally than others and we think that authors should privilege such astrocytes for analysis. However, cases like the representative astrocytes shown in Figure 4A or Figure S1B, have segmented domains localized only to proximal processes near the soma. Accordingly, given the reported timing differences between "arbor" and "soma" activation, one might expect there to be comparable timing differences between domains that are distal vs proximal to the soma as well. Fast signals in peripheral regions of astrocytes in contact with synapses are largely IP3R2-independent (Stobart et al., 2018). However, the quality of SR101 staining has implications for interpreting the IP3R2 KO data. There is evidence IP3R2 KO may preferentially impact activity near the soma (Srinivasan et al., 2015). Thus, astrocytes with insufficient staining - visible only in the soma and proximal domains - might show a biased effect for IP3R2 KO. While not necessarily disrupting the core conclusions made by the authors based on their analysis of SR101-segmented astrocytes, we think results would be strengthened if astrocytes with sufficient SR101 staining - i.e. more consistent with previous reports of L2/3 astrocyte area (Lanjakornsiripan et al., 2018) - were only included. This could be achieved by using max or cumulative projections of individual astrocytes in combination with SR101 staining to construct more holistic structural maps (Bindocci et al., 2017).

      We agree with the ideas concerning SR101, and indeed there could be variability in the origins of the astrocyte calcium signal. Astrocyte territory boundaries can be difficult to discern when both astrocytes express GCaMP6. Also, SR101-negative domains could encapsulate an area that is only partially that of astrocyte territory, including also extracellular space. Here we take a conservative approach to constrain ROIs to SR101positive astrocyte territory outlines without invading neighboring cells or extracellular space in order to reduce error in the estimate of a spatial threshold. The effect of IP3R2 KO preferentially impacting activity near the soma is interesting, and in line with our conclusions. We agree that the findings from SR101-negative pixels would not necessarily disrupt the core conclusions of the study, and the additional analysis suggested would further strengthen results. We have since included on the limitations of the study in the Discussion P15, L3137:

      “In this study, we chose to limit our examinations of calcium activity that was within the bounds determined by SR101 staining. Much work has shown that astrocyte territories are more akin to sponge-like morphology with small microdomains making up the end feet of their distal arborizations (Baldwin et al., 2024). Here, we took a conservative approach to not incorporate these fine morphological processes and only take SR101-postive pixels for analysis in order to reduce the possible error of including a neighboring astrocyte or extracellular space in our analyses. Much work can be done to extend these results.”

      For latency-based selection - The authors record calcium activity within a FOV containing at least 20+ astrocytes over a period of 60s, during which a 2Hz hindpaw stimulation at 2mA is applied for 20s. As discussed above, presumably some astrocytes in a FOV are the first to respond to the stimulus series, while others likely respond with longer latency to the stimulus. For the shorter-latency responders <3s, it is easier to attribute their calcium increases as "following the sensory information" projecting to L2/3. In other cases, when "arbor" responses occur at 10s or later, only after 20 stimulus events (at 2Hz), it is likely they are being activated by a more complex and recurrent circuit containing several rounds of neuron-glia crosstalk etc., which would be mechanistically distinct from astrocytes responding earlier. We suggest that authors focus more on the shorter latency response astrocytes, as they are more likely to have activity corresponding to the stimulus itself.

      We agree that different times of astrocyte calcium increases may be due to different mechanisms outside of the astrocyte. We believe the spatial threshold will be intrinsic to these external variables; yet we believe that longer latency responses are physiological and may carry important information to determining the astrocyte calcium responses. Indeed, we have performed the spatial threshold analysis on early responders (first half of responding cells), and found the spatial threshold in that population (23.8%) is within the 95% confidence interval [21.2%, 24.0%]. Additionally, the slow responders were also within the confidence interval (22.6%).

      The second level of analysis refinement we suggest relates specifically to the issue of propagation and timing for the activity within "arbor", "soma" and "post-soma". Currently, the authors use an ROI-based approach that segments the "arbor" into domains. We suggest that this approach could be supplemented by a more robust temporal analysis. This could for example involve starting with temporal maps that take pixels above a certain amplitude and plot their timing relative to the stimulus-onset, or (better) the first active pixel of the astrocyte. This type of approach has become increasingly used (Bindocci et al., 2017; Wang et al., 2019; Ruprecht et al., 2022) and we think its use can greatly help clarify both the proposed sequence and better characterize the spatial threshold. We think this analysis should specifically address several important points:

      We agree that the creation of temporal maps from our own data would be interesting, and we provide the results of the suggested analysis within the new figure (new Figure 5) in the revised version of the manuscript. In this analysis we show that subthreshold, pre-soma and post-soma dynamics are significantly different in time. These added results of including temporal maps strengthen our claim of a spatial threshold, by quantifying the distinct temporal and spatial dynamics of domain activation before and after the spatial threshold is met (i.e. soma activation), and highlights differences in subthreshold and suprathreshold activity.

      (1) Where/when does the astrocyte activation begin? Understanding the beginning is very important, particularly because another potential spatial threshold - preceding the one the authors describe in the paper - could gate the initial activation of more distal processes, as discussed above. This sequentially earlier spatial threshold could (for example) rely on microdomain interaction with synaptic elements and (in contrast) be IP3R2 independent (Srinivasan et al., 2015, Stobart et al., 2018). We would be interested to know whether, in a subset of astrocytes that meet the structure and latency criteria proposed above and can produce global activation, there is an initial local GCaMP6f response of a minimal size that must occur before propagation towards the soma begins. The data associated with varying stimulus parameters could potentially be useful here and reveal stimulus intensity/duration-dependent differences.

      This is a very important point. It is difficult to pinpoint the beginning of the signal, which is why we rely on the average of responses. The additional analysis we provide based on temporal maps (new Figure 5) shows a very interesting result in that there is no significant difference between the spatial clustering of, or average distance between, activated domains in subthreshold and pre-soma suprathreshold activity. This result, along with the General Linear Model, suggests that there is not another subcellular potential spatial threshold, as the activity is the same. Instead, the main difference between activity in the domains that leads to soma activation or not is the overall percentage of domains active and not necessarily how that spatial activity is organized. We have also added this point in the Discussion section to highlight the importance of this result. P15, L3-8:

      “Close examinations of the calcium surge uncovered distinct propagations whether before or after soma activation. Firstly, our analysis found that temporal clustering changed before and after calcium surge, with both being above subthreshold activity, and that this characteristic was absent when assessing spatial clustering. When comparing the percent area, spatial and temporal clustering of active domains using a GLM, we found that the percent area was the most significant parameter describing a threshold to soma activation.”

      (2) Whether the propagation in the authors' experimental model is centripetal? This is implied throughout the manuscript but never shown. We think establishing whether (or not) the calcium dynamics are centripetal is important because it would clarify whether spatially adjacent domains within the "arbor" need to be sequentially active before reaching the threshold and then reaching the soma. More broadly, visualizing propagation will help to better visualize summation, which is presumably how the threshold is first reached (and overcome).

      The alternative hypothesis of a general excitability threshold, as discussed above, would be challenged here and possibly rejected, thereby clarifying the nature of the Ca2+ process that needs to reach a threshold for further expansion to the soma and other parts of the astrocyte.

      We agree that our view is centripetal when considering activity leading up to soma activation. Indeed, we have found arborization activity precedes soma activity (Figure 3), soma activity appears to rely on the percent area of domain activity (Figure 4), and pre-soma domain activity comes online earlier in domains distal from the soma (new Figure 5). However, whether this is intrinsic or due to the fact that synapses are more likely to occur in the periphery requires further studies. Our new results in the new Figure 5 demonstrating that subthreshold activity has a spatial organization that is not significantly different than pre-soma activity in suprathreshold cases argues in favor of a general excitability threshold hypothesis. However, we do not see these hypotheses as mutually exclusive. Excitingly, we have also found that following soma activation, calcium surge appears to follow a centrifugal propagation. We have since added the topic of a centripetal-centrifugal experimental model to the Discussion P15, L8-15:

      “We then compared the delay of domain activation and its distance from the soma, and recreated previous results that suggest a centripetal model of astrocytic calcium responses from the distal arborizations to the soma (Fedotova et al., 2023; Rupprecht et al., 2024). Here, we went a step further and discovered that soma activation switches this directionality for astrocytic calcium surge to propagate outward in a centrifugal manner away from the soma. Taken together, these results demonstrate the integrative potential of astrocyte calcium responses and characterize further the astrocyte calcium surge to relay this other parts of the astrocyte.”

      (3) In complement to the previous point: we understand that the spatial threshold does not per se have a location, but is there some spatial logic underlying the organization of active domains before the soma response occurs? One can easily imagine multiple scenarios of sparse heterogeneous GCaMP6f signal distributions that correspond to {greater than or equal to}22.6% of the arborization, but that would not be expected to trigger soma activation. For example, the diagram in Figure 4C showing the astrocyte response to 2Hz stim (which lacks a soma response) underscores this point. It looks like it has {greater than or equal to}22.6% activation that is sparsely localized throughout the arborization. If an alternative spatial distribution for this activity occurred, such that it localized primarily to a specific process within the arbor, would it be more likely to trigger a soma response?

      This is an interesting point and our new spatiotemporal analysis found in the new figure (new Figure 5) aims to shed some light on this and is answered above. To our knowledge, there is no mechanism in astrocytes to impose directionality on calcium propagation, like rectifying voltage-gated sodium channels in neuronal voltage propagation. We found that the delay of domain activation compared to soma onset is significantly correlated to the distance from the soma (new Figure 5E). In addition, spatial clustering is not significantly different compared in pre-soma vs. non responders or post-soma. Together this suggests that centripetal propagation may be occurring throughout the entire cell and not in a local clustered way. Our findings also suggest that following soma activation astrocyte calcium surge follows a mostly centrifugal pattern (new Figure 5E).

      (4) Does "pre-soma" activation predict the location and onset time of "post-soma" activation? For example, are arbor domains that were part of the "pre-soma" response the first to exhibit GCaMP6f signal in the "post-soma" response?

      Please see above comments.

      Reviewer #2 (Public Review):

      Lines et al investigated the integration of calcium signals in astrocytes of the primary somatosensory cortex. Their goal was to better characterize the mechanisms that govern the spatial characteristics of calcium signals in astrocytes. In line with previous reports in the field, they found that most events originated and stayed localized within microdomains in distal astrocyte processes, occasionally coinciding with larger events in the soma, referred to as calcium surges. As a single astrocyte communicates with hundreds of thousands of synapses simultaneously, understanding the spatial integration of calcium signals in astrocytes and the mechanisms governing the latter is of tremendous importance to deepen our understanding of signal processing in the central nervous system. The authors thus aimed to unveil the properties governing the emergence of calcium surges. The main claim of this manuscript is that there would be a spatial threshold of ~23% of microdomain activation above which a calcium surge, i.e. a calcium signal that spreads to the soma, is observed. Although the study provides data that is highly valuable for the community, the conclusions of the current version of the manuscript seem a little too assertive and general compared with what can be deduced from the data and methods used.

      The major strength of this study is the experimental approach that allowed the authors to obtain numerous and informative calcium recordings in vivo in the somatosensory cortex in mice in response to sensory stimuli as well as in situ. Notably, they developed an interesting approach to modulating the number of active domains in peripheral astrocyte processes by varying the intensity of peripheral stimulation (its amplitude, frequency, or duration).

      We thank the reviewer for their kind and thoughtful review of our study.

      The major weakness of the manuscript is the method used to analyze and quantify calcium activity, which mostly relies on the analysis of averaged data and overlooks the variability of the signals measured. As a result, the main claims from the manuscript seem to be incompletely supported by the data. The choice of the use of a custom-made semi-automatic ROI-based calcium event detection algorithm rather than established state-of-the-art software, such as the event-based calcium event detection software AQuA (DOI: 10.1038/s41593-019-0492-2), is insufficiently discussed and may bias the analysis. Some references on this matter include: Semyanov et al, Nature Rev Neuro, 2020 (DOI: 10.1038/s41583-020-0361-8); Covelo et al 2022, J Mol Neurosci (DOI: 10.1007/s12031-022-02006-w) & Wang et al, 2019, Nat Neuroscience (DOI: 10.1038/s41593-019-0492-2). Moreover, the ROIs used to quantify calcium activity are based on structural imaging of astrocytes, which may not be functionally relevant.

      Unfortunately, there is no general consensus for calcium analysis in the astrocyte or neuronal field, and many groups use custom made software made in lab or custom software such as GECIquant, STARDUST, AQuA or AQuA2. While AQuA is an event-based calcium event detection software, it may be that not including inactive domains that are SR101 positive could underestimate the spatial threshold for calcium surge. Our data is not based on the functional events but is based on calcium with structural constraints within a single astrocyte. This is crucial to properly determine the ratio of active vs inactive pixels within a single astrocyte.

      For the reasons listed above, the manuscript would probably benefit from some rephrasing of the conclusions and a discussion highlighting the advantages and limitations of the methodological approach. The question investigated by this study is of great importance in the field of neuroscience as the mechanisms dictating the spatio-temporal properties of calcium signals in astrocytes are poorly characterized, yet are essential to understand their involvement in the modulation of signal integration within neural circuits.

      We thank the reviewer for their suggestions to benefit the conclusions and discussion. We have now included a paragraph outlining the limitations of the study in the Discussion P15, L23-37:

      “The investigation of the spatial threshold could be improved in the future in a number of ways. One being the use of state-of-the-art imaging in 3D(Bindocci et al., 2017). While the original publication using 3D imaging to study astrocyte physiology does not necessarily imply that there would be different calcium dynamics in one axis over another, the three-dimensional examination of the spatial threshold could refine the findings we present here. To better control the system, mice imaged here were under anesthesia, and this is a method that has been used to characterize many foundational physiological results in the field (Hubel and Wiesel, 1962; Mountcastle et al., 1957). However, assessing the spatial threshold in awake freely moving animals would be the next logical step. In this study, we chose to limit our examinations of calcium activity that was within the bounds determined by SR101 staining. Much work has shown that astrocyte territories are more akin to sponge-like morphology with small microdomains making up the end feet of their distal arborizations (Baldwin et al., 2024). Here, we took a conservative approach to not incorporate these fine morphological processes and only take SR101-postive pixels for analysis in order to reduce the possible error of including a neighboring astrocyte or extracellular space in our analyses. Much work can be done to extend these results.”

      Reviewer #3 (Public Review):

      Summary:

      The study aims to elucidate the spatial dynamics of subcellular astrocytic calcium signaling. Specifically, they elucidate how subdomain activity above a certain spatial threshold (~23% of domains being active) heralds a calcium surge that also affects the astrocytic soma. Moreover, they demonstrate that processes on average are included earlier than the soma and that IP3R2 is necessary for calcium surges to occur. Finally, they associate calcium surges with slow inward currents. Strengths:

      The study addresses an interesting topic that is only partially understood. The study uses multiple methods including in vivo two-photon microscopy, acute brain slices, electrophysiology, pharmacology, and knockout models. The conclusions are strengthened by the same findings in both in vivo anesthetized mice and in brain slices.

      We thank the reviewer for the positive assessment of the study and his/her comments.

      Weaknesses:

      The method that has been used to quantify astrocytic calcium signals only analyzes what seems to be a small proportion of the total astrocytic domain on the example micrographs, where a structure is visible in the SR101 channel (see for instance Reeves et al. J. Neurosci. 2011, demonstrating to what extent SR101 outlines an astrocyte). This would potentially heavily bias the results: from the example illustrations presented it is clear that the calcium increases in what is putatively the same astrocyte goes well beyond what is outlined with automatically placed small ROIs. The smallest astrocytic processes are an order of magnitude smaller than the resolution of optical imaging and would not be outlined by either SR101 or with the segmentation method judged by the ROIs presented in the figures. Completely ignoring these very large parts of the spatial domain of an astrocyte, in particular when making claims about a spatial threshold, seems inappropriate. Several recent methods published use pixel-by-pixel event-based approaches to define calcium signals. The data should have been analyzed using such a method within a complete astrocyte spatial domain in addition to the analyses presented. Also, the authors do not discuss how two-dimensional sampling of calcium signals from an astrocyte that has processes in three dimensions (see Bindocci et al, Science 2017) may affect the results: if subdomain activation is not homogeneously distributed in the three-dimensional space within the astrocyte territory, the assumptions and findings between a correlation between subdomain activation and somatic activation may be affected.

      In order to reduce noise from individual pixels, we chose to segment astrocyte arborizations into domains of several pixels. As pointed out previously, including pixels outside of the SR101-positive territory runs the risk of including a pixel that may be from a neighboring cell or mostly comprised of extracellular space, and we chose the conservative approach to avoid this source of error. We agree that the results have limitations from being acquired in 2D instead of 3D, but it is likely to assume the 3D astrocyte is homogeneously distributed and that the 2D plane is representative of the whole astrocyte. Indeed, no dimensional effects were reported in Bindocci et al, Science 2017. We have included a paragraph in the discussion to address this limitation in our study on P15, L23-27:

      “The investigation of the spatial threshold could be improved in the future in a number of ways. One being the use of state-of-the-art imaging in 3D(Bindocci et al., 2017). While the original publication using 3D imaging to study astrocyte physiology does not necessarily imply that there would be different calcium dynamics in one axis over another, the three-dimensional examination of the spatial threshold could refine the findings we present here.”

      The experiments are performed either in anesthetized mice, or in slices. The study would have come across as much more solid and interesting if at least a small set of experiments were performed also in awake mice (for instance during spontaneous behavior), given the profound effect of anesthesia on astrocytic calcium signaling and the highly invasive nature of preparing acute brain slices. The authors mention the caveat of studying anesthetized mice but claim that the intracellular machinery should remain the same. This explanation appears a bit dismissive as the response of an astrocyte not only depends on the internal machinery of the astrocyte, but also on how the astrocyte is stimulated: for instance synaptic stimulation or sensory input likely would be dependent on brain state and concurrent neuromodulatory signaling which is absent in both experimental paradigms. The discussion would have been more balanced if these aspects were dealt with more thoroughly.

      Yes, we agree that this is a limitation, and we acknowledge this is in the Discussion P15, L27-31:

      “To better control the system, mice imaged here were under anesthesia, and this is a method that has been used to characterize many foundational physiological results in the field (Hubel and Wiesel, 1962; Mountcastle et al., 1957). However, assessing the spatial threshold in awake freely moving animals would be the next logical step.”

      The study uses a heaviside step function to define a spatial 'threshold' for somata either being included or not in a calcium signal. However, Fig 4E and 5D showing how the method separates the signal provide little understanding for the reader. The most informative figure that could support the main finding of the study, namely a ~23% spatial threshold for astrocyte calcium surges reaching the soma, is Fig. 4G, showing the relationship between the percentage of arborizations active and the soma calcium signal. A similar plot should have been presented in Fig 5 as well. Looking at this distribution, though, it is not clear why ~23% would be a clear threshold to separate soma involvement, one can only speculate how the threshold for a soma event would influence this number. Even if the analyses in Fig. 4H and the fact that the same threshold appears in two experimental paradigms strengthen the case, the results would have been more convincing if several types of statistical modeling describing the continuous distribution of values presented in Fig. 4E (in addition to the heaviside step function) were presented.

      We agree with the reviewer and have added to the paper a discussion for our justification on the use of the Heaviside step function, and have included this in the methods section. We chose the Heaviside step function to represent the on/off situation that we observed in the data that suggested a threshold in the biology. We agree with the reviewer that Fig. 4G is informative and demonstrates that under 23% most of the soma fluorescence values are clustered at baseline. We agree that a different statistical model describing the data would be more convincing and confirmed the spatial threshold with the use of a confidence interval in the text and supported the use of percent domains active for this threshold over other properties such as spatial or temporal clustering using a general linear model. P18-19, L34-2:

      “Heaviside step function

      The Heaviside step function below in equation 4 is used to mathematically model the transition from one state to the next and has been used in simple integrate and fire models (Bueno-Orovio et al., 2008; Gerstner, 2000).

      The Heaviside step function 𝐻(𝑎) is zero everywhere before the threshold area (𝑎 ) and one everywhere afterwards. From the data shown in Figure 4E where each point (𝑆(𝑎)) is an individual astrocyte response with its percent area (𝑎) domains active and if the soma was active or not denoted by a 1 or 0 respectively. To determine 𝑎 in our data we iteratively subtracted 𝐻(𝑎) from  𝑆(𝑎) for all possible values of 𝑎 to create an error term over 𝑎. The area of the minimum of that error term was denoted the threshold area.”

      The description of methods should have been considerably more thorough throughout. For instance which temperature the acute slice experiments were performed at, and whether slices were prepared in ice-cold solution, are crucial to know as these parameters heavily influence both astrocyte morphology and signaling. Moreover, no monitoring of physiological parameters (oxygen level, CO2, arterial blood gas analyses, temperature etc) of the in vivo anesthetized mice is mentioned. These aspects are critical to control for when working with acute in vivo two-photon microscopy of mice; the physiological parameters rapidly decay within a few hours with anesthesia and following surgery.

      We have increased the thoroughness of our methods section. Especially including that body temperature and respiration were indeed monitored throughout anesthesia.

      Recommendations for the authors:  

      Reviewer #1 (Recommendations For The Authors):

      (1) We think it would improve the paper if the authors provided a frame-by-frame example over (for example) 10-15 frames showing the spatiotemporal evolution of responses, where each frame represents 1s or 2s. This could be included with the temporal maps we proposed above.

      We agree that this is a useful example and have included it in our new figure (new Figure 5, specifically see Figure 5A) that uses temporal maps to analyze the spatiotemporal properties of calcium dynamics (Figure 5B).

      (2) Concerning the evidence in the present manuscript, we are not clear on what "populations" means. Can the authors clarify in methods? It is our understanding that 987 astrocytes from 30 populations from 3 mice were the source for the core data in the paper. What are the 30 populations, and how were the 987 astrocytes distributed across the populations? Are they roughly 10 FOVs per mouse? If so, please clarify roughly how far apart FOVs from the same mouse were, and how much delay between stim protocol application there was when a FOV was changed to a new FOV. Also, if for example, the 10th FOV from mouse 1 "saw" 9 rounds of stimulation before recording the response to the 10th stim round. To this point, was there any indication of response differences in populations that were recorded earlier vs later in the experimental sequence for each mouse?

      Descriptions of data will be included with the uploaded datasets following acceptance.

      (3) The description of the results on page 6 is a bit confusing for us. In lines 1-4, are the authors saying that 57.7% of astrocytes in a FOV exhibited responses within their soma and arborization, while 15.1% had responses only in arborization? If so, this is not clear to us from Figure 2C, where we count ~25 astrocytes in the FOV, maybe 8 or 9 astrocytes with activity in the arborization + soma (after stimulation), and 8 or 9 astrocytes with responses only in arborization. Is there something we do not understand, or is the second panel simply not representative of the group data?

      Figure 2D is representative of the group data and does indeed show 57.7% of the population responds within the soma and arborization, and a 15.1% of astrocytes with responses in only their arborizations. It is unable to observe in this image whether arborizations are active or just increases in one or a few domains, as may not be enough activity to be detected when sampling over the entire arborization.

      (4) In the second part of page 6 - when the authors apply linear regression - are they saying that there is a linear relationship between the amount (area) of activity measured in the arborization versus the soma, where populations of astrocytes with 50% activation of the arborization also tend to have 50% activation in their somas? If so, then this is not apparent by the map provided in Figure 2C, where it looks like soma activation (within the subpopulation) is 100% irrespective of the apparent activity in the arborization. This needs to be clarified. If not, and what they mean is that the probability of finding an active soma is related to the amount of activation within the arborization, this needs to be stated more clearly.

      When testing the linear relationship between somas active vs arborizations active, we find a significant linear correlation (p < 0.001, R2 = 0.90).

      (5) In the experiments where stimulation duration, frequency, and intensity were varied to determine the percentage of domains that were on, it would be helpful to better understand the protocol in terms of sequence. In the methods it seems that hindpaw stimulation intensity was first pseudo-randomly varied at 2Hz for 10s, followed by pseudorandomly varied stimulation frequency and then pseudo-randomly varied duration - both at 2mA for 10s. Is this correct?

      We have since updated the methods section to better describe the experimental protocol.

      (6) In Figure 3E the alignment of the "arbor" to the somatic response is a bit misleading. The signals being averaged for the "arbor" are composed of temporally heterogeneous sources (from distal and proximal domains) and when averaged will produce an artificially slow rise time. In contrast, the averaged somatic signals are composed of much more homogenous sources (arising from a more singular event) and therefore have a sharp rise time. It would make more sense to align their kinetics relative to the stimulus onset. It would also make more sense to compare the somatic response of astrocytes to the "arbor" of astrocytes which respond rapidly vs slowly to the foot-shock.

      Aligning the responses to the stimulus onset would exacerbate the artificially slow rise time for the soma and arborization as not all cells come online at the same time from stimulus onset.

      Reviewer #2 (Recommendations For The Authors):

      Data availability

      It seems that the data is not shared on a public repository, while it appears to be necessary according to eLife's general principles (see https://elife-rp.msubmit.net/html/eliferp_author_instructions.html#dataavailability).

      We will upload raw data to a repository upon acceptance of the manuscript.

      Data analysis

      - Why did the authors choose the heaviside step function to characterize conditions for somatic event initiation? It seems that this approach is averaging very heterogeneous data (some cells do not display somatic events even with ~50% domains active while some display somatic events with < 5 it seems).

      Please see discussion to variability in the responses to the public reviews. We have since included more discussion on the use of the Heaviside step function in the Methods section.  

      - Averaging of the data. It seems that the approach chosen to quantify calcium activity overlooks the variability of the signals measured ("Astrocyte calcium quantifications were averaged over all astrocytes of a single video and these values were used in statistical testing.", l.22-23, page 15). What is the variability of the measured features between different astrocytes? Between different animals? To what extent does this averaging strategy overlook the variability of the signals/how much information do we expect to lose? The manuscript would probably benefit from a more advanced statistical approach to analyze the data.

      Is it possible to extract information from the data that would indicate mechanisms allowing somatic activity when the percentage of domain activation was lower than the threshold? How about the opposite (i.e when no global event was triggered even when the percentage of domain activation was high)?

      We are indeed combining the responses from many different diverse astrocyte responses, and we see this as a strength of the paper. Variation is a hallmark of biology, and we have added this to the discussion. In the rare cases where astrocyte somas do not come online when the percent of arborizations is over threshold, or the opposite when somas activate with little domain activation, we would say this is most likely due to imaging 2D instead of the entire 3D cell. We have also added this into our discussion.

      - Here are a few suggestions for additional analysis that might be of interest to the community:

      - Measuring calcium activity in domains depending on their distance from the soma. This would allow us to better understand the spatial integration of the signals and notably answer the following question: Does the emergence of somatic events depend on the spatial distribution of active domains? (and does a smaller domain-soma distance facilitate the emergence of a calcium surge with a lower percentage of active domains?) These measurements could be visualized with plots of xy position of the domains (domain-soma distance) = f(time) with a colormap reflecting dF/F0, for example, at different times pre- and post-somatic events. Instead of DF/F0, these plots could also display the correlation between domain activities.

      We have performed this analysis, and it is now in the new figure (new Figure 5).

      - Adding temporality to the data analysis. It seems that calcium activity is "concatenated" during the whole duration prior to the somatic event (pre-soma) and after (post-soma). However, it is unclear how long the domains remained active and how many domains were still active at the onset of the somatic event. Adding a finer temporal analysis might help answer questions such as the potential need for some degree of synchronization of domain activity to trigger calcium surges.

      It could notably be interesting to measure the level of synchrony of events as a function of their distance from the soma and to analyze how it correlates with the properties of the somatic event.

      We have now included temporal analysis of astrocyte calcium surge in our new figure (new Figure 5). While we did see examples of spatially clustered domain activation in our data, those examples usually included other non-clustered domain activities and when including all of the active domains within an astrocytes arborization, we found no difference between the distance between activated domains before and after soma activation, even when comparing to subthreshold domain activity.

      Experiments

      - Would it be possible to apply different levels of stimulation to a given cell in order to discriminate whether the "no-soma" cells can display somatic events when neuronal activity is enhanced?

      Increased sensory stimulation does increase soma activity (Please see Lines et al., Nature Communications, 2020). An example of increased stimulation leading to somatic activation where it was not present in lower stimuli can be seen in Figure 4A-C.

      - Why choose a stimulation of 2 mA, 2 Hz for 20 sec in the experiments on IP3R2-/- mice?

      Has the same set of various stimulation protocols featured in Figure 4 been applied to IP3R2-/- mice? If so, were more domains activated as stimulation intensity (amplitude; duration, or frequency) increased? Could it trigger somatic events? This information seems necessary to be able to assert that calcium surges rely on the IP3R2 pathway.

      These experiments were not performed.

      -  Adding intermediary values of ATP pulse duration to Figure 6 (e.g. 50 ms and 75 ms) might strengthen the claim that the linear increase of SIC frequency with ATP application duration is only observed above the ~23% threshold.

      Agreed, however these experiments were not performed.

      Minor corrections to the text and figures.

      Methods

      The reader might benefit from a little more detail regarding the analysis of calcium signals. Notably, what was the duration of the calcium recordings? Was it constant across the different conditions tested in the study? Was it different in slice experiments versus in vivo experiments? What were the durations of the pre- and post- soma recordings and their variability? Was the calcium activity normalized for each astrocyte or animal? If not, why not consider normalizing the post-stimulation activity with pre-stimulation baseline activity?

      Similarly, some information on the stimulation protocol seems to be lacking: what was the frequency and intensity of the stimulus in the experiments where stimulus duration varied? Concurrently, what were the duration and intensity when frequency varied? What were the duration and frequency when the intensity varied?

      It might be beneficial to add further information on the algorithm of the Calsee software. What is it performing? How was it tested? Why is it referred to as "semi"-automatic, i.e. what might the user be needing to do manually? The segmentation seems to be omitting some branches connecting distal ROIs to the soma (see e.g. Fig S1.E). How would this influence the analysis and results?

      Results

      - Some assessments in the manuscript seem a bit too assertive/general compared to what can be deduced from the evidence presented in the figures. It could be beneficial to the reader to rephrase the latter. Some examples are listed below:

      - "These results indicate that astrocyte responses occurred initially in the arborizations, which is consistent with the idea that synapses are likely to be accessed at the astrocyte arborization ", l.11-12 page 7. The fact that the time to peak is lower in the arborization does not necessarily mean that signals initiate there. It could be because the kinetics/pathways in those compartments are different or there could be a dilution effect in the soma. Indeed, an influx of the same amount of calcium ions in the soma vs in a small domain will not correspond to the same DF/F0 in those compartments and might thus remain undetected in the soma.

      - "Using transgenic IP3R2-/- mice, we found that the activation of type-2 IP3 receptors is necessary for the generation of astrocyte calcium surge" (page 4, line 1-2), "present data further demonstrate that IP3R2 are necessary for the propagation of astrocyte calcium surge." (l. 18-19 page 13) -> As discussed above, the evidence does not seem to be strong enough to assert that IP3R2 is necessary to trigger somatic events. The results indicate that the IP3R2 pathway seems to facilitate the emergence of somatic events. As astrocytes differ strongly in terms of morphology and expression profiles depending on physiological conditions, the conclusions of this study might only apply to the specific experimental conditions used: region studied, age of the animal, type of sensory stimuli performed, and so on.

      - "These results indicate that spatial threshold of the astrocyte calcium surge has a functional impact on gliotransmission, which have important consequences on the spatial extension of the astrocyte-neuron communication and synaptic regulation", l.41-48 page 11. Figure 6 seems to indicate a correlation between the proportion of astrocyte domains activated and the frequency of SICs. The data seems insufficient to conclude that there is a causal relationship between calcium surge in the astrocyte and gliotransmission or SIC frequency.

      -" These results indicate that, on average, subcellular calcium events located in astrocyte arborizations are related to soma activation.", page 6 l 15-16. It may be more informative to specify the correlation measured: i.e the larger the arborization activity, the larger the percentage of active somas.

      Figures

      Figure 2: Adding more details in the figure legend explaining how the different parameters are calculated might be useful to the reader. Notably, what does soma active (%) refer to?

      Figure 3: Could it be possible to add individual traces of calcium activity in the soma and arborization of individual cells to provide a glimpse of the variability of the signals measured?

      Fig4. B-C: Could it be possible to add in the legend information on the timeline between stimulation and calcium signal recording? (and the duration of the latter).

      Fig4 D-E: Why is the maximum number of active domains in panel D ~50-60% but goes up to ~100% in panel E? Could it be that plotting SEM rather than STD might misrepresent the variability in the percentage of active domains for each stimulus property?

      Fig4F: It seems that the threshold changes with the frequency of the stimulus: e.g. at 10 Hz, the threshold seems larger than 22.6%. What would that mean?

      Fig4G: - Why do some data points display a soma amplitude < 0 DF/F0 ?

      - Why choose a sigmoid fit? What are the statistics associated to the fit? Is it in accordance with the threshold of 23%? Would a linear fit provide a good fit?

      Fig5F: - It seems that a few IP3R2-/- astrocytes displayed somatic events? If so, it might be interesting to mention this in the discussion section and to speculate on why that might be. - It seems that panel 5F displays the average percentage of somas that got activated rather than the probability of somatic events.

      - Is it possible that the effect seen in domains vs arborization is due to statistical effects (as n=2450 vs 112)?

      Fig S1: Panel D legend: double labeling of the radius used for each plot might be useful, notably for colorblind readers as the colors might be hard to see.

      Discussion

      - The discussion section might benefit from a discussion on the similitude between the data presented here and previous reports that reported similar results, i.e that most calcium signals in astrocytes were located in the distal processes, forming microdomains that rarely propagated to the soma. These include Bindocci et al 2017 Science (DOI:10.1126/science.aai8185) and Georgiou et al, Science Advances, 2022 (DOI: 10.1126/sciadv.abe5371).

      Thank you for the suggestions. We have now changed portions of the Methods, Results  and Discussion sections.

      Reviewer #3 (Recommendations For The Authors):

      The text could potentially be improved somewhat.

      Thank you.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this study, Maestri et al. use an integrative framework to study the evolutionary history of coronaviruses. They find that coronaviruses arose recently rather than having undergone ancient codivergences with their mammalian hosts. Furthermore, recent host switching has occurred extensively, but typically between closely related species. Humans have acted as an intermediate host, especially between bats and other mammal species.

      Strengths:

      The study draws on a range of data sources to reconstruct the history of virus-host codivergence and host switching. The analyses include various tests of robustness and evaluations through simulation.

      Weaknesses:

      The analyses are limited to a single genetic marker (RdRp) from coronaviruses, but using other sections of the genome might lead to different conclusions. The genetic marker also lacks resolution for recent divergences, which precludes the detailed examination of recent host switches. Careful and detailed reconstruction of the timescale would be helpful for clarifying the evolutionary history of coronaviruses alongside their hosts.

      The use of a single short genetic marker (the RdRp palmprint region) from coronaviruses is indeed a limitation. However, this marker is the one that is currently used for routinely delimiting operational taxonomic units in RNA viruses and reconstructing their evolutionary history (Edgar et al. 2022, see also the Serratus project; https://serratus.io/); therefore, we took the conscious decision early on to rely on this expertise. Unfortunately, this marker cannot provide robust timescale reconstructions for coronavirus evolution (previous estimates of coronavirus origin range from around 10 thousand years ago to 293 million years ago depending on modeling assumptions). Only future genomic work across Coronaviridae that will characterize multiple genetic regions with different evolutionary rates will allow us to precisely elucidate the timescale of the evolutionary history of coronaviruses alongside their hosts. In the meantime, we show here that, while the RdRp palmprint region cannot by itself resolve the precise timescale of coronavirus evolution, it strongly suggests, when used along with cophylogenetic approaches, a recent evolutionary origin in bats.

      We now further discuss these issues and the perspectives offered by future genomic work on lines 462-485.  

      Reviewer #2 (Public Review):

      Summary:

      In their study titled "Recent evolutionary origin and localized diversity hotspots of mammalian coronaviruses," authors Benoît Perez-Lamarque, Renan Maestri, Anna Zhukova, and Hélène Morlon investigate the complex evolutionary history of coronaviruses, particularly those affecting mammals, including humans. The study focuses on unraveling the evolutionary trajectory of these viruses, which have shown a high propensity for causing pandemics, as evidenced by the SARS-CoV2 outbreak.

      The research addresses a significant gap in our understanding of the evolutionary dynamics of coronaviruses, particularly their history, patterns of host-to-host transmission, and geographical spread. These aspects are important for predicting and managing future pandemic scenarios.

      Historically, studies have employed cophylogenetic tests to explore virus-host relationships within the Coronaviridae family, often suggesting a long history of virus-host codiversification spanning millions of years. However, the team led by Perez-Lamarque proposes a novel phylogenetic framework that contrasts this traditional view. Their approach, which involves adapting gene tree-species tree reconciliation, is designed to robustly test the validity of two competing scenarios: an ancient origination and codiversification versus a more recent emergence and diversification through host switching.

      Upon applying this innovative framework to the study of coronaviruses and their mammalian hosts, the authors' findings challenge the prevailing notion of a deep evolutionary history. Instead, their results strongly support a scenario where coronaviruses have a more recent origin, likely in bat populations, followed by diversification predominantly through hostswitching events. This diversification, interestingly, seems to occur preferentially within mammalian orders.

      A critical aspect of their findings is the identification of hotspots of coronavirus diversity, particularly in East Asia and Europe. These regions align with the proposed scenario of a relatively recent origin and subsequent localized host-switching events. The study also highlights the rarity of spillovers from bats to other species, yet underscores the relatively higher likelihood of such spillovers occurring towards humans, suggesting a significant role for humans as an intermediate host in the evolutionary journey of these viruses.

      The research also points out the high rates of host-switching within mammalian orders, including between humans, domesticated animals, and non-flying wild mammals.

      In conclusion, the study by Perez-Lamarque and colleagues presents an important quantitative advance in our understanding of the evolutionary history of mammalian coronaviruses. It suggests that the long-held belief in extensive virus-host codiversification may have been substantially overestimated, paving the way for a reevaluation of how we understand, predict, and potentially control the spread of these viruses.

      Strengths:

      The study is conceptually robust, and its conclusions are convincing.

      Weaknesses:

      Despite the availability of a dated host tree the authors were only able to use the "undated" model in ALE, with the dated method (which only allows time-consistent transfers) failing on their dataset (possibly due to dataset size?). Further exploration of the question would be potentially valuable.

      Our intuition is that ALE in its “dated” version does not necessarily fail on our dataset due to its size: ALE runs, but it provides unrealistic parameter estimates and is not able to output possible reconciliations, as mentioned in our Material and Methods section. We think this issue is mostly due to the fact that there is no pattern of codiversification: the coronavirus and mammal trees are so distinct that finding a reconciliation scenario between these trees with time-consistent switches is very difficult and ALE fails at estimating an amalgamated likelihood for such an unlikely scenario. We now ran the dated version of ALE independently on the smaller alpha and betacoronaviruses datasets. It still fails on the betacoronaviruses dataset.  On the alphacoronaviruses dataset, it does output significant reconciliations, however these reconciliations have a majority of events of transfers and losses, confirming that codiversification is unlikely in this clade.

      Reviewer #3 (Public Review):

      Summary:

      This work uses tools and concepts from co-phylogenetic analyses to reconstruct the evolutionary and diversification history of coronaviruses in mammals. It concludes that crossspecies transmissions from bats to humans are a relatively common event (compared to bats to other species). Across all mammals, the diversification history of coronaviruses suggests that there is potential for further evolutionary diversification.

      Strengths:

      The article uses an interesting approach based on jointly looking at the extant network of coronaviruses-mammals interactions, and the phylogenetic history of both these organisms. The authors do an impressive job of explaining the challenges of reconstructing evolutionary dynamics for RNA viruses, and this helps readers appraise the relevance of their approach.

      Weaknesses:

      I remain unconvinced by the argument that sampling does not introduce substantial biases in the analyses. As the authors highlight, incomplete knowledge of the extant interactions would lead to a biased reconstruction of the diversification history. In a recent paper (Poisot et al. 2023, Patterns), we look at sampling biases in the virome of mammals and suggest that is a fairly prominent issue, that is furthermore structured by taxonomy, space, and phylogenetic position. Case in point, even for betacoronaviruses, there have been many newly confirmed hosts in recent years. For organisms that have received less intense scrutiny, I think a thorough discussion of potential gaps in data would be required (see for example Cohen et al. 2022, Nat. Comms).

      I was also surprised to see little discussion of the differences between alpha and beta coronaviruses - there is evidence that they may differ in their cross-species transmission (see Caraballo et al. 2022 Micr. Spectr.), which could call into question the relevance of treating all coronaviruses as a single, homogeneous group.

      Some of the discussions in this paper also echo previous work by e.g. Geoghegan et al. (see 2017, PLOS Pathogens), which I was surprised to not see discussed, as it is a much earlier investigation of the relative frequencies of co-divergence and host switches for different viral families, with a deep discussion of how this may structure future evolutionary dynamics.

      We totally agree that sampling biases in the virome of mammals is a prominent issue, which is why we conducted a series of sensitivity analyses to test their effect on our main conclusions. We thoroughly tested the effect of (i) the unequal sampling effort across mammalian species that have been screened and (ii) the unequal screening of mammalian species across the mammalian tree of life by subsampling the data to correct for the unequal sampling effort (see Supporting Information Text). In both cases, we still reported low support for a scenario of codiversification, the origin in bats in East Asia, the preferential host switches within mammalian orders, and the rare spillovers from bats to humans. The robustness of our findings to sampling biases may be explained by the fact that the cophylogenetic approach we used (ALE) explicitly accounts for undersampling by assuming that all host switches involve unsampled intermediate hosts. To address the reviewer's comment, we now better underline the importance of sampling biases in our main text (see Discussion, lines 487-494) with supporting references (note that we did not find the Cohen et al. Nature Comm reference). We also better highlight our sensitivity analyses by moving them from the Supporting Information Text to the main text. 

      We agree that distinguishing between alpha and beta coronaviruses provides useful additional insights. We have run separate cophylogenetic analyses for these two sub-clades and now report the results of these additional analyses in the revised manuscript, and put them in context with the existing literature about the two sub-clades.

      We were not aware of the work of Geoghegan et al. (see 2017, PLOS Pathogens), thank you for providing this reference that is now cited. 

      Reviewer #1 (Recommendations For The Authors):

      (1) Overall I found this paper to be quite difficult to follow. The text needs clearer structure, which can be helped by writing in shorter paragraphs and adding section headings. For example, there are some very long paragraphs starting on L83, L176, L215, L511, and L598.

      We have now added section headings and divided these paragraphs into smaller ones.

      (2) It would be helpful to define some of the key terminology relating to the evolutionary interactions between the viruses and their hosts. Some of the terms that are typically used in the context include "coevolution", "cospeciation", "codivergence", and "codiversification". These have different meanings and need to be used carefully. The paper mostly deals with "codivergence" between coronaviruses and their host species.

      We now provide a list of definitions in Box S1. These definitions are as in our recent article clarifying the differences between these patterns/processes (Perez-Lamarque & Morlon 2024).

      Specific comments

      L83-L105: This paragraph can be written more concisely.

      We prefer to keep this paragraph like this as it contains key explanations that are necessary for understanding our approach and results.  

      Figure 1: The timescales of the trees are rather confusing. The different scales are indicated by the gray shading but this is easy to overlook. Maybe stretching or compressing the trees horizontally would help to emphasise the different timescales.

      Done.

      Figure 2: Note that the maximum clade credibility tree is a specific tree sampled from the posterior distribution - it is not a consensus tree. In the figure caption, the meaning of "location" is unclear.

      We have removed the word “consensus”, thank you for noting this. We have replaced “location” by “branching order”. 

      L461: How was the model chosen, and why were different models used in the BEAST and PhyloBayes analyses?

      We did our PhyloBayes analyses first and used the LG model following methodology outlined in previous studies using ALE (e.g. Groussin et al. 2017; Dorrell et al. 2021). Unfortunately, the LG model is not available in the default version of BEAST2 so we had to use a different model (the WAG model). We have now run BEAST2 with the LG model (thanks to the BEAST_CLASSIC package) and we obtained very similar results (see Figure below showing the BEAST consensus trees obtained with the WAG or LG models – they only slightly differ by the branching of the u7351 OTU). We have now added this information in the Methods section. 

      Author response image 1.

      L477: It is not clear to me how the PhyloBayes and BEAST analyses differ. Please expand the explanation of why PhyloBayes was used here.

      We have now clarified this (lines 594-597). 

      L568: Why not test explicitly for recombination?

      We did test for the occurrence of recombination using several approaches, including

      OpenRDP (https://github.com/PoonLab/OpenRDP), our own custom code, and Gubbins (Croucher et al. 2015). These tests were however inconclusive, indicating either the absence or presence of recombination, thus suggesting that the palmprint region is too short to infer anything about recombination. We thus do not exclude the possibility that recombination occurred, and test the robustness of our results to recombination by running our analyses on different sub-parts of the palmprint region. We have clarified this in our Material & Methods.

      L618: "DNA sequences" -> "RNA sequences"

      Done.

      The paper contains numerous minor grammatical errors and would benefit from careful proofreading and editing. Please check the use of plurals and apostrophes. Some of the errors are listed below:

      L49: "As several" -> "As with several"

      Done.

      L178: "reconciliates" -> "reconciles"?

      Done.

      L199: "extent" -> "extant"

      Done.

      L289: This sentence needs rephrasing to avoid a triple negative ("cannot ... reject ... not present")

      Done.

      L469: "temporary" -> "temporal"

      Done.

      L470: "neglectable" -> "negligible"

      Done.

      L577: "not only relying" -> "not relying only"

      Done.

      Reviewer #2 (Recommendations For The Authors):

      The study is generally well-constructed and its results are convincing. However, considering the availability of a dated host tree, conducting a dated reconciliation analysis could be beneficial. Creating a smaller sub-dataset and performing a dated reconciliation analysis would likely be a valuable addition to the research.

      We have now run the dated version of ALE on both the alpha and betacoronaviruses subclades. ALE dated still does not output reconciliations on the betacoronaviruses dataset, but it does on the smaller alphacoronaviruses dataset. We found significant reconciliations, indicating that mammal-alphacoronavirus associations are not random with respect to phylogeny, but the reconciliations involved more host switch and loss events (38 switches + 29 losses) than cospeciation events (65), indicating cophylogenetic signal in the absence of phylogenetic congruence (Perez-Lamarque & Morlon 2024). We now present the results on lines 264-282.  

      Reviewer #3 (Recommendations For The Authors):

      I think the results are written in a very speculative way, with many sentence fragments that should really be part of the discussion.

      We have carefully checked our Results section and rephrased or removed formulation that may have been perceived as speculative.  

      There are a lot of considerations in this manuscript about spread and future pandemics, but I think this is very far from the topic of this paper. When we quantified the coevolutionary risk of bats-betacovs in a recent paper (Forero et al. 2024, Virus Evol.), we only briefly touched upon this discussion because we compared our outputs with a measure of human population density. I don't think the manuscript needs to talk about epidemiology at all, and it would probably be more useful as a purely evo-bio piece.

      We think that it is useful to discuss the potential implications of our results for future pandemics, even though we agree that this discussion is rather speculative. We have removed the mention of predictions in the Abstract and have softened our wording in the Discussion.  

      References:

      Croucher, N.J., Page, A.J., Connor, T.R., Delaney, A.J., Keane, J.A., Bentley, S.D., et al. (2015). Rapid phylogenetic analysis of large samples of recombinant bacterial whole genome sequences using Gubbins. Nucleic Acids Res., 43, e15.

      Dorrell, R.G., Villain, A., Perez-Lamarque, B., Audren de Kerdrel, G., McCallum, G., Watson, A.K., et al. (2021). Phylogenomic fingerprinting of tempo and functions of horizontal gene transfer within ochrophytes. Proc. Natl. Acad. Sci., 118, e2009974118.

      Edgar, R.C. et al. (2022). Petabase-scale sequence alignment catalyses viral discovery. Nature 602, 142–147.

      Groussin, M., Mazel, F., Sanders, J.G., Smillie, C.S., Lavergne, S., Thuiller, W., et al. (2017).

      Unraveling the processes shaping mammalian gut microbiomes over evolutionary time. Nat. Commun., 8, 14319.

      Perez-Lamarque, B. & Morlon, H. (2024). Distinguishing cophylogenetic signal from phylogenetic congruence clarifies the interplay between evolutionary history and species interactions. Syst. Biol.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer 1 (Public Review):

      Thank you for the helpful comments. Below, we have quoted the relevant sections from the revised manuscript as we respond to the reviewer’s comments item-by-item.

      Weaknesses:

      While the task design in this study is intentionally stimulus-rich and places a minimal constraint on the animal to preserve naturalistic behavior, this is, unfortunately, a double-edged sword, as it also introduces additional variables that confound some of the neural analysis. Because of this, a general weakness of the study is a lack of clear interpretability of the task variable neural correlates. This is a limitation of the task, which includes many naturally correlated variables - however, I think with some additional analyses, the authors could strengthen some of their core arguments and significantly improve clarity.

      We acknowledge the weakness and have included additional analyses to compensate for it. The details are as follows in our reply to the subsequent comments.  

      For example, the authors argue, based on an ANN decoding analysis (Figure 2b), that PFC neurons encode spatial information - but the spatial coordinate that they decode (the distance to the active foraging zone) is itself confounded by the fact that animals exhibit different behavior in different sections of the arena. From the way the data are presented, it is difficult to tell whether the decoder performance reflects a true neural correlate of distance, or whether it is driven by behavior-associated activity that is evoked by different behaviors in different parts of the arena. The author's claim that PFC neurons encode spatial information could be substantiated with a more careful analysis of single-neuron responses to supplement the decoder analysis. For example, 1) They could show examples of single neurons that are active at some constant distance away from the foraging site, regardless of animal behavior, and 2) They could quantify how many neurons are significantly spatially modulated, controlling for correlates of behavior events. One possible approach to disambiguate this confound could be to use regression-based models of neuron spiking to quantify variance in neuron activity that is explained by spatial features, behavioral features, or both.

      First of all, we would like to point out that while the recording was made during naturalistic foraging with minimal constraints behaviorally, a well-trained rat displayed an almost fixed sequence of actions within each zone. The behavioral repertoire performed in each zone was very different from each other: exploratory behaviors in the N-zone, navigating back and forth in the F-zone, and licking sucrose while avoiding attacks in the E-zone. Therefore, the entire arena is not only divided by the geographical features but also by the distinct set of behaviors performed in each zone. This is evident in the data showing a higher decoding accuracy of spatial distance in the F-zone than in the N- or E-zone. In this sense, the heterogeneous encoding reflects heterogenous distribution of dominant behaviors (navigation in the F-zone and attack avoidance while foraging in the E-zone) and hence corroborate the reviewer’s comment at a macroscopic scale encompassing the entire arena.

      Having said that, the more critical question is whether the neural activity is more correlated with microscopic behaviors at every moment rather than the location decoded in the F-zone. As the reviewer suggested, the first-step is to analyze single-neuron activity to identify whether direct neural correlates of location exist. To this end, traditional place maps were constructed for individual neurons. Most neurons did not show cohesive place fields across different regions, indicating little-to-no direct place coding by individual neurons. Only a few neurons displayed recognizable place fields in a consistent manner. However, even these place fields were irregular and patchy, and therefore, nothing comparable to the place cells or grid cells found in the hippocampus or entorhinal cortex. Some examples firing maps have been added to Figure 2 and characterized in the text as below.

      “To determine whether location-specific neural activity exists at the single-cell level in our mPFC data, a traditional place map was constructed for individual neurons. Although most neurons did not show cohesive place fields across different regions in the arena, a few neurons modulated their firing rates based on the rat’s current location. However, even these neurons were not comparable to place cells in the hippocampus (O’Keefe & Dostrovsky, 1971) or grid cells in the entorhinal cortex (Hafting et al., 2005) as the place fields were patchy and irregular in some cases (Figure 2B; Units 66 and 125) or too large, spanning the entire zone rather than a discrete location within it (Units 26 and 56). The latter type of neuron has been identified in other studies (e.g., Kaefer et al., 2020).”

      Next, to verify whether the location decoding reflects neuronal activity due to external features or particular type of action, predicted location was compared between the opposite directions within the F-zone, inbound and outbound in reference to the goal area (Lobsterbot). If the encoding were specifically tied to a particular action or environmental stimuli, there should be a discrepancy when the ANN decoder trained with outbound trajectory is tested for predictions on the inbound path, and vice versa. However, the results showed no significant difference between the two trajectories, suggesting that the decoded distance was not simply reflecting neural responses to location-specific activities or environmental cues during navigation.

      “To determine whether the accuracy of the regressor varied depending on the direction of movement, we compared the decoding accuracy of the regressor for outbound (from the N- to E-zone) vs. inbound (from the E- to N- zone) navigation within the F-zone. There was no significant difference in decoding accuracy between outbound vs. inbound trips (paired t-test; t(39) = 1.52, p =.136), indicating that the stability of spatial encoding was maintained regardless of the moving direction or perceived context (Figure 2E).”

      Additionally, we applied the same regression analysis on a subset of data that were recorded while the door to the robot compartment was closed during the Lobsterbot sessions. This way, it is possible to test the decoding accuracy when the most salient spatial feature, the Lobsterbot, is blocked out of sight. The subset represents an average of 38.92% of the entire session. Interestingly, the decoding accuracy with the subset of data was higher accuracy than that with the entire dataset, indicating that the neural activities were not driven by a single salient landmark. This finding supports our conclusion that the location information can be decoded from a population of neurons rather than from individual neurons that are associated with environmental or proprioceptive cues. We have added the following description of results in the manuscript.

      “Previous analyses indicated that the distance regressor performed robustly regardless of movement direction, but there is a possibility that the decoder detects visual cues or behaviors specific to the E-zone. For example, neural activity related to Lobsterbot confrontation or licking behavior might be used by the regressor to decode distance. To rule out this possibility, we analyzed a subset of data collected when the compartment door was closed, preventing visual access to the Lobsterbot and sucrose port and limiting active foraging behavior. The regressor trained on this subset still decoded distance with a MAE of 12.14 (± 3.046) cm (paired t-test; t(39) = 12.17, p <.001). Notably, the regressor's performance was significantly higher with this subset than with the full dataset (paired t-test; t(39) = 9.895, p <.001).”

      As for the comment on “using regression-based models of neuron spiking to quantify variance in neuron activity that is explained by spatial features, behavioral features, or both”, it is difficult to separate a particular behavioral event let alone timestamping it since the rat’s location was being monitored in the constantly-moving, naturalistic stream of behaviors. However, as mentioned above, a new section entitled “Overlapping populations of mPFC neurons adaptively encode spatial information and defensive decision” argues against single-neuron based account by performing the feature importance analysis. The results showed that even when the top 20% of the most informative neurons were excluded, the remaining neural population could still decode both distance and events.  This analysis supports the idea of a population-wide mode shift rather than distinct subgroups of neurons specialized in processing different sensory or motor events. This idea is also expressed in the schematic diagrams featured in Figure 8 of the revision.

      To substantiate the claim that PFC neurons really switch between different coding "modes," the authors could include a version of this analysis where they have regressed out, or otherwise controlled for, these confounds. Otherwise, the claim that the authors have identified "distinctively different states of ensemble activity," as opposed to simple coding of salient task features, seems premature.

      A key argument in our study is that the mPFC neurons encode different abstract internal representations (distance and avoidance decision) at the level of population. This has been emphasized in the revision with additional analyses and discussions. Most of all, we performed single neuron-based analysis for both spatial encoding (place fields for individual neurons) and avoidance decision (PETHs for head entry and head withdrawal) and contrasted the results with the population analysis. Although some individual neurons displayed a fractured “place cell-like” activity, and some others showed modulated firing at the head-entry and the head-withdrawal events, the ensemble decoding extracted distance information for the current location of the animal at a much higher accuracy. Furthermore, the PCA analysis identified abstract feature dimensions especially regarding the activity in the E-zone that cannot be attributable to a small number of sensory- or motor-related neurons. 

      To mitigate the possibility that the PCA is driven primarily by a small subset of units responsive to salient behavioral events, we also applied PCA to the dataset excluding the activity in the 2-second time window surrounding the head entry and withdrawal. While this approach does not eliminate all cue- or behavior-related activity within the E-zone, it does remove the neural activity associated with emotionally significant events, such as entry into the E-zone, the first drop of sucrose, head withdrawal, and the attack. Even without these events, the PC identified in the E-zone was still separated from those in the F-zone and N-zone. This result again argues in support of distinct states of ensemble activity formed in accordance with different categories of behaviors performed in different zones. Finally, the Naïve Bayesian classifier trained with ensemble activity in the E-zone was able to predict the success and failure of avoidance that occur a few seconds later, indicating that the same population of neurons are encoding the avoidance decision rather than the location of the animal.

      Reviewer 1 (Recommendations):

      The authors include an analysis (Figure 4) of population responses using PCA on session-wide data, which they use to support the claim that PFC neurons encode distinctive neural states, particularly between the encounter zone and nesting/foraging zones. However, because the encounter zone contains unique stimulus and task events (sucrose, threat, etc.), and the samples for PCA are drawn from the entire dataset (including during these events), it seems likely that the Euclidean distance measures analyzed in Figure 4b are driven mostly by the neural correlates of these events rather than some more general change in "state" of PFC dynamics. This does not invalidate this analysis but renders it potentially redundant with the single neuron results shown in Figure 5 - and I think the interpretation of this as supporting a state transition in the coding scheme is somewhat misleading. The authors may consider performing a PCA/population vector analysis on the subset of timepoints that do not contain unique behavior events, rather than on session-wide data, or otherwise equalizing samples that correspond to behavioral events in different zones. Observing a difference in PC-projected population vectors drawn from samples that are not contaminated by unique encounter-related events would substantiate the idea that there is a general shift in neural activity that is more related to the change in context or goal state, and less directly to the distinguishing events themselves.

      Thank you for the comments. Indeed, this is a recurring theme where the reviewers expressed concerns and doubts about heterogenous encoding of different functional modes. Besides the systematic presentation of the results in the manuscript, from PETH to ANN and to Bayesian classifier, we argue, however, that the activity of the mPFC neurons is better represented by the population rather than loose collection of stimulus- or event-related neurons.

      The PCA results that we included as the evidence of distinct functional separation, might reflect activities driven by a small number of event-coding neurons in different zones. As mentioned in the public review, we conducted the same analysis on a subset of data that excluded neural activity potentially influenced by significant events in the E-zone. The critical times are defined as ± 1 second from these events and excluded from the neural data. Despite these exclusions, the results continued to show populational differences between zones, reinforcing the notion that neurons encode abstract behavioral states (decision to avoid or stay) without the sensory- or motor-related activity. Although this analysis does not completely eliminate all possible confounding factors emerging in different external and internal contexts, it provides extra support for the population-level switch occurring in different zones.

      In Figure 7, the authors include a schematic that suggests that the number of neurons representing spatial information increases in the foraging zone, and that they overlap substantially with neurons representing behaviors in the encounter zone, such as withdrawal. They show in Figure 3 that location decoding is better in the foraging zone, but I could not find any explicit analysis of single-neuron correlates of spatial information as suggested in the schematic. Is there a formal analysis that lends support to this idea? It would be simple, and informative, to include a quantification of the fraction of spatial- and behavior-modulated neurons in each zone to see if changes in location coding are really driven by "larger" population representations. Also, the authors could quantify the overlap between spatial- and behavior-modulated neurons in the encounter zone to explicitly test whether neurons "switch" their coding scheme.

      The Figure 7 (now Figure 8) is now completely revised. The schematic diagram is modified to show spatial and avoidance decision encoding by the overlapping population of mPFC neurons (Figure 8a). Most notably, there are very few neurons that encode location but not the avoidance decision or vice versa. This is indicated by the differently colored units in F-zone vs. E-zone. The model also included units that are “not” engaged in any type of encoding or engaged in only one-type of encoding although they are not the majority.

      We have also added a schematic for hypothetical switching mechanisms (Figure 8b) to describe the conceptual scheme for the initiation of encoding-mode switching (sensory-driven vs. arbitrator-driven process)

      “Two main hypotheses could explain this switch. A bottom-up hypothesis suggests sensory inputs or upstream signals dictate encoding priorities, while a top-down hypothesis proposes that an internal or external “arbitrator” selects the encoding mode and coordinates the relevant information (Figure 8B). Although the current study is only a first step toward finding the regulatory mechanism behind this switch, our control experiment, where rats reverted to a simple shuttling task, provide evidence that might favor the top-down hypothesis. The absence of the Lobsterbot degraded spatial encoding rather than enhancing it, indicating that simply reducing the task demand is not sufficient to activate one particular type of encoding mode over another.  The arbitrator hypothesis asserts that the mPFC neurons are called on to encode heterogenous information when the task demand is high and requires behavioral coordination beyond automatic, stimulus-driven execution. Future studies incorporating multiple simultaneous tasks and carefully controlling contextual variables could help determine whether these functional shifts are governed by top-down processes involving specific neural arbitrators or by bottom-up signals.”

      Related to this difference in location coding throughout the environment, the authors suggest in Figure 3a-b that location coding is better in the foraging zone compared to the nest or encounter zones, evidenced by better decoder performance (smaller error) in the foraging zone (Figure 3b). The authors use the same proportion of data from the three zones for setting up training/test sets for cross-validation, but it seems likely that overall, there are substantially more samples from the foraging zone compared to the other two zones, as the animal traverses this section frequently, and whenever it moves from the next into the encounter zone (based on the video). What does the actual heatmap of animal location look like? And, if the data are down-sampled such that each section contributes the same proportion of samples to decoder training, does the error landscape still show better performance in the foraging zone? It is important to disambiguate the effects of uneven sampling from true biological differences in neural activity.

      Thank you for the comment. We agree with the concern regarding uneven data size from different sections of the arena. Indeed, as the heatmap below indicates, the rats spent most of their time in two critical locations, one being a transition area between N-and F-zone and the other near the sucrose port. This imbalance needs to be corrected. In fact we have included methodology to correct this biased sampling. In the result section “Non-navigational behavior reduces the accuracy of decoded location” we have the following results.

      Author response image 1.

      Heatmap of the animal’s position during one example session. (Left) Unprocessed occupancy plot. Each dot represents 0.2 seconds. Right) Smoothed occupancy plot using a Gaussian filter (sigma: 10 pixels, filter size: 1001 pixels). The white line indicates a 10 cm length.

      “To correct for the unequal distribution of location visits (more visits to the F- than to other zones), the regressor was trained using a subset of the original data, which was equalized for the data size per distance range (see Materials and Methods). Despite the correction, there was a significant main effect of the zone (F(1.16, 45.43) = 119.2, p <.001) and the post hoc results showed that the MAEs in the N-zone (19.52 ± 4.46 cm; t(39) = 10.45; p <.001) and the E-zone (26.13 ± 7.57 cm; t(39) = 11.40; p <.001) had a significantly higher errors when compared to the F-zone (14.10 ± 1.64 cm).”

      Also in the method section, we have stated that:

      “In the dataset adjusted for uneven location visits, we divided distance values into five equally sized bins. Then, a sub-dataset was created that contains an equal number of data points for each of these bins.”

      Why do the authors choose to use a multi-layer neural network (Figure 2b-c) to decode the animal's distance to the encounter zone?(…) The authors may consider also showing an analysis using simple regression, or maybe something like an SVM, in addition to the ANN approach.

      We began with a simple linear regression model and progressed to more advanced methods, including SVM and multi-layer neural networks. As shown below, simpler methods could decode distance to some extent, but neural networks and random forest regressors outperformed others (Neural Network: 16.61 cm ± 3.673; Linear Regression: 19.85 cm ± 2.528; Quadratic Regression: 18.68 cm ± 4.674; SVM: 18.88 cm ± 2.676; Random Forest: 13.59 cm ± 3.174).

      We chose the neural network model for two main reasons: (1) previous studies demonstrated its superior performance compared to Bayesian regressors commonly used for decoding neural ensembles, and (2) its generalizability and robustness against noisy data. Although the random forest regressor achieved the lowest decoding error, we avoided using it due to its tendency to overfit and its limited generalization to unseen data.

      Overall, we expect similar results with other regressors but with different statistical power for decoding accuracy. Instead, we speculate that neural network’s use of multiple nodes contributes to robustness against noise from single-unit recordings and enables the network to capture distributed processing within neural ensembles.

      In Figure 6c, the authors show a prediction of withdrawal behavior based on neural activity seconds before the behavior occurs. This is potentially very interesting, as it suggests that something about the state of neural dynamics in PFC is potentially related to the propensity to withdraw, or to the preparation of this behavior. However, another possibility is that the behaves differently, in more subtle ways, while it is anticipating threat and preparing withdrawal behavior - since PFC neurons are correlated with behavior, this could explain decoder performance before the withdrawal behavior occurs. To rule out this possibility, it would be useful to analyze how well, and how early, withdrawal success can be decoded only on the basis of behavioral features from the video, and then to compare this with the time course of the neural decoder. Another approach might be to decode the behavior on the basis of video data as well as neural data, and using a model comparison, measure whether inclusion of neural features significantly increases decoder performance.

      We appreciate this important point, as mPFC activity might indeed reflect motor preparation preceding withdrawal behavior. Another reviewer raised a similar concern regarding potential micro-behavioral influences on mPFC activity prior to withdrawal responses. However, our behavioral analysis suggests that highly trained rats engage in sucrose licking which has little variability regardless of the subsequent behavioral decision. To support, 95% of inter-lick intervals were less than 0.25 seconds, which is not enough time to perform any additional behavior during encounters.

      Author response image 2.

      To further clarify this, we included additional video showing both avoidance and escape withdrawals at close range. This video was recorded during the development of the behavioral paradigm, though we did not routinely collect this view, as animals consistently exhibited stable licking behavior in the E-zone. As demonstrated in the video, the rat remains highly focused on the lick port with minimal body movement during encounters. Therefore, we believe that the neural ensemble dynamics observed in the mPFC are unlikely to be driven by micro-behavioral changes.

      Reviewer 2 (Public Review):

      Thank you for the positive comment on our behavior paradigm and constructive suggestions on additional analysis. We came to think that the role of mPFC could be better portrayed as representing and switching between different encoding targets under different contexts, which in part, was more clearly manifested by the naturalistic behavioral paradigm. In the revision we tried to convey this message more explicitly and provide a new perspective for this important aspect of mPFC function.

      It is not clear what proportion of each of the ensembles recorded is necessary for decoding distance from the threat, and whether it is these same neurons that directly 'switch' to responding to head entry or withdrawal in the encounter phase within the total population. The PCA gets closest to answering this question by demonstrating that activity during the encounter is different from activity in the nesting or foraging zones, but in principle this could be achieved by neurons or ensembles that did not encode spatial parameters. The population analyses are focused on neurons sensitive to behaviours relating to the threat encounter, but even before dividing into subtypes etc., this is at most half of the recorded population.

      In our study, the key idea we aim to convey is that mPFC neurons adapt their encoding schemes based on the context or functional needs of the ongoing task. Other reviewers also suggested strengthening the evidence that the same neurons directly switch between encoding two different tasks. The counteracting hypothesis to "switching functions within the same neurons" posits that there are dedicated subsets of neurons that modulate behavior—either by driving decisions/behaviors themselves or being driven by computations from other brain regions.

      To test this idea, we included an additional analysis chapter in the results section titled Overlapping populations of mPFC neurons adaptively encode spatial information and defensive decision. In this section, we directly tested this hypothesis by examining each neuron's contribution to the distance regressor and the event classifier. The results showed that the histogram of feature importance—the contribution to each task—is highly skewed towards zero for both decoders, and removing neurons with high feature importance does not impair the decoder’s performance. These findings suggest that 1) there is no direct division among neurons involved in the two tasks, and 2) information about spatial/defensive behavior is distributed across neurons.

      Furthermore, we tested whether there is a negative correlation between the feature importance of spatial encoding and avoidance encoding. Even if there were no “key neurons” that transmit a significant amount of information about either spatial or defensive behavior, it is still possible that neurons with higher information in the navigation context might carry less information in the active-foraging context, or vice versa. However, we did not observe such a trend, suggesting that mPFC neurons do not exhibit a preference for encoding one type of information over the other.

      Lastly, another reviewer raised the concern that the PCA results, which we used as evidence of functional separation of different ensemble functions, might be driven by a small number of event-coding neurons. To address this, we conducted the same analysis on a subset of data that excluded neural activity potentially influenced by significant events in the E-zone. In the Peri-Event Time Histogram (PETH) analysis, we observed that some neurons exhibit highly-modulated activity upon arrival at the E-zone (head entry; HE) and immediately following voluntary departure or attack (head withdrawal; HW). We defined 'critical event times' as ± one second from these events and excluded neural data from these periods to determine if PCA could still differentiate neural activities across zones. Despite these exclusions, the results continued to show populational differences between zones, reinforcing the notion that neurons adapt their activity according to the context. We acknowledge that this analysis still cannot eliminate all of the confounding factors due to the context change, but we confirmed that excluding two significant events (delivery onset of sucrose and withdrawal movement) does not alter our result.

      To summarize, these additional results further support the conclusion that spatial and avoidance information is distributed across the neural population rather than being handled by distinct subsets. The analyses revealed no negative correlation between spatial and avoidance encoding, and excluding event-driven neural activity did not alter the observed functional separation, confirming that mPFC neurons dynamically adjust their activity to meet contextual demands.

      A second concern is also illustrated by Fig. 7: in the data presented, separate reward and threat encoding neurons were not shown - in the current study design, it is not possible to dissociate reward and threat responses as the data without the threat present were only used to study spatial encoding integrity.

      Thank you for this valuable feedback. Other reviewers have also noted that Figure 7 (now Figure 8) is misleading and contains assertions not supported by our experiments. In response, we have revised the model to more accurately reflect our findings. We have eliminated the distinction between reward coding and threat coding neurons, simplifying it to focus on spatial encoding and avoidance encoding neurons. The updated figure will more appropriately align with our findings and claims. A. Distinct functional states (spatial vs. avoidance decision) encoded by the same population neurons are separable by the region (F- vs. E zone). B. Hypothetical control models by which mPFC neurons assume different functional states.

      Thirdly, the findings of this work are not mechanistic or functional but are purely correlational. For example, it is claimed that analyzing activity around the withdrawal period allows for ascertaining their functional contributions to decisions. But without a direct manipulation of this activity, it is difficult to make such a claim. The authors later discuss whether the elevated response of Type 2 neurons might simply represent fear or anxiety motivation or threat level, or whether they directly contribute to the decision-making process. As is implicit in the discussion, the current study cannot differentiate between these possibilities. However, the language used throughout does not reflect this. 

      We acknowledge that our experiments only involve correlational study and this serves as weakness. Although we carefully managed to select word to not to be deterministic, we agree that some of the language might mislead readers as if we found direct functional contribution. Thus, we changed expressions as below.

      “We then further analyzed the (functional contribution ->)correlation between neural activity and success and failure of avoidance behavior. If the mPFC neurons (encode ->)participate in the avoidance decisions, avoidance withdrawal (AW; withdrawal before the attack) and escape withdrawal (EW; withdrawal after the attack) may be distinguishable from decoded population activity even prior to motor execution.”

      Also, we added part below in discussion section to clarify the limitations of the study.

      “Despite this interesting conjecture, any analysis based on recording data is only correlational, mandating further studies with direct manipulation of the subpopulation to confirm its functional specificity.”

      Fourthly, the authors mention the representation of different functions in 'distinct spatiotemporal regions' but the bulk of the analyses, particularly in terms of response to the threat, do not compare recordings from PL and IL although - as the authors mention in the introduction - there is prior evidence of functional separation between these regions.

      Thank you for bringing this part to our attention. As we mentioned in the introduction, we acknowledge the functional differences between the PL and IL regions. Although differences in spatial encoding between these two areas were not deeply explored, we anticipated finding differences in event encoding, given the distinct roles of the PL and IL in fear and threat processing. However, our initial analysis revealed no significant differences in event encoding between the regions, and as a result, we did not emphasize these differences in the manuscript. To address this point, we have reanalyzed the data separately and included the following findings in the manuscript.

      “However, we did not observe a difference in decoding accuracy between the PL and IL ensembles, and there were no significant interactions between regressor type (shuffled vs. original) and regions (mixed-effects model; regions: p=.996; interaction: p=.782). These results indicate that the population activity in both the PL and IL contains spatial information (Figure 2D, Video 3).

      […]

      Furthermore, we analyzed whether there is a difference in prediction accuracy between sessions with different recorded regions, the PL and the IL. A repeated two-way ANOVA revealed no significant difference between recorded regions, nor any interaction (regions: F(1, 38) = 0.1828, p = 0.671; interaction: F(1, 38) = 0.1614, p = 0.690).

      […]

      We also examined whether there is a significant difference between the PL and IL in the proportion of Type 1 and Type 2 neurons. In the PL, among 379 recorded units, 143 units (37.73%) were labeled as Type 1, and 75 units (19.79%) were labeled as Type 2. In contrast, in the IL, 156 units (61.66%) and 19 units (7.51%) of 253 recorded units were labeled as Type 1 and Type 2, respectively. A Chi-square analysis revealed that the PL contains a significantly higher proportion of Type 2 neurons (χ²(1, 632) = 34.85, p < .001), while the IL contains a significantly higher proportion of Type 1 neurons compared to the other region (χ²(1, 632) = 18.07, p < .001).”

      To summarize our additional results, we did not observe performance differences in distance decoding or event decoding. The only difference we observed was the proportional variation of Type 1 and Type 2 neurons when we separated the analysis by brain region. These results are somewhat counterintuitive, considering the distinct roles of the two regions—particularly the PL in fear expression and the IL in extinction learning. However, since the studies mentioned in the introduction primarily used lesion and infusion methods, this discrepancy may be due to the different approach taken in this study. Considering this, we have added the following section to the discussion.

      “Interestingly, we found no difference between the PL and IL in the decoding accuracy of distance or avoidance decision. This somewhat surprising considering distinct roles of these regions in the long line of fear conditioning and extinction studies, where the PL has been linked to fear expression and the IL to fear extinction learning (Burgos-Robles et al., 2009; Dejean et al., 2016; Kim et al., 2013; Quirk et al., 2006; Sierra-Mercado et al., 2011; Vidal-Gonzalez et al., 2006). On the other hand, more Type 2 neurons were found in the PL and more Type 1 neurons were found in the IL. To recap, typical Type 1 neurons increased the activity briefly after the head entry and then remained inhibited, while Type 2 neurons showed a burst of activity during head entry and sustained increased activity. One study employing context-dependent fear discrimination task (Kim et al., 2013) also identified two distinct types of PL units: short-latency CS-responsive units, which increased firing during the initial 150 ms of tone presentation, and persistently firing units, which maintained firing for up to 30 seconds. Given the temporal dynamics of Type 2 neurons, it is possible that our unsupervised clustering method may have merged the two types of neurons found in Kim et al.’s study.

      While we did not observe decreased IL activity during dynamic foraging, prior studies have shown that IL excitability decreases after fear conditioning (Santini et al., 2008), and increased IL activity is necessary for fear extinction learning. In our paradigm, extinction learning was unlikely, as the threat persisted throughout the experiment. Future studies with direct manipulation of these subpopulations, particularly examining head withdrawal timing after such interventions, could provide insight into how these subpopulations guide behavior.”

      Additionally, we made some changes in the introduction, mainly replacing the PL/IL with mPFC to be consistent with the main body of results and conclusion and also specifying the correlational nature of the recording study.

      “Machine learning-based populational decoding methods, alongside single-cell analyses, were employed to investigate the correlations between neuronal activity and a range of behavioral indices across different sections within the foraging arena.”

      Reviewer 2 (Recommendations):

      The authors consistently use parametric statistical tests throughout the manuscript. Can they please provide evidence that they have checked whether the data are normally distributed? Otherwise, non-parametric alternatives are more appropriate.

      Thank you for mentioning this important issue in the analysis. We re-ran the test of normality for all our data using the Shapiro-Wilk test with a p-value of .05 and found that the following data sets require non-parametric tests, as summarized in Author response table 1 below. For those analyses which did not pass the normality test, we used a non-parametric alternative test instead. We also updated the methods section. For instance, repeated measures ANOVA for supplementary figure S1 and PCA results were changed to the Friedman test with Dunn’s multiple comparison test.

      Author response table 1.

      Line 107: it is not clear here or in the methods whether a single drop of sucrose solution is delivered per lick or at some rate during the encounter, both during the habituation or in the final task. This is important information in order to understand how animals might make decisions about whether to stay or leave and how to interpret neural responses during this time period. Or is it a large drop, such that it takes multiple licks to consume? Please clarify.

      The apparatus we used incorporated an IR-beam sensor-controlled solenoid valve. As the beam sensor was located right in front of the pipe, the rat’s tongue activated the sensor. As a result, each lick opened the valve for a brief period, releasing a small amount of liquid, and the rat had to continuously lick to gain access to the sucrose. We carefully regulated the flow of the liquid and installed a small sink connected to a vacuum pump, so any remaining sucrose not consumed by the rat was instantly removed from the port. We clarified how sucrose was delivered in the methods section and also in the results section.

      Method:

      “The sucrose port has an IR sensor which was activated by a single lick. The rat usually stays in front of the lick port and continuously lick up to a rate of 6.3 times per second to obtain sucrose. Any sucrose droplets dropped in the bottom sink were immediately removed by negative pressure so that the rat’s behavior was focused on the licking.”

      Result:

      “The lick port was activated by an IR-beam sensor, triggering the solenoid valve when the beam was interrupted. The rat gradually learned to obtain rewards by continuously licking the port.”

      However, I'm not sure I understand the authors' logic in the interpretation: does the S-phase not also consist of goal-directed behaviour? To me, the core difference is that one is mediated by threat and the other by reward. In addition, it would be helpful to visualize the behaviour in the S-phase, particularly the number of approaches. This difference in the amount of 'experience' so to speak might drive some of the decrease in spatial decoding accuracy, even if travel distance is similar (it is also not clear how travel distance is calculated - is this total distance?) Ideally, this would also be included as a predictor in the GLM.

      We agree that the behaviors observed during the shuttling phase can also be considered goal-directed, as the rat moves purposefully toward explicit goals (the sucrose port and the N-zone during the return trip). However, we argue that there is a significant difference in the level of complexity of these goals.

      During the L-phase, the rat not only has to successfully navigate to the E-zone for sucrose but also pay attention to the robots, either to avoid an attack from the robot's forehead or escape the fast-striking motion of the claw. When the rat runs toward the E-zone, it typically takes a side-approaching path, similar to Kim and Choi (2018), and exhibits defensive behaviors such as a stretched posture, which were not observed in the S-phase. This behavioral characteristic differs from the S-phase, where the rat adopted a highly stereotyped navigation pattern fairly quickly (within 3 sessions), evidenced by more than 50 shuttling trajectories per session. In this phase, the rat exhibited more stimulus-response behavior, simply repeating the same actions over time without deliberate optimization.

      In our additional experiment with two different levels of goal complexity (reward-only vs. reward/threat conflict), we used a between-subject design in which both groups experienced both the S-phase and L-phase before surgery and underwent only one type of session afterward. This approach ruled out the possibility of differences in contextual experience. Additionally, since we initially designed the S-phase as extended training, behaviors in the apparatus tended to stabilize after rats completed both the S-phase and L-phase before surgery. As a result, we compared the post-surgery Lobsterbot phase to the post-surgery shuttling phase to investigate how different levels of goal complexity shape spatial encoding strength.

      To clarify our claim, we edited the paragraph below.

      “This absence of spatial correlates may result from a lack of complex goal-oriented navigation behavior, which requires deliberate planning to acquire more rewards and avoid potential threats.

      […]

      After the surgery, unlike the Lob-Exp group, the Ctrl-Exp group returned to the shuttling phase, during which the Lobsterbot was removed. With this protocol, both groups experienced sessions with the Lobsterbot, but the Ctrl-Exp group's task became less complex, as it was reduced to mere reward collection.

      . Given these observations, along with the mPFC’s lack of consistency in spatial encoding, it is plausible that the mPFC operates in multiple functional modes, and the spatial encoding mode is preempted when the complexity of the task requires deliberate spatial navigation.”

      Additionally, we added behavior data during initial S-phase into Supplementary Figure 1.

      It is good point that the amount of experience might drive decrease in spatial decoding accuracy. To test this hypothesis, we added a new variable, the number of Lobsterbot sessions after surgery, to the previous GLM analysis. The updated model predicted the outcome variable with significant accuracy (F(4,44) = 10.31, p < .001), and with the R-squared value at 0.4838. The regression coefficients were as follows: presence of the Lobsterbot (2.76, standard error [SE] = 1.11, t = 2.42, p = .020), number of recorded cells (-0.43, SE = .08, t = -5.22, p < .001), recording location (0.90, SE = 1.11, p = .424), and number of L sessions (0.002, SE = 0.11, p = .981). These results indicate that the number of exposures to the Lobsterbot sessions, as a measure of experience, did not affect spatial decoding accuracy.

      For minor edit, we edited the term as “total travel distance”.

      Relating to the previous point, it should be emphasized in both sections on removing the Lobsterbot and on non-navigational behaviours that the spatial decoding is all in reference to distance from the threat (or reward location). The language in these sections differs from the previous section where 'distance from the goal' is mentioned. If the authors wish to discuss spatial decoding per se, it would be helpful to perform the same analysis but relative to the animals' own location which might have equal accuracy across locations in the arena. Otherwise, it is worth altering the language in e.g. line 258 onwards to state the fact that distance to the goal is only decodable when animals are actively engaged in the task.

      Thank you for this comment, we changed the term as “distance from the conflict zone” or “distance of the rat to the center of the E-zone” to clarify our experiment setup.

      In Fig. 5, why is the number of neurons shown in the PETHs less than the numbers shown in the pie charts?

      The difference in the number of neurons between the PETHs and the pie charts in Figure 5 is because PETHs are drawn only for 'event-responsive' units. For visualizing the neurons, we selectively included those that met certain criteria described in Method section (Behavior-responsive unit analysis). We have updated the caption for Figure 5 as follows to minimize confusion.

      “Multiple subpopulations in the mPFC react differently to head entry and head withdrawal.

      (A) Top: The PETH of head entry-responsive units is color-coded based on the Z-score of activity.

      (C) The PETH of head withdrawal-responsive units is color-coded based on the Z-score of activity.”

      I appreciate the amount of relatively unprocessed data plotted in Figure 5, but it would be great to visualize something similar for AW vs. EW responses within the HW2 population. In other words, what is there that's discernably different within these responses that results in the findings of Fig. 6?

      To visualize the difference in neural activity between AW and EW, we included an additional supplementary figure (Supplementary Figure 5). We divided the neurons into Type 1 and Type 2 and plotted PETH during Avoidance Withdrawal (AW) and Escape Withdrawal (EW). Consistent with the results shown in Figure 6d, we could visually observe increased activity in Type 2 neurons before the execution of AW compared to EW. However, we couldn’t find a similar pattern in Type 1 neurons.

      On a related note, it would add explanatory power if the authors were able to more tightly link the prediction accuracy of the ensemble (particularly the Type 2 neurons) to the timing of the behaviour. Earlier in the manuscript it would be helpful to show latency to withdraw in AW trials; are animals leaving many seconds before the attack happens, or are they just about anticipating the timing of the attack? And therefore when using ensemble activity to predict the success of the AW, is the degree to which this can be done in advance (as the authors say, up to 6 seconds before withdrawal) also related to how long the animal has been engaged with the threat?

      We agree that the timing of head withdrawal, particularly in AW trials, is a critical factor in describing the rat's strategy toward the task. To test whether the rat uses a precise timing strategy—for instance, leaving several seconds before the attack or exploiting the discrete 3- and 6-second attack durations—we plotted all head withdrawal timepoints during the 6-second trials. The distribution was more even, without distinguishable peaks (e.g., at the very initial period or at the 3- or 6-second mark). This indicates a lack of precise temporal strategy by the rat. We included additional data in the supplementary figure (Supplementary Figure 6) and added the following to the results section.

      “We monitored all head withdrawal timepoints to assess whether rats developed a temporal strategy to differentiate between the 3-second and 6-second attacks. We found no evidence of such a strategy, as the timings of premature head withdrawals during the 6-second attack trials were evenly distributed (see Supplementary Figure S1).”

      As depicted in the new supplementary figure, head withdrawal times during avoidance behavior vary from sub-seconds to the 3- or 6-second attack timepoints. After receiving the reviewer’s comment, we became curious whether there is a decoding accuracy difference depending on how long the animal engaged with the threat. We selected all 6-second attack and avoidance withdrawal trials and checked if correctly classified trials (AW trials classified as AW) had different head withdrawal times—perhaps shorter durations—compared to misclassified trials (AW trials classified as EW). As shown in Author response image 3 below, there was no significant difference between these two types, indicating that the latency of head withdrawal does not affect prediction accuracy.

      Author response image 3.

      Finally, there remain some open questions. One is how much encoding strength - of either space or the decision to leave during the encounter - relates to individual differences in animal performance or behaviour, particularly because this seems so variable at baseline. A second is how stable this encoding is. The authors mention that the distance encoding must be stable to an extent for their regressor to work; I am curious whether this stability is also found during the encounter coding, and also whether it is stable across experience. For example, in a session when an individual has a high proportion of anticipatory withdrawals, is the proportion of Type 2 neurons higher?

      Thank you for these questions. To recap the number of animals that we used, we used five rats during Lobsterbot experiments, and three rats for control experiment that we removed Lobsterbot after training. Indeed, there were individual differences in performance (i.e. avoidance success rate), number of recorded units (related to the recording quality), and baseline behaviors. To clarify these differences, see author response image 4 below.

      Author response image 4.

      We used a GLM to measure how much of the decoder’s accuracy was explained by individual differences. The result showed that 38.96% of distance regressor’s performance, and 12.14% of the event classifier’s performance was explained by the individual difference. Since recording quality was highly dependent on the animals, the high subject variability detected in the distance regression might be attributed to the number of recorded cells. Rat00 which had the lowest average mean absolute error had the highest number of recorded cells at average of 18. Compared to the distance regression, there was less subject variability in event classification. Indeed, the GLM results showed that the variability explained by the number of cells was only 0.62% in event classification.

      The reason we mentioned that "distance encoding must be stable for our regressor to work" is entirely based on the population-level analysis. Because we used neural data and behaviors from entire trials within a session, the regressor or classifier would have low accuracy if encoding dynamics changed within the session. In other words, if the way neurons encode avoidance/escape predictive patterns changed within a training set, the classifier would fail to generate an optimized separation function that works well across all datasets.

      To further investigate whether changes in experience affect event classification results over time, we plotted an additional graph below. Although there are individual and daily fluctuations in decoding accuracy, there was no observable trend throughout the experiments.

      Author response image 5.

      Regarding the correlation between the ratio of avoidance withdrawal and the proportion of Type 2 neurons, we were also curious and analyzed the data. Across 40 sessions, the correlation was -0.0716. For Type 1 neurons, it was slightly higher at 0.1459. We believe this indicates no significant relationship between the two variables.

      Minor points:

      I struggled with the overuse of acronyms in the paper. Some might be helpful but F-zone/N-zone, for example, or HE/HW, AW/EW are a bit of a struggle. After reading the paper a few times I learned them but a naive reader might need to often refer back to when they were first defined (as I frequently had to).

      To increase readability, we removed acronyms that are not often used and changed HE/HW to head-entry/head-withdrawal.

      I have a few questions about Figure 1F: in the text (line 150) it says that 'surgery was performed after three L sessions when the rats displayed a range of 30% to 60% AW'. This doesn't seem consistent with what is plotted, which shows greater variability in the proportion of AW behaviours both before and after surgery. It also appears that several rats only experienced two days of the L1 phase; please make clear if so. And finally, what is the line at 50% indicating? Neither the text nor the legend discuss any sort of thresholding at 50%. Instead, it would be best to make the distinction between pre- and post-surgery behaviour visually clearer.

      Thank you for pointing out this issue. We acknowledge there was an error in the text description. As noted in the Methods section, we proceeded with surgery after three Lobsterbot sessions. We have removed the incorrect part from the Results section and revised the Methods section for clarity.

      “After three days of Lobsterbot sessions, the rats underwent microdrive implant surgery, and recording data were collected from subsequent sessions, either Lobsterbot or shuttling sessions, depending on the experiment. For all post-surgery sessions, those with fewer than 20 approaches in 30 minutes were excluded from further analysis.”

      Among the five rats, Rat2 and Rat3 did not approach the robot during the entire Lob2 session, which is why these two rats do not have Lob2 data points. We updated the caption for regarding issue.

      Initially, we added a 50% reference line, but we agree it is unnecessary as we do not discuss this reference. We have updated the figure to include the surgery point, as shown in Supplementary Figure 1.

      Fig. 2C: each dot is an ensemble of simultaneously recorded neurons, i.e. a subset of the total 800-odd units if I understand correctly. How many ensembles does each rat contribute? Similarly, is this evenly distributed across PL and IL?

      Yes, each dot represents a single session, with a total of 40 sessions. Five rats contributed 11, 9, 8, 7, and 5 sessions, respectively. Although each rat initially had more than 10 sessions, we discarded some sessions with a low unit count (fewer than 10 sessions; as detailed in Materials and Methods - Data Collection). We collected 25 sessions from the PL and 15 sessions from the IL. Our goal was to collect more than 200 units per each region.

      Please show individual data points for Fig. 2D.

      We update the figure with individual data points.

      Is there a reason why the section on removing the Lobsterbot (lines 200 - 215) does not have associated MAE plots? Particularly the critical comparison between Lob-Exp and Ctl-Exp.

      We intentionally removed some graphs to create a more compact figure, but we appreciate your suggestion and have included the graph in Figure 2.

      Some references to supplementary materials are not working, e.g. line 333.

      Our submitted version of manuscript had reference error. For the current version, we used plane text, and the references are fixed.

      The legend for Supp. Fig. 2B is incorrect.

      We greatly appreciate this point. We changed the caption to match the figure.

      Reviewer 3 (Public Review):

      Thank you for recognizing our efforts in designing an ethologically relevant foraging task to uncover the multiple roles of the mPFC. While we acknowledge certain limitations in our methodology—particularly that we only observed correlations between neural activity and behavior without direct manipulation—we have conducted additional analyses to further strengthen our findings.

      Weakness:

      The primary concern with this study is the absence of direct evidence regarding the role of the mPFC in the foraging behavior of the rats. The ability to predict heterogeneous variables from the population activity of a specific brain area does not necessarily imply that this brain area is computing or using this information. In light of recent reports revealing the distributed nature of neural coding, conducting direct causal experiments would be essential to draw conclusions about the role of the mPFC in spatial encoding and/or threat evaluation. Alternatively, a comparison with the activity from a different brain region could provide valuable insights (or at the very least, a comparison between PL and IL within the mPFC).

      Thank you for the comment. Indeed, the fundamental limitation of the recording study is that it is only correlational, and any causal relationship between neural activity and behavioral indices is only speculative. We made it clearer in the revision and refrained from expressing any speculative ideas suggesting causality throughout the revision. While we did not provide direct evidence that the mPFC is computing or utilizing spatial/foraging information, we based our assertion on previous studies that have directly demonstrated the mPFC's role in complex decision-making tasks (Martin-Fernandez et al., 2023; Orsini et al., 2018; Zeeb et al., 2015) and in certain types of spatial tasks (De Bruin et al., 1994; Sapiurka et al., 2016) . We would like to emphasize that, to the best of our knowledge, there was no previous study which investigated the mPFC function while animal is solving multiple heterogenous problems in semi-naturalistic environment. Therefore, although our recording study only provides speculative causal inference, it certainly provides a foundation for investigating the mPFC function. Future study employing more sophisticated, cell-type specific manipulations would confirm the hypotheses from the current study.

      One of the key questions of this manuscript is how multiple pieces of information are represented in the recorded population of neurons. Most of the studies mentioned above use highly structured experimental designs, which allow researchers to study only one function of the mPFC. In the current study, the semi-naturalistic environment allows rats to freely switch between multiple behavioral sets, and our decoding analysis quantitatively assesses the extent to which spatial/foraging information is embedded during these sets. Our goal is to demonstrate that two different task hyperspaces are co-expressed in the same region and that the degree of this expression varies according to the rat’s current behavior (See Figure 8(b) in the revised manuscript).

      Alternatively, we added multiple analyses. First, we included a single unit-level analysis looking at the place cell-like property to contrast with the ensemble decoding. Most neurons did not show well-defined place fields although there were some indications for place cell-like property. For example, some neurons displayed fragmented place fields or unusually large place fields only at particular spots in the arena (mostly around the gates). The accuracy from this place information at the single-neuron level is much lower than that acquired from population decoding. Likewise, although there were neurons with modulated firing around the time of particular behavior (head entry and withdrawal), overall prediction accuracy of avoidance decision was much higher when the ensemble-based classifier was applied.

      Moreover, given that high-dimensional movement has been shown to be reflected in the neural activity across the entire dorsal cortex, more thorough comparisons between the neural encoding of task variables and movement would help rule out the possibility that the heterogeneous encoding observed in the mPFC is merely a reflection of the rats' movements in different behavioral modes.

      Thanks for the comment. We acknowledge that the neural activity may reflect various movement components across different zones in the arena. We performed several analyses to test this idea. First, we want to recap our run-and-stop event analysis may provide an insight regarding whether the mPFC neurons are encoding locations despite the significant motor events. The rats typically move across the F-zone fairly routinely and swiftly (as if they are “running”) to reach the E-zone at which they reduce the moving speed to almost a halt (“stopping”). The PETHs around these critical motor events, however, did not show any significant modulation of neural activity indicating that most neurons we recorded from mPFC did not respond to movement.

      We added this analysis to demonstrate that these sudden stops did not evoke the characteristic activation of Type 1 and Type 2 neurons observed during head entry into the E-zone. When we isolated these sudden stops outside the E-zone, we did not observe this neural signature (Supplementary Figure 2).

      Second, our PCA results showed that population activity in the E-zone during dynamic foraging behavior was distinct from the activity observed in the N- and F-zones during navigation. However, there is a possibility that the two behaviorally significant events—entry into the E-zone and voluntary or sudden exit—might be driving the differences observed in the PCA results. To account for this, we designated ±1 second from head entry and head withdrawal as "critical event times," excluded the corresponding neural data, and reanalyzed the data. This method removed neural activity associated with sudden movements in specific zones. Despite this exclusion, the PCA still revealed distinct population activity in the E-zone, different from the other zones (Supplementary Figure 4). This result reduces the likelihood that the observed heterogeneous neural activity is merely a reflection of zone-specific movements.

      Lastly, the main claim of the paper is that the mPFC population switches between different functional modes depending on the context. However, no dynamic analysis or switching model has been employed to directly support this hypothesis.

      Thank you for this comment. Since we did not conduct a manipulation experiment, there is a clear limitation in uncovering how switching occurs between the two task contexts. To make the most of our population recording data, we added an additional results section that examines how individual neurons contribute to both the distance regressor and the event classifier. Our findings support the idea that distance and dynamic foraging information are distributed across neurons, with no distinct subpopulations dedicated to each context. This suggests that mPFC neurons adjust their coding schemes based on the current task context, aligning with Duncan’s (2001) adaptive coding model, which posits that mPFC neurons adapt their coding to meet the task's current demands.

      Reviewer 3 (Recommendations):

      The evidence for spatial encoding is relatively weak. In the F-zone (50 x 48 cm), the average error was approximately 17 cm, constituting about a third of the box's width and likely not significantly smaller than the size of a rat's body. The errors in the shuffled data are also not substantially greater than those in the original data. An essential test indicates that spatial decoding accuracy decreases when the Losterbot is removed. However, assessing the validity of the results is difficult in the current state. There is no figure illustrating the results, and no statistics are provided regarding the test for matching the number of neurons.

      We acknowledge that the average error (~ 17 cm ) measured in our study is relatively large, even though the error is significantly smaller than that by the shuffled control model (22.6 cm). Previous studies reported smaller prediction errors but in different experimental conditions: 16 cm in Kaefer et al. (2020) and less than 10 cm in Ma et al. (2023) and Mashhoori et al. (2018). Most notably, the average number of units used in our study (15.8 units per session) is significantly smaller compared to the previous works, which used 63, 49, and 40 units, respectively. As our GLM results demonstrated, the number of recorded cells significantly influenced decoding accuracy (β = -0.43 cm/neuron). With a similar number of recorded cells, we would have achieved comparable decoding accuracy. In addition, unlike other studies that have employed a dedicated maze such as the virtual track or the 8-shaped maze, we exposed rats to a semi-naturalistic environment where they exhibited a variety of behaviors beyond simple navigation. As argued throughout the manuscript, we believe that the spatial information represented in the mPFC is susceptible to disruption when the animal engages in other activities. A similar phenomenon was reported by Mashhoori et al. (2018), where the decoder, which typically showed a median error of less than 10 cm, exhibited a much higher error—nearly 100 cm—near the feeder location.

      As for the reviewer’s request for comparing spatial decoding without the Lobsterbot, we added a new figure to illustrate the spatial decoding results, including statistical details. We also applied a Generalized Linear Model to regress out the effect of the number of recorded neurons and statistically assess the impact of Lobsterbot removal. This adjustment directly addresses the reviewer's request for a clearer presentation of the results and helps contextualize the decoding performance in relation to the number of recorded neurons.

      As indicated in the public review, drawing conclusions about the role of the mPFC in navigation and avoidance behavior during the foraging task is challenging due to the exclusively correlational nature of the results. The accuracy in AW/EW discrimination increases a few seconds before the response, implying that changes in mPFC activity precede the avoidance/escape response. However, one must question whether this truly reflects the case. Could this phenomenon be attributed to rats modifying their "micro-behavior" (as evidenced by changes in movement observed in the video) before executing the escape response, and subsequently influencing mPFC activity?

      We appreciate the reviewer's thoughtful observation regarding the correlational nature of our results and the potential influence of pre-escape micro-behaviors on mPFC activity. We acknowledge that the increased accuracy in AW/EW discrimination preceding the response could also be correlated with micro-behaviors. However, there is very little room for extraneous behavior other than licking the sucrose delivery port within the E-zone, as the rats are highly trained to perform this stereotypical behavior. To support this, we measured the time delays between licking events (inter-lick intervals). The results show a sharp distribution, with 95% of the intervals falling within a quarter second, indicating that the rats were stable in the E-zone, consistently licking without altering their posture.

      To complement the data presented in Author response image 2, a video clip showing a rat engaged in licking behavior was included. We carefully designed the robot compartment and adjusted the distance between the Lobsterbot and the sucrose port to ensure that rats could exhibit only limited behaviors inside the E-zone. The video confirms that no significant micro-behaviors were observed during the rat’s activity in the E-zone.

      If mPFC activity indeed switches mode, the results do not clearly indicate whether individual cells are specifically dedicated to spatial representation and avoidance or if they adapt their function based on the current goal. Figure 7, presented as a schematic illustration, suggests the latter option. However, the proportion of cells in the HE and HW categories that also encode spatial location has not been demonstrated. It has also not been shown how the switch is manifested at the level of the population.

      Thank you for this comment. As the reviewer pointed out, we suggest that mPFC neurons do not diverge based on their functions, but rather adapt their roles according to the current goal. To support this assertion, we added an additional results section that calculates the feature importance of decoders. This analysis allows us to quantitatively measure each neuron’s contribution to both the distance regressor and the event decoder. Our results indicate that distance and defensive behavior are not encoded by a small subset of neurons; instead, the information is distributed across the population. Shuffling the neural data of a single neuron resulted in a median increase in decoding error of 0.73 cm for the distance regressor and 0.01% for the event decoder, demonstrating that the decoders do not rely on a specific subset of neurons that exclusively encode spatial and/or defensive behavior

      Although we found supporting evidence that mPFC neurons encode two different types of information depending on the current context, we acknowledge that we could not go further in answering how this switch is manifested. One simple explanation is that the function is driven by current contextual information and goals—in other words, a bottom-up mechanism. However, in our control experiment, simplifying the navigation task worsened the encoding of spatial information in the mPFC. Therefore, we speculate that an external or internal arbitrator circuit determines what information to encode. A precise temporal analysis of the timepoint when the switch occurs in more controlled experiments might answer these questions. We have added this discussion to the discussion section.

      PL and IL are two distinct regions; however, there is no comparison between the two areas regarding their functional properties or the representations of the cells. Are the proportions of cell categories (HE vs HW or HE1 vs HE2, spatial encoding vs no spatial encoding) different in IL and PL? Are areas differentially active during the different behaviors?

      Thank you for bringing up this issue. As mentioned in our response to the public review, we included a comparison between the PL and IL regions. While we did not observe any differences in spatial encoding (feature importance scores), the only distinction was in the proportion of Type 1 and Type 2 neurons, as the reviewer suggested. We have incorporated our interpretation of these results into the discussion section.

      The results and interpretations of the cluster analysis appear to be highly dependent on the parameters used to define a cluster. For example, the HE2 category includes cells with activity that precedes events and gradually decreases afterward, as well as cells with activity that only follows the events.

      We strongly agree that dependency on hyperparameters is a crucial point when using unsupervised clustering methods. To eliminate any subjective criteria in defining clusters, we carefully selected our clustering approach, which requires only two hyperparameters: the number of initial clusters (set to 8) and the minimum number of cells required to be considered a valid cluster (cutoff limit, 50). The rationale behind these choices was: 1) a higher number of initial clusters would fail to generalize neural activity, 2) clusters with fewer than 50 neurons would be difficult to analyze, and 3) to prevent the separation of clusters that show noisy responses to the event.

      Author response table 2 shows the differences in the number of cell clusters when we varied these two parameters. As demonstrated, changing these two variables does result in different numbers of clusters. However, when we plotted each cluster type’s activity around head entry (HE) and head withdrawal (HW), an increased number of clusters resulted in the addition of small subsets with low variation in activity around the event, without affecting the general activity patterns of the major clusters.

      The example mentioned by the reviewer—possible separation of HE2—appears when using a hyperparameter set those results in 4 clusters, not 3. In this result, 83 units, which were labeled as HE2 in the 3-cluster hyperparameter set, form a new group, HE3 (Group 3). This group of units shows increased activity after head entry and exhibited characteristics similar to HE2, with most of the units classified as HW2, maintaining high activity until head withdrawal. Among the 83 HE3 units, 36 were further classified as HW2, 44 as non-significant, and 3 as HW1. Therefore, we believe this does not affect our analysis, as we observed the separation of two major groups, Type 1 (HE1-HW1) and Type 2 (HE2-HW2), and focused our analysis on these groups afterward.

      Despite this validation, there remains a strong possibility that our method might not fully capture small yet significant subpopulations of mPFC units. As a result, we have included a sentence in the methods section addressing the rationale and stability of our approach.

      “(Materials and Methods) To compensate for the limited number of neurons recorded per session, the hyperparameter set was chosen to generalize their activity and categorize them into major types, allowing us to focus on neurons that appeared across multiple recording sessions. Although changes in the hyperparameter sets resulted in different numbers of clusters, the major activity types remained consistent (Supplementary Figure S8). However, there is a chance that this method may not differentiate smaller subsets of neurons, particularly those with fewer than 50 recorded neurons.”

      Author response table 2.

      Minor points:

      Line 333: Error! Reference source not found. This was probably the place for citing Figure S2?

      Lines 339, 343: Error! Reference source not found.

      Thank you for mentioning these comments. In the new version, all reference functions from Word have been replaced with plain text.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This is a very well written and performed study describing a TOPBP1 separation of function mutation, resulting in defective MSCI maintenance but normal sex body formation. The phenotype differs from that of a previous TOPBP1 null allele, in which both MSCI and sex body formation were defective. Additional defects in CHK phosphorylation and SETX localization are also described.

      Strengths:

      The study is very rigorous, with a remarkably large number of MSCI marks assayed, phosphoproteomics (leading to the interesting SETX discovery) and 10X RNAseq, allowing the MSCI phenotype to be further deconvolved. The approaches in most cases are robust.

      Weaknesses:

      There aren't many; please find list below:

      1) The authors are committed to the idea that maintenance of MSCI is the major defect here. However, based on the data, an alternative would be that some cells achieve sex body formation and MSCI normally, while others do not. It would only take a small percentage of cells exhibiting MSCI failure to kill all the cells in the same germinal epithelium, so this could still explain the complete pachytene block. This isn't a major point...this phenotype is clearly different to the TOPBP1 KO, but a broader discussion of possibilities in the discussion would help. I raise this in the context of both the cytology and 10X analysis:

      a) The assessment that sex body formation is normal is based on cytology in Supp 8 and 9, but a more rigorous approach would be to assess condensation of the XY pair in stage-matched spread cells (maybe they have that data already) by measuring distances between the X and Y centromere, or looking at stage IV of the seminiferous cycle, where all cells should have oval sex bodies but sex body mutants have persistent elongated XY pairs (see work of Namekawa and Turner). The authors do actually mention that gH2AX spreading is defective in many cells....and if this is true, condensation to form a sex body would almost certainly not have taken place in those cells.

      We appreciate the reviewer’s comment and have performed the experiment suggested, counting the number of elongated sex bodies in all sex body-positive cells in seminiferous tubules stained with γH2AX and DAPI (as done by Turner in Hirota et al., 2018). The experiment did not show significant differences between Topbp1+/+ and Topbp1B5/B5 as shown in Author response image 1.

      Author response image 1.

      Topbp1B5/B5 displays normal condensation of the XY-pair. A) Immunostaining of XY condensation in Topbp1+/+ and Topbp1B5/B5 testes sections (γH2AX: green and DAPI: gray). B) Quantification of all sex body-positive cells per tubule (Topbp1+/+ number of cells counted = 781, number of tubules counted = 28, number of mice = 3; Topbp1B5/B5 number of cells counted = 967, number of tubules counted = 28, number of mice = 3). C) Quantification of elongated-sex body cells per tubule (Topbp1+/+ number of cells counted = 19 and 762 normal round/oval-sex bodies cells, number of tubules counted = 28, number of mice = 3; Topbp1B5/B5 number of cells counted = 45 and 922 normal round/oval-sex bodies cells, number of tubules counted = 28, number of mice = 3).

      b) Regarding the 10X data, the finding that expression of some XY genes is elevated and others are not is also consistent with a "partial" phenotype (some cells have normal XY bodies and MSCI, others fail in both). In Fig 6E, X expression looks to be elevated in B5 vs wt at all stages...if this were a maintenance issue, shouldn't it be equal to that in wt and then elevate later?

      We understand the point raised by the reviewer, however we do not favor the “partial” phenotype model because of the absence of any post-pachytene spermatocytes in the B5 mutant. If some cells had escaped the MSCI defect, we would expect to detect cells progressing further in meiosis. Because we cannot rule out completely the possibility of a subtle disruption in XY silencing initiation, we decided to better emphasize this point in the discussion (lines 391-394).

      In Figure 6E, the X-linked genes were normalized against chromosome 9-linked genes. The normalization against pre-leptotene was done for the results displayed on Figure 7, in which we demonstrate the maintenance issue. Furthermore, for the 10X analysis, while the same number of cells were loaded for wild-type and mutant, the composition of cells varied between these two samples. Despite the fact that very few “spermatocyte 3” cells were detected in the mutant, those cells displayed much higher X-linked gene expression than the wild-type spermatocyte 3 cells.

      2) How is the quantitation showing impaired localization of select markers (e.g. SETX) normalized? How do we know that the antibody staining simply didn't work as well on the mutant slides?

      The quantification showing impaired localization of the selected markers such as SETX was done as described by Sims, et al. 2022 and Adams, et al. 2018. In brief, the green signal was measured along (XY cores) or across (XY DNA loops) the X and Y chromosomes and normalized against the analogous signal on the autosomal chromosomes. The possibility that the antibody simply did not work as well on the mutant is unlikely since multiple biological replicates were performed and we reproducibly followed standard practices in the field for meiotic spreads staining, imaging, and quantification. We also note that our findings published in Sims et al, 2022 show that ATR inhibition strongly impairs SETX localization to the sex body, further substantiating our claim that signaling via ATR-TOPBP1 controls SETX.

      3) Is testis TOPBP1 protein expression reduced in the B5 mutant?

      TOPBP1 protein abundance in the B5 mutant is reduced in lysates from whole testis, measured via western blot. We did not detect a significant reduction in TOPBP1 signal intensity measured by immunofluorescence in pachytene spreads of the B5 mutant.

      4) 10X analysis: how were the genes on the y-axis in Supp 24 arranged? Is this by location on the X chromosome?

      These genes were sorted by location across the chromosome X.

      5) The final analyses in Fig 7: X-genes are subdivided based on their behavior (up, down, unchanged). What isn't clear to me is whether the authors have considered the fact that there are global changes in gene expression during meiosis (very low in lep , zyg and early pach, then ramps up hugely from mid pach). In other words, is this normalized to autosomal gene expression?

      For the final analysis in Fig7, the normalization was done by their expression at the pre-leptotene stage. Moreover, the analysis was made comparing X-linked gene behavior in Wild-type vs B5 mutant.

      6) Again regarding the 10X analysis, my prediction would be that not ALL X and Y gene would increase in pach if MSCI were ablated...we should remember that XY genes have been subject to MSCI for some 160 million years of evolution, and this will mean that many enhancers that originally drove their expression prior to the evolution of MSCI will now be lost. This has been our experience: many XY genes aren't elevated at pach even in mutants in which MSCI is totally defective. I'd urge the authors to consider this possibility when they use XY gene expression patterns to diagnose the severity or timing of the MSCI phenotype. This could be a discussion point.

      We greatly appreciate the reviewer’s suggestion and have added discussion about this point to lines 392400).

      Reviewer #2 (Public Review):

      Summary:

      This paper described the role of BRCT repeat 5 in TOPBP1, a DNA damage response protein, in the maintenance of meiotic sex chromosome inactivation (MSCI). By analyzing a Topbp1 mutant mouse with amino acid substitutions in BRCT repeat 5, the authors found reduced phosphorylation of a DNA/RNA helicase, Sentaxin, and decreased localization of the protein to the X-Y sex body in pachynema. Moreover, the authors also found decreased repression of several genes on the sex chromosomes in the male mice.

      Strengths:

      The works including phospho-proteomics and single-cell RNA sequencing with lots of data have been done with great care and most of the results are convincing.

      Weaknesses:

      One concern is that, although the Topbp1 mutant spermatocytes show very severe defects after the stage of late pachynema, the defect in the gene silencing in the sex body is relatively weak. It is a bit difficult to explain how such a weak mis regulation of the gene silencing in mice causes the complete loss of cells in the late stage of spermatogenesis.

      We appreciate the reviewer’s comment. We note that even subtle mis-regulation of XY gene silencing has been reported to lead to significant loss of cells in late stage of prophase I (Ichijima et al., 2011; Modzelewski et al., 2012). Moreover, it is possible that some cells with drastic changes in X-gene expression were excluded from the downstream analysis due to high levels of mitochondrial gene expression (cells that were likely dying due to apoptosis). The exclusion of cells with high levels of mitochondrial gene expression is a common practice in downstream analysis of sc-RNA sequencing data.

      Reviewer #3 (Public Review):

      The work presented by Ascencao and coworkers aims to deepen into the process of sex chromosome inactivation during meiosis (MSCI) as a critical factor in the regulation of meiosis progression in male mammals. For this purpose, they have generated a transgenic mouse model in which a specific domain of TOPBP1 protein has been mutated, hampering the binding of a number of protein partners and interfering with the regulatory cascade initiated by ATR. Through the use of immunolocalization of an impressive number of markers of MSCI, phosphoproteomics and single cell RNA sequencing (scRNAseq), the authors are able to show that despite a proper morphological formation of the sex body and the incorporation of most canonical MSCI makers, sex chromosome-liked genes are reactivated at some point during pachytene and this triggers meiosis progression breakdown, likely due to a defective phosphorylation of the helicase SETX.

      The manuscript presents a clear advance in the understanding of MSCI and meiosis progression with two main strengths. First, the generation of a mouse model with a very uncommon phenotype. Second, the use of a vast methodological approach. The results are well presented and illustrated. Nevertheless, the discussion could be still a bit tuned by the inclusion of some ideas, and perhaps speculations, that have not been considered.

      We appreciate the reviewer’s comment and have improved the discussion section addressing the points raised in the “recommendation For the Authors”.

      Reviewer #1 (Recommendations For The Authors):

      I don't have any additional points here

      Reviewer #2 (Recommendations For The Authors):

      The paper by Ascencao et al. describes a separation-in-function allele of TOPBP1 critical for DNA damage response (DDR) that confers a specific defect in XY sex chromosome inactivation during male mouse meiosis. The authors constructed a Topbp1 separation-of-function mouse by introducing amino acid substitutions in BRCT repeat 5 and found the mice with normal DDR response in mitosis and meiosis show male infertility. Topbp1(B5/B5) mice do not contain spermatocytes after diplonema, as a result, little spermatids/sperms. In the mice, most of the meiotic events in prophase I including chromosome synapsis and meiotic recombination as well as the formation of the sex body are normal. The detailed proteomic analysis revealed the reduced ATR-dependent phosphorylation of a DNA/RNA helicase, Sentaxin. And also single-cell RNA sequencing found that the expression of some of genes from sex chromosomes are not silenced well compared to the control. The works with lots of data have been done with great care and most of the results are convincing. One clear concern is that, although the authors nicely showed a defect in gene silencing in sex chromosomes in the Topbp1(B5/B5) mice, how a small defect in the gene silencing leads to the complete loss of diplotene spermatocytes remains unaddressed.

      Major points:

      Although the authors showed a change in the transcriptome in spermatocytes of Topbp1(B5/B5) male mice, the authors cannot explain the complete lack of spermatids in this mouse. Even the transcriptome seems not to provide a clue.

      1) Given that the TOPBP1-B5 protein cannot bind to both 53BP1 and BLM, it is interesting to check the localization of both proteins on meiotic chromosome spreads (in the case of 53BP1, the localization in MEFs with DNA damage).

      We appreciate the reviewer’s comment. We have tried to stain BLM in meiotic spreads using several different antibodies, however we were not successful getting specific signals for BLM. In the case of 53BP1, we monitored its localization, and it was not significantly different from Topbp1-/- meiotic spreads, please refer to Supplemental Figure 11. While we appreciate the reviewer’s suggestion of looking at the localization of 53BP1 in MEFs with DNA damage, we opted not to perform the experiment because we have shown that 53BP1 can still bind the BRCT 1 and 2 domains of TOPBP1 as previously described (Bigot et al., 2019; Cescutti et al., 2010; Liu et al., 2017). Additionally, both male and female 53BP1 KO mice are fertile (Ward et al., 2003), thus the partial disruption in binding to 53BP1 that we observed in TOPBP1 B5 mutant is likely not causing the infertility phenotype.

      2) A recent preprint by Fujiwara et al. (doi: https://doi.org/10.1101/2023.04.12.536672) showed the accumulation of R-loops in spermatocyte spreads in Senataxin knockout mice. The authors may check the R-loop on the sex body in Topbp1-B5 mice.

      We thank the reviewer for the suggestion. We have tried several protocols to stain R-loops (including the protocol used in the paper mentioned above) but were not successful.

      3) The authors need to check the protein level (and band shift) of Senataxin in the testis by western blotting analysis.

      We have tried several SETX antibodies, and none worked for western blot analysis.

      4) If possible, the authors can see any protein interaction between TOPBP1 and Senataxin.

      We appreciate the suggestion, and we will investigate this interaction in future work.

      5) The authors need to check the statistics in the paper.

      (1) It is better to show actual P-values in the case of "ns".

      P-values were added to the respective figure legends.

      (2) In focus counting such as Figures 3D, G, H, 4B, D, F, H, 5E, and F (and in Supplemental Figures), please indicate how many spreads were counted in each mouse. Moreover, the distribution of focus numbers and intensity of fluorescence are not parametric (not normal distribution). It is better to use a non-parametric method such as Mann-Whitney's U test.

      We appreciate the reviewer's comment and upon consulting with a Statistician at Cornell Statistical Consulting Unit (CSCU), we were advised to use a linear mixed effect model to take into account the variability in cells within each mouse when comparing mice between groups (Topbp1+/+ vs Topbp1B5/B5). We then reanalyzed all quantified meiotic spreads using this mixed effect model, and the p-value, number of mice, and number of cells counted for each group are displayed in the respective figure legends. Upon going through all the quantified meiotic spreads, we realized a minor error in one of the previous data points related to SETX staining in Topbp1+/+ and have fixed it. Using the previous quantification data and the new stats analysis the p-value for cores was 0.5598 and p-value for loops was 0.0273. Now using the correct values and the new stats analysis the p-value for cores is 0.5987 and p-value for loops is 0.0452. The correction did not change the conclusion of this data and is now displayed in the new Figure 5. We also realized a mistake in the ATR quantification when the spreadsheet was moved from excel to Graphpad. Using the previous quantification and the new stats analysis the p-value for cores was 0.2451 and p-value for loops was 0.8933. Now using the correct values and the new stats analysis the p-value for cores is 0.4068 and p-value for loops is 0.9396. The correction did not change the conclusion of this data and is now displayed in the new Figure 4. Moreover, we realized that we used n = 8 (n = number of mice) for MDC1 quantification and n = 2 for pCHK1_S345, instead of n =3 as shown in the preprint version of the manuscript. Corrected values were added to their respective figures and figure legends.

      (3) From Figures 6E, 7B, and 7C, the authors conclude the difference in the expression profile between wild type and Topbp1(B5) spermatocytes. It is better to show P-values for the comparison. Particularly, in Figure 7C, Xiap expression kinetics look similar between wild type and the mutant.

      We have added p-values to figures 6E and 7B and their respective figures or figure legends.<br /> In figure 7C, we now recognize that the Δ could have been misleading as we meant to compare Wild-type SP2 to Wild-type SP3 and Mutant SP2 to SP3; and not comparing Wild-type SP3 to Mutant SP3. Therefore, the Δ was excluded from Figure 7C. For the comparisons between expression levels of SP2 and SP3, it is challenging to calculate p-values for a single gene since these cells have started X-gene silencing and expression values are very low. Meaningful p-values for the comparisons between Wildtype SP3 to Mutant SP3 can be visualized in Figure 7B, where the comparison is based on number of genes instead of expression levels of each gene.

      Minor comments:

      1) Line 34: SPO11 is NOT a nuclease. Just delete it.

      It has been deleted (see line 34).

      2) Line 71, a protein: Is this protein ATR? Is so, please write it. If not, please give the name of the protein.

      In line 71 (now lines 79-80), we refer to TOPBP1-interacting proteins in general since many of these interactions happen through a phosphorylation in the TOPBP1’s interactor. This is the case for BLM, 53BP1, FANCJ, and RAD9. ATR interacts with TOPBP1 through TOPBP1’s AAD domain and this is not a phospho-mediated interaction. We restructured the sentence for clarity.

      3) In the Introduction, the authors often refer to a review by Cimprich and Cortez (2008) in various places. It is better to cite an original paper or the other an appropriate review.

      We have accepted the reviewer’s suggestion and added original papers when appropriate.

      4) Line 143-145: The authors generated eight charge reversal point mutations in the BRCT domain 5 of TOPBP1. If possible, it is helpful to mention the logic to generate these substitutions and also why BRCT domain 5, is not other domains.

      We generated eight charge reversal point mutations to abrogate all possible phospho-dependent interactions and avoid potential residual interactions. We have mutated other BRCT domains as well, which will be published separately.

      5) Line 174 (and Figure 2E): RPA should be either RPA2 or RPA32.

      Corrected (it is RPA2).

      6) Figure 5C-F: Please explain in more detail how the authors quantified the SETX signals. Why the two results are different?

      The quantification was done as described by Sims, et al. 2022, yielding separate data for XY cores and DNA loops. In brief, the green signal was measured along (XY cores) or across (XY DNA loops) the X and Y chromosomes. Signals were normalized by the signal in the autosomal chromosomes.

      Reviewer #3 (Recommendations For The Authors):

      I have no major criticisms, but I include a list of comments and suggestions (some of them conceptual, and disputable) that could help the authors to improve some parts of the manuscript.

      1) Line 52: I realize that the term protein "sequestration" (used in many instances along the manuscript) has been widespread in the literature related to MSCI in the last years. While this might be a cool way to describe the dynamics of proteins accumulating in the sex body, this reviewer considers this term is totally inappropriate. It is confusing and introduces at least to mistakes to the fact of protein accumulation in the sex body. First, it seems to indicate that once trapped in the sex body, proteins are incapable of leaving it, which might be completely wrong (histone replacement refutes this idea). Second, it is suggested that DDR proteins are attracted by the sex body and cannot remain associated to autosomes even if DNA repair has not been completed. This has also been demonstrated to be incorrect (see for example PDMI 19714216). Moreover, DDR proteins can associate de novo to chromosomes if needed, for instance upon DNA damage caused by chemicals or irradiation. Thus, I suggest that the use of "sequestration" should be evaluated more critically, evaluating the misleading ideas that are subjacent to this term. The use of protein "accumulation" is much more objective and descriptive of the real facts.

      We thank the reviewer’s suggestion and have addressed it in lines 52, 97 and 324.

      2) Line 88: Just as a deference to the original ideas, it would be nice to acknowledge that the inactivation of sex chromosomes and the formation of a sex body in mouse meiosis was described more than 50 years ago (PDMI 5833946; 4854664). Likewise, the ideas about the sequential achievement and reinforcement of MSCI during pachytene have been developed during the last 20 years, far before the recent reports cited in the manuscript. Citations to these "old fashion" works would be great.

      We appreciate the reviewer’s suggestion and have addressed it in line 86.

      3) Line 90. Please, take into consideration that such a strong effect on meiosis progression occurs mainly in some knockout mice models and that in many other models (including hybrid mice models from natural populations) autosomal regions can remain unsynapsed and accumulate DDR proteins without impairing meiosis. In other mammalian species, meiosis is even more permissive to these MSUC phenomena.

      We appreciate the reviewer’s suggestion and have addressed it at line 88.

      4) Line 211: The differences in the abundance of MLH1 and MLH3 are remarkable. If these two proteins are supposed to form a heterodimer leading to crossover formation, then the increase of only MLH1 might be related to a different process, not leading to crossover (even not class II ones).

      We agree with the reviewer’s comment and have included this point in the discussion (lines 491- 497).

      5) Line 217: I have some doubts about the results presented in Supplementary Figure 9. First, it is not clear to me how the represented cells counts were performed. Each spot is supposed to represent cell counts in a single individual, but how many cells were counted per individual? The proportion of cells could be a better indicator. Second, some B5/B5 individuals' counts were close to the ones displayed in the wild type. Did mutant animals show a high divergence compared to each other? It could be great to have each individual data displayed in a pie chart, and not only the aggregated data.

      We have now addressed this in the new Supplemental figure 9 legend. Each dot in the graph represents the sum of cells counted for each individual. We counted cells from 8 mice for each, Topbp1+/+ and Topbp1B5/B5.

      Here we summarize the total cells counted per individual:

      Author response table 1.

      6) Line 222: The data on 53BP1 deserve further attention. On the one side, from the analysis presented in Supplementary Figure 11, it seems that 53BP1 tends to show a lower intensity in Topbp1B5/B5 mice. Since only 2 mice were analyzed, while for most of the other proteins 3-8 animals were studied, I suggest increasing the number of animals analyzed for 53BP1 localization, to test if this slight difference turns significant. This is relevant since: 1) the association of 53BP1 protein in somatic cells was clearly affected, and 2) 53BP1 is one of the last MSCI markers incorporated to the sex body at mid-late pachytene. These results should be moved to the main text and not appear as supplementary data. On the other hand, if no differences were to be found in meiosis, compared to somatic cells, how do authors explain these differences? Would 53BP1 have another partner at the sex body apart from TOPBP1? Could TOPBP1 have other BRCT domains (apart from domain 5) able to bind 53BP1?

      We appreciate the reviewer’s suggestion; however, we had an issue with 53BP1 antibody. We analyzed 2 mice and needed to re-order the antibody. This antibody was backordered for almost one year, and when we finally received the order, the company had changed the clone for this antibody, and it no longer worked for meiotic spreads. In somatic cells, we see in HEK-293T a partial disruption in the binding to TOPBP1 B5 through IP-MS and IP-Western blot. The disruption is only partial due to the binding of 53BP1 to other domains in TOPBP1 such as BRCT 1 and 2 (Bigot et al., 2019; Cescutti et al., 2010; Liu et al., 2017). However, in assays in which we would expect a phenotypic response caused by impaired 53BP1, we did not see any effect, such as survival after IR (using the mice) and survival after phleomycin challenge (using Mefs). Moreover, 53BP1 KO mice, males and females, are fertile (Ward et al., 2003) so, the partial disruption in binding to 53BP1 that we observed in TOPBP1 B5 mutant is likely not causing the infertility phenotype.

      7) Line 250: I do not understand what is represented in Figure 5A. Why did the author mix two different experiments (differences in phosphoprotein abundance in B5/B5 compared to wild type and the interference of ATR with AZ20)?

      To account for the differences in cell population observed in the whole testis between Topbp1+/+ and Topbp1B5/B5, and to know exactly which phosphorylation changes were due to disruption in the ATR signaling and not pleiotropic effects, we combined two different phosphoproteomes: One phosphoproteome from the comparison between Topbp1+/+ and Topbp1B5/B5 and another one from the comparison between Vehicle or ATR inhibitor-treated mice. By utilizing this approach, we only consider hits that were disrupted in both analyses. A similar method was used by Sims et.al, 2022 (Sims et al., 2022).

      8) It is not clearly explained what is represented in Figure 6B. There is no explanation in the text or the figure legend. Do this represent the difference between scRNAseq in control and Topbp1B5/B5? If so, please, clarify.

      We thank the reviewer’s comment and have addressed it in the legend of Figure 6B.

      9) Line 342 and following. The authors describe a decrease of gene silencing. The use of two negative concepts is always confusing and results in the conversion to a positive one. I suggest considering the possibility of just talking about increase of gene expression, in order to make the message clearer.

      We appreciate the reviewer’s point here, but it is important to note that the phenomenon disrupted in our mutants is MSCI, which is by definition a gene silencing mechanism. This phenotype is not as simple as “increased gene expression”, it is the removal of a mechanism that is a key feature of prophase I. Thus, because we are focusing on the mechanism of MSCI, it is crucial to maintain this (albeit unusual) terminology.

      10) As for the classification of spermatocytes into 9 categories, I am curious about which spermatocytes are included in each of these categories. For instance, from cytology it seems that in Topbp1B5/B5 mice, spermatocytes are able to reach mid-late pachytene. However, in the spermatocyte categories established by scRNAseq they only reach class 3. Therefore, which are the populations included in the remaining 6 classes of spermatocytes? Do authors have any morphological correlation to these scRNAseq categories? Is it possible that in this mutant morphological advance of meiosis and gene expression profiles are uncoupled?

      The clustering of cells to a specific group is based on RNA expression, which does not always match cytological features. Moreover, during the analysis, cells with high expression of mitochondrial genes are excluded (these are dying cells that do not pass the quality control). Thus, while Topbp1B5/B5 reaches a mid-late-pachytene stage according to cytological analyses, in the single-cell RNA seq analysis we could only detect one pachytene stage. The other 6 remaining categories of spermatocytes can be classified according to their best-fit profile of gene expression. For that, we use the classification described by Chen et al., 2018 and Lau et al.,2020. Spermatocytes 3-5 = Pachytene, Spermatocytes 6-7 = Diplotene, Spermatocytes 8-9 = secondary spermatocytes (metaphase I/II). The gene markers used for this classification are displayed in Author response image 2.

      Author response image 2.

      Genes used as markers of spermatocytes captured in the scRNAseq analysis. Violin plots display the distribution of cells expressing Gm960 (Leptotene marker), Meiob (Leptotene/Zygotene marker), Psma8 (Pachytene marker), Pwill1 (Pachytene marker), Pou5f2 (Diplotene marker), and Ccna1 (Secondary Spermatocytes marker).

      11) Figure 6E shows that overexpression of X-linked genes is not a feature of spermatocytes but it is initiated in spermatogonia. This fact has not been properly stated in the text and perhaps not sufficiently highlighted.

      We noticed subtle changes during the spermatogonia stage and have addressed the reviewer’s comment in lines 317-322, however the downstream analyses related to a defect in X-gene silencing maintenance displayed in Figure 7 were done based on normalization of gene expression to its respective pre-leptotene stage.

      12) Supplementary Figure 24 shows that some X-linked genes are more expressed in Topbp1B5/B5 compared to control mice. In the figure it can be observed that many genes accumulate at the bottom of the graph. Does this have any correlation to the location of these genes along the X chromosome, for instance near or within the PAR? This could correlate with the defects in γH2AX accumulation at this region.

      These are the locations along the chromosome. Only the bottom 5 rows are within the PAR region, so this accumulation is not within the PAR region specifically. The bottom tenth of the genes in the heatmap correspond to roughly a 17 Mb region.

      13) The authors only analyzed the overexpression of genes located on the X chromosome. It would be interesting to show the behavior of Y-linked genes as well.

      The coverage of Y-linked genes was not very high and that is why we have not shown the results in the paper. However, the results for Y-linked genes were similar to the X-linked genes and can be visualized in Author response image 3.

      Author response image 3.

      Single cell RNAseq reveals that Topbp1B5/B5 spermatocytes initiate MSCI but fail to promote full silencing of Y chromosome-linked genes. Violin plot displaying the ratio of the average expression of Y chromosome genes by the average expression of chromosome 9 genes at different stages of spermatogenesis for Topbp1+/+ and Topbp1B5/B5 cells.

      14) Line 425: Authors indicate that it is not known if association of TOPBP1 and BLM, 53BP1 or other proteins is disrupted in Topbp1B5/B5 spermatocytes. Could these experiments be performed in the testis, as they were in somatic cells?

      The cellular composition in Topbp1+/+ and Topbp1B5/B5 testes is very different so it would not be a fair comparison. While we have tried to isolate pachytene cells to perform these experiments, we were successful only when using Topbp1+/+ but not Topbp1B5/B5, likely due to the extremely small size of the mutant testis.

      15) Line 455 and following. I find that the discussion about the role of SETX is not completely clear. It seems that a failure of SETX function could result in defective or no transcription, as a consequence of the impossibility to resolve RNA-DNA hybrid molecules. Therefore, should impairment of SETX lead to reduced or enhanced transcription? Please clarify. On the other hand, this defect in SETX function should affect the whole genome, and not only sex chromosomes. Do authors have any clues about this broad effect?

      We thank the reviewer’s comment and have expanded on discussion in lines 470-474. While we agree with the reviewer’s point that an impairment on SETX should affect the whole genome, however, during pachytene stage, SETX is mostly localized to the sex body. The Topbp1B5/B5 shows a specific defect in X and Y silencing maintenance during pachytene stage, thus we hypothesized that an impairment in SETX localization during pachytene should especially impair the X and Y chromosomes.

      16) As a general comment to the discussion section, I think authors could extend into some specific ideas or speculations. It is shocking that sex chromosome-linked genes are able to escape silencing without dismantling the complex (almost complete) MSCI response in the Topbp1 mutant (although perhaps this is not so surprising considering the high number of escapees reported in the inactivated X chromosome in female somatic cells).

      How to explain this paradox? One possibility (which would make a real breakthrough) is that the expression of sex chromosome-linked genes represents a regulated response to meiotic defects, and not just an unfortunate consequence of a defective MSCI. Thus, MSCI might be somehow irrelevant to prevent the execution of this sex chromosome-based program to stop meiosis progression when needed. The fact that this regulated activation was never proposed is perhaps due to the fact that most of the meiosis mutants characterized so far are unable to reach the stage at which MSCI is properly established, which is the most remarkable difference with the Topbp1 mutant studied here.

      Although naïve, the critical point for the activation of this sex chromosome-based program seems to depend simply on the transcription of Zfy1 and Zfy2 (encoding for transcription factors). The signaling cascades up and downstream these genes are the real mystery, awaiting further studies.

      We thank the very interesting point raised by the reviewer. Our interpretation of the data is that X and Y silencing being a dynamic process requires an initiation step and a maintenance step driven/controlled by the DDR machinery, and that Topbp1B5/B5 shows a grossly normal initiation of X and Y silencing but fails on maintain MSCI. Moreover, the expression of Zfy1 and Zfy2 have been previously demonstrated as enough to trigger cell death (Royo et al., 2010; Vernet et al., 2016), and Topbp1B5/B5 cells show increased expression of these genes. However, we do not exclude the very interesting possibility, raised by the reviewer, that the expression of XY-linked genes represents a regulated response to meiotic defects to stop meiosis progression, leading to the cell death observed in Topbp1B5/B5, which makes the Topbp1B5/B5 an unique model for these studies as most of the previous meiosis mutants are unable to reach the stage at which MSCI is properly established. We add discussion about this exciting point in lines 513-522.

      17) Scale bars are impossible to read in Figures 1I and J, and are missing in all the other image figures. Please, correct.

      We have addressed this in the new Figure 1. For figures displaying meiotic spreads, adding a scale bar is not a common practice in the field as these cells are swollen while being prepared.

      18) Line 828. Since Paula Cohen is an author of the manuscript, it seems weird to acknowledge herself in this section.

      Corrected.

      References

      Adams SR, Maezawa S, Alavattam KG, Abe H, Sakashita A, Shroder M, Broering TJ, Sroga Rios J, Thomas MA, Lin X, Price CM, Barski A, Andreassen PR, Namekawa SH. 2018. RNF8 and SCML2 cooperate to regulate ubiquitination and H3K27 acetylation for escape gene activation on the sex chromosomes. PLoS Genet 14. doi:10.1371/journal.pgen.1007233

      Bigot N, Day M, Baldock RA, Watts FZ, Oliver AW, Pearl LH. 2019. Phosphorylation-mediated interactions with topbp1 couple 53bp1 and 9-1-1 to control the g1 DNA damage checkpoint. Elife 8:1–28.

      Cescutti R, Negrini S, Kohzaki M, Halazonetis TD. 2010. TopBP1 functions with 53BP1 in the G1 DNA damage checkpoint. EMBO J 29:3723–3732.

      Chen Y, Zheng Y, Gao Y, Lin Z, Yang S, Wang T, Wang Q, Xie N, Hua R, Liu M, Sha J, Griswold MD, Li J, Tang F, Tong M-H. 2018. Single-cell RNA-seq uncovers dynamic processes and critical regulators in mouse spermatogenesis. Cell Res 28:879–896.

      Hirota T, Blakeley P, Sangrithi MN, Mahadevaiah SK, Encheva V, Snijders AP, ElInati E, Ojarikre OA, de Rooij DG, Niakan KK, Turner JMA. 2018. SETDB1 Links the Meiotic DNA Damage Response to Sex Chromosome Silencing in Mice. Dev Cell 47:645-659.e6.

      Ichijima Y, Ichijima M, Lou Z, Nussenzweig A, Daniel Camerini-Otero R, Chen J, Andreassen PR, Namekawa SH. 2011. MDC1 directs chromosome-wide silencing of the sex chromosomes in male germ cells. Genes and Development 25:959–971.

      Lau X, Munusamy P, Ng MJ, Sangrithi M. 2020. Single-Cell RNA Sequencing of the Cynomolgus Macaque Testis Reveals Conserved Transcriptional Profiles during Mammalian Spermatogenesis. Dev Cell 54:548-566.e7.

      Liu Y, Cussiol JR, Dibitetto D, Sims JR, Twayana S, Weiss RS, Freire R, Marini F, Pellicioli A, Smolka MB. 2017. TOPBP1Dpb11 plays a conserved role in homologous recombination DNA repair through the coordinated recruitment of 53BP1Rad9. J Cell Biol 216:623–639.

      Modzelewski AJ, Holmes RJ, Hilz S, Grimson A, Cohen PE. 2012. AGO4 regulates entry into meiosis and influences silencing of sex chromosomes in the male mouse germline. Dev Cell 23:251–264. Royo H, Polikiewicz G, Mahadevaiah SK, Prosser H, Mitchell M, Bradley A, De Rooij DG, Burgoyne PS, Turner JMA. 2010. Evidence that meiotic sex chromosome inactivation is essential for male fertility. Curr Biol 20:2117–2123.

      Sims JR, Faça VM, Pereira C, Ascenção C, Comstock W, Badar J, Arroyo-Martinez GA, Freire R, Cohen PE, Weiss RS, Smolka MB. 2022. Phosphoproteomics of ATR signaling in mouse testes. Elife 11. doi:10.7554/eLife.68648

      Vernet N, Mahadevaiah SK, de Rooij DG, Burgoyne PS, Ellis PJI. 2016. Zfy genes are required for efficient meiotic sex chromosome inactivation (MSCI) in spermatocytes. Hum Mol Genet 25:5300–5310.

      Ward IM, Minn K, van Deursen J, Chen J. 2003. p53 Binding protein 53BP1 is required for DNA damage responses and tumor suppression in mice. Mol Cell Biol 23:2556–2563.

      Yeo AJ, Becherel OJ, Luff JE, Graham ME, Richard D, Lavin MF. 2015. Senataxin controls meiotic silencing through ATR activation and chromatin remodeling. Cell Discovery 1. doi:10.1038/celldisc.2015.25

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewers’ Comments:

      Reviewer #1 (Remarks to the Author):

      Summary:

      Fang Huang et al found that RBM7 deficiency promotes metastasis by coordinating MFGE8 splicing switch and NF-kB pathway in breast cancer by utilizing clinical samples as well as cell and tail vein injection models.

      Strengths:

      This study uncovers a previously uncharacterized role of MFGE8 splicing alteration in breast cancer metastasis, and provides evidence supporting RBM7 function in splicing regulation. These findings facilitate the mechanistic understanding of how splicing dysregulation contributes to metastasis in cancer, a direction that has increasingly drawn attention recently, and provides a potentially new prognostic and therapeutic target for breast cancer.

      We thank the reviewer for appreciating the novelty and importance of this study, and have provided new data to address the following concerns raised by the reviewer.

      Weaknesses:

      This study can be strengthened in several aspects by additional experiments or at least by further discussions. First, how RBM7 regulates NF-kB, and how it coordinates splicing and canonical function as a component of NEXT complex should be clarified. Second, although the roles of MFGE8 splicing isoforms in cell migration and invasion have been demonstrated in transwell and wound healing assays, it would be more convincing to explore their roles in vivo such as the tail vein injection model. Third, the clinical significance would be considerably improved, if the therapeutic value of targeting MFGE8 splicing could be demonstrated.

      We’re thankful for the constructive suggestions. A preliminary study on the mechanism by which RBM7 regulates NF-kB pathway is already underway. We found RBM7 depletion remarkably promoted the expression of IL-1β as judged by qPCR and ELISA assays (new Figure S5G- S5I, also see below). IL-1β, commonly known as a pro-inflammatory cytokine, could bind to IL-1R and initiate a multistage enzymatic reaction that triggers the activation of NF-κB pathway (Axel Weber, 2010) (Qing Guo, 2024). Thus we speculated that the upregulation of IL-1β might be a causal factor in RBM7-depletion-induced activation of NF-kB signaling. It will be interesting to determine the complete molecular mechanism in our future study. In addition, we performed a co-IP experiment and found that RBM7 could interact with RNA splicing factor SF3B2, a component of spliceosomal U2 snRNP complex (new Figure S6B, also see below). Consistent with the AS regulation of MFGE8 by RBM7, the depletion of SF3B2 also promoted exon7 skipping, implying a cooperative effect of the two proteins in regulating MFGE8 splicing (new Figure S6C-6D, also see below). This is in concert with a previous study that RRM domain of RBM7 could bind a proline-rich segment within SF3B2 (Falk, Finogenova et al., 2016). The interaction mode with strong similarity to RBM7RRM–ZCCHC8Proline interaction in the NEXT complex indicated mutually exclusive binding of SF3B2 and ZCCHC8 to RBM7. Thus, RBM7 appears to play dual, but not conflicting, roles during RNA processes depending on its interaction with the spliceosome or exosome (see line 427-437 in the new manuscript).

      Author response image 1.

      The mRNA levels of IL-1β in MDA-MB-231 or BT549 cells with stable RBM7 knockdown or control vector were examined by qRT-PCR approach.

      Author response image 2.

      Supernatants from RBM7-knockdown MDA-MB-231 or BT549 cells were collected and protein expression of IL-1β was measured by ELISA kit.

      Author response image 3.

      The knockdown efficiency of RBM7 in two breast cancer cell lines were determined by qRT-PCR approach.

      Author response image 4.

      Immunoprecipitation assay was performed in breast cancer cells expressing HA-RBM7 and Flag-SF3B2 or empty vector. The Flag-tagged precipitated complexes and lysates were analyzed through western blotting.

      Author response image 5.

      The splicing shift of MFGE8 upon SF3B2 knockdown in breast cancer cells was examined by RT-PCR approach. The mean ± SD of PSI values derived from three independent replicates is shown.

      Author response image 6.

      The SF3B2 knockdown efficiency was examined by qRT-PCR.

      To further corroborate the roles of two MFGE8 isoforms in cell invasion, we have performed Fluorescent Gelatin Degradation Assays for investigating invadopodia formation. Consistent with the transwell assay results, MFGE8-L up-regulation suppressed breast cancer cells invasion through a layer of extracellular matrix, whereas breast cancer cells with ectopic expression of MFGE8-S acquired enhanced ability to degrade matrix and invasion (new Figure 5B, also see below). In addition, to determine the therapeutic value of targeting MFGE8 splicing, we transfected triple-negative breast cancer cells with ASOs targeting RBM7-binding motif and examined the potential impact on cell aggressiveness. The results showed an obvious increase in exon7-skipped variant of MFGE8 as compared to the scramble negative control ASOs, meanwhile, the migrative and invasive ability of breast cancer cells treated with splice-targeting ASOs was significantly boosted (new Figure 6B and S5B, also see below), further suggesting that RBM7-knockdown stimulated aggressiveness of breast cancer at least partially relies on splicing switch of MFGE8.

      Author response image 7.

      Gelatin degradation assay was performed to test the effect of RBM7 knockdown on invadopodia function. 10000 cells were plated onto FITC-gelatin substrates (Green) and cultured for 48 h. Representative images are shown (red, Cy3-phalloidin; blue, DAPI) and the degraded areas were quantified by Image J software. Scar bars= 50 μm. P values were determined by one-way ANOVA with Tukey's multiple comparison test (n = 3).

      Author response image 8.

      Representative transwell analysis of migrative/invasive capability of breast cancer cells transfected with 500 nM ASO directed against RBM7-binding region in MFGE8 pre-mRNA. P values were determined by one-way ANOVA with Tukey's multiple comparison test.

      Author response image 9.

      RT-PCR quantification of two MFGE8 isoforms after transfecting breast cancer cells with 500 nM ASO directed against RBM7-binding region in MFGE8 pre-mRNA. P values were calculated by one-way ANOVA with Tukey's multiple comparison test.

      The minor concerns

      (1) Several figure legends do not match with the images, for example, Figure 2K, Figure 4, Figure 7D, and 7E, and the description of Fiure 7F is missing in the text.

      As suggested by the reviewer, we have checked all of the figure legends carefully and corrected all of the misinterpretation.

      (2) The statistical methods for Figure1A and Figure1B should be indicated.

      As suggested by the reviewer, we have included the statistical methods for Figure1A and 1B in Figure1 legend. Data in Figure 1A and 1B are presented as means ± SD and P values were obtained by Mantel-Cox log-rank test.

      (3) The molecular weight of the proteins in the Western Blot images should be marked.

      As suggested by the reviewer, we have added the molecular weight of proteins in all of the western blot images.

      (4) The sequences where RBM7 binds on MFGE8 RNA should be clearly indicated.

      We thank the reviewer for this question. We analyzed the sequence of alternative exon 7 and the motifs nearby its 5’ or 3’ splice sites, and found two RBM7 potentially binding motifs are positioned in proximal to the pseudo 3’ splice site. Subsequent RT-PCR for the precipitation in RIP assays confirmed RBM7 could bind to the upstream sequence containing 5’-UUUCUU-3’ motifs adjacent to intron6/exon7 junction of MFGE8 cassette exon, but not another region nearby it. To pinpoint the location for the potential cis-element for AS regulation by RBM7, we designed antisense oligonucleotides (ASOs) to block RBM7 potentially binding sites (UUUCUU). As shown in revised Figure 4F, when compared to scramble ASO, targeting ASOs contributed to the exclusion of exon7. Additionally, we constructed an exogenous MFGE8 splicing reporter containing exon 6-8 and partial intron sequences to determine the binding site for AS regulation by RBM7. The depletion of RBM7 still induced the splicing shift of the minigene reporter by elevating MFGE8-S variant. While the binding motif UUUCUU was removed or mutated, RBM7 failed to affect the splicing outcomes of MFGE8 (new Figure S3C, also see below). Due to its close proximity to 3’ splice site, UUUCUU residues bound by RBM7 is very likely to participate in spliceosome assembly at the upstream 3’ splice site of exon7, which may explain why disruption of the motif led to almost complete exon7 skipping. The above data suggested that RBM7 regulated the exon skipping of MFGE8 by binding to UUUCUU located six nucleotides upstream of the 3’ splice-site of exon7.

      Author response image 10.

      Upper: the red line in diagram indicates ASOs targeting region which contains UUUCUU; down: MCF7 and MDA-MB-231 cells were transfected with ASOs targeting MFGE8 pre-mRNA for 48h and then applied for RT-PCR identification. P values were determined by one-way ANOVA with Tukey's multiple comparison test.

      Author response image 11.

      Upper: MFGE8 min-splicing reporters with mutation in the RBM7 binding site or a non-specific binding were generated and shown in cartoon; down: RT-PCR assays were performed to identify the splicing outcomes of MFGE8 reporter while RBM7 was depleted in breast cancer cells.

      (5) Some typos, graphic errors, and sentences are hard to understand and need to be corrected, such as lines 80-81, 249-250, line 221 "motfs", line 319 "RBM4". Please carefully proofread and revise the entire manuscript.

      As suggested by the reviewer, we have corrected typos and graphic errors mentioned above. In addition, this manuscript was also extensively edited to improve grammar and sentence structure.

      (6) Define the abbreviations when they first appear, such as MFGE8-L, RBM, etc.

      We thank the reviewer for raising this point. We have defined the abbreviations when firstly presented in the manuscript.

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript, the authors reported the biological role of RBM7 deficiency in promoting metastasis of breast cancer. They further used a combination of genomic and molecular biology approaches to discover a novel role of RBM7 in controlling alternative splicing of many genes in cell migration and invasion, which is responsible for the RBM7 activity in suppressing metastasis. They conducted an in-depth mechanistic study on one of the main targets of RBM7, MFGE8, and established a regulatory pathway between RBM7, MFGE8-L/MFGE8-S splicing switch, and NF-κB signaling cascade. This link between RBM7 and cancer pathology was further supported by analysis of clinical data.

      Strengths:

      Overall, this is a very comprehensive study with lots of data, and the evidence is consistent and convincing. Their main conclusion was supported by many lines of evidence, and the results in animal models are pretty impressive.

      Weaknesses:

      However, there are some controls missing, and the data presentation needs to be improved. The writing of the manuscript needs some grammatical improvements because some of the wording might be confusing.

      We thank the reviewer for the positive comments on this work, and have addressed all the concerns raised by the reviewer.

      Specific comments:

      (1) Figure 2. The figure legend is missing for Figure 2C, which caused many mislabels in the rest of the panels. The labels in the main text are correct, but the authors should check the figure legend more carefully. Also in Figure 2C, it is not clear why the authors choose to examine the expression of this subset of genes. The authors only refer to them as "a series of metastasis-related genes", but it is not clear what criteria they used to select these genes for expression analysis.

      We thank the reviewer for raising this question. We have included the figure legend for Figure 2C and improved other figure legends throughout the article. For the second question, since gene ontology analysis of RNA-seq data in RBM7-depleted breast cancer cells showed that a series of differentially expressed genes were enriched in metastasis-associated processe, we identified the expression of this subset of genes in breast cancer cells in the presence or absence of RBM7 by heatmap differential analysis based on qRT-PCR results. To clarify this point and address the reviewer’s concern, we have improved the relevant description of this part (see line 174-180 in the new manuscript).

      (2) Line 218-220. The comparison of PSI changes in different types of AS events is misleading. Because these AS events are regulated in different mechanisms, they cannot draw the conclusion that "the presence of RBM7 may promote the usage of alternative splice sites". For example, the regulators of SE and IR may even be opposite, and thus they should discuss this in different contexts. If they want to conclude this point, they should specifically discuss the SE and A5SS rather than draw an overall conclusion.

      We are thankful for the reviewer’s valuable comment. According to the suggestion, we have removed the overall conclusion and corrected to discuss in SE and A5SS.

      (3) In the section starting at line 243, they first referred to the gene and isoforms as "EFG-E8" or "EFG-E8-L", but later used "EFGE8" and "EFGE8-L". Please be consistent here. In addition, it will be more informative if the authors add a diagram of the difference between two EFGE8 isoforms in terms of protein structure or domain configuration.

      As suggested by the reviewer, we keep using the name “MFGE8-L” for the canonical MFGE8 isoform and “MFGE8-S” for the truncated isoform in this manuscript. In addition, to clarify the structural basis for the different tumor invasion-related functions of two MFGE8 isoforms, we have included a diagram of their domain configuration in new Figure S4F and predicted protein structure in new Figure S4G. The details in the revised manuscript are given below:

      Author response image 12.

      Schematic diagram of the domain composition of two MFGE8 isoforms. Upper: the full-length variant with exon7 indicated by yellow square; down: the truncated variant with exon7 skipping.

      Author response image 13.

      The model structure of two MFGE8 isoforms was implemented using SwissModel software. The F5/8 type C2 protein domain excluded from MFGE8-S variant was marked in red.

      (4) Figure 7B and 7C. The figures need quantification of the inclusion of MFGE exon7 (PSI value) in addition to the RT-PCR gel. The difference seems to be small for some patients.

      As suggested by the reviewer, we have included the relative quantification of PSI for endogenous MFGE8 in breast cancer patients and found increased proportion of exon7 exclusion in most tumor samples when compared to normal tissues (case#1: 86:94; case#2: 84:86; case#3: 79:85; case#4: 63:75; case#5: 69:93; case#6: 71:80) (new Figure 7B, also see below). On the other hand, we have expanded the number of metastatic breast cancer cases and quantified the the AS events within MFGE8 by analyzing the PSI values. The lymph node metastases contain a higher proportion of MFGE8 variant with skipped exon7 in comparison with paired primary tumor tissues (case#1: 80:95; case#2: 86:97; case#3: 84:90; case#4: 70:78; case#5: 83:89) (Figure 7C). This is coherent with decreased RBM7 expression levels found in breast cancer with lymph node metastasis.

      Author response image 14.

      The splicing alteration of MFGE8 in 6 pairs of primary breast cancer tissues and adjacent normal tissues was examined using RT-PCR. The quantification of PSI vales was based on relative band intensities using Image J software.

      Author response image 15.

      The splicing alteration of MFGE8 in primary breast cancer tissues and corresponding lymph node metastases was identified by RT-PCR assays. The quantification of PSI vales wa determined by Image J software.

      Minor comments:

      The writing in many places is a little odd or somewhat confusing, I am listing some examples, but the authors need to polish the whole manuscript more to improve the writing. 1. Line 169-170, "...followed by profiling high-throughput transcriptome by RNA sequencing", should be "followed by high-throughput transcriptome profiling with RNA sequencing". 2. Line 170, "displayed a wide of RBM7-regulated genes were enriched...", they should add a "that" after the "displayed" as the sentence is very long. 3. Line 213, "PSI (percent splicing inclusion)" is not correct, PSI stands for "percent spliced in". 4. Line 216-217, the sentence is long and fragmented, they should break it into two sentences. 5. Line 224, the "tethering" should be changed to "recognizing". There is a subtle difference in the mechanistic implication between these two words. 6. Line 250, should be changed to "...in the ratio of two MFGE8 isoforms".

      We thank the detailed comments from the reviewer. The points mentioned above has been addressed one by one and this manuscript was also extensively edited to improve grammar and sentence structure for better understanding.

      References

      Axel Weber PW, Michael Kracht* (2010) Interleukin-1 (IL-1) Pathway. SCIENCESIGNALING.

      Qing Guo1, Yizi Jin1,2, Xinyu Chen3, Xiaomin Ye4, Xin Shen5, Mingxi Lin1,2, Cheng Zeng1,2, Teng Zhou1,2 and Jian Zhang1,2 (2024) NF-κB in biology and targeted therapy: new insights and translational implications. Signal Transduction and Targeted Therapy.

      Falk S, Finogenova K, Melko M, Benda C, Lykke-Andersen S, Jensen TH, Conti E (2016) Structure of the RBM7–ZCCHC8 core of the NEXT complex reveals connections to splicing factors. Nature Communications.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review): 

      Summary: 

      Balasubramanian et al. characterized the cell types comprising mouse Schlemm's canal (SC) using bulk and single-cell RNA sequencing (scRNA-seq). The results identify expression patterns that delineate the SC inner and outer wall cells and two inner wall 'states'. Further analysis demonstrates expression patterns of glaucoma-associated genes and receptor-ligand pairs between SEC's and neighboring trabecular meshwork. 

      Strengths: 

      While mouse SC has been profiled in previous scRNA-seq studies (van Zyl et al 2020, Thomson et al 2021), these data provide higher resolution of SC cell types, particularly endothelial cell (SEC) populations. SC is an important regulator of anterior chamber outflow and has important consequences for glaucoma. 

      We thank the reviewers for their thorough reading of our manuscript and their insightful comments.

      Weaknesses: 

      (1) Since SC has previously been characterized in mouse, human, and other species by scRNA-seq in other studies, this study would benefit from more direct comparisons to published datasets. For example, Table 4 could be expanded to list the SC cell numbers profiled in each study. Expression patterns highlighted in this study could be independently verified by plotting in publicly available mouse SC datasets. Further, a comparison to human expression patterns would assess whether type-specific expression patterns are conserved. Alternatively, an integrated analysis could be performed. Indeed, the authors mention that an integrated analysis was attempted but the data is not shown. It is unclear if this was because of a lack of agreement between datasets or other reasons.

      Table 4 now includes an expanded list of SC cell numbers in each study. We profiled the expression of Npnt, Selp, and Ccl21a in the Thomson et al., 2021 dataset and have included the concurring results in Figure S5. We were unable to do a similar profile using the Van Zyl., 2020 dataset due to small SC numbers. As previously mentioned, differences such as read depth, strain of animals used (including pigmented vs albino), method of cell isolation (including drug exposure), and number of cells profiled raise a significant impediment to integration with previously published datasets. A comparison to human atlas is a focus of future work.

      (2) Figure 1 presents bulk RNA seq results comparing SEC, BEC, and LEC expression patterns. These populations were isolated using cell surface markers and enrichment by FACS. Since each EC population is derived from the same sample, the accuracy of this data hinges on the purity of enrichment. However, a reference is not given for this method and it is not clear how purity was validated. The authors later note that marker Emcn, which was used to identify BECs, is also expressed in SECs and LECs at lower levels. It should be demonstrated that these populations are clearly separated by flow cytometry. 

      We have added the following clarifying text to the methods section: Forward and side scatter gates were first used to eliminate events with low scatter which include debris, cell fragments and pyknotic cells. Then propidium iodide positive dead cells were gated out. Further gating on the viable cells was applied such that distinct population of cells were isolated a) SECs: GFP+Lvye1-, b) LECs: GFP+ Lyve1+, c) GFP- BECs: Endomucin+.

      We show here a representative of the flow sort showing the clear distinction in SEC and LEC cell isolation.

      Author response image 1.

      Flow sorted SEC and LEC. We obtained two distinct populations; 1. SEC cells (GPF+LYVE1--blue) 2. LEC (GPF+LYVE1+- red). Note eFluor 660 emission was collected using the Alexa647 (A647) setting of the flow cytometer. Additionally, SEC marker expression from bulk RNA-seq aligns with signature gene expression from SECs in single cell RNA-seq (Figure S3).

      (3) Bulk RNA-seq analysis infers similarity from the number of DEGs between samples, however, this is not a robust indicator. A correlation analysis should be run to verify conclusions. 

      We have provided a heatmap with hierarchical clustering based on Euclidean distance of the EC subtypes (Figure 1B) analyzed by bulk RNA seq in addition to number of DE genes between subtypes.

      (4) Figures 2-4 present three different datasets targeting the same tissue: 1) C57bl/6j scRNA-seq, 2) C57bl/6j snRNA-seq, 3) 129/sj scRNA-seq. Integrated analysis comparing datasets #1 to #2 and #3 is also presented. Integration methods are not described beyond 'normalization for cell numbers'. It is unclear if additional alignment methods were used. Integration across each of these datasets needs careful consideration, especially since different filtering methods were used (e.g. <20% mito in scRNA-seq and <5% in snRNA-seq). Improper integration could affect the ability to cluster or exaggerate differences between cell/types and states. It would be useful to demonstrate the contribution of different samples and datasets to each cell type/state to verify that these are not driven by batch effects, mouse strain, or collection platform. 

      We agree that integration should be performed with careful consideration to confounding factors. We demonstrate the contribution of different samples and datasets to show how our datasets integrated well (we had added panels to Figure 3C and 4C) and that cell types/states contribution was uniformly distributed across methods (C57BL/6J single cell and single nuc) and backgrounds (C57BL/6J and 129/Sj) were not a result of integration.

      (5) IW1 and IW2 are not well separated, and it is unclear if these represent truly different cell states. Figure 5b shows the staining of CCL21A and describes expression in the 'posterior portion' but in the image there are no DAPI+ nuclei in the anterior portion, suggesting the sampling in this section is different from Figure 5a. This would be improved by co-staining NPNT and CCL21A to demonstrate specificity. 

      Since both our antibodies are derived from the same species (goat), a co-labeling wasn’t possible. To be prudent, we used adjacent sections, flat-mounts, and RNAscope and provided further evidence of the anterior/posterior “bias” in supplemental figures.

      (6) The substructures observed within clusters in sc/snRNA-seq data suggest that overall profiling may still not be comprehensive. This should be noted in the discussion. 

      We agree and have added this note in the discussion: “With greater sampling and deeper transcriptomic depth, it is likely that additional SEC cell states/types will be identified.”

      Reviewer #2 (Public Review):

      Summary: 

      This article has characterized the mouse Schlemm's canal expression profile using a comprehensive approach based on sorted SEC, LEC, and BEC total RNA-Seq, scRNA-Seq, and snRNA-Seq to enrich the selection of SECs. The study has successfully profiled genome-wide gene expression using sorted SECs, demonstrating that SECs have a closer similarity to LECs than BECs. The combined scRNA- and snRNA-Seq data with deep coverage of gene expression led to the successful identification of many novel biomarkers for inner wall SECs, outer wall SECs, collector channel ECs, and pericytes. In addition, the study also identified two novel states of inner wall SECs separated by new markers. The study provides significant novel information about the biology and expression profile of SECs in the inner and outer walls. It is of great significance to have this novel, convincing, and comprehensive study led by leading researchers published in this journal. 

      Strengths: 

      This is a comprehensive study using various data to support the expression characterization of mouse SECs. First, the study profiled genome-wide expression using sorted SECs, LECs, and BECs from the same tissue/organ to identify the similarities and differences among the three types of cells. Second, snRNA-Seq was applied to enrich the number of SECs from mouse ocular tissues significantly. Increased sampling of SECs and other cells led to more comprehensive coverage and characterization of cells, including pericytes. Third, the combined scRNA- and snRNA-Seq data analyses increase the power to further characterize the subtle differences within SECs, leading to identifying the expression markers of Inner and Outer wall SECs, collector channel ECs, and distal region cells. Fourth, the identified unique markers were validated for RNA and protein expression in mouse ocular tissues. Fifth, the study explored how the IOP- and glaucoma-associated genes are expressed in the ScRNA- and snRNA-Seq data, providing potential connections of these GWAS genes with IOP and glaucoma. Sixth, the initial pathway and network analyses generated exciting hypotheses that could be tested in other independent studies. 

      We thank the reviewer for their comments on the strengths of this study.

      Weaknesses: 

      A few minor weaknesses have been noted. First, since snRNA-Seq and scRNA-Seq generated different coverage of expressed genes in the cells, how did the combined analyses balance the un-equal sequencing coverage and missing data points in the snRNA-Seq data? Second, the RNA/protein validation of selected SEC molecular markers was done using mouse anterior segment tissues. It would be more helpful to examine whether these molecular markers for SECs could work well in human SECs. Third, the effort to characterize the GWAS-identified IOP- and glaucoma-associated genes is exciting but with limited new information. Additional work could be performed to prioritize these genes.

      Integration of sc-Seq and sn-Seq data: We have addressed a similar integration question from reviewer 1 and have now included a plot showing the distribution of cells upon integration. Integration methods are not perfect and generally result in some loss of data especially when datasets of un-equal sequencing coverage are integrated. However, we did not observe any obvious differences between the original (un-integrated) and integrated datasets. We also noted that cell types/states contribution was similarly distributed across methods (C57BL/6J single cell and single nuc) and backgrounds (C57BL/6J and 129/Sj) and clustering were not a result of batch-effects.

      We agree about the human relevance of SEC markers, and this will be a focus of future work.

      Another focus of our future work is to understand how GWAS identified IOP and glaucoma genes change in disease states.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      Minor: 

      (1) Figure 5- DAPI should be listed in the legend. 

      (2) Figure 5- It would be helpful to label IW1 and IW2 regions in the UMAPs. 

      We have incorporated the suggestions in Figure 5 and legend.

      Reviewer #2 (Recommendations For The Authors): 

      (1) The study has validated RNA/protein expression of the selected biomarkers for IW/OW SECs in mouse eyes. It would be more helpful to confirm that these newly identified molecular biomarkers for SECs could apply to human eyes. This could be examined through available human scRNA-/snRNA-Seq data or targeted RNA and protein staining experiments. The additional validation in human SECs would make the current discovery more convincing. 

      We agree with the importance of validation in human samples, and is the scope of future work.

      (2) The combination of scRNA-Seq and snRNA-Seq from three batches of experiments increased the statistical power to identify subtypes of SECs. It would be helpful to include more details on how the qc, missing data, and normalization across different batches were dealt with. 

      We have incorporated more details in the methods section of the paper.

      (3) The authors explored the underlying molecular connection between the newly identified IOP/glaucoma-associated genes using the newly generated SEC-targeted scRNA/snRNA-Seq data. Many of these associated genes were present in the same SEC cells. It would be interesting to see how many of these genes' expression levels are correlated with each other via a network. These potential correlation networks across SECs could lead to identifying novel upstream regulators or network hubs, which could target many IOP-associated genes for future studies. 

      We agree with the importance of a correlation network analysis, but this is a focus of future work, especially in normal and disease states.

    1. Author response:

      The following is the authors’ response to the current reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This paper examines patterns of diversity and divergence in two closely related sub-species of Zea mays. While the data are interesting and the authors have tried to exclude multiple confounding factors, many patterns cannot clearly be ascribed to one cause or another.

      Strengths:

      The paper presents interesting data from sets of sympatric populations of the two sub-species, maize and teosinte. This sampling offers unique insights into the diversity and divergence between the two, as well as the geographic structure of each. Many analyses and simulations to check analyses have been carried out.

      Weaknesses:

      The strength of conclusions that can be drawn from the analyses was low, partly because there are many strange patterns. The authors have done a good job of adding caveats, but clearly, these species do not meet many assumptions of our methods.

      Thank you for the comments. We appreciate the multiple rounds of revision the manuscript has undergone and the work has improved as a consequence. Overall we disagree that the patterns are strange, and have made considerable efforts to explain in the text and in our responses why the patterns make sense based on what we know about the history of Zeamays from previous research. We agree that currently available methods are not capable of answering all questions we propose adequately. This reflects both limitations with the available data for these populations (i.e. phenotypes and spatially explicit sampling), and limitations in available methods tailored to the questions at hand (spatially explicit inference of the range over which an allele is adaptive). We have made considerable effort to point out the places where our inferences are likely to have low accuracy or limited resolution. These limitations are in many ways inherent to all inferential based science and should not be considered a weak point specific to this work, nor do they take away from the fundamental conclusions, which have changed quantitatively but not qualitatively over the course of peer review.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      -The manuscript should say something about the fact that range-wide PSMC does not show a decline.

      We did not use PSMC methods but instead mushi as described in the methods. On line 356 we described how the lower sample size and strong regularization are the most likely explanations for the lack of a population size decline in the rangewide samples.

      - The manuscript should explain how rdmc was run and what "overlapping" means.

      We described how sweep intervals were inferred starting on line 823 (Methods subsection “Identifying Selective Sweeps”). Sweep regions were defined as the outermost coordinates from all populations that shared any overlap in their respectively defined sweep intervals. The details of how we ran rdmc, including all of the parameters, is described starting on line 895 (methods subsection “Inferring modes of convergent adaptation”).

      - Figure 4: "Negative log10" is messed up

      Thank you. This has been fixed for the Version Of Record.

      - Line 318: "accruacy"

      Thank you. We have edited this typo for the Version Of Record.

      - New Table S3: why don't the proportions add to 1?

      These values represent what proportion of fixed differences at 0 fold sites are unique to each population. The denominator is the total number of fixed differences for each population separately, so each proportion is distinct for each population and thus should not sum to one across them. The table caption has been reworded in efforts to clarify for the Version Of Record.


      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This paper examines patterns of diversity and divergence in two closely related sub-species of Zea mays. While the patterns are interesting, the strength of evidence in support of the conclusions drawn from these patterns is weak overall. Most of the main conclusions are not supported by convincing analyses.

      Strengths:

      The paper presents interesting data from sets of sympatric populations of the two sub-species, maize and teosinte. This sampling offers unique insights into the diversity and divergence between the two, as well as the geographic structure of each.

      Weaknesses:

      There were issues with many parts of the paper, especially with the strength of conclusions that can be drawn from the analyses. I list the major issues in the order in which they appear in the paper.

      (1) Gene flow and demography.

      The f4 tests of introgression (Figure 1E) are not independent of one another. So how should we interpret these: as gene flow everywhere, or just one event in an ancestral population? More importantly, almost all the significant points involve one population (Crucero Lagunitas), which suggests that the results do not simply represent gene flow between the sub-species. There was also no signal of increased migration between sympatric pairs of populations. Overall, the evidence for gene flow presented here is not convincing. Can some kind of supporting evidence be presented?

      We agree that the standard approach to f4 tests that we employed here is not without limitations, namely, that the tests are conducted independently, while the true evolutionary history is not. While a joint demographic inference across all populations would be useful, it did not seem tractable to perform over all of our populations with currently available methods, given the number of populations being analyzed, nor does it directly address the question of interest. Our purpose for including the f4 was testing if there was more gene flow between sympatric pairs than in other comparisons (we have made that point more clear in the text near line 174. As described in the text, the distribution of Z scores is generated by pairing focal populations with all other non-focal populations across both subspecies, which means the gene flow signal of interest is marginalized over the effects of gene flow in the other non-focal populations. This is not nearly as rich as inferring the full history, but it gives us some sense of the average amount of gene flow experienced between populations and allows us to address one of our primary questions of interest when conceiving this paper - do sympatric pairs show more geneflow than other pairs? We agree with the reviewer that that answer is largely no, and the writing reflects this.

      Overall, we think both points mentioned by the reviewer here; finding that most but not all tests involved Crucero Lagunitas maize, and that sympatric pairs don’t show higher gene flow; nicely contributes to the overall theme in the paper - the history of both subspecies is idiosyncratic and impacted by humans in ways that do not reflect geographic proximity that we did not anticipate (see expectations near line 110). We have emphasized the connection between f4 tests and the revised rdmc results near line 653.

      The paper also estimates demographic histories (changes in effective population sizes) for each population, and each sub-species together. The text (lines 191-194) says that "all histories estimated a bottleneck that started approximately 10 thousand generations ago" but I do not see this. Figure 2C (not 2E, as cited in the text) shows that teosinte had declines in all populations 10,000 generations ago, but some of these declines were very minimal. Maize has a similar pattern that started more recently, but the overall species history shows no change in effective size at all. There's not a lot of signal in these figures overall.

      I am also curious: how does the demographic model inferred by mushi address inbreeding and homozygosity by descent (lines 197-202)? In other words, why does a change in Ne necessarily affect inbreeding, especially when all effective population sizes are above 10,000?

      All maize populations show a decline beginning 10,000 generations ago. The smallest decline for maize is from 100,000 to 30,000. All teosinte populations show a reduction in population size. The smallest of these drops more than 70% from around 300,000 to 100,000. Three of the teosinte populations showed a reduction in population size from ~10^5 to ~10^3, which is well below 10,000. Thus all populations show declines.

      These large reductions should lead to inbreeding and increased homozygosity by descent. Mushi does not specifically model these features of the data, yet as we show, simulations under the model estimated by Mushi matched the true HBD levels fairly well (Figure 2D).

      The rangewide sample does not show declines, likely because there is enough isolation between populations that the reduction in variation at any given locus is not shared, and is maintained in the populations that did not experience the population decline.

      (2) Proportion of adaptive mutations.

      The paper estimates alpha, the proportion of nonsynonymous substitutions fixed by positive selection, using two different sampling schemes for polymorphism. One uses range-wide polymorphism data and one uses each of the single populations. Because the estimates using these two approaches are similar, the authors conclude that there is little local adaptation. However, this conclusion is not justified.

      There is little information as to how the McDonald-Kreitman test is carried out, but it appears that polymorphism within either teosinte or maize (using either sampling scheme) is compared to fixed differences with an outgroup. These species might be Z. luxurians or Z. diploperennis, as both are mentioned as outgroups. Regardless of which is used, this sampling means that almost all the fixed differences in the MK test will be along the ancestral branch leading to the ancestor of maize or teosinte, and on the branch leading to the outgroup. Therefore, it should not be surprising that alpha does not change based on the sampling scheme, as this should barely change the number of fixed differences (no numbers are reported).

      The lack of differences in results has little to do with range-wide vs restricted adaptation, and much more to do with how MK tests are constructed. Should we expect an excess of fixed amino acid differences on very short internal branches of each sub-species tree? It makes sense that there is more variation in alpha in teosinte than maize, as these branches are longer, but they all seem quite short (it is hard to know precisely, as no Fst values or similar are reported).

      The section “Genetic Diversity” in the methods provides details about how luxurians and diploperennis were used as outgroups. The section “Estimating the Rate of Positive Selection, α”, in the methods includes the definition of α and full joint non-linear regression equation and the software used to estimate it (brms), and the relevant citations crediting the authors of the original method. However, some of the relevant information about the SFS construction is provided in the previous section entitled, “Genetic Diversity”. We added reference to this in results near line 800.

      While we appreciate the concern that “almost all the fixed differences in the MK test will be along the ancestral branch leading to the ancestor of maize or teosinte”, this is only a problem if there aren’t enough fixed differences that are unshared between populations. This is more of a concern for maize than teosinte, which we make clear as a caveat in the manuscript in several places already. The fact that there is variation in alpha among teosinte populations is evidence that these counts do differ among pops. As we can see in the population trees in Figure 1, there is a considerable amount of terminal branch length for all the populations. Indeed if we look at the number of fixed differences at 0 fold sites across populations:

      The variation in the number of fixed differences, particularly across teosinte means that a large number cannot be shared between populations. We can estimate the fixed differences unique to each subpopulation (and total count) demonstrating that, in general, there are a large number of substitutions unique to each population. This is good evidence the rangewide estimates do not reflect a lack of variation within populations, at least not for teosinte. This is now included in the supplement (Table S3).

      Finally, we note that the branches leading to outgroups are likely not substantially longer than those among populations. Given our estimates of Ne, the coalescent within maize and teosinte should be relatively deep (with Ne of 30K it should be ~120K years). The divergence time between Zea mays and these outgroup taxa has been estimated at ~150K years (Chen et al. 2022). This is now mentioned in the text on line 407.

      We have added a caveat about the reviewers concern for the non-independence of fixed difference for maize near line 386.

      (3) Shared and private sweeps.

      In order to make biological inferences from the number of shared and private sweeps, there are a number of issues that must be addressed.

      One issue is false negatives and false positives. If sweeps occur but are missed, then they will appear to be less shared than they really are. Table S3 reports very high false negative rates across much of the parameter space considered, but is not mentioned in the main text. How can we make strong conclusions about the scale of local adaptation given this? Conversely, while there is information about the false positive rate provided, this information doesn't tell us whether it's higher for population-specific events. It certainly seems likely that it would be. In either case, we should be cautious saying that some sweeps are "locally restricted" if they can be missed more than 85% of the time in a second population or falsely identified more than 25% of the time in a single population.

      The reviewer brings up a worthwhile point. The simulation results indeed call into question how many of the sweeps we claim are exclusive to one population actually are. This caveat is already made, but we now make clearer the reviewer’s concern regarding the high false negative rate (near line 299). However, if anything this suggests sweeps are shared even more often than what is reported. One of the major takeaways from the paper is that convergent adaptation is more common than we expected. The most interesting part about the unique sweeps is the comparison between maize and teosinte. While the true proportions may vary, the relatively higher proportion of sweeps exclusive to one population in teosinte compared to maize is unlikely to be affected by false negatives, since the accuracy to identify sweeps pretty similar across subspecies (though perhaps with some exceptions for the populations with stronger bottlenecks). Further, these criticisms are specific to the raisd results. All sweeps shared across multiple populations were analyzed using rdmc. After adjustments made to the number of proposed sites for selection (see response below), there is good agreement between the raisd and rdmc results - the regions we proposed as selective sweeps with raisd all show evidence convergence using rdmc. Recall too that rdmc uses a quite different approach to inference - all populations are used jointly, labelling those that did and did not experience the sweep. If sweeps were present in populations that were labeled as neutral (or vice versa), this would weaken the power to infer selection at the locus. Much of the parameter space we explored is for quite weak selection, and the simulated analysis shows we are likely to miss those instances, often entirely. For strong sweeps, however, our simulations show we have appreciable accuracy.

      Together, there is reason to be optimistic about our detection of strong shared sweeps and that the main conclusions we make are sound.

      Finally, we note that we are unaware of any other empirical study that has performed similar estimates of the accuracy of the sweep calling in their data (as opposed to using simulations). We thus see these analyses as a significant contribution towards transparency that is completely lacking from most papers.

      A second, opposite, issue is shared ancestral events. Maize populations are much more closely related than teosinte (Figure 2B). Because of this, a single, completed sweep in the ancestor of all populations could much more readily show a signal in multiple descendant populations. This is consistent with the data showing more shared events (and possibly more events overall). There also appear to be some very closely (phylogenetically) related teosinte populations. What if there's selection in their shared ancestor? For instance, Los Guajes and Palmar Chico are the two most closely related populations of teosinte and have the fewest unique sweeps (Figure 4B). How do these kinds of ancestrally shared selective events fit into the framework here?

      The reviewer brings up another interesting point and one that likely impacts some of our results.

      As the reviewer describes, this is an issue that is of more concern to the more closely related populations and is less likely to explain results across the subspecies. We have added this as a caveat (near line 456). As is clear in the writing, sharing across subspecies is our primary interest for the rdmc results.

      These analyses of shared sweeps are followed by an analysis of sweeps shared by sympatric pairs of teosinte and maize. Because there are not more events shared by these pairs than expected, the paper concludes that geography and local environment are not important. But wouldn't it be better to test for shared sweeps according to the geographic proximity of populations of the same sub-species? A comparison of the two sub-species does not directly address the scale of adaptation of one organism to its environment, and therefore it is hard to know what to conclude from this analysis.

      We did not intend to conclude that local adaptation is not important. Especially for teosinte, we report and interpret evidence that many sweeps are happening exclusively to one population, which is consistent with the action of location adaptation and consistent with some of our expectations.

      More directly, this is another instance of us having clear hypotheses going into the paper and constructing specific analyses to test them. As we explain in the paper, we expected the scale of local adaptation to be very small, such that subspecies growing next to each other have more opportunities to exchange alleles that are locally adapted to their shared environment. The analysis we conducted makes sense in light of this expectation. We considered conducting tests regarding geographic proximity, but there is limited power with the number of populations we have within subspecies, and the meaning of the tests is unclear if all populations of both subspecies are naively included together. This analysis shows that, at least for sweeps and fixations, adaptation is larger than a single location. While it may not be a complete description on its own, the work here does provide information about the scale of adaptation and is useful to our overall claims and objectives of the paper. As mentioned in the paper, the story might be very different if we were to study through a lens of polygenic adaptation. We also now include in the discussion in several places mention of where broader sampling could improve inference.

      (4) Convergent adaptation

      My biggest concern involves the apparent main conclusion of the paper about the sources of "convergent adaptations". I believe the authors are misapplying the method of Lee and Coop (2017), and have not seriously considered the confounding factors of this method as applied. I am unconvinced by the conclusions that are made from these analyses.

      The method of Lee and Coop (referred to as rdmc) is intended to be applied to a single locus (or very tightly linked loci) that shows adaptation to the same environmental factor in different populations. From their paper: "Geographically separated populations can convergently adapt to the same selection pressure. Convergent evolution at the level of a gene may arise via three distinct modes." However, in the current paper, we are not considering such a restricted case. Instead, genome-wide scans for sweep regions have been made, without regard to similar selection pressures or to whether events are occurring in the same gene. Instead, the method is applied to large genomic regions not associated with known phenotypes or selective pressures.

      I think the larger worry here is whether we are truly considering the "same gene" in these analyses. The methods applied here attempt to find shared sweep regions, not shared genes (or mutations). Even then, there are no details that I could find as to what constitutes a shared sweep. The only relevant text (lines 802-803) describes how a single region is called: "We merged outlier regions within 50,000 Kb of one another and treated as a single sweep region." (It probably doesn't mean "50,000 kb", which would be 50 million bases.) However, no information is given about how to identify overlap between populations or sub-species, nor how likely it is that the shared target of selection would be included in anything identified as a shared sweep. Is there a way to gauge whether we are truly identifying the same target of selection in two populations?

      The question then is, what does rdmc conclude if we are simply looking at a region that happened to be a sweep in two populations, but was not due to shared selection or similar genes? There is little testing of this application here, especially its accuracy. Testing in Lee and Coop (2017) is all carried out assuming the location of the selected site is known, and even then there is quite a lot of difficulty distinguishing among several of the non-neutral models. This was especially true when standing variation was only polymorphic for a short time, as is estimated here for many cases, and would be confused for migration (see Lee and Coop 2017). Furthermore, the model of Lee and Coop (2017) does not seem to consider a completed ancestral sweep that has signals that persist into current populations (see point 3 above). How would rdmc interpret such a scenario?

      Overall, there simply doesn't seem to be enough testing of this method, nor are many caveats raised in relation to the strange distributions of standing variation times (bimodal) or migration rates (opposite between maize and teosinte). It is not clear what inferences can be made with confidence, and certainly the Discussion (and Abstract) makes conclusions about the spread of beneficial alleles via introgression that seem to outstrip the results.

      We have fixed the “50,000 Kb” typo.

      There are several important points the reviewer makes here worth considering. First and most importantly, the method of Lee and Coop (2017) actually does include sites as part of the composite likelihood calculation. For computational feasibility, the number of positions we initially considered was 20 (20 different positions along the input sequence were proposed as the site of the shared beneficial mutation). In efforts to further address the reviewer’s concern about adaptive mutations at distinct loci, we have increased the number of proposed selected sites to 200. This fact should greatly diminish the reviewer’s concern that we are picking up independent sweeps that happened at different nucleotide positions in the same region - evidence for a beneficial mutation must be shared by the selected populations at a proposed site. As the revisions show, this has modified the results of our paper in a number of ways, including changing all of the previous neutral regions to shared via standing variation or migration. Despite these changes, our previous conclusions are intact, including the pattern that migration rates are high when maize populations share the sweep. Relatedly, we disagree with the reviewer’s characterization of the migration results. The pattern is quite clear and makes sense - when a maize population is involved in the sweep, migration rate is inferred to be high. Sweeps exclusive to teosinte are rarer and are inferred to have a low migration rate. This relates directly to the idea that humans have moved maize relatively rapidly across the landscape.

      We have now included a plot showing how the difference between the maximum composite likelihood (CLE) site compares to the next highest CLE site varies across our inferences (Figure S8), which strongly suggests that patterns are not muddled across multiple loci, but are centered at a focal region where the beneficial allele is inferred to be located. While there are too many to show in the manuscript across all sweeps, here is a nice example of what inference looks like for one of the proposed sweep regions.

      Author response image 1.

      Furthermore, the situation the reviewer is describing would be selection acting on independent mutations (mutations at different loci), which would not create an increase in the amount of allele frequency covariance above and beyond what would be expected by drift under the migration and standing variation models.

      We also note that we are not alone in applying this approach to shared outlier signals in the absence of known genes; indeed the authors of the DMC method have applied it to regions of shared outlier signal themselves (e.g. https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1008593).

      Reviewer #2 (Public Review):

      Summary:

      The authors sampled multiple populations of maize and teosinte across Mexico, aiming to characterise the geographic scale of local adaptation, patterns of selective sweeps, and modes of convergent evolution between populations and subspecies.

      Strengths & Weaknesses:

      The population genomic methods are standard and appropriate, including Fst, Tajima's D, α, and selective sweep scans. The whole genome sequencing data seems high quality. However, limitations exist regarding limited sampling, potential high false-positive sweep detection rates, and weak evidence for some conclusions, like the role of migration in teosinte adaptation.

      Aims & Conclusions:

      The results are interesting in supporting local adaptation at intermediate geographic scales, widespread convergence between populations, and standing variation/gene flow facilitating adaptation. However, more rigorous assessments of method performance would strengthen confidence. Connecting genetic patterns to phenotypic differences would also help validate associations with local adaptation.

      Impact & Utility:

      This work provides some of the first genomic insights into local adaptation and convergence in maize and teosinte. However, the limited sampling and need for better method validation currently temper the utility and impact. Broader sampling and connecting results to phenotypes would make this a more impactful study and valuable resource. The population genomic data itself provides a helpful resource for the community.

      Additional Context:

      Previous work has found population structure and phenotypic differences consistent with local adaptation in maize and teosinte. However, genomic insights have been lacking. This paper takes initial steps to characterise genomic patterns but is limited by sampling and validation. Additional work building on this foundation could contribute to understanding local adaptation in these agriculturally vital species.

      We appreciate the reviewer’s thoughtful reading of the paper and scrutiny. We hope that the added caveats made in response to reviewer 1 (as well as the previous rounds of peer review) will provide readers with the proper amount of skepticism in the accuracy of some of our initial sweep results, while also demonstrating that many of our conclusions are robust to the concerns raised over the various stages of review.

      We agree with the reviewer that better sampling and the incorporation inference about phenotypic data would be excellent additions, but the information is not available for the studied populations, and is outside scope of this paper.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      - Sometimes alpha is described as a rate, and sometimes as a proportion. The latter is correct.

      We have updated this. Thanks.

      - Line 79: are they really "discrete" populations?

      The teosinte populations sampled are all clearly separated from each other and are physically discrete. The maize population samples came from individual farmer fields. Traditional maize is grown as open-pollinated (outcrossing) populations, and farmers save seed for subsequent generations. An individual farmer’s field thus behaves as a discrete population for our purposes, impacted of course by gene flow, selection, and other evolutionary processes.

      - Lines 418-420: "Large genomes may lead to more soft sweeps, where no single mutation driving adaptive evolution would fix (Mei et al. 2018)." I'm not sure I understand this statement. Why is this a property of genome size?

      Mei et al. 2018 lay out the logic, but essentially they present data arguing that the total number of functionally relevant base pairs increases with genome size (less than linearly). If true, genomes with a large number of potentially functional bp are more likely to undergo soft sweeps (see theory by Hermisson and Pennings cited in Mei et al. 2018).

      - Lines 500-1: selection does not cause one to underestimate effective population sizes. Selection directly affects Ne. I'm not sure what biases the sentences on lines 502-508 are trying to explain.

      We have simplified this section. Not accounting for linked selection (especially positive selection) results in a biased inference of demographic history. See Marsh and Johri (2024) for another example. https://doi.org/10.1093/molbev/msae118

      - Line 511-3: does Uricchio et al. (2019) show any difference in the estimate of alpha from Messer and Petrov (2013) when taking background selection into account?

      What we initially wrote was incorrect. The aMK method of Messer and Petrov (2013) accounts for weakly deleterious polymorphisms, but it does not account for positively selected ones. We have updated this text and suggested our method may underestimate alpha if positively selected segregating alleles are common (near line 539).

      - Lines 598-599: "which would limit the rate of new and beneficial mutations." I don't understand this - shouldn't a bottleneck only affect standing variation? Why would a bottleneck affect new mutations?

      This is simply to say that during the low Ne period of a bottleneck, fewer total mutations (and therefore beneficial mutations) will be generated since there are fewer individuals for mutations to occur in. We have changed “rate” to amount to clarify we do not mean the mutation rate itself.

      Reviewer #2 (Recommendations For The Authors):

      Experiments/Analyses:

      (1) Consider simulating polygenic adaptation in addition to hard and soft sweeps to see if this improves the power to detect adaptive signatures shared between populations. This could involve simulating the coordinated change in allele frequencies across many loci to match a specified shift in trait value due to selection. The ability to detect shared polygenic adaptation between population replicates could be assessed using methods tailored to polygenic signals, such as the Polygenic Selection Score approach. Comparing the power to detect shared polygenic adaptation versus shared hard and soft sweeps would provide further insight into what adaptive modes current methods can uncover. If the power to detect shared polygenic adaptation is very low, the extent of shared adaptation between populations may be even more common than currently inferred. Adding simulations of polygenic adaptation would strengthen the study.

      While this would be a worthwhile undertaking in general, it would be a considerable amount of work outside of the scope and aims of this paper.

      (2) Explore using machine learning approaches like S/HIC to improve power over summary statistic methods potentially.

      We in fact put considerable effort into applying diplo S/HIC before switching to raisd for this project. While predictions on simulations had good power to detect sweeps, we found that applying to our actual data had a dubious number of windows classified as sweeps (e.g. >90% of the genome), which we believed to be false positives. We speculated that this may have to do with sensitivity to demographic or other types of misspecification in the simulations, such as our choice of window sizes compared to local recombination rates. It would likely be fruitful to our further efforts into using machine learning methods for maize and teosinte, but a deeper exploration of the right hyper parameters and simulation choices is likely needed to apply them effectively.

      (3) Increase geographic sampling density, if possible, especially near population pairs showing high differentiation, to better understand the scale of local adaptation.

      We agree this would be valuable research. Hopefully this work inspires further efforts into the question of the spatial and temporal scales of local adaptation with more ambitious spatial sampling designed at the onset

      Writing/Presentation:

      (1) Provide more intuition about the biological interpretation of the migration rates inferred under the migration model of convergence. What do the rates imply about the amount or timing of gene flow?

      We have expanded the discussion sections (starting near line 653) to elaborate on the migration results and connect the rdmc and f4 tests more explicitly. The timing of gene flow is more challenging to address directly with the approaches we used, but we agree it would be interesting to explore more in future papers.

      (2a) Expand the discussion of power limitations and the need for simulation tests. Consider adding ROC curves for sweep detection on simulated data. The relatively low proportion of shared selective sweeps between population replicates highlights limitations in the power to detect sweeps, especially incomplete or soft sweeps. I think it would be a good idea to expand the discussion of the power tradeoffs shown in the simulation analyses. In particular, the ROC curves in Figure S4 clearly show how power declines for weaker selection coefficients across the different sweep types. I suggest making these ROC curves part of the main figures to feature the issue of power limitations more prominently.

      (2b) The discussion would benefit from commenting on how power changes across the sweep simulation scenarios. Adding a summary figure to visualise the effects of sweep type, selection strength, and frequency on detectability could further clarify the power constraints. Stating the proportion of sweeps likely missed strengthens the argument that sharing adaptive alleles is likely even more common than inferred. Discussing power will also motivate the need for developing methods with improved abilities to uncover incomplete and soft sweeps.

      While these are useful suggestions (2a and 2b), the aim of this paper at its core is empirical, and was not intended to give an exhaustive analysis of the power to detect sweeps. We report what parts of the analysis may be impacted by low power and what aspects of our inferences have higher uncertainty due to power. We agree that there is more work to be done to improve methods to detect selection given our findings (see below concerning our efforts to use machine learning as well). While we do not highlight this in the paper, we also note that ours is one of extremely few empirical studies that actually perform power analyses on real data (as opposed to simulations). We think this extra transparency by itself is of substantial utility to the community in demonstrating that the results from simulation studies performed in publications describing a method do not necessarily translate well to empirical data.

      (3) Improve clarity in describing f4 test results. Consider visualising results on a map to show spatial patterns.

      We have expanded the discussion concerning f4 tests (see several comments to reviewer 1). We are not clear on how to effectively visualize f4 spatially, but hope the updates have made the results more clear.

      Minor:

      -  Increase the font size of figure axis labels for improved readability.

      We have looked over and figures and increased font sizes where possible.

      -  Add units to selection coefficient axis labels in Figure 5.

      Selection coefficients are derived in Lee and Coop (2017) from classical population genetics theory. They do not have units, but denote the relative fitness advantage of the heterozygous genotype carrying the beneficial mutation of interest.

      -  Fix the typo 'cophenetic' in Figure S3 caption.

      Fixed. Thank you.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study provides convincing evidence on the infraslow oscillation of DG cells during NREM sleep, and how serotonergic innervation modulates hippocampal activity pattern during sleep and memory.

      Strengths and Weaknesses:

      The authors used state-of-the-art techniques to carry out these experiments. Given that the functional role of infraslow rhythm still remains to be studied, this study provides convincing evidence of the role of DG cells in regulating infraslow rhythm, sleep microarchitecture, and memory.

      I have a few minor comments.

      (1) Decreased infraslow rhythm during NREMs in the 5ht1a KO mice is striking. It would be helpful to know whether sleep-wake states, MAs, and transitions to REMs are changed.

      We agree with the reviewer that serotonin receptors may be involved in sleep regulation therefore it is important to analyze the effect of their manipulation. We would also like to bring to the attention of the reviewer that in this case we restricted the 5ht1a manipulation to the hippocampus which does not have a known impact on sleep-wake regulation. The analysis of our recorded dataset from these mice confirmed this notion, because we did not see any changes in sleep metrics (see: supplementary figure 6A).

      (2) It would be interesting to discuss whether the magnitude in changes of infraslow rhythm strength is correlated with memory performance (Figure 6).

      We agree with the reviewer that this could be an interesting point. In our experiments we wanted to minimize the impact of the surgical procedures on the behavior, thus we used separate cohorts to record the photometry and to carry out the behavior experiments, therefore we are unable to correlate behavior and infraslow oscillatory amplitudes in our dataset.

      However, a similar experiment was carried out in a recent paper where the authors discovered that the norepinephrine system also displays infraslow oscillatory cycles during NREM sleep (Kjaerby et al 2022). The authors of that paper gradually decreased the magnitude of the NE pulses during NREM by optogenetic manipulation of the locus coeruleus which led to a fragmented sleep phenotype characterized by increased micro arousal occurrence, decreased REM and reduced spindle activity. They also tested the memory performance of the mice in a novel object recognition task and found diminished performance level in the opto group. Serotonin has multiple roles in the brain, many of them show overlap with proposed functions of the noradrenergic system including regulation of plasticity, signaling reward or fearful stimuli. Therefore, we speculate that the modification of serotonin dynamics during sleep will most likely interfere with memory performance.

      We inserted this paragraph in the discussion part of our paper.

      (3) The authors should cite the Oikonomou Neuron paper that describes slow oscillatory activity of DRN SERT neurons during NREM sleep.

      Thank you for the suggestion, we inserted this paper in the manuscript.

      (4) The authors should clarify how they define the phasic pattern of the photometry signal.

      We have added the details in the Methods.

      Reviewer #2 (Public review):

      Summary:

      The authors investigated DG neuronal activity at the population and single-cell level across sleep/wake periods. They found an infraslow oscillation (0.01-0.03 Hz) in both granule cells (GC) and mossy cells (MC) during NREM sleep.

      The important findings are:

      (1) The antiparallel temporal dynamics of DG neuron activities and serotonin neuron activities/extracellular serotonin levels during NREM sleep, and

      (2) The GC Htr1a-mediated GC infraslow oscillation.

      Strengths:

      (1) The combination of polysomnography, Ca-fiber photometry, two-photon microscopy, and gene depletion is technically sound. The coincidence of microarousals and dips in DG population activity is convincing. The dip in activity in upregulated cells is responsible for the dip at the population level.

      (2) DG GCs express excitatory Htr4 and Htr7 in addition to inhibitory Htr1a, but deletion of Htr1a is sufficient to disrupt DG GC infraslow oscillation, supporting the importance of Htr1a in DG activity during NREM sleep.

      Weaknesses:

      (1) The current data set and analysis are insufficient to interpret the observation correctly.

      a. In Figure 1A, during NREM, the peaks and troughs of GC population activities seem to gradually decrease over time. Please address this point.

      Thank you for the suggestion. We have analyzed and compared the magnitude of the oscillatory signals in the first and last minute of the NREM sleep epochs in Dock10-Cre mice and found no significant difference. However, we did observe that the ISO amplitude is smaller in the early stage of the first NREM epochs, defined as those with the prior wakefulness longer than 5 minutes (new supplementary figure 1).

      b. In Figure 1F, about 30% of Ca dips coincided with MA (EMG increase) and 60% of Ca dips did not coincide with EMG increase. If this is true, the readers can find 8 Ca dips which are not associated with MAs from Figure 1E. If MAs were clustered, please describe this properly.

      We did not find evidence that MAs were clustered in our dataset (see a representative example in supplementary figure 1A). We replaced the example trace with a new one which shows calcium dips with and without MAs. We believe this new trace better represents the data.

      c. In Figure 1F, the legend stated the percentage during NREM. If the authors want to include the percentage of wake and REM, please show the traces with Ca dips during wake and REM. This concern applies to all pie charts provided by the authors.

      Figure 1F (and all other pie charts) shows the outcome of brain states following a calcium-dip episode. That is, we found that the Ca-dips during NREM were followed by MAs in 30% of the cases, 59% of the Ca-dips led to the maintenance of NREM (no MAs) while in 2% and 9% of the cases we detected either REM state or wakening of the animal. These numbers correspond very well with similar analysis done in a recent paper which looked at the infraslow oscillatory behavior of the norepinephrine system (Kjaerby et al 2022) during NREM sleep. We apologize if the wording in the manuscript was misleading, we modified the figure legends to clarify what the pie charts represent. 

      d. In Figure 1C, please provide line plots connecting the same session. This request applies to all related figures.

      We have replaced the dot plots in all related figures with the line plots. 

      e. In Figure 2C, the significant increase during REM and the same level during NREM are not convincing. In Figure 2A, the several EMG increasing bouts do not appear to be MA, but rather wakefulness, because the duration of the EMG increase is greater than 15 seconds. Therefore, it is possible that the wake bouts were mixed with NREM bouts, leading to the decrease of Ca activity during NREM. In fact, In Figure 2E, the 4th MA bout seems to be the wake bout because the EMG increase lasts more than 15 seconds.

      We have replaced the Figure 2C with line plots as suggested above. It is clear that MC activity during REM sleep is higher, compared to that in NREM sleep, whereas the overall difference between wake and NREM is not significant (some increased, some decreased). Regarding the MAs, we have added a trace of averaged EMG signals in Figure 2G, showing that the averaged EMG bursts during MA are shorter than 5 seconds.

      f. Figure 5D REM data are interesting because the DRN activity is stably silenced during REM. The varied correlation means the varied DG activity during REM. The authors need to address it.

      We thank the reviewer for this suggestion. We have added this point to the discussion. We speculate that inputs from the supramammillary nucleus or entorhinal cortex to the DG during REM sleep may both contribute to this variability.

      g. In Figure 6, the authors should show the impact of DG Htr1a knockdown on sleep/wake structure including the frequency of MAs. I agree with the impact of Htr1a on DG ISO, but possible changes in sleep bout may induce the DG ISO disturbance.

      As suggested, we have performed sleep analysis in the Htr1a knockdown experiments including MA quantification. We have found no significant difference between Hrt1-knockdown and control mice in any of the sleep metrics (see: supplemental figure 6). Our interpretation is that the lack of changes in sleep/wake cycles is likely due to the hippocampus not being directly involved in regulating these brain states.

      (2) It is acceptable that DG Htr1a KO induces the reduced freezing in the CFC test (Figure 6E, F), but it is too much of a stretch that the disruption of DG ISO causes impaired fear memory. There should be a correlation.

      We have modified the discussion accordingly.

      (3) It is necessary to describe the extent of AAV-Cre infection. The authors injected AAV into the dorsal DG (AP -1.9 mm), but the histology shows the ventral DG (Supplementary Figure 4), which reduces the reliability of this study.

      The histology image shown in the manuscript was taken from the -2.5 mm anteroposterior level, which we still consider to be part of the dorsal DG. For additional clarity, we have replaced the figure with new histology images slightly more anterior position (AP~2.0mm). 

      Reviewer #3 (Public review):

      Summary:

      The authors employ a series of well-conceived and well-executed experiments involving photometric imaging of the dentate gyrus and raphe nucleus, as well as cell-type specific genetic manipulations of serotonergic receptors that together serve to directly implicate serotonergic regulation of dentate gyrus (DG) granule (GC) and mossy cell (MC) activity in association with an infra slow oscillation (ISO) of neural activity has been previously linked to general cortical regulation during NREM sleep and microarousals.

      Strengths:

      There are a number of novel and important results, including the modulation of dentage granule cell activity by the infraslow oscillation during NREM sleep, the selective association of different subpopulations of granule cells to microarousals (MA), the anticorrelation of raphe activity with infraslow dentate activity.

      The discussion includes a general survey of ISOs and recent work relating to their expression in other brain areas and other potential neuromodulatory system involvement, as well as possible connections with infraslow oscillations, micro-arousals, and sensory sensitivity.

      Weaknesses:

      (1) The behavioral results showing contextual memory impairment resulting from 5-HT1a knockdown are fine but are over-interpreted. The term memory consolidation is used several times, as well as references to sleep-dependence. This is not what was tested. The receptor was knocked down, and then 2 weeks later animals were found to have fear conditioning deficits. They can certainly describe this result as indicating a connection between 5-HT1a receptor function and memory performance, but the connection to sleep and consolidation would just be speculation. The fact that 5-HT1a knockdown also impacted DG ISOs does not establish dependency. Some examples of this are:

      a. The final conclusion asserts "Together, our study highlights the role of neuromodulation in organizing neuronal activity during sleep and sleep-dependent brain functions, such as memory.". However, the reported memory effects (impairment of fear conditioning) were not shown to be explicitly sleep-dependent.

      We thank the reviewer for this comment. We have revised the sentence.

      b. Earlier in the discussion it mentions "Finally, we showed that local genetic ablation of 5-HT1a receptors in GCs impaired the ISO and memory consolidation". The effect shown was on general memory performance - consolidation was not specifically implicated.

      We have revised the sentence.

      (2) The assertion on page 9 that the results demonstrate "that the 5-HT is directly acting in the DG to gate the oscillations" is a bit strong given the magnitude of effect shown in Figure 6D, and the absence of demonstration of negative effect on cortical areas that also show ISO activity and could impact DG activity (see requested cortical sigma power analysis).

      We have revised the sentence.

      (3) Recent work has shown that abnormal DG GC activity can result from the use of the specific Ca indicator being used (GCaMP6s). (Teng, S., Wang, W., Wen, J.J.J. et al. Expression of GCaMP6s in the dentate gyrus induces tonic-clonic seizures. Sci Rep 14, 8104 (2024). https://doi.org/10.1038/s41598-024-58819-9). The authors of that study found that the effect seemed to be specific to GCaMP6s and that GCaMP6f did not lead to abnormal excitability. Note this is of particular concern given similar infraslow variation of cortical excitability in epilepsy (cf Vanhatalo et al. PNAS 2004). While I don't think that the experiments need to be repeated with a different indicator to address this concern, you should be able to use the 2p GCaMP7 experiments that have already been done to provide additional validation by repeating the analyses done for the GCaMP6s photometry experiments. This should be done anyway to allow appropriate comparison of the 2p and photometry results.

      We would like to thank the reviewer for this comment. We also analyzed the two-photon data in the same manner as the photometry data. However, the only supportive evidence that might be related to ISO in the two-photon data, recorded at the somatic level, was decreased fluorescence during MAs in the NREM-upregulated cell group (see Figure 3 D, E). We are unsure why this discrepancy exists, but we have discussed it in the manuscript and offered some alternative explanations. One hypothesis we are currently exploring relates to the different subcellular compartments sampled by the two imaging techniques. The photometry probe was implanted above the dentate gyrus, and since light collection efficiency declines sharply with distance from the probe tip (Pisano et al., 2019), we hypothesize that ISO is stronger at the dendritic level which directly receive the inputs from entorhinal cortex, and which is closest to the probe's tip. We are now conducting multiplane two-photon imaging experiments in our labs to test this hypothesis.

      (4) While the discussion mentions previous work that has linked ISOs during sleep with regulation of cortical oscillations in the sigma band, oddly no such analysis is performed in the current work even though it is presumably available and would be highly relevant to the interpretation of a number of primary results including the relationship between the ISOs and MAs observed in the DG and similar results reported in other areas, as well as the selective impact of DG 5-HT1a knockdown on DG ISOs. For example, in the initial results describing the cross-correlation of calcium activity and EMG/EEG with MA episodes (paragraph 1, page 4), similar results relating brief arousals to the infraslow fluctuation in sleep spindles (sigma band) have been reported also at .02 Hz associated with variation in sensory arousability (cf. Cardis et al., "Cortico-autonomic local arousals and heightened somatosensory arousability during NREMS of mice in neuropathic pain", eLife 2021). It would be important to know whether the current results show similar cortical sigma band correlations. Also, in the results on ISO attenuation following 5-HT1 knockdown on page 7 (Figure 6), how is cortical EEG affected? Is ISO still seen in EEG but attenuated in DG?

      Thank you for this valuable comment. We performed the analysis and found a positive correlation between cortical sigma band activity and DG activity during NREM sleep (see supplementary figure 1C-1E). Additionally, we conducted further analyses using the local 5-HT1a KO mouse model but did not observe significant changes in sleep architecture or MA frequency (see supplementary figure 6A). It is also important to note that ISO was only analyzed using calcium signals, not EEG signals. The standard filtering settings in our EEG data collection (0.5-500 Hz) do not allow us to analyze signals in such a low-frequency range.

      (5) The illustrations of the effect of 5-HT1a knockdown shown in Figure 6 are somewhat misleading. The examples in panels B and C show an effect that is much more dramatic than the overall effect shown in panel D. Panels B and C do not appear to be representative examples. Which of the sample points in panel D are illustrated in panels B and C? It is not appropriate to arbitrarily select two points from different animals for comparison, or worse, to take points from the extremes of the distributions. If the intent is to illustrate what the effect shown in D looks like in the raw data, then you need to select examples that reflect the means shown in panel D. It is also important to show the effect on cortical EEG, particularly in sigma band to see if the effects are restricted to the DG ISOs. It would also be helpful to show that MAs and their correlations as shown in Figure 1 or G as well as broader sleep architecture are not affected.

      We agree with the reviewer that the chosen example may appear somewhat exaggerated. However, we must point out that visually assessing missing or downregulated frequency components can be challenging. To provide a more objective presentation, we included Supplementary Figure 6B-C, in which we performed analysis similar to that in Fig1G in 5HT1a mice. These figures show a significant decrease in ISO amplitude, though the blockade is not complete, due to the incomplete nature of genetic manipulation with viral injection (see Suppl Fig 5). Furthermore, recent studies (Dong et al., 2023; Zhang et al., 2024; Kjaerby et al., 2022) have identified several other neuromodulatory and peptidergic systems that might affect DG activity during MAs.

      To explore this further, we conducted pharmacological experiments. We administered 8-hydroxy-DPAT, a 5-HT1a agonist (i.p. 1 mg/kg) in Dock10-Cre mice injected with AAV-FLEX-GcaMP6s in the DG. Since 5-HT1a receptors act as autoreceptors on raphe 5-HT neurons, this treatment effectively silences the serotonergic system, thereby “removing” 5-HT signaling from the brain. The results, shown in Author response image 1, indicate that pharmacological suppression of 5-HT dampens the ISO in the DG during subsequent sleep intervals, with ISO recovering after the drug is washed out. These findings are consistent with the results obtained with the more specific local genetic manipulation. We have not included this result in the manuscript because we believe that the local downregulation is a cleaner experiment whose interpretation is more straightforward.

      Author response image 1.

      Finally, we also performed sleep analysis in 5-HT1a KO mice, showing that the local downregulation of 5-HT1a receptors had no significant effect on sleep metrics (Suppl Fig 6A). The hippocampus is not typically involved in regulating sleep-wake cycles, so we believe this result is consistent with that understanding.

      (6) On page 9 of the results it states that GCs and MCs are upregulated during NREM and their activity is abruptly terminated by MAs through a 5-HT mediated mechanism. I didn't see anything showing the 5-HT dependence of the MA activity correlation. The results indicate a reduction in ISO modulation of GC activity but not the MA-correlated activity. I would like to see the equivalent of Figure 1,2 G panels with the 5-HT1a manipulation.

      We agree with the reviewer on this point. We did not conduct any pharmacological or genetic manipulation in 2-photon calcium imaging experiments. We have removed that statement. As for the suggested analysis, please see our explanation above (Suppl Fig 6B-C).

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      (1) Since the authors did not monitor DG neuronal activity with an electrophysiological tool, please rephrase the following sentence: "In this study, we investigated the neuronal activity of the dentate gyrus (DG) with electrophysiological and optical imaging tools during sleep-wake cycles." in the Abstract.

      We have rephrased the sentence as suggested.

      (2) Since the authors did not manipulate the serotonin release during sleep to investigate whether serotonin release modulates DG ISO, please edit the following sentence: "Further experiments revealed that the infraslow oscillation in the DG is modulated by rhythmic serotonin release during sleep" in the Abstract.

      We have rephrased the sentence as suggested.

      (3) Single-cell recording in DG with two-photon microscopy may address the issue raised in the 4th paragraph of the Discussion. In addition, in Fig 6C, the photometry has only captured the diminished oscillation in Htr1a KO, but cannot distinguish whether the activity levels of GC remain at high or low, which is a clear disadvantage of photometry.

      We agree with the reviewer, and have added text to the discussion.

      Reviewer #3 (Recommendations for the authors):

      (1) Some of the figures are missing labels in the spectrogram panels (e.g. no freq units in Figures 4 and 6).

      We have added information in those figures.

      (2) Missing specific locations for EEG electrodes/screws. The text states "we predrilled 2 holes on the right side of the skull (1.5 mm posterior of the Bregma) for implanting recording electrodes". 2 holes on the right side of the skull are pretty vague.

      We have added this information in the Methods.

      (3) Some additional work that could be cited particularly when discussing the serotonergic impact on hippocampal function as it might relate to sleep and memory would include work linking mesopontine activity (both serotonergic and non-serotonergic) to memory-associated hippocampal sharp-wave ripple activity (e.g. Jelitai et al. Front. Neural Circ. 2021, Wang et al Nat. Neuro. 2015).

      We have cited these papers.

      (4) The work cited at the beginning of the Results describing higher population calcium activity during sleep states (15,18,30) is generally appropriate but not explicitly related to GCamP imaging. Pilz et al. "Functional Imaging of Dentate Granule Cells in the Adult Mouse Hippocampus", J.Neurosci. 2016 might be a more relevant citation.

      We have added the citation.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We thank the three reviewers for their positive comments and useful suggestions. We have implemented most of the reviewers’ recommendations and hope the manuscript is clearer now.

      The main modifications are:

      - A revision of the introduction to better explain what Transitional Probabilities are and clarify the rationale of the experimental design

      - A revision of the discussion

      - To tune down and better explain the interpretation of the different responses between duplets after a stream with phonetic or voice regularities (possibly an N400).

      - To better clarify the framing of statistical learning as a universal learning mechanism that might share computational principles across features (or domains).

      Below, we provide detailed answers to each reviewer's point.

      Response to Reviewer 1:

      There are no significant weaknesses to signal in the manuscript. However, in order to fully conclude that there is no obvious advantage for the linguistic dimension in neonates, it would have been most useful to test a third condition in which the two dimensions were pitted against each other, that is, in which they provide conflicting information as to the boundaries of the words comprised in the artificial language.

      This last condition would have allowed us to determine whether statistical learning weighs linguistic and non-linguistic features equally, or whether phonetic content is preferentially processed.

      We appreciate the reviewers' suggestion that a stream with conflicting information would provide valuable insights. In the present study, we started with a simpler case involving two orthogonal features (i.e., phonemes and voices), with one feature being informative and the other uninformative, and we found similar learning capacities for both. Future work should explore whether infants—and humans more broadly—can simultaneously track regularities in multiple speech features. However, creating a stream with two conflicting statistical structures is challenging. To use neural entrainment, the two features must lead to segmentation at different chunk sizes so that their effects lead to changes in power/PLV at different frequencies—for instance, using duplets for the voice dimension and triplets for the linguistic dimension (or vice versa). Consequently, the two dimensions would not be directly comparable within the same participant in terms of the number of distinguishable syllables/voices, memory demand, or SNR given the 1/F decrease in amplitude of background EEG activity. This would involve comparisons between two distinct groups counter-balancing chunk size and linguistic non-linguistic dimension. Considering the test phase, words for one dimension would have been part-words for the other dimension. As we are measuring differences and not preferences, interpreting the results would also have been difficult. Additionally, it may be difficult to find a sufficient number of clearly discriminable voices for such a design (triplets imply 12 voices). Therefore, an entirely different experimental paradigm would need to be developed.

      If such a design were tested, one possibility is that the regularities for the two dimensions are calculated in parallel, in line with the idea that the calculation of statistical regularities is a ubiquitous implicit mechanism (see Benjamin et al., 2024, for a proposed neural mechanism). Yet, similar to our present study, possibly only phonetic features would be used as word candidates. Another possibility is that only one informative feature would be explicitly processed at a time due to the serial nature of perceptual awareness, which may prioritise one feature over the other.

      We added one sentence in the discussion stating that more research is needed to understand whether infants can track both regularities simultaneously (p.13, l.270 “Future work could explore whether they can simultaneously track multiple regularities.”).

      Note: The reviewer’s summary contains a typo: syllabic rate (4 Hz) –not 2 Hz, and word rate (2 Hz) –not 4 Hz.

      Response to Reviewer 2:

      N400: I am skeptical regarding the interpretation of the phoneme-specific ERP effect as a precursor of the N400 and would suggest toning it down. While the authors are correct in that infant ERP components are typically slower and more posterior compared to adult components, and the observed pattern is hence consistent with an adult N400, at the same time, it could also be a lot of other things. On a functional level, I can't follow the author's argument as to why a violation in phoneme regularity should elicit an N400, since there is no evidence for any semantic processing involved. In sum, I think there is just not enough evidence from the present paradigm to confidently call it an N400.

      The reviewer is correct that we cannot definitively determine the type of processing reflected by the ERP component that appears when neonates hear a duplet after exposure to a stream with phonetic regularities. We interpreted this component as a precursor to the N400, based on prior findings in speech segmentation tasks without semantic content, where a ~400 ms component emerged when adult participants recognised pseudowords (Sander et al., 2002) or during structured streams of syllables (Cunillera et al., 2006, 2009). Additionally, the component we observed had a similar topography and timing to those labelled as N400 in infant studies, where semantic processing was involved (Parise et al., 2010; Friedrich & Friederici, 2011).

      Given our experimental design, the difference we observed must be related to the type of regularity during familiarisation (either phonemes or voices). Thus, we interpreted this component as reflecting lexical search— a process which could be triggered by a linguistic structure but which would not be relevant to a non-linguistic regularity such as voices. However, we are open to alternative interpretations. In any case, this difference between the two streams reveals that computing regularities based on phonemes versus voices does not lead to the same processes.

      We revised the abstract (p.2, l.33) and the discussion of this result (p.15, l.299), toning them down. We hope the rationale of the interpretation is clearer now, as is the fact that it is just one possible interpretation of the results.

      Female and male voices: Why did the authors choose to include male and female voices? While using both female and male stimuli of course leads to a higher generalizability, it also introduces a second dimension for one feature that is not present for this other (i.e., phoneme for Experiment 1 and voice identity plus gender for Experiment 2). Hence, couldn't it also be that the infants extracted the regularity with which one gender voice followed the other? For instance, in List B, in the words, one gender is always followed by the other (M-F or F-M), while in 2/3 of the part-words, the gender is repeated (F-F and M-M). Wouldn't you expect the same pattern of results if infants learned regularities based on gender rather than identity?

      We used three female and three male voices to maximise acoustic variability. The streams were synthesised using MBROLA, which provides a limited set of artificial voices. Indeed, there were not enough French voices of acceptable quality, so we also used two Italian voices (the phonemes used existed in both Italian and French).

      Voices differ in timbre, and female voices tend to be higher pitched. However, it is sometimes difficult to categorise low-pitched female voices and high-pitched male voices. Given that gender may be an important factor in infants' speech perception (newborns, for instance, prefer female voices at birth), we conducted tests to assess whether this dimension could have influenced our results.

      We report these analyses in SI and referred to them in the methods section (p.25, l.468 “We performed post-hoc tests to ensure that the results were not driven by a perception of two voices: female and male (see SI).”).

      We first quantified the transitional probabilities matrices during the structured stream of Experiment 2, considering that there are only two types of voices: Female and Male.

      For List A, all transition probabilities are equal to 0.5 (P(M|F), P(F|M), P(M|M), P(F|F)), resulting in flat TPs throughout the stream (see Author response image 1, top). Therefore, we would not expect neural entrainment at the word rate (2 Hz), nor would we anticipate ERP differences between the presented duplets in the test phase.

      For List B, P(M|F)=P(F|M)=0.66 while P(M|M)=P(F|F)=0.33. However, this does not produce a regular pattern of TP drops throughout the stream (see Author response image 1, bottom). As a result, strong neural entrainment at 2 Hz was unlikely, although some degree of entrainment might have occasionally occurred due to some drops occurring at a 2 Hz frequency. Regarding the test phase, all three Words and only one Part-word presented alternating patterns (TP=0.6). Therefore, the difference in the ERPs between Words and Part- words in List B might be attributed to gender alternation.

      However, it seems unlikely that gender alternation alone explains the entire pattern of results, as the effect is inconsistent and appears in only one of the lists. To rule out this possibility, we analysed the effects in each list separately.

      Author response image 1.

      Transition probabilities (TPs) across the structured stream in Experiment 2, considering voices processed by gender (Female or Male). Top: List A. Bottom: List B.

      We computed the mean activation within the time windows and electrodes of interest and compared the effects of word type and list using a two-way ANOVA. For the difference between Words and Part-words over the positive cluster, we observed a main effect of word type (F(1,31) = 5.902, p = 0.021), with no effects of list or interactions (p > 0.1). Over the negative cluster, we again observed a main effect of word type (F(1,31) = 10.916, p = 0.0016), with no effects of list or interactions (p > 0.1). See Author response image 2.

      Author response image 2:

      Difference in ERP voltage (Words – Part-words) for the two lists (A and B); W=Words; P=Part-Words,

      We conducted a similar analysis for neural entrainment during the structured stream on voices. A comparison of entrainment at 2 Hz between participants who completed List A and List B showed no significant differences (t(30) = -0.27, p = 0.79). A test against zero for each list indicated significant entrainment in both cases (List A: t(17) = 4.44, p = 0.00036; List B: t(13) = 3.16, p = 0.0075). See Author response image 3.

      Author response image 3.

      Neural entrainment at 2Hz during the structured stream of Experiment 2 for Lists A and B.

      Words entrainment over occipital electrodes: Do you have any idea why the duplet entrainment effect occurs over the electrodes it does, in particular over the occipital electrodes (which seems a bit unintuitive given that this is a purely auditory experiment with sleeping neonates).

      Neural entrainment might be considered as a succession of evoked response induced by the stream. After applying an average reference in high-density EEG recordings, the auditory ERP in neonates typically consists of a central positivity and a posterior negativity with a source located at the electrical zero in a single-dipole model (i.e. approximately in the superior temporal region (Dehaene-Lambertz & Dehaene, 1994). In adults, because of the average reference (i.e. the sum of voltages is equal to zero at each time point) and because the electrodes cannot capture the negative pole of the auditory response, the negativity is distributed around the head. In infants, however, the brain is higher within the skull, allowing for a more accurate recording of the negative pole of the auditory ERP (see Figure 4 for the location of electrodes in an infant head model).

      Besides the posterior electrodes, we can see some entrainment on more anterior electrodes that probably corresponds to the positive pole of the auditory ERP.

      We added a phrase in the discussion to explain why we can expect phase-locked activity in posterior electrodes (p.14, l.277: “Auditory ERPs, after reference-averaged, typically consist of a central positivity and posterior negativity”).

      Author response image 4:

      International 10–20 sensors' location on the skull of an infant template, with the underlying 3-D reconstruction of the grey-white matter interface and projection of each electrode to the cortex. Computed across 16 infants (from Kabdebon et al, Neuroimage, 2014). The O1, O2, T5, and T6 electrodes project lower than in adults.

      Response to Reviewer 3:

      (1) While it's true that voice is not essential for language (i.e., sign languages are implemented over gestures; the use of voices to produce non-linguistic sounds, like laughter), it is a feature of spoken languages. Thus I'm not sure if we can really consider this study as a comparison between linguistic and non-linguistic dimensions. In turn, I'm not sure that these results show that statistical learning at birth operates on non-linguistic features, being voices a linguistic dimension at least in spoken languages. I'd like to hear the authors' opinions on this.

      On one hand, it has been shown that statistical learning (SL) operates across multiple modalities and domains in human adults and animals. On the other hand, SL is considered essential for infants to begin parsing speech. Therefore, we aimed to investigate whether SL capacities at birth are more effective on linguistic dimensions of speech, potentially as a way to promote language learning.

      We agree with the reviewer that voices play an important role in communication (e.g., for identifying who is speaking); however, they do not contribute to language structure or meaning, and listeners are expected to normalize across voices to accurately perceive phonemes and words. Thus, voices are speech features but not linguistic features. Additionally, in natural speech, there are no abrupt voice changes within a word as in our experiment; instead, voice changes typically occur on a longer timescale and involve only a limited number of voices, such as in a dialogue. Therefore, computing regularities based on voice changes would not be useful in real-life language learning. We considered that contrasting syllables and voices was an elegant way to test SL beyond its linguistic dimension, as the experimental paradigm is identical in both experiments.

      We have rephrased the introduction to make this point clearer. See p.5, l.88-92: “To test this, we have taken advantage of the fact that syllables convey two important pieces of information for humans: what is being said and who is speaking, i.e. linguistic content and speaker’s identity. While statistical learning…”.

      Along the same line, in the Discussion section, the present results are interpreted within a theoretical framework showing statistical learning in auditory non-linguistic (string of tones, music) and visual domains as well as visual and other animal species. I'm not sure if that theoretical framework is the right fit for the present results.

      (2) I'm not sure whether the fact that we see parallel and independent tracking of statistics in the two dimensions of speech at birth indicates that newborns would be able to do so in all the other dimensions of the speech. If so, what other dimensions are the authors referring to?

      The reviewer is correct that demonstrating the universality of SL requires testing additional modalities and acoustic dimensions. However, we postulate that SL is grounded in a basic mechanism of long-term associative learning, as proposed in Benjamin et al. (2024), which relies on a slow decay in the representation of a given event. This simple mechanism, capable of operating on any representational output, accounts for many types of sequence learning reported in the literature (Benjamin et al., in preparation).

      We have revised the discussion to clarify this theoretical framework.

      In p.13, l.264: “This mechanism might be rooted in associative learning processes relying on the co- existence of event representations driven by slow activation decays (Benjamin et al., 2024). ”

      In p., l. 364: “Altogether, our results show that statistical learning works similarly on different speech features in human neonates with no clear advantage for computing linguistically relevant regularities in speech. This supports the idea that statistical learning is a general learning mechanism, probably operating on common computational principles across neural networks (Benjamin et al., 2024)…”.

      (3) Lines 341-345: Statistical learning is an evolutionary ancient learning mechanism but I do not think that the present results are showing it. This is a study on human neonates and adults, there are no other animal species involved therefore I do not see a connection with the evolutionary history of statistical learning. It would be much more interesting to make claims on the ontogeny (rather than philogeny) of statistical learning, and what regularities newborns are able to detect right after birth. I believe that this is one of the strengths of this work.

      We did not intend to make claims about the phylogeny of SL. Since SL appears to be a learning mechanism shared across species, we use it as a framework to suggest that SL may arise from general operational principles applicable to diverse neural networks. Thus, while it is highly useful for language acquisition, it is not specific to it.

      We have removed the sentence “Statistical learning is an evolutionary ancient learning mechanism.”, and replaced it by (p.18, l.364) “Altogether, our results show that statistical learning works similarly on different speech features in human neonates with no clear advantage for computing linguistically relevant regularities in speech.” We now emphasise in the discussion that infants compute regularities on both features and propose that SL might be a universal learning mechanism sharing computational principles (Benjamin et al., 2024) (see point 2).

      (4) The description of the stimuli in Lines 110-113 is a bit confusing. In Experiment 1, e.g., "pe" and "tu" are both uttered by the same voice, correct? ("random voice each time" is confusing). Whereas in Experiment 2, e.g., "pe" and "tu" are uttered by different voices, for example, "pe" by yellow voice and "tu" by red voice. If this is correct, then I recommend the authors to rephrase this section to make it more clear.

      To clarify, in Experiment 1, the voices were randomly assigned to each syllable, with the constraint that no voice was repeated consecutively. This means that syllables within the same word were spoken by different voices, and each syllable was heard with various voices throughout the stream. As a result, neonates had to retrieve the words based solely on syllabic patterns, without relying on consistent voice associations or specific voice relationships.

      In Experiment 2, the design was orthogonal: while the syllables were presented in a random order, the voices followed a structured pattern. Similar to Experiment 1, each syllable (e.g., “pe” and “tu”) was spoken by different voices. The key difference is that in Experiment 2, the structured regularities were applied to the voices rather than the syllables. In other words, the “green” voice was always followed by the “red” voice for example but uttered different syllables.

      We have revised the description of the stimuli and the legend of Figure 1 to clarify these important points.

      See p.6, l. 113: “The structure consisted of the random concatenation of three duplets (i.e., two-syllable units) defined only by one of the two dimensions. For example, in Experiment 1, one duplet could be petu with each syllable uttered by a random voice each time they appear in the stream (e.g pe is produced by voice1 and tu by voice6 in one instance and in another instance pe is produced by voice3 and tu by

      voice2). In contrast, in Experiment 2, one duplet could be the combination [voice1- voice6], each uttering randomly any of the syllables.”

      p.20, l. 390 (Figure 1 legend): “For example, the two syllables of the word “petu” were produced by different voices, which randomly changed at each presentation of the word (e.g. “yellow” voice and “green” voice for the first instance, “blue” and “purple” voice for the second instance, etc..). In Experiment 2, the statistical structure was based on voices (TPs alternated between 1 and 0.5), while the syllables changed randomly (uniform TPs of 0.2). For example, the “green” voice was always followed by the “red” voice, but they were randomly saying different syllables “boda” in the first instance, “tupe” in the second instance, etc... “

      (5) Line 114: the sentence "they should compute a 36 x 36 TPs matrix relating each acoustic signal, with TPs alternating between 1/6 within words and 1/12 between words" is confusing as it seems like there are different acoustic signals. Can the authors clarify this point?

      Thank you for highlighting this point. To clarify, our suggestion is that neonates might not track regularities between phonemes and voices as separate features. Instead, they may treat each syllable-voice combination as a distinct item—for example, "pe" spoken by the "yellow" voice is one item, while "pe" spoken by the "red" voice is another. Under this scenario, there would be a total of 36 unique items (6 syllables × 6 voices), and infants would need to track regularities between these 36 combinations.

      We have modified this sentence in the manuscript to make it clearer.

      See p.7, l. 120: “If infants at birth compute regularities based on a neural representation of the syllable as a whole, i.e. comprising both phonetic and voice content, this would require computing a 36 × 36 TPs matrix relating each token.”

      Reviewer #1 (Recommendations for the authors):

      (1) The acronym TP should be spelled out, and a brief description of the fact that dips in TPs signal boundaries while high TPs signal a cohesive unit could be useful for non-specialist readers.

      We have added it at the beginning of the introduction (lines 52-60)

      (2) p.5, l.76: "Here, we aimed to further characterise the characteristics of this mechanism...". I suggest this is rephrased as "to further characterise this mechanism".

      We have changed it as suggested by the reviewer (now p.5, l.81)

      (3) p.9, l.172: "[...] this contribution is unlikely since the electrodes differ from the electrodes, showing enhanced word-rate activity at 2 Hz."

      It is unclear which electrodes differ from which electrodes. I figure that the authors mean that the electrodes showing stronger activity at 2 Hz differ from those showing it at 4 Hz, but the sentence could use rephrasing.

      This part has been rephrased (p.9, l.177-181)

      (4) p.10, l.182: "[...] the entrainment during the first minute of the structure stream [… ]".

      Structured stream.

      It has been corrected (p.10, l.190)

      (5) p.12, l.234: "we compared STATISTICAL LEARNING"

      Why the use of capitals?

      This was an error and it was corrected (p.12, l.242).

      (6) p.15, l.298: "[...] suggesting that such negativity might be related to semantic."

      The sentence feels incomplete. To semantics? To the processing of semantic information?

      The phrase has been corrected (p.15, l.314). Additionally, the discussion of the posterior negativity observed for duplets after familiarisation with a stream with regularities over phonemes has been rephrased (p.15, l.)

      (7) Same page, l.301: "3-mo-olds" 3-month-olds.

      It has been corrected (now in p.16, l.333)

      (8) Same page, l.307: "(see also (Bergelson and Aslin, 2017)" (see also Bergelson and Aslin, 2017).

      It has been corrected (now in p.17, l.340)

      (9) Same page, l.310: "[...] would be considered as possible candidate" As possible candidates.

      This has been rephrased and corrected (now in p.17, l.343)

      Reviewer #2 (Recommendations for the authors):

      (1) Figure 2: The authors mention a "thick orange line", which I think should be a "thick black line".

      We are sorry for this. It has been corrected.

      (2) Ln 166: Should be Figure 2C rather than 3C.

      It has been corrected (now in p.9, l.173)

      (3) Figure 4 is not referenced in the manuscript.

      We referred to it now on p. 12, l.236

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      The authors set out to define the molecular basis for LP as the origin of BRCA1deficient breast cancers. They showed that LPs have the highest level of replicative stress, and hypothesise that this may account for their tendency to transform. They went on to identify ELF3 as a candidate driver of LP transformation and showed that ELF3 expression is up-regulated in response to replicative stress as well as BRCA1 deficiency. They went on to show that ELF3 inactivation led to a higher level of DNA damage, which may result from compromised replicative stress responses.

      While the manuscript supports the interesting idea wherein ELF3 may fuel LP cell transformation, it remains obscure how ELF3 promotes cell tolerance to DNA damage. Interestingly the authors proposed that ELF3 suppresses excessive genomic instability, but in my opinion, I do not see any evidence that supports this claim. In fact, one might think that genomic instability is key to cell transformation.

      We greatly appreciate your thorough review and insightful comments on our manuscript. We have taken your feedback seriously and have made several key revisions to address your concerns.

      To your primary point about how ELF3 helps cells tolerate DNA damage, we have expanded our discussion to clarify the role of ELF3 in the context of BRCA1 deficiency and high replicative stress. We clarified that while ELF3 may not directly suppress excessive genomic instability, it plays a role in maintaining a balance that prevents catastrophic damage in BRCA1-deficient cells. Both BRCA1 deficiency and increased replication stress induce up-regulation of ELF3, which acts as a transcription factor, and it’s up-regulation leads to up-regulation of the expression of a variety of DNA replication-associated proteins that help to maintain homeostasis in the DNA replication process (Figure 5 E and F). Defects in ELF3 also do lead to disruption of the DNA replication process (Figure 5 G-I). While ELF3 cannot completely eliminate genomic instability, ELF3 essentially maintains genomic instability within a dangerous yet non-lethal range: higher than in normal cells, but not so high as to cause cell death.

      This precarious balance can facilitate the transformation of LPs into a malignant state, as you pointed out.

      In the revised manuscript, we emphasized that in cells with inherently low replicative stress, such as other non-LP mammary cells, the ELF3-associated mechanism might help cells endure the high replicative stress caused by BRCA1 deficiency without leading to cancerous changes. However, in LP cells, which naturally experience higher replicative stress, this ELF3-related mechanism may make them more susceptible to transformation into cancer cells. This supports our hypothesis that the combination of high replicative stress and BRCA1 deficiency specifically predisposes LP cells to tumorigenesis.

      We have modified the working model to make it clearer.

      Reviewer #2 (Public Review):

      Summary:

      The manuscript focuses on a persistent question of why germline mutations in BRCA1 which impair homology-directed repair of DNA double-strand breaks predispose to primarily breast and ovarian cancers but not other tissues. The authors propose that replication stress is elevated in the luminal progenitor (LP) cells and apply the gene signature from Dreyer et al as a measure of replication stress in populations of cells selected by FACS previously (published by Lim et al.) and suggest an enrichment of replication stress among the LP cells. This is followed by single-cell RNA seq data from a small number of breast tissues from a small number of BRCA1 mutation carriers but the pathogenic variants are not listed. The authors perform an elegant analysis of the effects of BRCA1 knockdown in MCF10A cells, but these cells are not considered a model of LP cells.

      Overall, the manuscript suffers from significant gaps and leaps in logic among the datasets used. The connection to luminal progenitor cells is not adequately established because the models used are not representative of this population of cells. Therefore, the central hypothesis is not sufficiently justified.

      Strengths:

      The inducible knockdown of BRCA1 provided compelling data pointing to an upregulation of ELF3 in this setting as well as a small number of other genes. It would be useful to discuss the other genes for completeness and explain the logic for focusing on ELF3. Nonetheless, the connection with ELF 3 is reasonable. The authors provide significant data showing a role for ELF3 in breast epithelial cells and its role in cell survival.

      Weaknesses:

      The initial observations in primary breast cells have small sample sizes. The mutations in BRCA1 seem to be presumed to be all the same, but we know that pathogenic variants differ among individuals and range from missense mutations affecting interactions with one critical partner to large-scale truncations of the protein.

      The figure legends are missing critical details that make it difficult for the reader to evaluate the data. The data support the notion that ELF3 may participate in relieving replication stress, but does not appear to be limited to LP cells as proposed in the hypothesis.

      We would like to sincerely thank you for your thorough review and constructive feedback on our manuscript. Your insightful comments and suggestions have been invaluable in guiding our revisions.

      (1) Acknowledgment of Data Set Limitations and Additional Analyses:    We fully acknowledge the importance of the concerns raised regarding the datasets used in our study. We have supplemented our manuscript with the missing information you pointed out and conducted additional analyses as suggested. These efforts have

      (2) Challenges in LP Cell Experiments:

      One of the most critical issues you raised was the lack of validation in LP cells, particularly concerning the role of ELF3 in these cells. We are acutely aware of the significance of this point. Following your review, we made extensive efforts to isolate and culture LP cells from both BRCA1-proficient and BRCA1-deficient patient samples. We tried various methods and invested substantial resources, including time, manpower, and materials, to establish a reliable protocol for isolating and cultivating LP cells in vitro. Unfortunately, despite our best efforts, we were unable to obtain a sufficient number of high-quality cells to generate solid and reproducible results.

      The challenges we faced included the limited availability of patient tissues and the technical difficulties in consistently obtaining viable LP cells. Given the already extended timeline for the revision of this manuscript, we regretfully decided to forgo further attempts to perform these critical experiments with LP cells. In the revised manuscript, we have explicitly addressed the limitations of our cell models and provided a detailed discussion of the challenges faced in isolating LP cells. Despite these limitations, we believe that the consistency between our results and LP cell sequencing data provides valuable insights and a solid foundation for future studies.

      (3) Data Presentation Improvements:

      In response to your feedback, we have also made significant improvements to the data presentation in our manuscript. We updated and optimized figure legends and narrative sections to ensure that the data are clearly and accurately conveyed. These changes aim to enhance the readability and comprehensibility of our findings.

      We greatly appreciate your valuable feedback, which has significantly contributed to the improvement of our manuscript. Your suggestions have helped us refine our arguments and present a more robust and nuanced interpretation of our data. 

      Thank you once again for your critical and constructive review. We look forward to your feedback on our revised manuscript.

      Recommendations for the authors:  

      Reviewer #1 (Recommendations For The Authors):  

      As such, in addition to consolidating the role of ELF3 in promoting cell tolerance to replicative stress (or in suppressing genomic instability), I have a few comments the authors should consider to improve their manuscript.  

      (1) I am not sure how cells have gained a growth advantage if they were arrested (Line 105-106). Perhaps the authors can elaborate.

      Thanks for pointing this out and we are sorry for the misleading statement. We have revised the manuscript and would like to clarify that “survival advantage” may be more accurate than “growth advantage”, and since long-term DOX treatment led to decreased cell survival indicated by decreased number of colonies in Supplemental Fig. S1D, thus many cells died during DOX treatment. Therefore, the cells able to survive throughout DOX treatment and being collected for sequencing may have gained survival advantage compared to their counterparts who fail to survive.  

      (2) Figure 3D - From Western blotting of ELF3, forced expression of E2F6 does not appear to "block" HU-induced ELF3 up-regulation, but merely down-regulate basal level of ELF3, with the effect of HU still notable.

      Thanks for the comment and we agree that E2F6 down-regulate ELF3 baseline expression levels and did not fully block ELF3 up-regulation. After calculating the foldchange after E2F6 overexpression, we did confirm that E2F6 overexpression still partially block HU-induced ELF3 up-regulation, with foldchange from 3.32 to 2.40, supporting our conclusion that HU-induced ELF3 upregulation is regulated by ATRChk1-E2F axis. It does, however, cannot be excluded that E2F6 also regulates ELF3 expression in other replication stress-independent ways, and we have revised the manuscript accordingly. 

      (3) Figure 3J & K - In my opinion, if BRCA1 knockdown were more efficient it remains formally possible that co-depletion of BRCA1 and GATA3 may exhibit additive effects in up-regulating ELF3 mRNA level.

      Thank you for the comment. Actually, the BRCA1 knockdown efficiency in Figure 3J was shown in Supplemental Fig. S3B, and notably both BRCA1 and GATA3 knockdown were numerically more efficient in the double-knockdown group than in the single-knockdown group, individually. Thus, the higher ELF3 up-regulation in double-knockdown group in Figure 3J could be cause by the superior knockdown efficiency of both BRCA1 and GATA3. Nonetheless, we agree that it might be possible that BRCA1 and GATA3 still have separate functions in this experimental setting and marginal additive effect may exist, and the manuscript was revised accordingly.

      (4) Figure 4 - Perhaps the authors can change its title to better summarise the findings. Cell sensitivity assays and xenograph experimentations may not necessarily relate to genomic instability.

      Thank you for the great suggestion. To summarize the results more accurately, we have revised the title as “ELF3 can help cells tolerate replication stress and sustain cell survival”.

      (5) Figure 5B&C - It would be important to document the time-dependent resolution of HU-induced DNA lesions by including additional time-points before, during, and after HU treatment.

      We appreciate the suggestion to include additional time points to document the timedependent resolution of HU-induced DNA lesions. In our experiments, we observed that ELF3 knockdown leads to genomic instability both in the presence and absence of HU treatment. Specifically, Figure 5A and Figure S5 demonstrate that ELF3 knockdown increases genomic instability without HU treatment, indicating its role in maintaining genomic stability under normal conditions. On the other hand, Figure 5B, 5C, and 5D show that ELF3 knockdown under HU-induced replication stress further exacerbates genomic instability. This observation aligns with our finding that ELF3 expression increases in response to replication stress, suggesting its critical role in maintaining replication homeostasis under such conditions. 

      6) Figure 5F&I - Which ELF3 siRNA was used in these experimentations? Since the authors did not exclude off-target effects perhaps it may be worthwhile to include both ELF3 siRNAs for Panel F.

      Thanks for your advice. The qPCR (Figure 5F) and DNA fiber assay (Figure 5I) were using siELF3-4 siRNA. And we repeat the qPCR experiments for Panel F using siELF3-5 siRNA (Supplement Fig. S5B).

      We sincerely thank you for your thoughtful feedback and constructive suggestions. Addressing these points has strengthened our manuscript, and we are grateful for the opportunity to refine and clarify our work. We appreciate your critical evaluation and look forward to further constructive dialogue.

      Reviewer #2 (Recommendations For The Authors):  

      (1) The data driving the hypothesis uses gene expression signatures as an indirect measure of replication stress. This is a critical concern.

      a. At this time, numerous gene expression signatures have been reported to be biomarkers of replication stress. Therefore, it would be valuable to apply additional gene expression signatures to examine the performance and the overlap in the results.

      The recent work by Takahashi et al., 2022 (https://pubmed.ncbi.nlm.nih.gov/36381660/) provides a signature that was derived independently and offers one that can be used to assess the performance of the signatures and stability of the conclusions.

      Thank you for the valuable suggestion. We have done the replication stress evaluation of mammary cell subgroups using the Repstress score developed in the work you mentioned. The result showed that LP cells have trends of higher replication stress compared with other subgroups, though no statistical significance. This result, consistent with our previous analysis, indicated that LP cells have higher trends of replication stress levels. And we have added this data as the last line of Figure 1A in revised vision.

      Author response image 1.

      Replication stress pathway scores of different human normal mammary cell  populations. The gene expression data were from Lim et al. (3).

      b. A direct measure of replication stress in LP cells would be important to confirm the gene expression signature. Therefore, performing immunostaining for markers of replication stress (eg gamma-H2AX foci, DNA fiber assays) would provide more direct data to support the assertions.

      Thank you for this suggestion and we totally agree that experiments revealing replication stress levels by investigating common markers, e.g., gamma-H2AX foci, DNA fiber assays, will provide vital evidence for our hypothesis. However, since our last response, we have been diligently trying to obtain LP cells for these experiments but encountered technical challenges while attempting to isolate and culture LP cells in vitro. 

      In the discussion part, we have revised the manuscript to emphasize that the data obtained from MCF10A should be interpreted with caution and there are certain gaps between the cell models and LP cells.

      (2) The depth of single-cell sequencing can often be limiting. Therefore, a supplementary table listing the genes used for the replication stress signature and the frequency that they are observed in the single-cell sequencing data. This is needed to ensure that the replication stress score does not reflect a small subset of the replication stress signature genes.

      Thanks very much for this evaluable suggestion. We have provided an expression matrix of genes for the replication stress signature in the revised version (Supplementary Table S1), And we also calculated the average expression level of each gene in the cells. As shown in Author response image 2, these genes expressed relatively low at the single-cell level (with counts≤10), The expression differences among genes are relatively small. Thus, we excluded the possibility that several high-expressed genes significantly affect the replicative stress score.

      Author response image 2.

      Average counts of Top 50 genes for the replication stress signature

      (3) As only 4 BRCA mutation carriers are analyzed, it is critical that the mutations be reported for these individuals because pathogenic variants differ in their effects and interactions with the DNA repair machinery in cells.

      Thanks for the suggestion and the information of 4 BRCA1 mutant carriers were added in Supplemental Table S2.

      (4) The figures throughout lack critical details making it difficult to evaluate. Figure 1A states that these are "replication stress pathway scores..." but there is no evaluation of levels of statistical differences. The heat map has what appears to be a log unit score between +2 and -2 but it is unclear whether it is log2 or log10 or some other unit. In 1B, the replication stress scores are visualized as relative values between 0 and 0.1, but there is no indication of what this means or whether there is a statistically significant difference in the levels among the populations. As tumors are composed of multiple cell types, it should be stated how the "tumor cells" are uniquely identified in the figure legend. The lack of critical information is common across many of the figures making review frustratingly difficult.

      Thanks for the suggestion. We have added the statistical analysis and scale in Figure 1A legend. For Figure 1B, replication stress was calculated by sum of replication stress gene expression and presented as ln value. We have provided a quantitative figure and statistical tests (by Mann-Whitney) of replication stress scores for various cell types (Supplementary Figure 1A). 

      In addition, we added details of identification of tumor cells in the method section in the revised manuscript. Briefly, the adjacent normal breast sample served as a control to filter various types of normal cells from tumor samples. the normal cells from the tumor sample were merged with the same types of normal cells from adjacent normal breast samples, leaving one cell cluster only generalized by tumor sample. These tumor specific clusters were considered as malignant cell populations. We further found that the malignant cell population showed higher UMI counts than the normal cell populations, consistent with active metabolism in the malignant cells. More importantly, ER, PR, and HER2 expression of the malignant cells in each case were exactly matched with the clinical records. Finally, we utilized InferCNV to validate malignant cells subset as higher copy number alterations (CNAs) detected in the malignant cells compared with normal cells.

      (5) The hypothesis states that the LP cells are uniquely sensitive to deficiency in BRCA1 compared to other cells. However, the authors use knockdown of BRCA1 in MCF10A cells which are generally considered to be basal cells and not LP cells.

      Thanks for the comment. We totally agree that MCF10A cannot reflect the LP features and was mainly used as a normal mammary cell line model. We have tried to obtain human LP to perform some experiments but have all failed due to the cell vulnerability and difficult to be passed on in vitro. The gap between MCF10A and LP cells was stressed in the discussion part.

      (6) Figure 2, the number of samples being compared is not listed for most of the panels. It appears that ELF3 is enriched in subsets of breast cancers, but much of the data is not focused on BRCA1-deficient tumors. Therefore, the data appears to show that ELF3 expression is more of a generalized feature of TNBCs (which has been reported previously) and dilutes the support for the hypothesis. Therefore, panels C-G raise concerns regarding the overall hypothesis that LP cells are the cell type that is affected.

      Thanks for the suggestion. We have added the number of samples in Figure 2 legends.

      Our analysis focus on basal subtype because of the well-known relationship between BRCA1 deficiency and this subtype. Our results demonstrate the association between ELF3 expression and basal, TNBC, as well as HER2+ subtype, consistent with previous reports. Since TNBC also has high replication stress levels (NPJ Breast Cancer. 2020 Sep 7;6:40.), ELF3 upregulation in this subtype may not be solely due to BRCA1 deficiency, and we totally agree that this analysis may dilute the relationship between ELF3 and BRCA1. We have revised the discussion part to be more precise on this. 

      (7) Figure 3 provides experimental support for the hypothesis. While panel A is of interest, the legend lacks any description beyond "normal mammary tissue" and that there are non-carriers and carriers of BRCA1 mutations. Is this from bulk RNAseq data or single-cell RNAseq data? How many carriers and how many noncarriers? Panel E is ENCODE data from MCF7 cells that are ER+ luminal subtype so it is unclear if this is relevant to the LP cells that are the focus of the hypothesis.

      Thanks for the comments. Figure 3 panel A was from single-cell RNAseq data, including 3 BRCA1 WT patients and 4 BRCA1 mutant patients. All cells (normal cells and tumor cells) are involving, and ELF3 expression was normalized by reads in each cell. We have added this information in the figure legend. 

      It has been difficult to obtain ENCODE data in LP cells. The effect of E2F1 on regulation of ELF3 was validated in MCF10A cells by experiment and consistent with MCF7 ENCODE data, thus we suggest this effect can be conserve in mammary cells, but further confirmation in LP cells is needed. We have revised the manuscript to note that.

      (8) In Figure 4, the authors use BRCA1-deficient breast cancer cells to show the reliance on ELF3 and suggest that this is specific to this genetic lesion and not other subtypes. However, there is no data to show that this is not observed using ER+ cells or TNBC that are not BRCA1-deficient cell lines or models.

      Thanks for pointing this out. As ELF3 knockdown in MCF10A resulted in increased genomic instability (Supplement Fig S5) and less capability to resolve replication stress (Figure 5B), we believe that ELF3 can help deal with replication stress not specifically in BRCA1-mutant cells, but also normal mammary cells, and also multiple cell lines with distinct backgrounds as suggested in Figure 4G, 4H and Supplement Fig S4G. The special link between ELF3 and BRCA1 is reflected by ELF3 significant upregulation upon BRCA1 deficiency, but not ELF3 downstream functions. 

      (9) Figure 5 provides the first direct evaluation of biomarkers of replication stress (gamma H2AX, 53BP1). DNA fiber assays provide the most direct evaluation of replication fork kinetics, and therefore, replication stress. The knockdown of BRCA1 and ELF3 appear to phenocopy one another in the HCC1937, but there is no other cell type to show whether this is specific for BRCA1-deficient cells. For example, the MCF7 cells show E2F1 binding to ELF3 (Figure 3E) and may show replication stress upon knockdown of ELF3. Without testing this, the authors cannot suggest that the effect is linked to BRCA1 status. The authors do not identify the BRCA1 mutation in these cells and whether there is homozygous loss. Similarly, the mutational status in the SUM149PT cells should also be stated. These need to be added to aid interpretation of the results.

      Thank you for the constructive advice. We have added information regarding BRCA1 status of HCC1937 and SUM149PT. As discussed before, the results from Figure 4G and 4H suggest that ELF3 expression is associated with sensitivity to replicationstress-inducing-drugs across many cell lines. Thus ELF3 can maintain the stability of DNA replication is not specific to BRCA1-deficient cells. The reliance of ELF3 in BRCA1-deficiency we proposed is mainly focus on the fact that ELF3 is upregulated in BRCA1 deficient conditions, plus ELF3 may help cells tolerate replication stress during the transformation, therefore the resulted tumor cells-that is BRCA1-deficient breast cancer cells-may be more sensitive when losing ELF3 expression.

      (10) While the data in Figure 6 are valuable extensions of the gene signature derived from the MCF10A cells with BRCA1 knockdown, only 2 BRCA1 carriers are reported. As carriers bear heterozygous mutations in BRCA1, haplo-insufficiency would be necessary to generate the signature. The authors do note the publication by Panthania et al, but there are relatively few examples of haploinsufficiency. It should be noted that Sedic et al., 2015 also suggested haploinsufficiency in breast epithelial cell cultures from BRCA1 heterozygotes which appears to cause premature senescence, possibly via replication stress. However, this was observed in the basal epithelial cells. Therefore, this appears to be a feature of the breast epithelium more generally and is not enriched or limited to the LP cells.

      Thanks very much for your valuable suggestion. We have revised the discussion part to involve this important work and we fully agree that BRCA1 deficiency can cause replication stress not limited to LP cells. While in fact, the point we would like to address in Figure 6 is that BRCA1 deficiency modules the transcription profile towards LP-like cells, but not other-subtype-like cells, in normal mammary cells. We observed surprisingly similar profile between BRCA1-deficient cells and LP cells, suggesting there might be an inherent function of BRCA1 to mediate LP genes transcription. Furthermore, the data indicate that ELF3 has a tighter association with LP genes than other recognized LP-specific transcription factors like ELF5 and EHF, which are of the same family of ELF3. This result is intriguing since ELF3 can be upregulated by BRCA1 deficiency and replication stress. We assume that ELF3 could be a transcription node downstream of BRCA1 deficiency and modulate LP genes expression, and this process might be limited to LP cells since ELF3 has the highest expression levels in LP. Nonetheless, this hypothesis is also needed to be validated in LP cells by experiments. 

      We would like to express our deepest gratitude to the reviewers for their thorough and constructive feedback. Their insightful comments have been invaluable in guiding the revisions of our manuscript, helping us to clarify our hypotheses and strengthen the presentation of our findings. While we encountered some challenges, particularly with the isolation and culturing of LP cells, we made significant efforts to address the reviewers' concerns to the best of our ability. We have updated our manuscript accordingly, ensuring that all issues raised have been addressed comprehensively. We believe that these revisions have substantially improved the quality and clarity of our work, and we are excited to share our findings with the scientific community. Thank you once again for the opportunity to revise our manuscript, and we look forward to your feedback on the updated version.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      This paper by Poverlein et al reports the substantial membrane deformation around the oxidative phosphorylation super complex, proposing that this deformation is a key part of super complex formation. I found the paper interesting and well-written but identified a number of technical issues that I suggest should be addressed:

      We thank Reviewer 1 for finding our work interesting. We have addressed the technical issues below.

      (1) Neither the acyl chain chemical makeup nor the protonation state of CDL are specified. The acyl chain is likely 18:2/18:2/18:2/18:2, but the choice of the protonation state is not straightforward.

      We thank the Reviewer for highlighting this missing information. We have now added this information in the Materials and Methods section:

      "…were performed in a POPC:POPE:cardiolipin (2:2:1) membrane containing 5 mol% QH<sub>2</sub> / Q (1:1 ratio). Cardiolipin was modeled as tetraoleoyl cardiolipin (18:1/18:1/18:1/18:1) with a headgroup modeled in a singly protonated state (with Q<sub>tot</sub>=-1)."

      (2) The analysis of the bilayer deformation lacks membrane mechanical expertise. Here I am not ridiculing the authors - the presentation is very conservative: they find a deformed bilayer, do not say what the energy is, but rather try a range of energies in their Monte Carlo model - a good strategy for a group that focuses on protein simulations. The bending modulus and area compressibility modulus are part of the standard model for quantifying the energy of a deformed membrane. I suppose in theory these might be computed by looking at the per-lipid distribution in thickness fluctuations, but this route is extremely perilous on a per-molecule basis. Instead, the fluctuation in the projected area of a lipid patch is used to imply the modulus [see Venable et al "Mechanical properties of lipid bilayers from molecular dynamics simulation" 2015 and citations within]. Variations in the local thickness of the membrane imply local variations of the leaflet normal vector (the vector perpendicular to the leaflet surface), which is curvature. With curvature and thickness, the deformation energy is analyzed.

      See:

      Two papers: "Gramicidin A Channel Formation Induces Local Lipid Redistribution" by Olaf Andersen and colleagues. Here the formation of a short peptide dimer is experimentally linked to hydrophobic mismatch. The presence of a short lipid reduces the influence of the mismatch. See below regarding their model cardiolipin, which they claim is shorter than the surrounding lipid matrix.

      Also, see:

      Faraldo-Gomez lab "Membrane transporter dimerization driven by differential lipid solvation energetics of dissociated and associated states", 2021. Mondal et al "Membrane Driven Spatial Organization of GPCRs" 2013 and many citations within these papers.

      While I strongly recommend putting the membrane deformation into standard model terms, I believe the authors should retain the basic conservative approach that the membrane is strongly deformed around the proteins and that making the SC reduces the deformation, then exploring the consequences with their discrete model.

      We thank the Reviewer for the suggestions and for pointing out the additional references, which are now cited in the revised manuscript. The analysis is indeed significantly more complex for large multi-million atom supercomplexes in comparison to small peptides (gramicidin A) or model systems of lipid membranes. However, in the revised manuscript, we have conducted further analysis on the membrane curvature effects based on the suggestions. We were able to estimate the energetic contribution of the changes in local membrane thickness and curvature, which are now summarized in Table 1, and described in the main text and SI. We find that both the curvature and local thickness contribute to the increased stability of SC.

      We have now extensively modified the result to differentiate between different components of membrane strain properly:

      "We observe a local decrease in the membrane thickness at the protein-lipid interface (Fig. 2G, Fig S2A,D,E), likely arising from the thinner hydrophobic belt region of the OXPHOS proteins (ca. 30 Å, Fig. S1A) relative to the lipid membrane (40.5 Å, Fig. S1). We further observe ∼30% accumulation of cardiolipin at the thinner hydrophobic belt regions (Fig. 2H, Fig. S2B,F,G), with an inhomogeneous distribution around the OXPHOS complexes. While specific interactions between CDL and protein residues may contribute to this enrichment (Fig. 2N), CDL prefers thermodynamically thinner membranes (∼38 Å, Fig. S1B, Fig. S5F). These changes are further reflected in the reduced end-toend distance of lipid chains in the local membrane belt (see Methods, Fig. S6, cf. also Refs. (41-44). In addition to the perturbations in the local membrane thickness, the OXPHOS proteins also induce a subtle inward curvature towards the protein-lipid interface (Fig. S5G), which could modulate the accessibility of the Q/QH2 substrate into the active sites of CI and CIII<sub>2</sub> (see below, section Discussion). This curvature is accompanied by a distortion of the local membrane plane itself (Fig. 2A-F, Fig. S4AC, Fig. S7), with perpendicular leaflet displacements reaching up to ~2 nm relative to the average leaflet plane.

      To quantify the membrane strain effects, we analyzed the cgMD trajectories by projecting the membrane surface onto a 2-dimensional grid and calculating the local membrane height and thickness at each grid point. From these values, we quantified the local membrane curvature (Fig. S5H), which measures the energetic cost of deforming the membrane from a flat geometry (ΔG<sub>curv</sub>). We also computed the energetics associated with changes in the membrane thickness, assessed from the deviations from an ideal local membrane in the absence of embedded proteins (ΔG<sub>thick</sub>, see Supporting Information, for technical details). Our analysis suggests that both contributions are substantially reduced upon formation of the SC, with the curvature decreasing by 19.8 ± 1.3 kcal mol-1 and the thickness penalty by 2.8 ± 2.0 kcal mol-1 (Table 1). These results indicate a significant thermodynamic advantage for SC formation, as it minimizes lipid deformation and stabilizes the membrane environment surrounding Complex I and III.”

      […]

      “Taken together, the analysis suggests that the OXPHOS complexes affect the mechanical properties of the membranes by inducing a small inwards curvature towards the protein-lipid interface (Fig. S5), resulting in a membrane deformation effect, while the SC formation releases some deformation energy relative to the isolated OXPHOS complexes. The localization of specific lipids around the membrane proteins, as well as local membrane perturbation effects, is also supported by simulations of other membrane proteins (45, 46), suggesting that the effects could arise from general protein-membrane interactions.”

      Our Supporting Information section now provides additional information about the membrane curvature.

      (41) R. M. Venable, F. L. H. Brown, R. W. Pastor, Mechanical properties of lipid bilayers from molecular dynamics simulation. Chemistry and Physics of Lipids 192, 60-74 (2015).

      (42) R. Chadda et al., Membrane transporter dimerization driven by differential lipid solvation energetics of dissociated and associated states. eLife 10, e63288 (2021).

      (43) S. Mondal et al., Membrane Driven Spatial Organization of GPCRs. Scientific Reports 3, 2909 (2013).

      (44) J. A. Lundbæk, S. A. Collingwood, H. I. Ingólfsson, R. Kapoor, O. S. Andersen, Lipid bilayer regulation of membrane protein function: gramicidin channels as molecular force probes. Journal of The Royal Society Interface 7, 373-395 (2009).

      We also expanded our SI Method section to account for the new calculations:

      “Analysis of lipid chain end-to-end length

      To probe the protein-induced deformation effect of the membrane, the membrane curvature (H), and the end-to-end distance between the lipid chains, were computed based on aMD and cgMD simulations. The lipid chain length was computed from simulations A1-A6 and C1 based on the first and last carbon atoms of each lipid chain. For example, the end-to-end length of a cardiolipin chain was determined as the distance between atom “CA1” and atom “CA18”.

      “Membrane Curvature and Deformation Energy

      The local mean curvature of the membrane midplane was computed by approximating the membrane surface as a height function Z(x,y), defined as the average location of the N-side and P-side leaflets at each grid point. Based on this, the mean curvature H(x,y) was calculated as,

      where the derivatives are defined as .

      The thickness deformation energy was computed from the local thickness d(x,y) relative to a reference thickness distribution F(d), derived from membrane-only simulations, and converted to a free energy profile via Boltzmann inversion. At each grid point, the F(d) was summed over the grid,

      The bending deformation energy was computed from the mean curvature field H(x,y), assuming a constant bilayer bending modulus κ (taken as 20 kJ mol-1 = 4.78 kcal mol-1):

      where Δ_A_ is the area of the grid cell.

      The thickness and curvature fields were obtained by projecting the coarse-grained MD trajectories (one frame per ns) onto a 2D-grid with a resolution of 0.5 nm. Grid points with low occupancy were downweighted to mitigate noise. More specifically, points with counts below 50% of the median grid count were scaled linearly by their relative count value. To focus the analysis on the region around the protein– membrane interface, only grid points within a radius of 20 nm from the center of the complex were included in the energy calculations. Energies were normalized to an effective membrane area of 1000 nm2 to facilitate the comparison between systems. Bootstrapping with resampling over frames was performed to estimate the standard deviations of G<sub>thick</sub> and G<sub>curv</sub>.

      We find that G<sub>curve</sub> converges slowly due to its sensitivity to local derivatives and the small grid size required to resolve the curvature contribution near the protein. Consequently, tens of microseconds of simulations were necessary to obtain well-converged estimates of the curvature energy.”

      (1) If CDL matches the hydrophobic thickness of the protein it would disrupt SC formation, not favor it. The authors' hypothesis is that the SC stabilizes the deformed membrane around the separated elements. Lipids that are compatible with the monomer deformed region stabilize the monomer, similarly to a surfactant. That is, if CDL prefers the interface because the interface is thin and their CDL is thin, CDL should prevent SC formation. A simpler hypothesis is that CDL's unique electrostatics are part of the glue.

      We rephrased the corresponding paragraph in the Discussion section to reflect the role of electrostatics for the behavior of cardiolipin.

      "…supporting the involvement of CDL as a "SC glue". In this regard, electrostatic effects arising from the negatively charged cardiolipin headgroup could play an important role in the interaction of the OXPHOS complexes."

      Generally our simulations suggest that CDL prefers thinner membranes, which could rationalize these findings.

      "We find that CDL prefers thinner membranes relative to the neutral phospholipids (PE/PC, Fig. S5F),[…]”

      (2) Error bars for lipid and Q* enrichments should be computed averaging over multi-lipid regions of the protein interface, e.g., dividing the protein-lipid interface into six to ten domains, in particular functionally relevant regions. Anionic lipids may have long, >500 ns residence times, which makes lipid enrichment large and characterization of error bars challenging in short simulations. Smaller regions will be noisy. The plots depicted in, for example, Figure S2 are noisy.

      It is indeed challenging to capture lipid movements on the timescales accessible for atomistic MD, and hence the data in Figure S2 contains some noise. In this regard, for the cgMD data presented in the revised Fig. S2H,I, the concentration data was averaged for six domains of the protein-lipid interface.

      (3) The membrane deformation is repeatedly referred to as "entropic" without justification. The bilayer has significant entropic and enthalpic terms just like any biomolecule, why are the authors singling out entropy? The standard "Helfrich" energetic Hamiltonian is a free energy model in that it implicitly integrates over many lipid degrees of freedom.

      We apologize for the unclear message – our intention was not to claim that the effects are purely entropic, but could arise from a combination of both entropic and enthalpic effects. We hope that this has now been better clarified in the revised manuscript. We also agree that it is difficult to separate between entropic and enthalpic effects. However, we wish to point out that, e.g., the temperature-dependence of the SC formation suggests that the entropic contribution is also affecting the process.

      Regarding the Helfrich Hamiltonian, we note that the standard model assumes a homogeneous fluid-like sheet. We have thus difficulties in relating this model to capture the local effects.

      Revisions / clarifications in the main manuscript:

      "SC formation is affected by both enthalpic and entropic effects."

      "We have shown here that the respiratory chain complexes perturb the IMM by affecting the local membrane dynamics. The perturbed thickness and alteration in the lipid dynamics lead to an energetic penalty, which can be related to molecular strain effects, as suggested by the changes of both the internal energy of lipid and their interaction with the surroundings (Fig. S2, S5, S6), which are likely to be of enthalpic origin. However, lipid binding to the OXPHOS complex also results in a reduction in the translational and rotational motion of the lipids and quinone (Fig. S8-S9), which could result in entropic changes. The strain effects are therefore likely to arise from a combination of enthalpic and entropic effects."

      (4) Figure S7 shows the surface area per lipid and leaflet height. This appears to show a result that is central to the interpretation of SC formation but which makes very little sense. One simply does not increase both the height and area of a lipid. This is a change in the lipid volume! The bulk compressibility of most anything is much higher than its Young's modulus [similar to area compressibility]. Instead, something else has happened. My guess is that there is *bilayer* curvature around these proteins and that it has been misinterpreted as area/thickness changes with opposite signs of the two leaflets. If a leaflet gets thin, its area expands. If the manuscript had more details regarding how they computed thickness I could help more. Perhaps they measured the height of a specific atom of the lipid above the average mid-plane normal? The mid-plane of a highly curved membrane would deflect from zero locally and could be misinterpreted as a thickness change.

      We thank the Reviewer for this insightful comment. We chose to define the membrane thickness based on the height of the lipid P-atoms above the average midplane normal. The Reviewer is correct that this measurement gives a changing thickness for a highly curved membrane. However, in this scenario, the thickness would always be overestimated [d<sub>true</sub> = d<sub>measured</sub> / cos (angle between global mid-plane normal and local mid-plane normal)]. Therefore, since we observe a smaller thickness at the protein-lipid interface, the effect is not likely to result from an artifact. For further clarification, see Fig. S4I showing the averaged local position of the Patoms in the cgMD simulations, which further supports that there is a local deformation of the lipid.

      The changes in the local membrane thickness are also supported by our analysis of the membrane thickness (Fig.S2A) and by the lipid chain length distributions (Fig.S6).

      (5) The authors write expertly about how conformational changes are interpreted in terms of function but the language is repeatedly suggestive. Can they put their findings into a more quantitative form with statistical analysis? "The EDA thus suggests that the dynamics of CI and CIII2 are allosterically coupled."

      We extended our analysis on the allosteric effects, which is now described in the revised main text, the SI and the Methods section:

      "In this regard, our graph theoretical analysis (Fig. S11C,D) further indicates that ligand binding to Complex I induces a dynamic crosstalk between NDUFA5 and NDUFA10, consistent with previous work (50, 51), and affecting also the motion of UQCRC2 with respect to its surroundings. Taken together, these effects suggest that the dynamics of CI and CIII<sub>2</sub> show some correlation that could result in allosteric effects, as also indicated based on cryo-EM analysis (40)."

      “Extended Methods

      Allosteric Network Analysis. Interactions between amino acid residues were modeled as an interaction graph, where each residue was represented by a vertex. Two nodes were connected by an edge, if the Ca atoms of the corresponding amino acid residues were closer than 7.5 Å for more than 50% of the frames of simulations S1-S6 (time step of frames: 1 ns). (7) This analysis was carried out for the aMD simulations of the supercomplex, analyzing differences between the Q bound and apo states (simulations A1+A2+A3 vs. A4+A5+A6).”

      (6) The authors write "We find that an increase in the lipid tail length decreases the relative stability of the SC (Figure S5C)" This is a critical point but I could not interpret Figure S5C consistently with this sentence. Can the authors explain this?

      We apologize for this oversight. This sentence should refer to Fig. S5F, which has now been corrected. We have additionally updated the figure to provide an improved estimation of the thickness contribution based on the lipid tail length.

      "We find that an increase in the lipid tail length decreases the relative stability of the SC (Fig. S5F)"

      (7) The authors use a 6x6 and 15x15 lattice to analyze SC formation. The SC assembly has 6 units of E_strain favoring its assembly, which they take up to 4 kT. At 3 kT, the SC should be favored by 18 kT, or a Boltzmann factor of 10^8. With only 225 sites, specific and non-specific complex formation should be robust. Can the authors please check their numbers or provide a qualitative guide to the data that would make clear what I'm missing?

      In the revised manuscript, we have now clarified the definition of the lattice model and the respective energies:

      In summary, the qualitative data presented are interesting (especially the combination of molecular modeling with simpler Monte Carlo modeling aiding broader interpretation of the results) ... but confusing in terms of the non-standard presentation of membrane mechanics and the difficulty of this reviewer to interpret some of the underlying figures: especially, the thickness of the leaflets around the protein and the relative thickness of cardiolipin. Resolving the quantitative interpretation of the bilayer deformation would greatly enhance the significance of their Monte Carlo model of SC formation.

      We thank the Reviewer for the helpful suggestion. We hope that the revisions help to clarify the non-standard presentation and connect to concepts used in the lipid membrane community.

      Reviewer #2 (Public review):

      Summary:

      The authors have used large-scale atomistic and coarse-grained molecular dynamics simulations on the respiratory chain complex and investigated the effect of the complex on the inner mitochondrial membrane. They have also used a simple phenomenological model to establish that the super complex (SC) assembly and stabilisation are driven by the interplay between the "entropic" forces due to strain energy and the enthalpies forces (specific and non-specific) between lipid and protein domains. The authors also show that the SC in the membrane leads to thinning and there is preferential localisation of certain lipids (Cardiolipin) in the annular region of the complex. The data reports that the SC assembly has an effect on the conformational dynamics of individual proteins making up the assembled complex and they undergo "allosteric crosstalk" to maintain the stable functional complex. From their conformational analyses of the proteins (individual and while in the complex) and membrane "structural" properties (such as thinning/lateral organization etc) as well from the out of their phenomenological lattice model, the authors have provided possible implications and molecular origin about the function of the complex in terms of aspects such as charge currents in internal mitochondrion membrane, proton transport activity and ATP synthesis.

      Strengths:

      The work is bold in terms of undertaking modelling and simulation of such a large complex that requires simulations of about a million atoms for long time scales. This requires technical acumen and resources. Also, the effort to make connections to experimental readouts has to be appreciated (though it is difficult to connect functional pathways with limited (additive forcefield) simulations.

      We thank the Reviewer for recognizing the challenge in simulating multimillion atom membrane proteins. We also thank the Reviewer for recognizing the connections we have made to different experiments. Our work indeed relies on atomistic and coarse-grained molecular simulations, which are widely recognized to provide accurate models of membrane proteins.

      Weakness:

      There are several weaknesses in the paper (please see the list below). Claims such as "entropic effect", "membrane strain energy" and "allosteric cross talks" are not properly supported by evidence and seem far-fetched at times. There are other weaknesses as well. Please see the list below.

      We thank the Reviewer for pointing out that key concepts needed further clarification. Please see answers to specific questions below:

      (i) Membrane "strain energy" has been loosely used and no effort is made to explain what the authors mean by the term and how they would quantify it. If the membrane is simulated in stress-free conditions, where are strains building up from?

      We thank the Reviewer for this important question. In the revised manuscript, we have toned down the assignment of the effects into pure entropic or enthalpic effects. We have also provided further clarification of the effects observed in the membrane.

      Example of revisions / clarifications in the main text:

      "SC formation is affected by both enthalpic and entropic effects."

      "We have shown here that the respiratory chain complexes perturb the IMM by affecting the local membrane dynamics. The perturbed thickness and alteration in the lipid dynamics lead to an energetic penalty, which can be related to molecular strain effects, as suggested by the changes of both the internal energy of lipid and their interaction with the surroundings (Fig. S2, S5, S6), which are likely to be of enthalpic origin. However, lipid binding to the OXPHOS complex, also results in a reduction in the translational and rotational motion of the lipids and quinone (Fig. S8-S9), which could result in entropic changes. The strain effects are therefore likely to arise from a combination of enthalpic and entropic effects."

      We have also revised the result section, where we now have explicitly defined and clarified the different contributions to membrane strain, observed in our simulations:

      In the following, we define membrane strain as the local perturbations of the lipid bilayer induced by protein-membrane interactions. These include changes in (i) membrane thickness, (ii) the local membrane composition, (iii) lipid chain configurations, and (iv) local curvature of the membrane plane relative to an undisturbed, protein-free bilayer. Together, these phenomena reflect the thermodynamic effects associated with accommodating large protein complexes within the membrane.

      We now also provide a more quantitative estimation of the membrane strain based on the contribution of changes in local thickness and curvature, summarize in Table 1.

      (ii) In result #1 (Protein membrane interaction modulates the lipid dynamics ....), I strongly feel that the readouts from simulations are overinterpreted. Membrane lateral organization in terms of lipids having preferential localisation is not new (see doi: 10.1021/acscentsci.8b00143) nor membrane thinning and implications to function (https://doi.org/10.1091/mbc.E20-12-0794). The distortions that are visible could be due to a mismatch in the number of lipids that need to be there between the upper and lower leaflets after the protein complex is incorporated. Also, the physiological membrane will have several chemically different lipids that will minimise such distortions as well as would be asymmetric across the leaflets - none of which has been considered. Connecting chain length to strain energy is also not well supported - are the authors trying to correlate membrane order (Lo vs Ld) with strain energy?

      We thank the Reviewer for the suggestions. The role of the membrane in driving supercomplex formation has not, to our knowledge, been suggested before. There are certainly many important studies, which have been better highlighted in the revised manuscript. In this context, we also now cite the papers Srivastava & coworkers and Tielemann & coworkers.

      “The localization of specific lipids around the membrane proteins, as well as local membrane perturbation effects, are also supported by simulations of other membrane proteins (45, 46), suggesting that the effects could arise from general protein-membrane interactions.”

      (45) V. Corradi et al., Lipid–Protein Interactions Are Unique Fingerprints for Membrane Proteins. ACS Central Science 4 (June 13, 2018).

      (46) K. Baratam, K. Jha, A. Srivastava, Flexible pivoting of dynamin pleckstrin homology domain catalyzes fission: insights into molecular degrees of freedom. Molecular Biology of the Cell 32 (2021 Jul 1).

      Physiological membrane will have several chemically different lipids that will minimise such distortions as well as would be asymmetric across the leaflets

      We agree with this point. As shown in Figs. 2H,N, S6, S13, we suggest that cardiolipin functions as a buffer molecule. However, very little is experimentally known about the asymmetric distribution of lipids in the IMM. Therefore, modelling the effect of asymmetry across the left is outside the scope of this study. Moreover, as now better clarified in the revised manuscript, we agree that it is difficult to unambiguously divide the effect into enthalpic and entropic contributions.

      To address the main concern of the Reviewer, we have updated the main text and Supporting Information to clearly state the different aspects of how the proteinmembrane interactions induce perturbations of the lipid bilayer. We define these effects as membrane strain. We now use the changes in local thickness and local curvature to quantify the effect of membrane strain on the stability of the respiratory SC.

      (iii) Entropic effect: What is the evidence towards the entropic effect? If strain energy is entropic, the authors first need to establish that. They discuss enthalpy-entropy compensation but there is no clear data or evidence to support that argument. The lipids will rearrange themselves or have a preference to be close to certain regions of the protein and that generally arises because of enthalpies reasons (see the body of work done by Carol Robinson with Mass Spec where certain lipids prefer proteins in the GAS phase, certainly there is no entropy at play there). I find the claims of entropic effects very unconvincing.

      We agree that it is difficult to distinguish the entropic vs. enthalpic contributions. In the revised manuscript, we better clarify that both effects are likely to be involved.

      The native MS work by Robinson and coworkers and others support that many lipids are strongly bound to membrane proteins, as also supported by the local binding of certain lipid molecules, such as CDL to the SC (Figs. S2, S6, S13).

      We suggest that the accumulation of cardiolipin at the protein-lipid interface involves a combination of entropic and enthalpic effects, arising from the reduction of the lipid mobility (entropy) as indicated by lowered diffusion (Fig. S9), and formation of noncovalent bonds between the lipid and the OXPHOS protein (Fig. S14).

      We added further clarification to the Discussion section.

      “Taken together, our combined findings suggest that the SC formation is affected by thermodynamic effects that reduce the molecular strain in the lipid membrane, whilst the perturbed micro-environment also affects the lipid and Q dynamics, as well as the dynamics of the OXPHOS proteins (see below).”

      (iv) The changes in conformations dynamics are subtle as reported by the authors and the allosteric arguments are made based on normal mode analyses. In the complex, there are large overlapping regions between the CI, CIII2, and SCI/III2. I am not sure how the allosteric crosstalk claim is established in this work - some more analyses and data would be useful. Normal mode analyses (EDA) suggest that the motions are coupled and correlated - I am not convinced that it suggests that there is allosteric cross-talk.

      Our analysis suggests that the SC changes the dynamics of the system. Although it is difficult to assign how these effects result in activity modulation of the system, we note these changes relate to sites that are central for the charge transfer reactions. We thank the Reviewer for suggesting to extend the analysis, which further suggests that regions of the proteins could be allosterically coupled.

      (v) The lattice model should be described better and the rationale for choosing the equation needs to be established. Specific interactions look unfavourable in the equation as compared to non-specific interactions.

      We have now provided further clarification of the lattice model in the Methods section. Addition to the main text:

      “Lattice model of SC formation. A lattice model of the CI and CIII<sub>2</sub> was constructed (Fig. 4A,B) by modeling the OXPHOS proteins in unique grid positions on a 2D N×N lattice. Depending on the relative orientation, the protein-protein interaction was described by specific interactions (giving rise to the energetic contribution E<sub>specific</sub> < 0) and non-specific interactions (E<sub>non-specific</sub> > 0). The membrane-protein interaction determined the strain energy of the membrane (E<sub>strain</sub>), based on the number of neighboring "lipid" occupied grids that are in contact with proteins (Fig. 4A). The interaction between the lipids was indirectly accounted for by the background energy of the model. The proteins could occupy four unique orientations on a grid ([North, East, South, West]). The states and their respective energies that the system can visit are summarized in Table S6.”

      “The conformational landscape was sampled by Monte Carlo (MC) using 10<sup>7</sup> MC iterations with 100 replicas. Temperature effects were modeled by varying β, and the effect of different protein-to-lipid ratios by increasing the grid area. The simulation details can be found in Table S7.”

      Reviewer #3 (Public review):

      Summary:

      In this contribution, the authors report atomistic, coarse-grained, and lattice simulations to analyze the mechanism of supercomplex (SC) formation in mitochondria. The results highlight the importance of membrane deformation as one of the major driving forces for SC formation, which is not entirely surprising given prior work on membrane protein assembly, but certainly of major mechanistic significance for the specific systems of interest.

      Strengths:

      The combination of complementary approaches, including an interesting (re)analysis of cryo-EM data, is particularly powerful and might be applicable to the analysis of related systems. The calculations also revealed that SC formation has interesting impacts on the structural and dynamical (motional correlation) properties of the individual protein components, suggesting further functional relevance of SC formation. Overall, the study is rather thorough and highly creative, and the impact on the field is expected to be significant.

      Weaknesses:

      In general, I don't think the work contains any obvious weaknesses, although I was left with some questions.

      We thank the Reviewer for acknowledging that our work is thorough and creative, and that it is likely to have a significant impact on the field.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Diffusion is quantified in speed units (Figure S8). The authors should explain why they have used an apparently incorrect model for quantifying diffusion. The variance of the distribution of a diffusing molecule is linear with time, not its standard deviation (as I suppose I would use for computing effective molecular speed). Perhaps they are quantifying residence times, in which molecules near a wall (protein) will appear to have half the movements of a bulk molecule. This is confusing.

      We thank the Reviewer for the comment. The data shown in previous version of Figure S8 corresponded to the effective molecular velocity, which is now clarified in the revised figure (now Fig. S9). This measure was used to reflect the average residence time of the groups in the vicinity of the sites.

      However, as suggested by the Reviewer, we now also analyzed the positiondependent diffusion of the quinone in the new Figure S9:

      (2) With a highly charged bilayer a large water layer is necessary to verify that the concentration of salt is plateauing at 150 mM at the box edge. 45 A appears to be the default in CHARMM-GUI, but this default guidance is not based on the charge of the bilayer. I suggest the authors plot the average concentration of both anions and cations in mM units along the z coordinate of the simulation cell.

      We thank the Reviewer for the suggestion. We have now provided an analysis of the average ion concentrations along the z coordinate, supporting that the salt concentration plateaus at 150 mM at the box edge.

      Typos:

      SI: "POPC/POPE or CLD" should be CDL

      We apologize for the mistake. We have corrected the typos:

      "of the membrane thickness in a POPC/POPE/CDL/QH2 membrane and a CDL membrane."

      "a pure CDL membrane"

      Reviewer #2 (Recommendations for the authors):

      (1) Suggestion regarding membrane strain energy claims:

      Changes in area per lipid and membrane thinning are surely not akin to membrane strain energy changes. At best, the authors should calculate the area compressibility (both in bilayers with and without proteins) and then make comments. In general, if they are talking about the in-plane properties (bilayer being liquid in 2D), I do not see how they can discuss membrane strain energy with NPT=1 atms barostat reservoir that they are simulating against. At least they can try to plot the membrane lateral pressures in various conditions and then start making such comments. If it was a closed vesicle, I would expect some tension in the membrane due to the closed surface but in the conditions in which the simulations are run, I do not see how strain is so important. If the authors want to be more rigorous, they can calculate "atomic viral" values by doing a tessellation and showing the data to make their point. Strain energy would mean that there is a modulus in-plane. Bending modulus would surely change with membrane thinning and area compressibility changes (simple plate theory) but linear strain is surely something to be defined well before making claims out of it.

      Our work shows that the OXPHOS proteins alter the local membrane thickness and curvature, and we now quantify the deformation penalty associated with that (Table 1). As stated above, we now provide a better definition and description 'membrane strain’ and the observed effect, which is likely to contain both enthalpic and entropic contributions.

      As suggested by the Reviewer, we have computed the lateral pressure profiles around the OXPHOS proteins, further supporting that there are energetic effects related to the "solvation" of the membrane proteins in the IMM. To this end, Figs. S2D,E; Figure S4I and Fig. S5G,H shows the membrane distortion effect; while in Fig. S5A supports that there the 'internal energy' of the lipids changes as result of the SC formation, further justifying that these effects can be assigned as 'strain effects'. The analysis has also been extended by computing the end-to-end distances, shown in Fig. S6.

      Unfortunately, it is technically unfeasible to accurately estimate the area compressibility, bending modulus, or the atomic virial for the present multi-million membrane protein simulations.

      Summary of Revisions/Additions:

      Fig. S2 [...] (D, E) Difference in the membrane thickness around the SC relative to CI (left) or relative to CIII<sub>2</sub> (right) from (D) aMD and (E) cgMD.

      Fig. S4. [...] (I) Visualization of the membrane distortion effect.

      Fig. S5. Analysis of membrane-induced distortion effects. (A) Relative strain effect relative to a lipid membrane from atomistic MD simulations of the SCI/III2, CI, and CIII<sub>2</sub>, suggesting reduction of the membrane strain (blue patches) in the SC surroundings. The figure shows the non-bonded energies relative to the average non-bonded energies from membrane simulations (simulation M4, Table S1). (B) The lipid strain contribution for different lipids calculated from non-bonded interaction energies of the lipids relative to the average lipid interaction in a IMM membrane model (simulation M4). The figure shows the relative strain contribution for nearby lipids (r < 2 Å, in color from panel (C), and lipids >5 Å from the OXPHOS proteins. (C) Selection of lipids (< 2 Å) interacting with the OXPHOS proteins. (D) Potential of mean force (PMF) of membrane thickness derived from thickness distributions from cgMD simulations of a membrane, the SCI/III2, CI, and CIII<sub>2</sub>. (E) Membrane thickness as a function of CDL concentration from cgMD simulations. (F) ΔGthick of the SC as a function of membrane thickness based on cgMD simulations. (G) Membrane curvature around the SCI/III2 (left), CI (middle), and CIII<sub>2</sub> (right) from atomistic simulations. (H) Squared membrane curvature obtained from cgMD simulations, within a 20 nm radius around the center of the system. These maps correspond to the curvature field used in the calculation of the bending deformation energy term (G<sub>curv</sub>).

      Fig. S6. Analysis of lipid end-to-end distance from aMD simulations of (A) SC, (B) CI, (C) CIII<sub>2</sub>.

      (2) Membrane distortions:

      Membrane distortions can arise due to a mismatch in the area between the upper leaflet and the lower left especially when a protein is embedded. Authors can carefully choose the numbers to keep the membrane stable.

      We have further clarified in the revised manuscript that the membranes are stable in all simulation setups. During building the simulation setups, it was carefully considered that no leaflet introduced higher lipid densities that could result in artificial displacements. Our results of the local changes in the lipid dynamics and structure around the OXPHOS complexes are independently supported by both our atomistic and coarse-grained simulations, which contain significantly larger membranes. Moreover, as discussed in our work, the local membrane distortion is also experimentally supported by cryoEM analysis as well as recent in situ cryoTEM data, showing that the OXPHOS proteins indeed affect the local membrane properties.

      Clarifications/Additions to the main text:

      “We find that the individual OXPHOS complexes, CI and CIII<sub>2</sub>, induce pronounced membrane strain effects, supported both by our aMD (Fig. S2A) and cgMD simulations with a large surrounding membrane (Fig. 2G).“

      ” The localization of specific lipids around the membrane proteins, as well as local membrane perturbation effects, are also supported by simulations of other membrane proteins (45, 46), suggesting that the effects could arise from general protein-membrane interactions.”

      "During construction of the simulation setups, it was carefully considered that no leaflet introduced higher lipid densities that could result in artificial displacement effects."

      (3) Strain energy as an entropic effect:

      Please establish that the strain energy (if at all present) can be called an entropic effect.

      We have now better clarified that the SC formation results from combined enthalpic and entropic effects. We apologize that the previous version of the text was unclear in this respect.

      To further probe the involvement of entropic effects, we derived entropic and enthalpic contributions from our lattice model. The model supports that increased strain contributions also alters the entropic contributions, further supporting the coupling between the effects.

      We have also clarified our definition of the effects:

      " The perturbed thickness and alteration in the lipid dynamics leads to an energetic penalty, which can be related to molecular strain effects, as suggested by the changes of both the internal energy of lipid and their interaction with the surroundings (Fig. S2, S5, S6), which are likely to be of enthalpic origin. However, lipid binding to the OXPHOS complex, also results in a reduction in the translational and rotational motion of the lipids and quinone (Fig. S8-S9), which could result in entropic changes. The strain effects are therefore likely to arise from a combination of enthalpic and entropic effects."

      (4) Allosteric cross-talk:

      A thorough network analysis (looking at aspects like graph laplacian, edge weights, eigenvector centrality, changes in characteristic path length, etc can be undertaken to establish allostery (see https://doi.org/10.1093/glycob/cwad094, Ruth Nussinov/Ivet Bahar papers).

      We have expanded the network analysis as suggested by the Reviewer. In this regard, we have expanded the analysis by computing the covariance matrix, further supporting that the SC could involve correlated protein dynamics. We observe a prominent change especially with respect to the ligand state of Complex I, indicative of some degree of allostery, while we find that the apo state of Complex I leads to a slight uncoupling of the motion between CI and CIII<sub>2</sub>.

      Additions in the main text:

      In this regard, our graph theoretical analysis (Fig. S11) further indicates that ligand binding to Complex I induces a dynamic crosstalk between NDUFA5 and NDUFA10, consistent with previous work (48, 49), and affecting also the motion of UQCRC2 with respect to its surroundings_._ Taken together, these effects suggest that the dynamics of CI and CIII<sub>2</sub> show some correlation that could result in allosteric effects, as also indicated based on the cryoEM analysis.

      (5) Lattice model:

      The equation needs to be rationalised. For example, specific interaction (g_i g_j favours separation (lower energy when i and j are not next to each other), and nonspecific interaction favours proximity. Why is that? Also, the notation for degeneracy in partition function and the notation for lattice point. It is mentioned that the "interaction between the lipids was indirectly accounted for by the "background energy" of the model". If the packing/thinning etc are so important to the molecular simulations, will not the background energy change with changing lipid organising during complex formation?

      We have further expanded the technical discussion of the energy terms in our lattice model.

      For example, specific interaction (g_i g_j favours separation (lower energy when i and j are not next to each other), and non-specific interaction favours proximity. Why is that

      "The g<sub>i</sub>g<sub>j</sub> -term assigns a specific energy contribution when the OXPHOS complexes are in adjacent lattice points only in a correct orientation (modeling a specific non-covalent interaction between the complexes such as the Arg29<sup>FB4</sup>-Asp260<sup>C1</sup>/Glu259<sup>C1</sup> interaction between CI and CIII<sub>2</sub>). The d<sub>i</sub>d<sub>j</sub> -term assigns a non-specific interaction for the OXPHOS complexes when they are in adjacent lattice points, but in a "wrong" orientation relative to each other to form a specific interaction. The term introduces a strain into all lattice points surrounding an OXPHOS complex, mimicking the local membrane perturbation effects observed in our molecular simulations.

      This leads to the partition function,

      where wi is the degeneracy of the state, modeling that the SC and OXPHOS proteins can reside at any lattice position of the membrane, and where β=1/k<sub>B</sub>T (k<sub>B</sub>, Boltzmann's constant; T, temperature). The probability of a given state i was calculated as,

      with the free energy (G) defined as,

      This discussion has been included in the methods sections to ensure that our work remains readable for the biological community studying supercomplexes from a biochemical, metabolic, and physiological perspectives.

      (6) This is a minor issue but the paper is poorly organised and can be fixed readily. The figures are not referenced in order. For example, Figure 2G is discussed before discussing Figures 2A-2F (never discussed). Figure S2 is referenced before Figure S1.

      Answer: We thank the Reviewer for pointing this out. The order of the figures was revised.

      Reviewer #3 (Recommendations for the authors):

      A few minor questions/suggestions, not necessarily in the order of importance:

      (1) The discussion of the timescale of simulations is a bit misleading. For example, the discussion cites a timescale of 0.3 ms of CG simulations. The value is actually the sum of multiple CG simulations on the order of 50-75 microseconds. These are already very impressive lengths of CG simulations, there is no need to use the aggregated time to claim even longer time scales.

      We thank the Reviewer for the suggestion on this important clarification. We have now modified the text and tables accordingly:

      "(0.3 ms in cumulative simulation time, 50-75 μs/cgMD simulation)"

      (2) The observation of cardiolipin (CDL) accumulation is interesting. How close are the head groups, relative to the electrostatic screening length at the interface? Should one worry about the potential change of protonation state coupled with the CDL redistribution?

      Answer: We thank the Reviewer for this excellent comment, which has also been on our mind. The CDL indeed form contacts with various functional groups at the protein interface (as shown in Fig. S13), as well as bulk ions (sodium) that could tune the p_K_a of the CDLs, and result in a protonation change. We have clarified these effects in the revised manuscript:

      "While CDL was modeled here in the singly anionic charged state (but cf. Fig. S5E), we note that the local electrostatic environment could tune their p_K_a that result in protonation changes of the lipid, consistent with its function as a proton collecting antenna (62)."

      (3) The authors refer to the membrane strain effect as entropic. Since membrane bending implicates a free energy change that includes both enthalpic and entropic components, I wonder how the authors reached the conclusion that the effect is largely entropic in nature.

      We agree with the Reviewer that the effects are likely to comprise both enthalpic and entropic contributions, which are difficult to separate in practice. To reflect this, we have now better clarified why we consider that both contributions are involved. We apologize that our previous version of the manuscript was unclear in this respect. Clarifications in the main text:

      “The perturbed thickness and alteration in the lipid dynamics lead to an energetic penalty, which can be related to molecular strain effects, as suggested by the changes of both the internal energy of lipid and their interaction with the surroundings (Fig. S2, S5, S6), which are likely to be of enthalpic origin. However, lipid binding to the OXPHOS complex also results in a reduction in the translational and rotational motion of the lipids and quinone (Fig. S8-S9), which could result in entropic changes. The strain effects are therefore likely to arise from a combination of enthalpic and entropic effects."

      (4) The authors refer to the computed dielectric constant as epsilon_perpendicular. Did the authors really distinguish the parallel and perpendicular component of the dielectric tensor, as was done by, for example, R. Netz and co-workers for planar surfaces?

      We have extracted the perpendicular dielectric constant from the total dielectric profiles. We clarify that this differs from the formal definition of by Netz and coworkers.

      “The calculations were performed by averaging the total M over fixed z values from the membrane plane. Note that this treatment differs from extraction of radial and axial contributions of the dielectric tensor, as developed by Netz and co-workers (cf. Ref. (3) and refs therein) that requires a more elaborate treatment, which is outside the scope of the present work.”

      (3) P. Loche, C. Ayaz, A. Schlaich, Y. Uematsu, R.R. Netz. Giant Axial Dielectric Response in Water-Filled Nanotubes and Effective Electrostatic Ion-Ion Interactions from a Tensorial Dielectric Model. J Phys Chem B 123, 10850-10857 (2019).

      (5) Regarding the effect of SC formation on protein structure and dynamics, especially allosteric effects, most of the discussions are rather qualitative in nature. More quantitative analysis would be valuable. For example, the authors did compute covariance matrix although it appears that they chose not to discuss the results in depth. Is the convergence of concern and therefore no thorough discussion is given?

      We have now expanded the analysis by computing the covariance matrix, further supporting that the SC could involve correlated protein dynamics. We observe a prominent change, especially with respect to the ligand state of Complex I, indicative of some degree of allostery, while we find that the apo state of Complex I leads to a slight uncoupling of the motion between CI and CIII<sub>2</sub>.

      Additions in the main text:

      “In this regard, our graph theoretical analysis (Fig. S11) further indicates that ligand binding to Complex I induces a dynamic crosstalk between NDUFA5 and NDUFA10, consistent with previous work (48, 49), and affecting also the motion of UQCRC2 with respect to its surroundings. Taken together, these effects suggest that the dynamics of CI and CIII<sub>2</sub> show some correlation that could result in allosteric effects, as also indicated based on the cryoEM analysis (40).”

      (6) The discussion of quinone diffusion is interesting, although I'm a bit intrigued by the unit of the diffusion constant cited in the discussion. Perhaps a simple typo?

      The plot showed the molecular velocity, which roughly corresponding to the residence times. However, as suggested by the Reviewer, we now also analyzed the position-dependent diffusion of the quinone in the new Figure S9:

    1. Author Response

      Thank you for your letter and for the reviewers’ comments concerning our manuscript entitled “The cation channel mechanisms of subthreshold inward depolarizing currents in the VTA dopaminergic neurons and their roles in the depression-like behavior”. These comments are constructive and very helpful for improving our manuscript. We have studied comments carefully and have made provisional revision which we hope meet with approval. We also respond to the reviewer’s comments point by point as following.

      Reviewer #1 (Public Review):

      Comment 1:

      The pharmacological tools used in this study are highly non-selective. Gd3+, used here to block NALCN is actually more commonly used to block TRP channels. 2-APB inhibits not only TRPC channels, but also TRPM and IP3 receptors while stimulating TRPV channels (Bon and Beech, 2013), while FFA actually stimulates TRPC6 channels while inhibiting other TRPCs (Foster et al., 2009).

      We agree with the reviewer that the substances mentioned are not specific. Although we performed shRNA experiments against NALCN and TRPC6, we also used more specific pharmacological modulators for these two channels, L703,606 (the antagonist of NALCN)[1] and larixyl acetate (a potent TRPC6 inhibitor)[2]. The results are shown in figure 3E, F and figure 4C, E.

      Comment 2:

      -The multimodal approach including shRNA knockdown experiments alleviates much of the concern about the non-specific pharmacological agents. Therefore, the author's claim that NALCN is involved in VTA dopaminergic neuron pacemaking is well-supported.

      -However, the claim that TRPC6 is the key TRPC channel in VTA spontaneous firing is somewhat, but not completely supported. As with NALCN above, the pharmacology alone is much too non-specific to support the claim that TRPC6 is the TRP channel responsible for pacemaking. However, unlike the NALCN condition, there is an issue with interpreting the shRNA knockdown experiments. The issue is that TRPC channels often form heteromers with TRPC channels of other types (Goel, Sinkins and Schilling, 2002; Strübing et al., 2003). Therefore, it is possible that knocking down TRPC6 is interfering with the normal function of another TRPC channel, such as TRPC7 or TRPC4.

      From our single-cell RNA-seq results, TRPC7 and TRPC4 are found not to be present broadly like TRPC6 in the VTA DA neurons. And in experiments using single cell PCR (sFig. 9A), only a very small proportion of TRPC6-positive DA cells (DAT+) expressed TRPC4 (sFig. 9Bi) or TRPC7 (sFig. 9Bii), in consistent with the results of single-cell RNA-seq (Fig.2). Therefore, it is possible that knocking down TRPC6 maybe not interfering with the normal function of another TRPC channel, such as TRPC7 or TRPC4.

      Comment 3:

      The claim that TRPC6 channels in the VTA are involved in the depressive-like symptoms of CMUS is supported.

      • However, the connection between the mPFC-projecting VTA neurons, TRPC6 channels, and the chronic unpredictable stress model (CMUS) of depression is not well supported. In Figure 2, it appears that the mPFC-projecting VTA neurons have very low TRPC6 expression compared to VTA neurons projecting to other targets. However, in figure 6, the authors focus on the mPFC-projecting neurons in their CMUS model and show that it is these neurons that are no longer sensitive to pharmacological agents non-specifically blocking TRPC channels (2-APB, see above comment). Finally, in figure 7, the authors show that shRNA knockdown of TRPC6 channels (in all VTA dopaminergic neurons) results in depressive-like symptoms in CMUS mice. Due to the low expression of TRPC6 in mPFC-projecting VTA neurons, the author's claims of "broad and strong expression of TRPC6 channels across VTA DA neurons" is not fully supported. Because of the messy pharmacological tools used, it cannot be clamed that TRPC6 in the mPFC-projecting VTA neurons is altered after CMUS. And because the knockdown experiments are not specific to mPFC-projecting VTA neurons, it cannot be claimed that reducing TRPC6 in these specific neurons is causing depressive symptoms.

      The reason we focused on the mPFC-projecting VTA DA neurons is that this pathway is indicated in depressive-like behaviors of the CMUS model[3-5]. Although mPFC-projecting VTA DA neurons seem have lower level of TRPC6, we reason they are still functional there. However, we do agree with the reviewer that the statement “broad and strong expression of TRPC6 channels across VTA DA neurons" is not fully supported. We have changed the statements based on the reviewer suggestion. Furthermore, we did selectively knockdown TRPC6 in the mPFC-projecting VTA DA neurons, and then studied the behavior (Fig.8).

      Comment 4:

      It is important to note that the experiments presented in Figure 1 have all been previously performed in VTA dopaminergic neurons (Khaliq and Bean, 2010) including showing that low calcium increases VTA neuron spontaneous firing frequency and that replacement of sodium with NMDG hyperpolarizes the membrane potential.

      We agree with reviewer that similar experiments have been performed previously [6] for the flow of our manuscript and for general readers.

      Comment 5:

      -The authors explanation for the increase in firing frequency in 0 calcium conditions is that calcium-activated potassium channels would no longer be activated. However, there is a highly relevant finding that low calcium enhances the NALCN conductance through the calcium sensing receptor from Dejian Ren's lab (Lu et al., 2010) which is not cited in this paper. This increase in NALCN conductance with low calcium has been shown in SNc dopaminergic neurons (Philippart and Khaliq, 2018), and is likely a factor contributing to the low-calcium-mediated increase in spontaneous VTA neuron firing.

      We agree with the reviewer and thanks for the suggestions. A discussion for this has been added.

      Comment 6:

      -One of the only demonstrations of the expression and physiological significance of TRPCs in VTA DA neurons was published by (Rasmus et al., 2011; Klipec et al., 2016) which are not cited in this paper. In their study, TRPC4 expression was detected in a uniformly distributed subset of VTA DA neurons, and TRPC4 KO rats showed decreased VTA DA neuron tonic firing and deficits in cocaine reward and social behaviors.

      We thank the reviewer for the suggestion. The references and a discussion for this has been added.

      Comment 7:

      • Out of all seven TRPCs, TRPC5 is the only one reported to have basal/constitutive activity in heterologous expression systems (Schaefer et al., 2000; Jeon et al., 2012). Others TRPCs such as TRPC6 are typically activated by Gq-coupled GPCRs. Why would TRPC6 be spontaneously/constitutively active in VTA DA neurons?

      In a complex neuronal environment where VTA DA neurons are located, multiple modulatory factors including the GPCRs could be dynamically active, this could lead to the activation of TRP channels including TRPC6.

      Comment 8:

      A new paper from the group of Myoung Kyu Park (Hahn et al., 2023) shows in great detail the interactions between NALCN and TRPC3 channels in pacemaking of SNc DA neurons.

      The reference mentioned has been added. We thank the reviewer.

      Reviewer #2 (Public Review):

      Comment 1:

      These results do not show that TRPC6 mediates stress effects on depression-like behavior. As stated by the authors in the first sentence of the final paragraph, "downregulation of TRPC6 proteins was correlated with reduced firing activity of the VTA DA neurons, the depression-like behaviors, and that knocking down of TRPC6 in the VTA DA neurons confer the mice with depression behaviors." Therefore, the results show associations between TRPC6 downregulation and stress effects on behavior, occlusion of the effects of one by the other on some outcome measures, and cell manipulation effects that resemble stress effects. There is no experiment that shows reversal of stress effects with cell/circuit-specific TRPC6 manipulations. Please adjust the title, abstract and interpretation accordingly.

      We agree with the reviewer’s suggestion. The title was changed to ‘’The cation channel mechanisms of subthreshold inward depolarizing currents in the VTA dopaminergic neurons and their roles in the chronic stress-induced depression-like behavior” and the abstract and interpretation were also adjusted accordingly.

      Comment 2:

      Statistical tests and results are unclear throughout. For all analyses, please report specific tests used, factors/groups, test statistic and p-value for all data analyses reported. In some cases, the chosen test is not appropriate. For example, in Figure 6E, it is not clear how an experiment with 2 factors (stress and drug) can be analyzed with a 1-way RM ANOVA. The potential impact of inappropriate statistical tests on results makes it difficult to assess the accuracy of data interpretation.

      We have redone the statistical analysis as suggested by the reviewer and added specific tests used, factors/groups, test statistic and p-value for all data analyses into the figure legends of the revised manuscript.

      Comment 3:

      Why were only male mice used? Please justify and discuss in the manuscript. Also, change the title to reflect this.

      Although most similar previous studies used male mice or rats[7, 8], we do agree with the reviewer that the female animals should also be tested, in consideration possible role of sex hormones, as such we repeated some key experiments on female mice (sFig.1.6.8. and 13).

      Comment 4:

      Number of recorded cells is very low in Figure 1. Where in VTA did recordings occur? Given the heterogeneity in this brain region, this n may be insufficient. Additional information (e.g., location within VTA, criteria used to identify neurons) should be included. Report the number of mice (i.e., n = 6 cells from X mice) in all figures.

      Yes indeed, the number here is not high. More experiments were performed to increase the N/n number. And the location of recorded cells in VTA and the number of used mice is now shown in all figures; criteria to identify neurons is stated in the Methods-Identification of DA neurons and electrophysiological recordings. At the end of electrophysiological recordings, the recorded VTA neurons were collected for single-cell PCR. VTA DA neurons were identified by single-cell PCR for the presence of TH and DAT.

      Comment 5:

      Authors refer to VTA DA neurons as those that are DAT+ in line 276, although TH expression is considered the standard of DAergic identity, and studies (e.g., Lammel et al, 2008) have shown that a subset of VTA DA neurons have low levels of DAT expression. Authors should reword/clarify that these are DAT-expressing VTA DA neurons.

      The study published by Lammel[9] in 2015 has shown the low dopamine specificity of transgene expression in ventral midbrain of TH-Cre mice; on the other hand, DAT-Cre mice exhibit dopamine-specific Cre expression patterns, although DAT-Cre mice are likely to suffer from their own limitations (for example, low DAT expression in mesocortical DA neurons may make it difficult to target this subpopulation, see Lammel et al., 2008[10]).Hence, in our study, the DAT was used as criteria to identify DAT neurons. Of course, TH and DAT were all tested in single-cell PCR to identify whether the recorded cells were DA neurons.

      Comment 6:

      Neuronal subtype proportions should be quantified and reported (Fig. 1Aii).

      Neuronal subtype proportions are now quantified and reported in Fig. 1Aii.

      Comment 7:

      In addition to reporting projection specificity of neurons expressing specific channels, it would be ideal to report these data according to spatial location in VTA.

      The spatial location of recorded cells in VTA are now shown in all figures.

      Comment 8:

      The authors state that there are a small number of Glut neurons in VTA, then they state that a "significant proportion" of VTA neurons are glutamatergic.

      Thanks, “a significant proportion of neurons” has been changed to “less than half of sequenced DA neurons”.

      Comment 9:

      It is an overstatement that VTA DA neurons are the key determinant of abnormal behaviors in affective disorders.

      Thanks, we have amended the statement to that “Dopaminergic (DA) neurons in the ventral tegmental area (VTA) play an important role in mood, reward and emotion-related behaviors”.

      Reviewer #3 (Public Review):

      Comment 1:

      The authors of this study have examined which cation channels specifically confer to ventral tegmental area dopaminergic neurons their autonomic (spontaneous) firing properties. Having brought evidence for the key role played by NALCN and TRPC6 channels therein, the authors aimed at measuring whether these channels play some role in so-called depression-like (but see below) behaviors triggered by chronic exposure to different stressors. Following evidence for a down-regulation of TRPC6 protein expression in ventral tegmental area dopaminergic cells of stressed animals, the authors provide evidence through viral expression protocols for a causal link between such a down-regulation and so-called depression-like behaviors. The main strength of this study lies on a comprehensive bottom-up approach ranging from patch-clamp recordings to behavioral tasks. However, the interpretation of the results gathered from these behavioral tasks might also be considered one main weakness of the abovementioned approach. Thus, the authors make a confusion (widely observed in numerous publications) with regard to the use of paradigms (forced swim test, tail suspension test) initially aimed (and hence validated) at detecting the antidepressant effects of drugs and which by no means provide clues on "depression" in their subjects. Indeed, in their hands, the authors report that stress elicits changes in these tests which are opposed to those theoretically seen after antidepressant medication. However, these results do not imply that these changes reflect "depression" but rather that the individuals under scrutiny simply show different responses from those seen in nonstressed animals. These limits are even more valid in nonstressed animals injected with TRPC6 shRNAs (how can 5-min tests be compared to a complex and chronic pathological state such as depression?). With regard to anxiety, as investigated with the elevated plus-maze and the open field, the data, as reported, do not allow to check the author's interpretation as anxiety indices are either not correctly provided (e.g. absolute open arm data instead of percents of open arm visits without mention of closed arm behaviors) or subjected to possible biases (lack of distinction between central and peripheral components of the apparatus).

      We agree with the reviewer that behavior tests we used here is debatable whether they represent a real depression state, and this is an open question that could be discussed from different respective. Since these testes (forced swimming and tail suspension), as the reviewer noted, were “widely observed in numerous publications”, we used these seemly only options to reflect a “depression-like” state. One could argue that since these testes were initially used for testing antidepressants (“validated”), with decreased immobility time as indications of anti-depressive effects, why not an increased immobility time reflect a “depression-like” state. As for anxiety tests, the data concerning the elevated plus-maze are also changed based on the reviewer’s suggestion.

      Recommendations for the authors: please note that you control which, if any, revisions, to undertake

      Reviewer #1 (Recommendations For The Authors):

      Recommendation 1 for improving the paper:

      -The paper needs extensive editing for both overall structural clarity and for the high number of typos and grammatical errors.

      We thank the reviewer’s suggestion. The revised manuscript has been edited extensively.

      Recommendation 2 for improving the paper:

      -Retrobeads are often toxic to cells and build up with increasing time. It is surprising that the authors wait 14-21 days for retrobead expression in their target cells. It is also a problem that the mPFC projecting cells have a longer time with the retrobeads than the other projection-targeting cells because the toxicity could be more extensive with the longer wait time thus confounding the results. The authors should repeat some mPFC experiments at the 14 day time point to confirm that the longer time with the beads is not influencing the differential effects in these cells.

      According to the methods published by Stephan Lammel and Jochen Roeper, “For sufficient labeling, survival periods for retrograde tracer transport depended on respective injection areas: DS and NAc lateral shell, 7 days; NAc core, NAc medial shell, and BLA, 14 days; and mPFC, 21 days[10]”, we did the experiments related to mPFC projecting cells at the 21 day time point. Consistent with the mentioned above, the labeled mPFC projecting cells at 14 day time point, is not sufficient, compared with this at 21 day time point, which is shown as followings.

      Author response image 1.

      Confocal images showing the anatomical distribution of mPFC-projecting DA neurons labelled with retrobeads (red) in the VTA after DAT-immunofluorescence (green) staining at different day time point (A, 14d; B, 21d) after retrobeads injection; Scale bars=10 μm.

      Recommendation 3 for improving the paper:

      -The experiment with FFA in Figure 4E seems weird. Why is there no baseline before the FFA application? And why is the baseline trending downward immediately? The authors should explain why this example experiment is presented differently from all the others.

      We apologize for this part that this example time-course is not typical. Since the FFA is not specific antagonist for TRPC6 and actually stimulates TRPC6 channels, we repeated the experiments with a more specific pharmacological modulator for TRPC6, larixyl acetate (LA), and the results are shown in Figure 4C and 4F.

      Recommendation 4 for improving the paper:

      -It would be much more useful to see exact p values in the text, as it aids in interpreting the 'insignificance' of specific comparisons. Specifically, in Figure 5F, the 2-APB looks like it is having a small effect, and the already low firing rate (due to the TRPC6 knockdown) makes a big effect less likely. It would be useful to know what the actual p value is here (and everywhere).

      OK. We now report all P values in the figure legends of the revised version.

      Recommendation 5 for improving the paper:

      -In the results, it should be explained that the "RMP" of VTA DA neurons was obtained by treating the cells with TTX.

      A sentence indicating the presence of TTX when measuring “RMP” is added in the Results part of the revised version.

      Recommendation 6 for improving the paper:

      -The spacing of the panels in the figures is somewhat odd. The figures could be more compact.

      Thanks, we have re-arranged all figures.

      Recommendation 7 for improving the paper:

      The paper is difficult to read because of significant grammatical errors. Here are some examples by line number, but this list is not at all exhaustive.

      We thank the reviewer for pointing out grammatical errors and we corrected them.

      Reviewer #2 (Recommendations For The Authors):

      Recommendation 1 for improving the paper:

      Fix typos: e.g., change HCH to HCN, change EMP to EPM, "these finding", "compact par" should read "pars compacta", "substantial" in line 475 should read "substantia", Incomplete sentences on line 73 and line 107, etc. Also, what is meant by "autonomic" firing activity? What is meant by "expression files"? Change "depression behaviors" to depression-like behaviors. "The HCN" as written in line 69 is a bit misleading, as HCN channels in the heart and brain are different members of a family of channels, although as written in the text, it seems that they are identical. In Figure 2, rearrange order of brain regions (e.g., from "BLA-VTA" to "VTA-BLA"), because as written, it seems that the focus is on projections into the VTA from each brain region, rather than VTA neurons that project to each respective region.

      We thank the reviewer for pointing out these errors and we corrected them. Autonomic firing activity has been changed to spontaneous firing activity. Expression files has been changed to expression levels. All the “depression behavior” have been changed to depression-like behaviors. In the Figure 2, all “xx-VTA” have been changed to “VTA-xx”.

      Reviewer #3 (Recommendations For The Authors):

      Recommendation 1 for improving the paper:

      Methodology: as opposed to sFig. 8 where the order through which mice were repeatedly tested is precise, such a key information is lacking in Fig. 6 as well as in the Methods section (for example, when such traumatic stress as forced swimming is performed with regard to the other tests?). Relevant to this point is the possible bias triggered by such chronological testing as exposure to the forced swim test likely affects the behaviors recorded in the other tests. Furthermore, the way this test is conducted is appealing as it is mentioned that the water depth was set to 10 cms which is quite low given that immobility scores might be affected by the ability of mice to stand on their tails.

      With regard to the elevated plus-maze, data are erroneously provided. Absolute values regarding open arm behaviors should be provided as percentages of the number of visits (or time spent therein) over the total (open + closed) number of arm visits. Indeed, closed arm visits should also be provided. This variable, also considered an index of locomotor activity, would allow the reader to exclude any effect of locomotion on the exploration in the open field.

      As they stand, data in the open field seem to indicate parallel changes at the center(center time) and the periphery (total distance), hence suggesting locomotor effects rather than anxiogenic effects. Data related to the center and the periphery should be clearly distinguished. Lastly, the number of weeks allowed for the mice to recover from surgeries aimed at delivering viruses are not mentioned. This is important as it could have affected the amplitude of the sensitivity to the stressors.

      We thank the reviewer for the suggestion. The lack information in Figure 6 and the Methods is now supplied. We apologize for the wrong number of “10 cm” in the forced swimming test, this has been corrected. The data concerning the elevated plus-maze are also changed based on the reviewer’s suggestion. For a possible role of locomotor effect, we tested the mice on the rota-rod test. From the result, there is no difference in locomotor activity between control and depressed-like mice (sFig.10G, sFig.12I and sFig.13G). We modified the experimental procedure timeline in Figure 6 and in the method- AAV for gene knockdown or overexpression and viral construct and injection, we added “Mice were singly housed with enough food and water to recover for 4-5 weeks after injection of virus, before behavior tests and electrophysiological recordings.” to report the number of weeks allowed for the mice to recover from surgeries aimed at delivering virus.

      Recommendation 2 for improving the paper:

      Results/conclusions: as yet mentioned, the authors make a confusion in the interpretation of their tail suspension tests and forced swimming tests. I acknowledge that such a confusion is frequent but it is important to note that the tests used by the authors were INITIALLY aimed at detecting the antidepressant effects of drugs under investigation. However, it is not because a test reveals such antidepressant properties that they also provide indices of depression. The authors will surely agree that it is unlikely that a 5-min test provides a model of a chronic pathology accounted for by a complex intrication between genetics and environmental factors. I would propose the authors to read for example Molendijk and De Kloet (Eur J Neurosci 2022). I think that the authors should just neutrally mention their results without any interpretation related to depression. On the other hand, what could have been interesting is to test whether the so-called "depressive-like" responses recorded in the study were sensitive to chronic antidepressant treatments. This would have allowed the authors to further suggest some relevance (if any) with depression-like pathologies.

      As we discussed above, we again agree with the reviewer’s concern. However, if as stated by the reviewer that “However, it is not because a test reveals such antidepressant properties that they also provide indices of depression”, then the experiments suggested by the reviewer “….. to test whether the so-called "depressive-like" responses recorded in the study were sensitive to chronic antidepressant treatments”

      Recommendation 3 for improving the paper:

      A close examination of the responses to CMUS or chronic restraint suggests that indeed two populations of animals were detected, possibly sensitive and resilient to these stressors. Did the authors try to examine this possibility?

      Based on the results of behavior test in CMUS and CRS, animals might be divided into two populations of animals highly-sensitive and moderately-sensitive ones.

      Recommendation 4 for improving the paper:

      There are some text changes that need to be performed:

      Page 2 line 46: ref 4 uses a social stress model which brings no clearcut evidence for it being a "depression" model. Indeed, this model can also be suggested to be a model of chronic anxiety (Kalueff et al., Science 2006; Chaouloff, Cell tissue Res 2013), hence indicating that VTA dopaminergic neurons might also be involved in anxiety.

      page 11, line 329: the references supporting the hypothesis that VTA DA neurons are linked to depression cannot be found in the reference list (10-15 do not correspond to the appropriate references).

      page 11, line 3341: reference 47 does not fit with the authors' assertion as it did not include any behavior.

      Fig. S8: body weight data are likely provided as changes rather than absolute values (e.g. 8 g)

      We agreed with the reviewer’s comments. The line 46“……such as depression states” has been changed to “such as depression- or anxiety-related states”. And we corrected the references in line 329 and 341. Finally, the body weight has been changed to the change in body weight.

      References:

      1. Um, K.B., et al., TRPC3 and NALCN channels drive pacemaking in substantia nigra dopaminergic neurons. Elife, 2021. 10.

      2. Urban, N., et al., Identification and Validation of Larixyl Acetate as a Potent TRPC6 Inhibitor. Mol Pharmacol, 2016. 89(1): p. 197-213.

      3. Zhong, P., et al., HCN2 channels in the ventral tegmental area regulate behavioral responses to chronic stress. Elife, 2018. 7.

      4. Liu, D., et al., Brain-derived neurotrophic factor-mediated projection-specific regulation of depressive-like and nociceptive behaviors in the mesolimbic reward circuitry. Pain, 2018. 159(1): p. 175.

      5. Walsh, J.J. and M.H. Han, The Heterogeneity of Ventral Tegmental Area Neurons: Projection Functions in a Mood-Related Context. Neuroscience, 2014. 282: p. 101-108.

      6. Khaliq, Z.M. and B.P. Bean, Pacemaking in dopaminergic ventral tegmental area neurons: depolarizing drive from background and voltage-dependent sodium conductances. J Neurosci, 2010. 30(21): p. 7401-13.

      7. Li, L., et al., Selective targeting of M-type potassium K(v) 7.4 channels demonstrates their key role in the regulation of dopaminergic neuronal excitability and depression-like behaviour. Br J Pharmacol, 2017. 174(23): p. 4277-4294.

      8. Friedman, A.K., et al., Enhancing depression mechanisms in midbrain dopamine neurons achieves homeostatic resilience. Science, 2014. 344(6181): p. 313-9.

      9. Lammel, S., et al., Diversity of transgenic mouse models for selective targeting of midbrain dopamine neurons. Neuron, 2015. 85(2): p. 429-38.

      10. Lammel, S., et al., Unique properties of mesoprefrontal neurons within a dual mesocorticolimbic dopamine system. Neuron, 2008. 57(5): p. 760-73.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:  

      Reviewer #1 (Public Review): 

      Summary: 

      In their manuscript entitled 'The domesticated transposon protein L1TD1 associates with its ancestor L1 ORF1p to promote LINE-1 retrotransposition', Kavaklıoğlu and colleagues delve into the role of L1TD1, an RNA binding protein (RBP) derived from a LINE1 transposon. L1TD1 proves crucial for maintaining pluripotency in embryonic stem cells and is linked to cancer progression in germ cell tumors, yet its precise molecular function remains elusive. Here, the authors uncover an intriguing interaction between L1TD1 and its ancestral LINE-1 retrotransposon. 

      The authors delete the DNA methyltransferase DNMT1 in a haploid human cell line (HAP1), inducing widespread DNA hypo-methylation. This hypomethylation prompts abnormal expression of L1TD1. To scrutinize L1TD1's function in a DNMT1 knock-out setting, the authors create DNMT1/L1TD1 double knock-out cell lines (DKO). Curiously, while the loss of global DNA methylation doesn't impede proliferation, additional depletion of L1TD1 leads to DNA damage and apoptosis.  

      To unravel the molecular mechanism underpinning L1TD1's protective role in the absence of DNA methylation, the authors dissect L1TD1 complexes in terms of protein and RNA composition. They unveil an association with the LINE-1 transposon protein L1-ORF1 and LINE-1 transcripts, among others.  

      Surprisingly, the authors note fewer LINE-1 retro-transposition events in DKO cells than in DNMT1 KO alone.  

      Strengths: 

      The authors present compelling data suggesting the interplay of a transposon-derived human RNA binding protein with its ancestral transposable element. Their findings spur interesting questions for cancer types, where LINE1 and L1TD1 are aberrantly expressed.  

      Weaknesses: 

      Suggestions for refinement:  

      The initial experiment, inducing global hypo-methylation by eliminating DNMT1 in HAP1 cells, is intriguing and warrants a more detailed description. How many genes experience misregulation or aberrant expression? What phenotypic changes occur in these cells? 

      This is an excellent suggestion. We have gene expression data on WT versus DNMT1 KO HAP1 cells and have included them now as Suppl. Figure S1. The  transcriptome analysis of DNMT1 KO cells showed hundreds of deregulated genes upon DNMT1 ablation. As expected, the majority were up-regulated and gene ontology analysis revealed that among the strongest up-regulated genes were gene clusters with functions in “regulation of transcription from RNA polymerase II promoter” and “cell differentiation” and genes encoding proteins with KRAB domains. In addition, the de novo methyltransferases DNMT3A and DNMT3B were up-regulated in DNMT1 KO cells suggesting the set-up of compensatory mechanisms in these cells. 

      Why did the authors focus on L1TD1? Providing some of this data would be helpful to understand the rationale behind the thorough analysis of L1TD1. 

      We have previously discovered that conditional deletion of the maintenance DNA methyltransferase DNMT1 in the murine epidermis results not only in the up-regulation of mobile elements, such as IAPs but also the induced expression of L1TD1 ([1], Suppl. Table 1 and Author response image 1). Similary, L1TD1 expression was induced by treatment of primary human keratinocytes or squamous cell carcinoma cells with the DNMT inhibitor azadeoxycytidine (Author response images 2 and 3). These findings are in accordance with the observation  that inhibition of DNA methyltransferase activity by aza-deoxycytidine in human non-small cell lung cancer cells (NSCLCs) results in up-regulation of L1TD1 [2]. Our interest in L1TD1 was further fueled by reports on a potential function of L1TD1 as prognostic tumor marker. We have included this information in the last paragraph of the Introduction in the revised manuscript.

      Author response image 1. RT-qPCR of L1TD1 expression in cultured murine control and Dnmt1 Δ/Δker keratinocytes. mRNA levels of L1td1 were analyzed in keratinocytes isolated at P5 from conditional Dnmt1 knockout mice [1]. Hprt expression was used for normalization of mRNA levels and wildtype control was set to 1. Data represent means ±s.d. with n=4. **P < 0.01 (paired t-test). 

      Author response image 2. RT-qPCR analysis of L1TD1 expression in primary human keratinocytes. Cells were treated with 5-aza-2-deoxycidine for 24 hours or 48 hours, with PBS for 48 hours or were left untreated. 18S rRNA expression was used for normalization of mRNA levels and PBS control was set to 1. Data represent means ±s.d. with n=3. **P < 0.01 (paired t-test).

      Author response image 3. Induced L1TD1 expression upon DNMT inhibition in squamous cell carcinoma cell lines SCC9 and SCCO12. Cells were treated with 5-aza-2-deoxycidine for 24 hours, 48 hours or 6 days. (A) Western blot analysis of L1TD1 protein levels using beta-actin as loading control. (B) Indirect immunofluorescence microscopy analysis of L1TD1 expression in SCC9 cells. Nuclear DNA was stained with DAPI. Scale bar: 10 µm. (C)  RT-qPCR analysis of L1TD1 expression in primary human keratinocytes. Cells were treated with 5-aza-2deoxycidine for 24 hours or 48 hours, with PBS for 48 hours or were left untreated. 18S rRNA expression was used for normalization of mRNA levels and PBS control was set to 1. Data represent means ±s.d. with n=3. *P < 0.05, **P < 0.01 (paired t-test).

      The finding that L1TD1/DNMT1 DKO cells exhibit increased apoptosis and DNA damage but decreased L1 retro-transposition is unexpected. Considering the DNA damage associated with retro-transposition and the DNA damage and apoptosis observed in L1TD1/DNMT1 DKO cells, one would anticipate the opposite outcome. Could it be that the observation of fewer transposition-positive colonies stems from the demise of the most transposition-positive colonies? Further exploration of this phenomenon would be intriguing. 

      This is an important point and we were aware of this potential problem. Therefore, we calibrated the retrotransposition assay by transfection with a blasticidin resistance gene vector to take into account potential differences in cell viability and blasticidin sensitivity. Thus, the observed reduction in L1 retrotransposition efficiency is not an indirect effect of reduced cell viability. We have added a corresponding clarification in the Results section on page 8, last paragraph. 

      Based on previous studies with hESCs and germ cell tumors [3], it is likely that, in addition to its role in retrotransposition, L1TD1 has further functions in the regulation of cell proliferation and differentiation. L1TD1 might therefore attenuate the effect of DNMT1 loss in KO cells generating an intermediate phenotype (as pointed out by Reviewer 2) and simultaneous loss of both L1TD1 and DNMT1 results in more pronounced effects on cell viability. This is in agreement with the observation that a subset of L1TD1 associated transcripts encode proteins involved in the control of cell division and cell cycle. It is possible that subtle changes in the expression of these protein that were not detected in our mass spectrometry approach contribute to the antiproliferative effect of L1TD1 depletion as discussed in the Discussion section of the revised manuscript. 

      Reviewer #2 (Public Review):           

      In this study, Kavaklıoğlu et al. investigated and presented evidence for the role of domesticated transposon protein L1TD1 in enabling its ancestral relative, L1 ORF1p, to retrotranspose in HAP1 human tumor cells. The authors provided insight into the molecular function of L1TD1 and shed some clarifying light on previous studies that showed somewhat contradictory outcomes surrounding L1TD1 expression. Here, L1TD1 expression was correlated with L1 activation in a hypomethylation-dependent manner, due to DNMT1 deletion in the HAP1 cell line. The authors then identified L1TD1-associated RNAs using RIP-Seq, which displays a disconnect between transcript and protein abundance (via Tandem Mass Tag multiplex mass spectrometry analysis). The one exception was for L1TD1 itself, which is consistent with a model in which the RNA transcripts associated with L1TD1 are not directly regulated at the translation level. Instead, the authors found the L1TD1 protein associated with L1-RNPs, and this interaction is associated with increased L1 retrotransposition, at least in the contexts of HAP1 cells. Overall, these results support a model in which L1TD1 is restrained by DNA methylation, but in the absence of this repressive mark, L1TD1 is expressed and collaborates with L1 ORF1p (either directly or through interaction with L1 RNA, which remains unclear based on current results), leads to enhances L1 retrotransposition. These results establish the feasibility of this relationship existing in vivo in either development, disease, or both.   

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):        

      Major 

      (1) The study only used one knockout (KO) cell line generated by CRISPR/Cas9. Considering the possibility of an off-target effect, I suggest the authors attempt one or both of these suggestions. 

      A) Generate or acquire a similar DMNT1 deletion that uses distinct sgRNAs, so that the likelihood of off-targets is negligible. A few simple experiments such as qRT-PCR would be sufficient to suggest the same phenotype.  

      B) Confirm the DNMT1 depletion also by siRNA/ASO KD to phenocopy the KO effect.  (2) In addition to the strategies to demonstrate reproducibility, a rescue experiment restoring DNMT1 to the KO or KD cells would be more convincing. (Partial rescue would suffice in this case, as exact endogenous expression levels may be hard to replicate). 

      We have undertook several approaches to study the effect of DNMT1 loss or inactivation: As described above, we have generated a conditional KO mouse with ablation of DNMT1 in the epidermis. DNMT1-deficient keratinocytes isolated from these mice show a significant increase in L1TD1 expression.  In addition, treatment of primary human keratinocytes and two squamous cell carcinoma cell lines with the DNMT inhibitor aza-deoxycytidine led to upregulation of L1TD1 expression. Thus, the derepression of L1TD1 upon loss of DNMT1 expression or activity is not a clonal effect. Also, the spectrum of RNAs identified in RIP experiments as L1TD1-associated transcripts in HAP1 DNMT1 KO cells showed a strong overlap with the RNAs isolated by a related yet different method in human embryonic stem cells. When it comes to the effect of L1TD1 on L1-1 retrotranspostion, a recent study has reported a similar effect of L1TD1 upon overexpression in HeLa cells [4].  

      All of these points together help to convince us that our findings with HAP1 DNMT KO are in agreement with results obtained in various other cell systems and are therefore not due to off-target effects. With that in mind, we would pursue the suggestion of Reviewer 1 to analyze the effects of DNA hypomethylation upon DNMT1 ablation.

      (3) As stated in the introduction, L1TD1 and ORF1p share "sequence resemblance" (Martin 2006). Is the L1TD1 antibody specific or do we see L1 ORF1p if Fig 1C were uncropped?  (6) Is it possible the L1TD1 antibody binds L1 ORF1p? This could make Figure 2D somewhat difficult to interpret. Some validation of the specificity of the L1TD1 antibody would remove this concern (see minor concern below).  

      This is a relevant question. We are convinced that the L1TD1 antibody does not crossreact with L1 ORF1p for the following reasons: Firstly, the antibody does not recognize L1 ORF1p (40 kDa) in the  uncropped Western blot for Figure 1C (Author response image 4A). Secondly, the L1TD1 antibody gives only background signals in DKO cells in the  indirect immunofluorescence experiment shown in Figure 1E of the manuscript. 

      Thirdly, the immunogene sequence of L1TD1 that determines the specificity of the antibody was checked in the antibody data sheet from Sigma Aldrich. The corresponding epitope is not present in the L1 ORF1p sequence. Finally, we have shown that the ORF1p antibody does not cross-react with L1TD1 (Author response image 4B).

      Author response image 4. (A) Uncropped L1TD1 Western blot shown in Figure 1C. An unspecific band is indicated by an asterisk. (B) Westernblot analysis of WT, KO and DKO cells with L1 ORF1p antibody.

      (4) In abstract (P2), the authors mentioned that L1TD1 works as an RNA chaperone, but in the result section (P13), they showed that L1TD1 associates with L1 ORF1p in an RNAindependent manner. Those conclusions appear contradictory. Clarification or revision is required. 

      Our findings that both proteins bind L1 RNA, and that L1TD1 interacts with ORF1p are compatible with a scenario where L1TD1/ORF1p heteromultimers bind to L1 RNA. The additional presence of L1TD1 might thereby enhance the RNA chaperone function of ORF1p. This model is visualized now in Suppl. Figure S7C. 

      (5) Figure 2C fold enrichment for L1TD1 and ARMC1 is a bit difficult to fully appreciate. A 100 to 200-fold enrichment does not seem physiological. This appears to be a "divide by zero" type of result, as the CT for these genes was likely near 40 or undetectable. Another qRT-PCRbased approach (absolute quantification) would be a more revealing experiment. 

      This is the validation of the RIP experiments and the presentation mode is specifically developed for quantification of RIP assays (Sigma Aldrich RIP-qRT-PCR: Data Analysis Calculation Shell). The unspecific binding of the transcript in the absence of L1TD1 in DNMT1/L1TD1 DKO cells is set to 1 and the value in KO cells represents the specific binding relative the unspecific binding. The calculation also corrects for potential differences in the abundance of the respective transcript in the two cell lines. This is not a physiological value but the quantification of specific binding of transcripts to L1TD1. GAPDH as negative control shows no enrichment, whereas specifically associated transcripts show strong enrichement. We have explained the details of RIPqRT-PCR evaluation in Materials and Methods (page 14) and the legend of Figure 2C in the revised manuscript.       

      (6) Is it possible the L1TD1 antibody binds L1 ORF1p? This could make Figure 2D somewhat difficult to interpret. Some validation of the specificity of the L1TD1 antibody would remove this concern (see minor concern below).            

      See response to (3).  

      (7) Figure S4A and S4B: There appear to be a few unusual aspects of these figures that should be pointed out and addressed. First, there doesn't seem to be any ORF1p in the Input (if there is, the exposure is too low). Second, there might be some L1TD1 in the DKO (lane 2) and lane 3. This could be non-specific, but the size is concerning. Overexposure would help see this.

      The ORF1p IP gives rise to strong ORF1p signals in the immunoprecipitated complexes even after short exposure. Under these contions ORF1p is hardly detectable in the input. Regarding the faint band in DKO HAP1 cells, this might be due to a technical problem during Western blot loading. Therefore, the input samples were loaded again on a Western blot and analyzed for the presence of ORF1p, L1TD1 and beta-actin (as loading control) and shown as separate panel in Suppl. Figure S4A. 

      (8) Figure S4C: This is related to our previous concerns involving antibody cross-reactivity. Figure 3E partially addresses this, where it looks like the L1TD1 "speckles" outnumber the ORF1p puncta, but overlap with all of them. This might be consistent with the antibody crossreacting. The western blot (Figure 3C) suggests an upregulation of ORF1p by at least 2-3x in the DKO, but the IF image in 3E is hard to tell if this is the case (slightly more signal, but fewer foci). Can you return to the images and confirm the contrast are comparable? Can you massively overexpose the red channel in 3E to see if there is residual overlap? 

      In Figure 3E the L1TD1 antibody gives no signal in DNMT1/L1TD1 DKO cells confirming that it does not recognize ORF1p. In agreement with the Western blot in Figure 3C the L1 ORF1p signal in Figure 3E is stronger in DKO cells. In DNMT1 KO cells the L1 ORF1p antibody does not recognize all L1TD1 speckles. This result is in agreement with the Western blot shown above in Figure R4B and indicates that the L1 ORF1p antibody does not recognize the L1TD1 protein. The contrast is comparable and after overexposure there are still L1TD1 specific speckles. This might be due to differences in abundance of the two proteins.

      (9) The choice of ARMC1 and YY2 is unclear. What are the criteria for the selection?

      ARMC1 was one of the top hits in a pilot RIP-seq experiment (IP versus input and IP versus  IgG IP). In the actual RIP-seq experiment with DKO HAP1 cells instead of IgG IP as a negative control, we found ARMC1 as an enriched hit, although it was not among the top 5 hits. The results from the 2nd RIP-seq further confirmed the validity of ARMC1 as an L1TD1-interacting transcript. YY2 was of potential biological relevance as an L1TD1 target due to the fact that it is a processed pseudogene originating from YY1 mRNA as a result of retrotransposition. This is mentioned on page 6 of the revised manuscript.

      (10) (P16) L1 is the only protein-coding transposon that is active in humans. This is perhaps too generalized of a statement as written. Other examples are readily found in the literature. Please clarify.  

      We will tone down this statement in the revised manuscript. 

      (11) In both the abstract and last sentence in the discussion section (P17), embryogenesis is mentioned, but this is not addressed at all in the manuscript. Please refrain from implying normal biological functions based on the results of this study unless appropriate samples are used to support them.

      Much of the published data on L1TD1 function are related to embryonic stem cells [3-7]. Therefore, it is important to discuss our findings in the context of previous reports.

      (12) Figure 3E: The format of Figures 1A and 3E are internally inconsistent. Please present similar data/images in a cohesive way throughout the manuscript.  

      We show now consistent IF Figures in the revised manuscript.

      Minor: 

      (1) Intro:           

      - Is L1Td1 in mice and Humans? How "conserved" is it and does this suggest function?  

      Murine and human L1TD1 proteins share 44% identity on the amino acid level and it was suggested that the corresponding genes were under positive selection during evolution with functions in transposon control and maintenance of pluripotency [8].  

      - Why HAP1? (Haploid?) The importance of this cell line is not clear.          

      HAP1 is a nearly haploid human cancer cell line derived from the KBM-7 chronic myelogenous leukemia (CML) cell line [9, 10]. Due to its haploidy is perfectly suited and widely used for loss-of-function screens and gene editing. After gene editing  cells can be used in the nearly haploid or in the diploid state. We usually perform all experiments with diploid HAP1 cell lines.  Importantly, in contrast to other human tumor cell lines, this cell line tolerates ablation of DNMT1. We have included a corresponding explanation in the revised manuscript on page 5, first paragraph.

      - Global methylation status in DNMT1 KO? (Methylations near L1 insertions, for example?) 

      The HAP1 DNMT1 KO cell line with a 20 bp deletion in exon 4 used in our study was validated in the study by Smits et al. [11]. The authors report a significant reduction in overall DNA methylation. However, we are not aware of a DNA methylome study on this cell line. We show now data on the methylation of L1 elements in HAP1 cells and upon DNMT1 deletion in the revised manuscript in Suppl. Figure S1B.

      (2) Figure 1:  

      - Figure 1C. Why is LMNB used instead of Actin (Fig1D)?  

      We show now beta-actin as loading control in the revised manuscript.  

      - Figure 1G shows increased Caspase 3 in KO, while the matching sentence in the result section skips over this. It might be more accurate to mention this and suggest that the single KO has perhaps an intermediate phenotype (Figure 1F shows a slight but not significant trend). 

      We fully agree with the reviewer and have changed the sentence on page 6, 2nd paragraph accordingly.  

      - Would 96 hrs trend closer to significance? An interpretation is that L1TD1 loss could speed up this negative consequence. 

      We thank the reviewer for the suggestion. We have performed a time course experiment with 6 biological replicas for each time point up to 96 hours and found significant changes in the viability upon loss of DNMT1 and again significant reduction in viability upon additional loss of L1TD1 (shown in Figure 1F). These data suggest that as expexted loss of DNMT1 leads to significant reduction viability and that additional ablation of L1TD1 further enhances this effect.

      - What are the "stringent conditions" used to remove non-specific binders and artifacts (negative control subtraction?) 

      Yes, we considered only hits from both analyses, L1TD1 IP in KO versus input and L1TD1 IP in KO versus L1TD1 IP in DKO. This is now explained in more detail in the revised manuscript on page 6, 3rd paragraph.  

      (3) Figure 2:  

      - Figure 2A is a bit too small to read when printed. 

      We have changed this in the revised manuscript.

      - Since WT and DKO lack detectable L1TD1, would you expect any difference in RIP-Seq results between these two?

      Due to the lack of DNMT1 and the resulting DNA hypomethylation, DKO cells are more similar to KO cells than WT cells with respect to the expressed transcripts.

      - Legend says selected dots are in green (it appears blue to me). 

      We have changed this in the revised manuscript.           

      - Would you recover L1 ORF1p and its binding partners in the KO? (Is the antibody specific in the absence of L1TD1 or can it recognize L1?) I noticed an increase in ORF1p in the KO in Figure 3C.  

      Thank you for the suggestion. Yes, L1 ORF1p shows slightly increased expression in the proteome analysis and we have marked the corresponding dot in the Volcano plot (Figure 3A).

      - Should the figure panel reference near the (Rosspopoff & Trono) reference instead be Sup S1C as well? Otherwise, I don't think S1C is mentioned at all. 

      - What are the red vs. green dots in 2D? Can you highlight ERV and ALU with different colors? 

      We added the reference to Suppl. Figure S1C (now S3C) in the revised manuscript. In Figure 2D L1 elements are highlighted in green, ERV elements in yellow, and other associated transposon transcripts in red.     

      - Which L1 subfamily from Figure 2D is represented in the qRT-PCR in 2E "LINE-1"? Do the primers match a specific L1 subfamily? If so, which? 

      We used primers specific for the human L1.2 subfamily. 

      - Pulling down SINE element transcripts makes some sense, as many insertions "borrow" L1 sequences for non-autonomous retro transposition, but can you speculate as to why ERVs are recovered? There should be essentially no overlap in sequence. 

      In the L1TD1 evolution paper [8], a potential link between L1TD1 and ERV elements was discussed: 

      "Alternatively, L1TD1 in sigmodonts could play a role in genome defense against another element active in these genomes. Indeed, the sigmodontine rodents have a highly active family of ERVs, the mysTR elements [46]. Expansion of this family preceded the death of L1s, but these elements are very active, with 3500 to 7000 species-specific insertions in the L1-extinct species examined [47]. This recent ERV amplification in Sigmodontinae contrasts with the megabats (where L1TD1 has been lost in many species); there are apparently no highly active DNA or RNA elements in megabats [48]. If L1TD1 can suppress retroelements other than L1s, this could explain why the gene is retained in sigmodontine rodents but not in megabats." 

      Furthermore, Jin et al. report the binding of L1TD1 to repetitive sequences in transcripts [12]. It is possible that some of these sequences are also present in ERV RNAs.

      - Is S2B a screenshot? (the red underline). 

      No, it is a Powerpoint figure, and we have removed the red underline.

      (4) Figure 3: 

      - Text refers to Figure 3B as a western blot. Figure 3B shows a volcano plot. This is likely 3C but would still be out of order (3A>3C>3B referencing). I think this error is repeated in the last result section. 

      - Figure and legends fail to mention what gene was used for ddCT method (actin, gapdh, etc.). 

      - In general, the supplemental legends feel underwritten and could benefit from additional explanations. (Main figures are appropriate but please double-check that all statistical tests have been mentioned correctly).

      Thank you for pointing this out. We have corrected these errors in the revised manuscript.

      (5) Discussion: 

      -Aluy connection is interesting. Is there an "Alu retrotransposition reporter assay" to test whether L1TD1 enhances this as well? 

      Thank you for the suggestion. There is indeed an Alu retrotransposition reporter assay reported be Dewannieux et al. [13]. The assay is based on a Neo selection marker. We have previously tested a Neo selection-based L1 retrotransposition reporter assay, but this system failed to properly work in HAP1 cells, therefore we switched to a blasticidinbased L1 retrotransposition reporter assay. A corresponding blasticidin-based Alu retrotransposition reporter assay might be interesting for future studies (mentioned in the Discussion, page 11 paragraph 4 of the revised manuscript.

      (6) Material and Methods       : 

      - The number of typos in the materials and methods is too numerous to list. Instead, please refer to the next section that broadly describes the issues seen throughout the manuscript. 

      Writing style  

      (1) Keep a consistent style throughout the manuscript: for example, L1 or LINE-1 (also L1 ORF1p or LINE-1 ORF1p); per or "/"; knockout or knock-out; min or minute; 3 times or three times; media or medium. Additionally, as TE naming conventions are not uniform, it is important to maintain internal consistency so as to not accidentally establish an imprecise version. 

      (2) There's a period between "et al" and the comma, and "et al." should be italic. 

      (3) The authors should explain what the key jargon is when it is first used in the manuscript, such as "retrotransposon" and "retrotransposition".    

      (4) The authors should show the full spelling of some acronyms when they use it for the first time, such as RNA Immunoprecipitation (RIP).  

      (5) Use a space between numbers and alphabets, such as 5 µg.  

      (6) 2.0 × 105 cells, that's not an "x".  

      (7) Numbers in the reference section are lacking (hard to parse).  

      (8) In general, there are a significant number of typos in this draft which at times becomes distracting. For example, (P3) Introduction: Yet, co-option of TEs thorough (not thorough, it should be through) evolution has created so-called domesticated genes beneficial to the gene network in a wide range of organisms. Please carefully revise the entire manuscript for these minor issues that collectively erode the quality of this submission.  

      Thank you for pointing out these mistakes. We have corrected them in the revised manuscript. A native speaker from our research group has carefully checked the paper. In summary, we have added Supplementary Figure S7C and have changed Figures 1C, 1E, 1F, 2A, 2D, 3A, 4B, S3A-D, S4B and S6A based on these comments. 

      REFERENCES

      (1) Beck, M.A., et al., DNA hypomethylation leads to cGAS-induced autoinflammation in the epidermis. EMBO J, 2021. 40(22): p. e108234.

      (2) Altenberger, C., et al., SPAG6 and L1TD1 are transcriptionally regulated by DNA methylation in non-small cell lung cancers. Mol Cancer, 2017. 16(1): p. 1.

      (3) Narva, E., et al., RNA-binding protein L1TD1 interacts with LIN28 via RNA and is required for human embryonic stem cell self-renewal and cancer cell proliferation. Stem Cells, 2012. 30(3): p. 452-60.

      (4) Jin, S.W., et al., Dissolution of ribonucleoprotein condensates by the embryonic stem cell protein L1TD1. Nucleic Acids Res, 2024. 52(6): p. 3310-3326.

      (5) Emani, M.R., et al., The L1TD1 protein interactome reveals the importance of posttranscriptional regulation in human pluripotency. Stem Cell Reports, 2015. 4(3): p. 519-28.

      (6) Santos, M.C., et al., Embryonic Stem Cell-Related Protein L1TD1 Is Required for Cell Viability, Neurosphere Formation, and Chemoresistance in Medulloblastoma. Stem Cells Dev, 2015. 24(22): p. 2700-8.

      (7) Wong, R.C., et al., L1TD1 is a marker for undifferentiated human embryonic stem cells. PLoS One, 2011. 6(4): p. e19355.

      (8) McLaughlin, R.N., Jr., et al., Positive selection and multiple losses of the LINE-1-derived L1TD1 gene in mammals suggest a dual role in genome defense and pluripotency. PLoS Genet, 2014. 10(9): p. e1004531.

      (9) Andersson, B.S., et al., Ph-positive chronic myeloid leukemia with near-haploid conversion in vivo and establishment of a continuously growing cell line with similar cytogenetic pattern. Cancer Genet Cytogenet, 1987. 24(2): p. 335-43.

      (10) Carette, J.E., et al., Ebola virus entry requires the cholesterol transporter Niemann-Pick C1. Nature, 2011. 477(7364): p. 340-3.

      (11) Smits, A.H., et al., Biological plasticity rescues target activity in CRISPR knock outs. Nat Methods, 2019. 16(11): p. 1087-1093.

      (12) Jin, S.W., et al., Dissolution of ribonucleoprotein condensates by the embryonic stem cell protein L1TD1. Nucleic Acids Res, 2024.

      (13) Dewannieux, M., C. Esnault, and T. Heidmann, LINE-mediated retrotransposition of marked Alu sequences. Nat Genet, 2003. 35(1): p. 41-8.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The study by Deng et al reports single-cell expression analysis of developing mouse hearts and examines the requirements for cardiac fibroblasts in heart maturation. Much of this work is overlapping with previous studies, but the single-cell gene expression data may be useful to investigators in the field. The significance and scope of new findings are limited and major conclusions are largely based on correlative data.

      Strengths:

      The strengths of the manuscript are the new single-cell datasets and comprehensive approach to ablating cardiac fibroblasts in pre and postnatal development in mice.

      Weaknesses:

      There are several major weaknesses in the analysis and interpretation of the results.

      (1) The major conclusions regarding collagen signaling and heart maturation are based on gene expression patterns and are not functionally validated. The potential downstream signaling pathways were not examined and known structural contributions of fibrillar collagen to heart maturation are not discussed.

      We thank the reviewer for the comment. In this study, we mainly focused on the functional analysis of fibroblasts in heart development at embryonic and neonatal stages by using cell ablation system and single cell mRNA sequencing analysis. The further functional analysis of collagen pathway is interesting but out of the scope of this study. We will continue this line of research and share the results in the future. Moreover, through the analysis of single cell mRNA-sequencing data, we have predicted the downstream genes that are regulated by the collagen pathway in Fig 5C. We have also added sentences to highlight the structural role of collagen in affecting the related heart developmental processes.

      (2) The heterogeneity of fibroblast populations and contributions to multiple structures in the developing heart are not well-considered in the analysis. The developmental targeting of fibroblasts will likely affect multiple structures in the embryonic heart and other organs. Lethality is described in some of these studies, but additional analysis is needed to determine the effects on heart morphogenesis or other organs beyond the focus on cardiomyocyte maturation being reported. In particular, the endocardial cushions and developing valves are likely to be affected in the prenatal ablations, but these structures are not included in the analyses.

      We thank the reviewer for the comment. We have included a new figure presenting the fibroblast heterogeneity in developing hearts (Fig S3). We have also compared the valve structural differences at E18.5 (Fig S11).

      (3) ECM complexity and extensive previous work on specific ECM proteins in heart development and maturation are not incorporated into the current study. Different types of collagen (basement membrane Col4, filamentous Col6, and fibrillar Col1) are known to be expressed in fibroblast populations in the developing heart and have been studied extensively. Much also has been reported for other ECM components mentioned in the current work.

      We thank the reviewer for the comment. We agree that the ECM is complex, and the functions of many of its components have been previously reported, as mentioned in the introduction. In this study, our focus is to analyze the spatial and temporal expression patterns of various ECM genes in fibroblasts throughout developmental progression (Fig. S5–7). To further acknowledge previous work, we have added additional sentences and cited relevant literature on the role of collagen genes in developing hearts (page 4).

      Reviewer #2 (Public review):

      This study aims to elucidate the role of fibroblasts in regulating myocardium and vascular development through signaling to cardiomyocytes and endothelial cells. This focus is significant, given that fibroblasts, cardiomyocytes, and vascular endothelial cells are the three primary cell types in the heart. The authors employed a Pdgfra-CreER-controlled diphtheria toxin A (DTA) system to ablate fibroblasts at various embryonic and postnatal stages, characterizing the resulting cardiac defects, particularly in myocardium and vasculature development. scRNA-seq analysis of the ablated hearts identified collagen as a crucial signaling molecule from fibroblasts that influences the development of cardiomyocytes and vascular endothelial cells. This is an interesting manuscript; however, there are several major issues, including an over-reliance on the scRNA-seq data, which shows inconsistencies between replicates. Some of the major issues are described below.

      The comments are the same as the comments for “Recommendations for the authors”. Please see the responses below.

      Reviewer #3 (Public review):

      The authors investigated fibroblasts' communication with key cell types in developing and neonatal hearts, with a focus on the critical roles of fibroblast-cardiomyocyte and fibroblast-endothelial cell networks in cardiac morphogenesis. They tried to map the spatial distribution of these cell types and reported the major pathways and signaling molecules driving the communication. They also used Cre-DTA system to ablate Pdgfra labeled cells and observed myocardial and endothelial cell defects at development. They screened the pathways and genes using sequencing data of ablated hearts. Lastly, they reported compensatory collagen expression in long-term ablated neonate hearts. Overall, this study provides us with important insight into fibroblasts' roles in cardiac development and will be a powerful resource for collagens and ECM-focused research.

      Strengths:

      The authors utilized good analyzing tools to investigate multiple databases of single-cell sequencing and Multiseq. They identified significant pathways and cellular and molecular interactions of fibroblasts. Additionally, they compared some of their analytic findings with a human database, and identified several groups of ECM genes with varying roles in mice.

      Weaknesses:

      This study is majorly based on sequencing data analysis. At the bench, they used a very strident technique to study fibroblast functions by ablating one of the major cell populations of the heart. Considering the importance of the fibroblast population, intriguing in vivo findings were expected. Also, they analyzed the downstream genes in ablated hearts, but did not execute any experimental validation for any of the targets.

      Recommendations for the authors:

      Reviewing Editor Comments:

      All three reviewers found the large amount of scRNA-Seq data compelling and valuable, and they noted that the study's conclusions based on the scRNA Seq and fibroblast ablating align closely with previously published studies. Therefore, a more thorough discussion and integration of the current findings with prior studies are recommended. Each reviewer provided specific feedback to improve the manuscript, correct errors, and strengthen the overall presentation, and please edit the manuscript accordingly. Additionally, further validation of the scRNA-Seq data through more data analysis, reference comparisons, or additional experiments is encouraged.

      Reviewer #1 (Recommendations for the authors):

      (1) The heterogeneity of fibroblasts and ECM components in the developing heart needs to be considered in the analysis and description of results. There are extensive reports in both of these areas that would inform the gene expression and ablation studies being reported.

      We thank the reviewer for the comment. We have added a supplemental figure (Fig. S3) analyzing the heterogeneity of fibroblasts during development and described the results on page 3 and 4. Through the analysis of single-cell mRNA sequencing data, we identified four distinct populations of fibroblasts and further performed RNA scope to examine their spatial locations. Additionally, we agree with the reviewer that there are many types of ECM components, which we have addressed in the introduction (page 2). Furthermore, we have conducted a detailed analysis of the spatial and temporal expression patterns of ECM genes throughout developmental progression (Figs. S5–7).

      (2) One of the novel aspects of the work is the prenatal ablation of cardiac fibroblasts. Embryonic lethality was observed in some cases, but the specific cardiac structural anomalies or potential vascular effects were not described. The contributing role of cardiac fibroblasts to valvuloseptal development, which was likely affected in these studies, was not described.

      We thank the reviewer for the comment. Since the heart sections were not initially prepared to compare valve differences between control and ablation conditions, most sections do not include valve structures. However, in the small subset of sections that do contain valves, we have compared valve structures in control and ablated hearts at E18.5 following three doses of tamoxifen treatment from E15.5 to E17.5. In mutants, the valves appear shorter compared to controls. Specifically, we observed that in control hearts, the mitral valve was already connected to the papillary muscle, whereas in ablated hearts, the valve leaflet at similar position was not. We have included these images as a new supplemental figure (Fig. S11). Regarding vascular defects, we have described them in Fig. 3C and 3F.

      (3) The major conclusions regarding collagen signaling and heart development are based on correlations in gene expression and are not validated by functional data. What are the downstream signaling pathways affected and are they affected during development or with ablation? The main conclusions of the study do not take into account well-known structural functions of collagen in the developing heart.

      We thank the reviewer for the comment. Through regulatory prediction analysis, we identified the collagen ligands Col1a1, Col5a1, and Col4a1 from the collagen family (Fig. 5C), which regulate multiple genes in cardiomyocytes, including Masp1. Masp1 is a member of the lectin complement pathway and potentially regulates cardiomyocyte migration during development. These collagen ligands also regulate multiple mitochondria-related genes, such as Etfa, Ndufb10, Ndufs6, and Slc25a4, which are potentially important for cardiomyocyte development and maturation. Moreover, we agree with the reviewer that collagen is an important structural ECM protein, and its deletion or reduction could cause heart developmental defects due to its structural role. We have added a discussion on this possibility (page 8).

      (4) The postnatal ablation studies are very similar to studies with the same mouse lines reported by Kurabara et al 2022 in JMCC (PMID 35569524) which came to similar conclusions and was not cited in the current work.

      We thank the reviewer for the comment and apologize for overlooking this study. We have now included the citation on page 8.

      (5) The discussion of a regenerative response with DTA ablation of fibroblasts is confusing. Proliferation was examined in cardiomyocytes which lose their regenerative capacity after birth in mice. However, cardiac fibroblasts can proliferate in response to injury throughout life which is not really a regenerative process.

      We appreciate the reviewer’s comment. To avoid confusion, we have replaced the term "regeneration" with "response to cell loss" and "compensation."

      (6) Some of the descriptions of single-cell expression data are overstated (Page 7). Regulatory interactions, signaling pathway activation, or function cannot be determined from gene expression data alone.

      We thank the reviewer for the comment. We agree that these conclusions rely on results from multiple assays. We have weakened the description of the analysis by emphasizing that the findings are predictive results from scRNA-seq analysis.

      (7) In the last paragraph of the discussion "data not shown" should be shown or this information should be deleted. As written, the discussion does not present a clear description of what major new findings are being reported or why they are significant. The new insights into heart development are not specified.

      We thank the reviewer for the comment. We have added the data as a supplemental figure (Fig. S19). Since this paragraph is part of the discussion, we believe the results are not conclusive at this stage and require further research to explore the potential protective role of fibroblast ablation in neonatal hearts.

      Minor comments.

      (1) Figure legends are missing information needed to understand what is being shown. For example, in Figure 2, collagen is visualized using CHP staining.

      Thanks. We have gone through all figure legends to ensure that all necessary information has been provided.

      (2) The hearts in Figure S15 are upside down.

      Thanks. We have updated the figure.

      (3) In Figure S16A, "brian" should be "brain".

      Thanks. We have updated it.

      Reviewer #2 (Recommendations for the authors):

      This is an interesting manuscript; however, there are several major issues, including an overreliance on the scRNA-seq data, which shows inconsistencies between replicates. Some of the major issues are described below.

      (1) The CD31 immunostaining data (Figures 3B-G) indicate a reduction in endothelial cell numbers following fibroblast deletion using PdgfraCreER+/-; RosaDTA+/- mice. However, the scRNA-seq data show no percentage change in the endothelial cell population (Figure 4D). Furthermore, while the percentage of Vas_ECs decreased in ablated samples at E16.5, the results at E18.5 were inconsistent, showing an increase in one replicate and a decrease in another, raising concerns about the reliability of the RNA-seq findings.

      We thank the reviewer for the comment. We believe that measuring cell proportions in scRNA-seq results is sensitive and relies on a high number of total and target cells, similar to other cell counting assays such as FACS. As the reviewer pointed out, the proportions of Vas_EC in E18.5 replicates are inconsistent. Specifically, Col_4 at E18.5 showed a relatively low proportion of Vas_EC. Upon examining the cell numbers in each sample, we found that Col_4 had the lowest number of recovered cells, with approximately 760 in total, whereas the other samples had more than 920 cells each. Additionally, since immunofluorescence staining for CD31 marks both Vas_EC and Endo_EC, we combined these two cell types to increase the number of targeted cells. This analysis consistently showed that the ablated samples had lower proportions. However, given that the quantifications have also produced inconsistent results for other cell types, such as Ven_CM, as mentioned in the reviewer’s next question, we have decided to delete this plot to avoid confusion.

      Author response image 1.

      (2) Similarly, while the percentage of Ven_CMs increased at E18.5, it exhibited differing trends at E16.5 (Figure 4E), further highlighting the inconsistency of the scRNA-seq analysis with the other data.

      We thank the reviewer for the comment. Please see the response above.

      (3) Furthermore, the authors noted that the ablated samples had slightly higher percentages of cardiomyocytes in the G1 phase compared to controls (Figures 4H, S11D), which aligns with the enrichment of pathways related to heart development, sarcomere organization, heart tube morphogenesis, and cell proliferation. However, it is unclear how this correlates with heart development, given that the hearts of ablated mice are significantly smaller than those of controls (Figure 3E). Additionally, the heart sections from ablated samples used for CD31/DAPI staining in Figure 3F appear much larger than those of the controls, raising further inconsistencies in the manuscript.

      We thank the reviewer for the comment. We observed changes in G1-phase cardiomyocytes at both E16.5 and E18.5, with pathway enrichment primarily identified in E16.5 cardiomyocytes. At E16.5, the ablated hearts exhibited myocardial defects, including an increased trabecular-to-compact myocardium ratio and reduced vascular density. By E18.5, the ablated embryos had smaller hearts with reduced vascular density, although the trabecular-to-compact myocardium ratio showed no obvious changes. Regarding the larger section size in the ablated hearts compared to the control hearts, there are two reasons contributing to this discrepancy. First, the control and ablated heart sections have different scale bars. The ablated hearts were enlarged compared to control section. Secondly, the heart sections vary in size depending on their position. Sections taken from the middle of the heart are larger than those from the edges. In our initial comparison, we used an edge-positioned section from the control hearts and a middle-positioned section from the ablated hearts. To avoid confusion, we have now updated the control section to match the position of the ablated embryos more closely and used the same size of scale bars in the two images (Fig 3F).

      (4) The manuscript relies heavily on the scRNA-seq dataset, which shows inconsistencies between the two replicates. Furthermore, the morphological and histological analyses do not align with the scRNA-seq findings.

      We respectfully disagree with this comment from the reviewer. As shown in Figure 4B, the scRNAseq data from the two replicates are highly consistent. For inconsistencies in cell proportions and tissue section sizes, please refer to our responses above.

      (5) There is a lack of mechanistic insight into how collagen, as a key signaling molecule from fibroblasts, affects the development of cardiomyocytes and vascular endothelial cells.

      We thank the reviewer for the comment. In this study, we primarily focused on analyzing fibroblast function in heart development using cell ablation and single-cell mRNA sequencing. While further mechanistic analysis of the collagen pathway is intriguing, it falls outside the scope of this study. Additionally, our scRNAseq analysis identified multiple collagen ligands derived from fibroblasts that may regulate gene expression in Ven_CM and influence their development, as shown in Figure 5C. Although validating these predictions would be valuable, it is beyond the scope of this study. We will continue this line of research and share our findings in the future.

      (6) In Figure 1B, Col1a1 expression is observed in the epicardial cells (Figure 1A, E11.5), but this is not represented in the accompanying cartoon.

      We thank the reviewer for the comment. As stated in the main text (page 3), based on scRNA-seq and IF staining results, we observed that Col1a1 is also expressed in epicardial cells. In the cartoon, we depicted the pattern of fibroblasts rather than Col1a1-positive cells, which is why we did not include epicardial cells.

      (7) What is the genotype of the control animals used in the study?

      We thank the reviewer for the comment. We have added the genotype information for the control embryos in the legends of the relevant figures.

      (8) Do the PdgfraCreER+/-; RosaDTA+/- mice survive after birth when induced at E15.5, and do they exhibit any cardiac defects?

      We thank the reviewer for the comment. This is an interesting question; however, we did not perform the experiment because administering tamoxifen to pregnant mice from E15.5 to E18.5 causes delivery complications, as reported in the literature (PMID: 23139287). Unfortunately, this prevents us from exploring this question further.

      Reviewer #3 (Recommendations for the authors):

      Overall, this is a comprehensive study substantiated by the evidence the authors provided in their findings. However, I have a few concerns to be addressed.

      (1) The claim by the authors that "at E17.5 and P3, each FB was in contact with approximately one Vas_EC and four CMs at both stages" is not fully convincing. RNA scope images for Actn2 are not clear enough to lead the quantification (RNA scope images for Cdh5 look better). I suggest performing imaging at higher magnification and the Z stack technique to provide a better understanding of their localization. Also, no changes in FBs adjacent cell numbers (CM&EC) with ages (P3) compared to E17.5? Any thoughts on the explanation?

      We thank the reviewer for the comment. We imaged the staining results using a confocal microscope at 20X resolution. We also considered imaging them at 40X; however, due to the large areas that need to be imaged in these sections, it was challenging to do so. Additionally, we identified each CM based on Actn2 and DAPI staining information and are confident in the accuracy of our quantification results. Moreover, since each FB interacts with multiple CMs and Vas_ECs in 3D projections, but our calculations are based only on 2D imaging sections, there may be discrepancies compared to a true 3D environment. We have added a sentence to address this limitation (page 9). Regarding the similar number of interactions observed at E17.5 and P3, we think there are two possibilities. First, the three cell types may proliferate in a synchronized manner, maintaining a consistent number of interactions. Second, these cell types may exhibit minimal proliferation during late embryonic and early neonatal stages. Instead, heart growth primarily occurs through CM hypertrophy, which does not significantly alter the number of interactions.

      (2) Fix the Capitalized font of RNA markers in Figure S2.

      Thanks. We have updated them.

      (3) I appreciate the visualization of ligand-receptor interactions in collagen network comparison between FB to CM and FB to EC, and predictive analysis on the FB ligands that regulate differentially expressed genes in ablated heart CM and ECs.

      We appreciate the reviewer for the comment.

      (4) The authors depleted Pdgfra-Cre cells at E10.5, and reported 100% DTA+ lethality after 3 days. Induction at E13.5 to ablate Pdgfra-Cre cells resulted in survival at least up to E16.5 age. What could be the possible reasons authors think that lead to embryo lethality when induced at E10.5? Did the authors analyze the expression of Pdgfra at E10.5 to E13.5 using Pdgfra antibody or Pdgfra-Cre labeling, or using the ScRNA seq data?

      We thank the reviewer for the comment. The expression pattern of Pdgfra at E10.5 has been previously reported (PMID: 18297729) and shown to be highly expressed in the atrioventricular region, consistent with the Col1a1 expression pattern we profiled in this study. Therefore, we believe the embryonic lethality observed in the ablated embryos at E10.5 was likely due to the disruption of the atrioventricular structure. However, since Pdgfra is also expressed in other tissues at this stage, we cannot rule out the possibility that the ablation of non-cardiac tissues also contributed to the lethality.

      (5) In terms of the findings on the trabeculation and compaction defects, please provide the images of the ventricles with markers to indicate the compact and trabecular zones and their defects.

      Thanks! We have included images that illustrate the quantification of compact and trabecular myocardium thickness in control and ablated hearts (FigS10C).

      (6) Did the author check the expression of any other marker for the vascular system in addition to CD31 to see the effects of ablated FB on coronary vasculature development?

      We thank the reviewer for the comment. We analyzed only Cd31 to assess the effects of fibroblast ablation on the overall endothelial cell population. We did not separately examine the subpopulations, but this would be an interesting direction for future studies.

      (7) Can the authors interpret how findings from PHH3 proliferation explain thinner compact and thicker trabeculae in ablated hearts?

      We thank the reviewer for the comment and apologize for the misinterpretation of the results. We observed that the ablated hearts have a thinner compact myocardium, while the thickness of the trabecular myocardium remains unchanged, leading to an increased trabecular-to-compact myocardium ratio (Fig 3D). We have corrected the description in the manuscript accordingly. Moreover, since the compact myocardium has a higher proliferation rate than the trabecular myocardium, a reduction in overall cell proliferation is expected to have a more pronounced impact on the compact myocardium. Inhibition of compact myocardium proliferation has been reported to lead thinner compact myocardium and non-compaction defects (PMID: 31342111).

      (8) The authors did not execute experiments to find the downstream target that causes compaction defects and endothelial cell density defects upon ablation of FBs. Can you project from your sequencing analysis what could be the potential downstream if you could execute bench-side experiments on this?

      We appreciate the reviewer for the comment. We believe that the regulatory predictive results in Figures 5C and D from the scRNA-seq data analysis have provided a set of downstream candidates for validation. We could select some of the ligands, such as the collagen ligands Col1a1, Col4a1, and Col5a1, to treat the ablated embryos in vivo to assess whether they could partially rescue the myocardium defects. Additionally, we could conduct ex vivo experiments by co-culturing CM and FB, comparing them with CM alone and CM treated with the identified ligands. This would allow us to evaluate CM proliferation and the expression of downstream genes identified in the prediction results. However, as the reviewer suggested, these experiments are planned for future studies.

      (9) Please provide the echocardiographic M mode images with a comparable number of cardiac cycles in control and ablated (Fig. 6H). Also, the heart rate of the ablated heart is too low to compare other parameters with the control. If you could stabilize the heart rate at comparable values to control the heart, it is possible that EF and FS values will be largely changed.

      We thank the reviewer for the comment. As the echocardiographic analysis was performed on conscious mice, the lower heart rates in the ablated mice are a phenotype associated with the ablation. Unfortunately, we are unable to adjust them to the same as the control mice.

      (10) Can you provide a numerical dataset for any one of the cell chat figures? Like in figure 2A, supporting the claim "However, in terms of interaction strength, FB exhibited the highest values compared to those of other cell types (Fig. 2A)".

      Yes, we have added a supplemental table (Table S2) containing the numerical interaction weights. As shown in the table, the interactions between FB and other cell types have the highest values.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      Chen and Phillips describe the dynamic appearance of cytoplasmic granules during embryogenesis analogous to SIMR germ granules, and distinct from CSR-1-containing granules, in the C. elegans germline. They show that the nuclear Argonaute NRDE-3, when mutated to abrogate small RNA binding, or in specific genetic mutants, partially colocalizes to these granules along with other RNAi factors, such as SIMR-1, ENRI-2, RDE-3, and RRF-1. Furthermore, NRDE-3 RIP-seq analysis in early vs. late embryos is used to conclude that NRDE-3 binds CSR-1-dependent 22G RNAs in early embryos and ERGO-1dependent 22G RNAs in late embryos. These data lead to their model that NRDE-3 undergoes small RNA substrate "switching" that occurs in these embryonic SIMR granules and functions to silence two distinct sets of target transcripts - maternal, CSR-1 targeted mRNAs in early embryos and duplicated genes and repeat elements in late embryos.

      Strengths:

      The identification and function of small RNA-related granules during embryogenesis is a poorly understood area and this study will provide the impetus for future studies on the identification and potential functional compartmentalization of small RNA pathways and machinery during embryogenesis.

      Weaknesses:

      (1) While the authors acknowledge the following issue, their finding that loss of SIMR granules has no apparent impact on NRDE-3 small RNA loading puts the functional relevance of these structures into question. As they note in their Discussion, it is entirely possible that these embryonic granules may be "incidental condensates." It would be very welcomed if the authors could include some evidence that these SIMR granules have some function; for example, does the loss of these SIMR granules have an effect on CSR-1 targets in early embryos and ERGO-1-dependent targets in late embryos?

      We appreciate reviewer 1’s concern that we do not provide enough evidence for the function of the SIMR granules. As suggested, we examined the NRDE-3 bound small RNAs more deeply, and we do observe a slight but significant increased CSR-class 22G-RNAs binding to NRDE-3 in late embryos of simr-1 and enri-2 mutants (see below, right). We hypothesize that this result could be due to a slower switch from CSR to ERGO 22G-RNAs in the absence of SIMR granules. We added these data to Figure 6G.

      (2) The analysis of small RNA class "switching" requires some clarification. The authors re-define ERGO1-dependent targets in this study to arrive at a very limited set of genes and their justification for doing this is not convincing. What happens if the published set of ERGO-1 targets is used? 

      As we mentioned in the manuscript, we initially attempted to use the previously defined ERGO targets. However, the major concern is fewer than half the genes classified as ERGO targets by Manage et al. and Fischer et al. overlap with one another (Figure 6—figure supplement 1D and below). We reason this might because the gene sets were defined as genes that lose small RNAs in various ERGO pathway mutants and because different criteria were used to define the lists as discussed in the manuscript (lines 471-476). As a result, some of the previously defined ERGO target genes may actually be indirect targets of the pathway. Here we focus on genes targeted by small RNAs enriched in an ERGO pathway Argonaute IP, which should be more specific.

      In this manuscript, we are interested specifically in the ERGO targets bound by NRDE-3, thus we utilized the IP-small RNA sequencing data from young adult animals (Seroussi et al, 2023), to define a new ERGO list. We are confident about this list because 1) Most of our new ERGO genes overlap with the overlap between ERGO-Manage and ERGO-Fischer list (see Figure 6—figure supplement 1D in our manuscript and below). 2) We observed the most significant decrease of small RNA levels and increase of mRNA levels in the nrde-3 mutants using our newly defined list (see Figure 6—figure supplement 1E-F in our manuscript).

      To further address reviewer 1’s concern about whether the data would look significantly different when using the ERGO-Manage and ERGO-Fischer lists, we made new scatter plots shown in Author response image 1 panels A-C below (ERGO-Manage – purple, ERGO-Fischer- yellow, and the overlap - yellow with purple ring). We found that the small switching pattern of NRDE-3 is consistent with our newly defined list, particularly if we look at the overlap of ERGO-Manage and ERGO-Fischer list (Author response image 1 panels D-F below, red).

      Author response image 1.

      Further, the NRDE-3 RIP-seq data is used to conclude that NRDE-3 predominantly binds CSR-1 class 22G RNAs in early embryos, while ERGO-1-dependent 22G RNAs are enriched in late embryos. a) The relative ratios of each class of small RNAs are given in terms of unique targets. What is the total abundance of sequenced reads of each class in the NRDE-3 IPs? 

      To address the reviewer’s question about the total abundance of sequenced reads of each class in the NRDE-3 IPs: Author response image 2 panel A-B below show the total RPM of CSR and ERGO class sRNAs in inputs and IPs at different stages. Focusing on late embryos, the total abundance of ERGO-dependent sRNAs is similar to CSR-class sRNAs in input, while much higher in IP, indicating an enrichment of ERGO-dependent 22G-RNAs in NRDE-3 consistent with our log2FC (IP vs input) in Figure 6B. This data supports our conclusion that NRDE-3 preferentially binds to ERGO targets in late embryos.

      Author response image 2.

      b) The "switching" model is problematic given that even in late embryos, the majority of 22G RNAs bound by NRDE-3 is the CSR-1 class (Figure 5D). 

      It is important to keep in mind the difference in the total number of CSR target genes (3834) and ERGO target genes (119).  The pie charts shown in Figure 6D are looking at the total proportion of the genes enriched in the NRDE-3 IP that are CSR or ERGO targets. For the NRDE-3 IP in late embryos, that would be 70/119 (58.8%) of ERGO targets are enriched, while 172/3834 (4.5%) of CSR targets are enriched. These data are also supported by the RPM graphs shown in Author response image 2 panels A-B above, which show that the majority of the small RNA bound by NRDE-3 in late embryos are ERGO targets. Nonetheless, NRDE-3 still binds to some CSR targets shown as Figure 6D and panel B, which may be because the amount of CSR-class 22G-RNAs is reduced gradually across embryonic development as the maternally-deposited NRDE-3 loaded with CSR-class 22G-RNAs is diluted by newly transcribed NRDE-3 loaded with ERGOdependent 22G-RNAs (lines 857-862). 

      c) A major difference between NRDE-3 small RNA binding in eri-1 and simr-1 mutants appears to be that NRDE-3 robustly binds CSR-1 22G RNAs in eri-1 but not in simr-1 in late embryos. This result should be better discussed.

      In the eri-1 mutant, we hypothesize that NRDE-3 robustly binds CSR-class 22G-RNAs because ERGOclass 22G-RNAs are not synthesized during mid-embryogenesis, so either NRDE-3 is unloaded (in granule at 100-cell stage in Figure 2A) or mis-loaded with CSR-class 22G-RNAs (in the nucleus at 100cell stage in Figure 2A). We don’t have a robust method to address the proportion of loaded vs. unloaded NRDE-3 so it is difficult to address the degree to which NRDE-3 is misloaded in the eri-1 mutant. In the simr-1 mutant, both classes of small RNAs are present and NRDE-3 is still preferentially loaded with ERGO-dependent 22G-RNAs, though we do see a subtle increase in association with CSR-class 22GRNAs. These data could suggest a less efficient loading of NRDE-3 with ERGO-dependent 22G-RNAs, but we would need more precise methods to address the loading dynamics in the simr-1 mutant.

      (3) Ultimately, if the switching is functionally important, then its impact should be observed in the expression of their targets. RNA-seq or RT-qPCR of select CSR-1 and ERGO-1 targets should be assessed in nrde-3 mutants during early vs late embryogenesis.

      The function of NRDE-3 at ERGO targets has been well studied (Guang et al, 2008) and is also assessed in our H3K9me3 ChIP-seq analysis in Figure 7E where, in mixed staged embryos, H3K9me3 level on ERGO targets (labeled as ‘NRDE-3 targets in young adults’) is reduced significantly in the nrde-3 mutant.

      To understand the function of NRDE-3 binding on CSR targets in early embryos, we attempted to do RTqPCR, smFISH, and anti-H3K9me3 CUT&Tag-seq on early embryos, and we either failed to obtain enough signal or failed to detect any significant difference (data not shown). We additionally tested the possibility that NRDE-3 functions with CSR-class 22G-RNAs in oocytes. We present new data showing that NRDE-3 represses RNA Pol II in oocytes to promote global transcriptional repression at the oocyteto-embryo transition, we now included these data in Figure 8. 

      Reviewer #2 (Public review):

      Summary:

      NRDE-3 is a nuclear WAGO-clade Argonaute that, in somatic cells, binds small RNAs amplified in response to the ERGO-class 26G RNAs that target repetitive sequences. This manuscript reports that, in the germline and early embryos, NRDE-3 interacts with a different set of small RNAs that target mRNAs. This class of small RNAs was previously shown to bind to a different WAGO-clade Argonaute called CSR1, which is cytoplasmic, unlike nuclear NRDE-3. The switch in NRDE-3 specificity parallels recent findings in Ascaris where the Ascaris NRDE homolog was shown to switch from sRNAs that target repetitive sequences to CSR-class sRNAs that target mRNAs.

      The manuscript also correlates the change in NRDE-3 specificity with the appearance in embryos of cytoplasmic condensates that accumulate SIMR-1, a scaffolding protein that the authors previously implicated in sRNA loading for a different nuclear Argonaute HRDE-1. By analogy, and through a set of corelative evidence, the authors argue that SIMR foci arise in embryogenesis to facilitate the change in NRDE-3 small RNA repertoire. The paper presents lots of data that beautifully documents the appearance and composition of the embryonic SIMR-1 foci, including evidence that a mutated NRDE-3 that cannot bind sRNAs accumulates in SIMR-1 foci in a SIMR-1-dependent fashion.

      Weaknesses:

      The genetic evidence, however, does not support a requirement for SIMR-1 foci: the authors detected no defect in NRDE-3 sRNA loading in simr-1 mutants. Although the authors acknowledge this negative result in the discussion, they still argue for a model (Figure 7) that is not supported by genetic data. My main suggestion is that the authors give equal consideration to other models - see below for specifics.

      We appreciate reviewer 2’s comments on the genetic evidence for the function of SIMR foci.  A similar concern was also brought up by reviewer 1. By re-examining our sequencing data, we found that there is a modest but significant increase in NRDE-3 association with CSR-class sRNAs in simr-1 and enri-2 mutants in late embryos. We believe that this data supports our model that SIMR-1 and ENRI-2 are required for an efficient switch of NRDE-3 bound small RNAs. Please refer our response to the reviewer 1 - point (1), and Figure 6G in the updated manuscript. 

      Reviewer #3 (Public review):

      Summary:

      Chen and Phillips present intriguing work that extends our view on the C. elegans small RNA network significantly. While the precise findings are rather C. elegans specific there are also messages for the broader field, most notably the switching of small RNA populations bound to an argonaute, and RNA granules behavior depending on developmental stage. The work also starts to shed more light on the still poorly understood role of the CSR-1 argonaute protein and supports its role in the decay of maternal transcripts. Overall, the work is of excellent quality, and the messages have a significant impact.

      Strengths:

      Compelling evidence for major shift in activities of an argonaute protein during development, and implications for how small RNAs affect early development. Very balanced and thoughtful discussion.

      Weaknesses:

      Claims on col-localization of specific 'granules' are not well supported by quantitative data

      We have now included zoomed images of individual granules to better show the colocalization in Figure 4 and Figure 4—figure supplement 1, and performed Pearson’s colocalization analysis between different sets of proteins in Figure 4B. 

      Reviewer #2 (Recommendations for the authors):

      - The manuscript is very dense and the gene names are not helpful. For example, the authors mention ERGO-1 without clarifying the type of protein, etc. I suggest the authors include a figure to go with the introduction that describes the different classes of primary and secondary sRNAs, associated Argonautes, and other accessory proteins. Also include a table listing relevant gene names, protein classes, main localizations, and proposed functions for easy reference by the readers.

      We agree that the genes names in different small RNA pathways are easily confused. We added a diagram and table in Figure 1—figure supplement 1 depicting the ERGO/NRDE and CSR pathways and added clarification about the ERGO/NRDE-3 pathway in the text from line 126-128.  

      - Line 424 - the wording here and elsewhere seems to imply that SIMR-1 and ENRI-2, although not essential, contribute to NRDE-3 sRNA loading. The sequencing data, however, do not support this - the authors should be clearer on this. If the authors believe there are subtle but significant differences, they should show them perhaps by adding a panel in Figure 5 that directly compares the NRDE-3 IPs in wildtype versus simr-1 mutants. Figure 5H however does not support such a requirement.

      As brought up by reviewer 1, we do not see difference in binding of ERGO-dependent sRNA in simr-1 mutant in late embryos. We do, however, see a modest, but significant, increase of CSR-sRNAs bound by NRDE-3 in simr-1 and enri-2 mutants, which we hypothesize could be due to a less efficient loading of ERGO-dependent 22G-RNAs by NRDE-3. The updated data are now in Figure 6G. We have also edited the text and model figure to soften these conclusions.

      - Condensates of PGL proteins appear at a similar time and place (somatic cells of early embryos) as the embryonic SIMR-1 foci. The PGL foci correspond to autophagy bodies that degrade PGL proteins. Is it possible that SIMR-1 foci also correspond to degradative structures? The possibility that SIMR-1 foci are targeted for autophagy and not functional would fit with the finding that simr-1 mutants do not affect NRDE-3 loading in embryos.

      We appreciate reviewer 2’s comments on possibility of SIMR granules acting as sites for degradation of SIMR-1 and NRDE-3. We think this is not the case for the following reasons: 1) if SIMR granules are sites of autophagic degradation, then we would expect that embryonic SIMR granules in somatic cells, like PGL granules, should only be observed in autophagy mutants; however we see them in wild-type embryos 2) we would not expect a functional Tudor domain to be required for granule localization; however in Figure 1—figure supplement 2B, we show that a point mutation in the Tudor domain of SIMR-1 abrogates SIMR granule formation, and 3) if NRDE-3(HK-AA) is recruited to SIMR granules for degradation while wild-type NRDE-3 is cytoplasmic, then NRDE-3(HK-AA) should shows a significantly reduced protein level comparing to wild-type NRDE-3. In the western blot in Figure 2—figure supplement 1B, NRDE-3 and NRDE-3(HK-AA) protein levels are similar, indicating that NRDE-3(HK-AA) is not degraded despite being unloaded. This is in contrast to what we have observed previously for HRDE-1, which is degraded in its unloaded state. If SIMR-1 played a role directly in promoting degradation of NRDE-3(HK-AA), we would similarly expect to see a change in NRDE-3 or NRDE-3(HK-AA) expression in a simr-1 mutant. We performed western blot and did not observe a significant change in protein expression for NRDE-3 (Figure 3—figure supplement 1A). 

      Although under wild-type conditions, SIMR granules do not appear to be sites of autophagic degradation, upon treatment with lgg-1 (an autophagy protein) RNAi, we found that SIMR-1, as well as many other germ granule and embryonic granule-localized proteins, increase in abundance in late embryos.  This data demonstrates that ZNFX-1, CSR-1, SIMR-1, MUT-2/RDE-3, RRF-1, and unloaded NRDE-3 are removed by autophagic degradation similar to what have been shown previously for PGL-1 proteins (Zhang et al, 2009, Cell). We added these data to Figure 5. It is important to emphasize, however, that the timing of degradation differs for each granule assayed (Lines 447-450), indicating that there must be multiple waves of autophagy to selectively degrade subsets of proteins when they are no longer needed by the embryo.

      - The observation that an NRDE-3 mutant that cannot load sRNAs localizes to SIMR-1 foci does not necessarily imply that wild-type unloaded NRDE-3 would also localize there. Unless the authors have additional data to support this idea, the authors should acknowledge that this hypothesis is speculative. In fact, why does cytoplasmic NRDE-3 not localize to granules in the rde-3;ego-1degron strain shown in Figure 6B?? Is it possible that the NRDE-3 mutant accumulates in SIMR-1 foci because it is unfolded and needs to be degraded?

      We believe that wild-type NRDE-3 also localize to SIMR foci when unloaded. This is supported by the localization of wild-type NRDE-3 in eri-1 and rde-3 mutants, where a subset of small RNAs are depleted. Wild-type NRDE-3 localizes to both somatic SIMR-1 granules and the nucleus, depending on embryo stage (Figure 2A, Figure 2—figure supplement 1C). The granule numbers in eri-1 and rde-3 mutants are less than the nrde-3(HK-AA) mutant, consistent with the imaging data that NRDE-3 only partially localize to somatic granule (Figure 2A – 100-cell stage).

      In the rde-3; ego-1 double mutant, the embryos have severe developmental defect: they cannot divide properly after 4-8 cell stage and exhibit morphology defects after that stage. In wild-type, SIMR foci does not appear until around 8-28-cell stage (shown in Figure 1C), so we believe that cytoplasmic NRDE-3 does not localize to foci in the double mutant is because of the timing.

      - The authors propose that NRDE-3 functions in nuclei to target mRNAs also targeted in the cytoplasm by CSR-1. If so, how do they propose that NRDE-3 might do this since little transcription occurs in oocytes/early embryos?? Are the authors suggesting that NRDE-3 targets germline genes for silencing specifically at the times that zygotic transcription comes back on, or already in maturing oocytes? Is the transcription of most CSR-1 targets silenced in early embryos??

      We appreciate the suggestions to check the function of NRDE-3 in oocytes. We tested this possibility and found it to be correct. NRDE-3 functions in oocytes for transcriptional repression by inhibiting RNA Pol II elongation. We added these data to Figure 8. We also attempted to do RT-qPCR, smFISH, and antiH3K9me3 Cut&Tag-seq on early embryos to further test the hypothesis that NRDE-3 acts with CSR-class 22G-RNAs in early embryos, but we either failed to obtain enough signal or failed to detect any significant difference (data not shown). Therefore, we think that the primary role for NRDE-3 bound to CSR-class 22G-RNAs may be for global transcriptional repression of oocytes prior to fertilization.

      - Line 684-686: "In summary, this work investigating the role of SIMR granules in embryos, together with our previous study of SIMR foci in the germline (Chen and Phillips 2024), has identified a new mechanism for small RNA loading of nuclear Argonaute proteins in C. elegans". This statement appears overstated/incorrect since there is no evidence that SIMR-1 foci are required for sRNA loading of NRDE3. The authors should emphasize other models, as suggested above.

      We have revised the text on line 869-871 to emphasize that SIMR granule regulate the localization of nuclear Argonaute proteins, rather than suggesting a direct role on controlling small RNA loading. We also edit the title, text, and legend for our model in Figure 9. 

      Reviewer #3 (Recommendations for the authors):

      Issues to be addressed:

      - The authors show a switch in 22G RNA binding by NRDE-3 during embryogenesis. While the data is convincing, it would be great if it could be tested if the preferred NRDE-3 replacement model is indeed correct. This could be done relatively easily by giving NRDE-3 a Dendra tag, allowing one to colour-switch the maternal WAGO-3 pool before the zygotic pool comes up. Such data would significantly enhance the manuscript, as this would allow the authors to follow the fate of maternal NRDE-3 more precisely, perhaps identifying a period of sharp decline of maternal NRDE-3.

      We think the NRDE-3 Dendra tag experiment suggested by the reviewer is a clever approach and we will consider generating this strain in the future. However, we feel that optimization of the color-switching tag between the maternal germline and the developing embryos is beyond the scope of this manuscript. To partially address the question about NRDE-3 fate during embryogenesis, we examined the single-cell sequencing data of C. elegans embryos from 1-cell to 16-cell stage (Tintori et al, 2016, Dev Cell; Visualization tool from John I Murray lab), as shown in Author response image 3 Panel A below, NRDE-3 transcript level increases as embryo develops, indicating that zygotic NRDE-3 is being actively expressed starting very early in development. We hypothesize that maternal NRDE-3 will either be diluted as the embryo develops or actively degraded during early embryogenesis. 

      Author response image 3.

      - Figure 3A: * should mark PGCs, but this seems incorrect. At the 8-cell stage there still is only one PGC (P4), not two, and at 100 cells there are only two, not three germ cells. Also, the identification of PGCs with a maker (PGL for instance) would be much more convincing.

      We apologize for the confusion in Figure 3A. We changed the figure legend to clarify that the * indicate nuclear NRDE-3 localization in somatic cells for 8- and 100-cell stage embryos rather than the germ cells.  

      - Overall, the authors should address colocalization more robustly. In the current manuscript, just one image is provided, and often rather zoomed-out. How robust are the claims on colocalization, or lack thereof? With the current data, this cannot be assessed. Pearson correlation, combined with line-scans through a multitude of granules in different embryos will be required to make strong claims on colocalization. This applies to all figures (main and supplement) where claims on different granules are derived from.

      We thank reviewer 3 for this important suggestion. To better address the colocalization, we included insets of individual granules in Figure 2D and Figure 4. We also performed colocalization analysis by calculating the Pearson’s R value between different groups of proteins in Figure 4B, to highlight that SIMR-1 colocalizes with ENRI-2, NRDE-3(HK-AA), RDE-3, and RRF-1, while CSR-1 colocalizes with EGO-1.

      For the proteins that lack colocalization in Figure 4—figure supplement 1, we also added insets of individual granules. Additionally, we included a new set of panels showing SIMR-1 localization compared to tubulin::GFP (Figure 4—figure supplement 1I) in response to a recent preprint (Jin et al, 2024, BioRxiv), which finds NRDE-3 (expressed under a mex-5 promoter) associating with pericentrosomal foci and the spindle in early embryos. We do not see SIMR-1 (or NRDE-3, data not shown) at centrosomes or spindles in wild-type conditions but made a similar observation for SIMR-1 in a mut-16 mutant (Figure 4E). All of the localization patterns were examined on at least 5 individual 100-cell staged embryos with same localization pattern.

      - Figure 7: Its title is: Function of cytoplasmic granules. This is a much stronger statement than provided in the nicely balanced discussion. The role of the granules remains unclear, and they may well be just a reflection of activity, not a driver. While this is nicely discussed in the text, figure 7 misses this nuance. For instance, the title suggests function, and also the legend uses phrases like 'recruited to granule X'. If granules are the results of activity, 'recruitment' is really not the right way to express the findings. The nuance that is so nicely worded in the discussion should come out fully in this figure and its legend as well.

      We have changed the title of Figure 7 (now Figure 9) to “Model for temporally- and developmentallyregulated NRDE-3 function” to deemphasize the role of the granules and to highlight the different functions of NRDE-3. Similarly, we have rephrased the text in the figure and legend and add a some details about our new results.

      Minor:

      Typo: line 663 Acaris

      We corrected the typo.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary: 

      The authors demonstrated that carbon depletion triggers the autophagy-dependent formation of Rubisco Containing Bodies, which contain chloroplast stroma material, but exclude thylakoids. The authors show that RCBs bud directly from the main body of chloroplasts rather than from stromules and that their formation is not dependent on the chloroplast fission factor DRP5. The authors also observed a transient engulfment of the RBCs by the tonoplast during delivery to the vacuolar lumen.

      Strengths: 

      The authors demonstrate that autophagy-related protein 8 (ATG8) co-localizes to the chloroplast demarking the place for RCB budding. The authors provide good-quality time-lapse images and co-localization of the markers corroborating previous observations that RCBs contain only stroma material and do not include thylakoid. The text is very well written and easy to follow. 

      Weaknesses: 

      A significant portion of the results presented in the study comes across as a corroboration of the previous findings made under different stress conditions: autophagy-dependent formation of RCBs was reported by Ishida et all in 2009. Furthermore, some included results are not of particular relevance to the study's aim. For example, it is unclear what is the importance of the role of SA in the formation of stromules, which do not serve as an origin for the RCBs. Similarly, the significance of the transient engulfment of RCBs by the tonoplast remained elusive. Although it is indeed a curious observation, previously reported for peroxisomes, its presentation should include an adequate discussion maybe suggesting the involved mechanism. Finally, some conclusions are not fully supported by the data: the suggested timing of events poorly aligns between and even within experiments mostly due to high variation and low number of replicates. Most importantly, the discussion does not place the findings of this study into the context of current knowledge on chlorophagy and does not propose the significance of the piece-meal vs complete organelle sequestration into the vacuole under used conditions, and does not dwell on the early localization of ATG8 to the future budding place on the chloroplast. 

      We performed additional experiments with biological replicates that involved quantification. The results of these experiments validate the findings of this study. We also revised the Discussion section, which now includes a discussion of the interplay between piecemeal-type and entire-organelle-type chloroplast autophagy and the relevance of autophagy adaptor and receptor proteins to the localization of ATG8 on the chloroplast surface. Accordingly, the first subheading section in the Discussion became too long. Therefore, we divided it into two subheading sections. We believe that the revisions successfully address the weaknesses pointed out by the reviewer and enhance the importance of the current study. Below is a detailed description of the improvements made to our manuscript in response to the reviewer comments.

      Reviewer #1 (Recommendations For The Authors): 

      It would be great if the authors kindly used numbered lines to facilitate the review process. 

      We have added line numbers to the text of the revised version of the manuscript.  

      The authors use the words "budding", "protrusion" and "stromule formation" interchangeably in some parts of the text. For the sake of clarity, it would be best to be consistent in the terminology and possibly elaborate on the exact differences between these structure types and the criteria by which they were identified. 

      We have checked all of the text and improved the consistency of the terminology. An important finding of this study is that chloroplasts form budding structures at the site associated with ATG8. These structures then divide to become a type of autophagic cargo termed a Rubiscocontaining body. We therefore mainly use the terms “bud” and “budding” throughout the text. In the experiments shown in Figure 5, we considered the possibility that chloroplast protrusions accumulate in leaves of atg mutants and do not divide because the mutants cannot create autophagosomes. Therefore, the word “protrusion” was used to describe the results shown in Figure 5 in which the proportion of chloroplasts forming protrusions was scored. In the revised text, the word “protrusion” is only used in descriptions of Figure 5. Previous reports define stromules as thin, tubular, extended structures (less than 1 µm in diameter) of the plastid stroma (Hanson and Sattarzadeh, 2011; Brunkard et al., 2015). In the revised text, the word “stromules” is used to describe the structures defined in these previous reports. We have added definitions of each term to the Introduction, Methods and Results sections where appropriate (lines 57–58, 160–162, 247–249, 313–316, 655–658, 668–670).      

      Pages 3-4: the authors observed budding of the chloroplasts within a few minutes - it would be helpful to specify that time was probably counted from the first observation of budding, not from the start of the dark treatment, and also specify the exact treatment duration for each of the experiments. 

      The time scales in the figures do not represent the time from the start of the dark treatment. Instead, they describe the duration from the start of the time-lapse videos that were used to generate the still images. Therefore, the indicated time scales are almost the same as the duration from the start of the observations of each target structure (chloroplast buds or GFPATG8a-labeled structures). As described in the Methods section, leaves were incubated in darkness for 5 to 24 h to induce sugar starvation. Such sugar-starved leaves were subjected to live-cell monitoring for the target structures. Since Arabidopsis leaves accumulate starch as a stored sugar source (Smith and Stitt, 2007; Usadel et al., 2008), dark treatment lasting several minutes is not sufficient for the starch to be consumed and sugar starvation to be induced.   To avoid confusion, we have added definitions of the time scales to the legends of figures containing the results of time-lapse imaging. We have also specified the durations of dark treatments used to obtain the respective results in the legends. 

      Figure 6: the time scale for complete autophagosome formation is in the range of 100-120 sec, how do these results align with the results shown in Figures 3B and C, where complete autophagosomes are suggested to be released into the vacuole after 73.8 sec. Furthermore, another structure is suggested to be formed within 50 sec. Such experiments possibly require a large number of replicates to estimate representative timing. 

      As mentioned in the previous response, the time scales in still frames represent the duration from the start of the corresponding video. Leaves incubated in darkness for 5 to 24 h were subjected to live-cell imaging. When we identified the target structures, e.g., GFP-ATG8alabeled structures on the surfaces of chloroplasts (Figure 6) or chloroplast budding structures (Figure 3), we began to track these structures. Therefore, the time scales in the figures do not align to a common time axis. We revised the descriptions about Figure 3 and Figure 6 in the Results section to clearly explain that the time points in each experiment merely indicates the time of one observation.

      The authors might want to consider using arrows to indicate structures of interest in all movies and figures.

      We have added arrows to indicate the structures of interest in the starting frames of all videos. We hesitate to add arrows to highlight RCBs accumulating in the vacuole (Figure 1-figure supplement 1, Figure 5 and Figure 8) and stromules (Figure 7) because many arrows would be required, which would obscure large portions of the images. We believe that the images without arrows clearly represent the appearance of RCBs or stromules and that their quantification (Figure 1-figure supplement 1C, Figure 5B, Figure 5-figure supplement 1B, Figure 7B, 7D, 7F, and Figure 8B) well supports the results.   

      Figure 7 Supplement 1: do the authors detect complete chloroplasts in the vacuole of atg7 and sid2/atg7? 

      We did not observe the vacuolar transport of whole chloroplasts in atg7 or atg7 sid2 plants under our experimental conditions. The figure below (Figure 1 for Response to reviewers) shows images of mesophyll cells from a leaf (third rosette leaf of a 20-d-old plant) of atg7 accumulating chloroplast stroma–targeted GFP (CT-GFP); this is from the previous version of Figure 7–figure supplement 1. Indeed, some GFP bodies exhibiting strong stromal GFP (CTGFP) signals appeared in the central area of the cell (arrowheads in A). However, such bodies were chloroplasts in epidermal cells. The 3D images (B) and cross-section image (x to z axis) of the region highlighted by the blue dotted line (C) indicate that such GFP bodies are the edges of chloroplasts that localize on the abaxial side of the observed region. Because CT-GFP expression was driven by the 35S promoter, strong GFP signals appeared in chloroplasts in epidermal cells in addition to chloroplasts in mesophyll cells. Previous studies using the same transgenic lines also showed that chloroplasts in epidermal cells exhibit strong GFP signals (Kohler et al., 1997; Caplan et al., 2015; Lee et al., 2023). RBCS-mRFP or GFP driven by the RBCS2B promoter do not label the chloroplasts in epidermal cells (new Figure 7-figure supplement 1). Additionally, because the borders between the mesophyll cell layer and the epidermal cell layer are not even, chloroplasts in epidermal cells are sometimes visible during observations of mesophyll cells. Such detection more frequently occurs during the acquisition of z-stack images. This point was more precisely demonstrated in our previous study with the aid of Calcofluor white staining of cell walls (Nakamura et al., 2018). Please see Supplemental Figure S3 in our previous report. To avoid any misunderstanding, we replaced the image of the leaf from atg7 in the revised figure, which is now Figure 7-figure supplement 2, with an image of another region to more precisely visualize mesophyll cells in this plant line.

      Author response image 1.

      Mesophyll cells in a leaf of atg7 accumulating stromal CT-GFP, reconstructed from the data shown in the previous version of Figure 7–figure supplement 1. (A) Individual channel images (CT-GFP and chlorophyll) from the merged orthogonal projection image shown in the previous version of Figure 7–figure supplement 1. The right panel shows the enhanced chlorophyll signal to clearly visualize the chloroplasts in epidermal cells. Green, CTGFP; magenta, chlorophyll fluorescence. Scale bar, 20 µm. (B) 3D structure of the merged image shown in (A). (C) Images of the cross section indicated by the blue dotted line (a to b) in B. Arrowheads indicate the edges of chloroplasts in epidermal cells.

      Figure 8: it would be interesting to hear the authors' opinion on why they observed a significant increase in RCBs number in the drp5b mutant background

      We have added a discussion of this issue to the revised manuscript (lines 445–459). We now have two hypotheses to explain this issue. One hypothesis is that the impaired chloroplast division due to the drp5b mutation reduces energy availability and thus activates chloroplast autophagy. The other hypothesis is that the drp5b mutation impairs the type of chlorophagy that degrades whole chloroplasts, and thus piecemeal-type chloroplast autophagy via Rubiscocontaining bodies is activated. However, we do not have any experimental evidence supporting either hypothesis.  

      Reviewer #2 (Public Review): 

      This manuscript proposed a new link between the formation of chloroplast budding vesicles (Rubisco-containing bodies [RCBs]) and the development of chloroplast-associated autophagosomes. The authors' previous work demonstrated two types of autophagy pathways involved in chloroplast degradation, including piecemeal degradation of partial chloroplast and whole chloroplast degradation. However, the mechanisms underlying piecemeal degradation are largely unknown, particularly regarding the initiation and release of the budding structures. Here, the authors investigated the progression of piecemeal-type chloroplast trafficking by visualizing it with a high-resolution time-lapse microscope. They provide evidence that autophagosome formation is required for the initiation of chloroplast budding, and that stromule formation is not correlated with this process. In addition, the authors also demonstrated that the release of chloroplast-associated autophagosome is independent of a chloroplast division factor, DRP5b. 

      Overall, the findings are interesting, and in general, the experiments are very well executed. Although the mechanism of how Rubisco-containing bodies are processed is still unclear, this study suggests that a novel chloroplast division machinery exists to facilitate chloroplast autophagy, which will be valuable to investigate in the future. 

      Reviewer #2 (Recommendations For The Authors): 

      Below are some specific comments. 

      (1) In Supplement Figure 1B, there is no chloroplast stromule in RBCS-mRFP x atg7-2 plants under dark treatment with ConA, but in Figure 7A, there are stromules in CT-GFP x atg7-2 plants. How to explain such a discrepancy? Did the authors check the chloroplast morphology of RBCS-mRFP x atg7-2 plants in different developmental stages? Will it behave the same as CT-GFP x atg7-2 under the same condition as in Figure 7A?

      As described in the text, the ages and conditions of the leaves shown in Figure 1–figure supplement 1 and Figure 7 are different. In Figure 1–figure supplement 1, second rosette leaves from 21-d-old plants were incubated in the dark with concanamycin A for 1 d. In Figure 7E and 7F, we explored the condition under which mesophyll chloroplasts in atg leaves actively form stromules to assess how a deficiency in autophagy is related to stromule formation. We found that late senescing leaves (third rosette leaves from 36-d-old plants) of atg5 and atg7 plants accumulated many stromules without additional treatment (Figure 7). It is not surprising that the chloroplast morphologies shown in Figures 1 and 7 are different because the leaf ages and conditions are largely different.

      However, we agree that the differences in chloroplast stroma–targeted GFP and RBCS-mRFP might influence the visualization of stromules. For instance, fluorescent protein– labeled RBCS proteins are incorporated into the Rubisco holoenzyme, comprising eight RBCS and eight RBCL proteins (Ishida et al., 2008; Ono et al., 2013). Such a large protein complex might not accumulate in stromules. Therefore, we examined the chloroplast morphology in late senescing leaves (third rosette leaves from 36-d-old plants) from WT, atg5, and atg7 plants harboring ProRBCS:RBCS-mRFP, as you suggested. Mesophyll chloroplasts formed many stromules in atg5 and atg7 leaves but not in WT leaves (Figure 7–figure supplement 1). These results indicate that RBCS-mRFP can be used to visualize stromules and that the differences in chloroplast morphology between Figure 1-figure supplement 1 and Figure 7 cannot be attributed to the different marker proteins used. A previous study also indicated that Rubisco is present in plastid stromules (Kwok and Hanson, 2004).

      (2) In Figure 2, the author showed that the outer envelope marker Toc64 was colocalized with chloroplast buds. How about proteins in the inner envelope membrane of chloroplasts? 

      We generated Arabidopsis plants expressing red fluorescent protein–tagged K+ EFFLUX ANTIPORTER 1 (KEA1), a chloroplast inner envelope membrane protein (Kunz et al., 2014; Boelter et al., 2020). We found that the chloroplast buds visualized by RBCS-GFP were also marked by KEA1-mRFP (Figure 2–figure supplement 1B). We observed the transport of such buds (Figure 2–figure supplement 2). These results strengthen our claim that autophagy degrades chloroplast stroma and envelope components as a type of specific cargo termed a Rubisco-containing body. The descriptions about this additional experiment are in lines 181– 187. 

      (3) In Figure 3, how many RCBs were tracked for the trafficking analysis to raise the conclusion that the vesicle was released into the vacuole around 73.8s? 

      We apologize for our confusing explanation in the previous version of the manuscript. The time point “73.8 s” merely indicates the time of one observation, as shown in Figure 3. This time does not represent the common timing of vacuolar release of a Rubisco-containing body. As we explained in the response to the comments from reviewer 1, we subjected leaves that were incubated in the dark for several hours to live-cell imaging assays to observe chloroplast morphology in sugar-starved leaves. The time scales of each still frame represent the time from the start of the corresponding video. Therefore, the time points in the respective figures do not align to a common time axis, and the number “73.8 s” is not important. We attempted to emphasize that the type of movement of Rubisco-containing bodies changes during their tracking shown in Figure 3. Based on this finding, we hypothesized that the Rubisco-containing bodies are released into the vacuolar lumen when they initiate random movement. Therefore, we expected that the interaction between the Rubisco-containing bodies and the vacuolar membrane could be captured, and we therefore turned our attention to the dynamics of the vacuolar membrane in subsequent experiments. Accordingly, our observations of the vacuolar membrane allowed us to visualize the release of the Rubisco-containing body into the vacuole (Figure 4). We rephrased these sentences (lines 212–219) to avoid confusion and to explain this idea accurately. We also performed tracking experiments of Rubisco-containing bodies to strengthen the finding that the type of movement of the bodies changes during tracking (Figure 3-figure supplement 1, Videos 8 and 9).

      (4) I do believe the conclusion that vacuolar membranes incorporate RCBs into the vacuole in Figure 4. However, it will be more convincing if images of higher quality are provided. 

      We tried to acquire images that more clearly show the morphology of the vacuolar membrane during the incorporation of the Rubisco-containing body. We obtained the images in Figure 4A using a standard type of confocal microscope, the LSM 800 (Carl Zeiss), and obtained the images in Figure 4B using the Airyscan Fast acquisition mode, a hyper-resolution microscope mode, in the LSM 880 system (Carl Zeiss). We performed additional experiments with another type of confocal microscope, the SP8 (Leica; Figure 4-figure supplement 1A to 1C, Videos 12– 14). The quality of the images from these experiments was as high as possible under the experimental conditions (equipment and plant materials). In general, increasing the image resolution during time-lapse imaging with a confocal microscope requires reducing the time resolution. However, the transport of a Rubisco-containing body occurs relatively quickly: Its engulfment by the vacuolar membrane takes place for just a few seconds (Figure 4, Figure 4figure supplement 1). We could therefore not reduce the time resolution further to better capture the morphology of the vacuolar membrane.

      (5) In Figure 7G, the authors concluded that SA and ROS might be the cause of the extensive formation of stromules. How about the H2O2 level in NahG and atg5 NahG plants? Compared with sid2, NahG appeared to completely inhibit stromule formation in atg5. Will this be related to ROS levels?

      We measured the hydrogen peroxide (H2O2) contents in NahG atg5 plants and atg5 single mutant plants and found that their leaves accumulate more H2O2 than those of wild-type or NahG plants (Figure 7-figure supplement 3). Since we have only maintained fresh seeds of NahG atg5 plants harboring the 35S promoter–driven chloroplast stroma–targeted GFP (Pro35S:CT-GFP) construct, we first confirmed that CT-GFP accumulation does not affect the measurement of H2O2 content. H2O2 levels were similar between wild-type leaves and CT-GFPexpressing leaves. A comparison among Pro35S:CT-GFP expressing lines in the wild-type, atg5, NahG, and NahG atg5 backgrounds revealed enhanced accumulation of H2O2 in the atg5 and NahG atg5 genotypes compared with the wild-type and NahG genotypes. This finding is consistent with the results of histological staining of H2O2 using 3,3′-diaminobenzidine (DAB) in a previous study (Yoshimoto et al., 2009).   

      It is unclear why NahG expression inhibited stromule formation more strongly than the sid2 mutation in the atg5 mutant background, as you pointed out (Figure 7A–D). NahG catabolizes salicylic acid (SA), whereas sid2 mutants are knockout mutants of ISOCHORISMATE SYNTHASE1 (ICS1), a gene required for SA biosynthesis. Plants have two metabolic routes for SA biosynthesis: The isochorismate synthase (ICS) pathway and the phenylalanine ammonia-lyase (PAL) pathway. Furthermore, Arabidopsis plants contain two ICS homologs: ICS1 and ICS2. Previous studies have revealed that ICS1 (SID2) is the main player for SA biosynthesis in response to pathogen infection (Delaney et al., 1994). Another study revealed drastically lower SA contents in the leaves of both sid2 single mutants and NahGexpressing plants compared with those of wild-type plants (Abreu and Munné-Bosch, 2009). Therefore, it is clear that the sid2 single mutation sufficiently inhibits SA accumulation in Arabidopsis leaves. However, low levels of SA biosynthesis through ICS1-independent routes might influence stromule formation in leaves of sid2 atg5 and sid2 atg7. Because a previous study demonstrated that the sid2 single mutation sufficiently suppresses the SA hyperaccumulation–related phenotypes of atg plants (Yoshimoto et al., 2009), we believe that the use of the sid2 mutation was adequate to assess the effects of SA on stromule formation that actively occurs in the atg plants examined in this study.    

      (6) In Supplement Figure 7, I have noticed that there are still some CT-GFP signals (green dots) in the vacuoles of the atg7 mutant, are they RCBs? If so, how can this phenomenon be explained? 

      As we explained in the response to the comment from Reviewer 1, CT-GFP-labeled bodies are chloroplasts in the epidermal cell layer. Please see our response to Reviewer 1’s comment about Figure 7 and the associated figure (Figure 1 for Response to reviewers). The CT-GFP-labeled dots (arrowheads) are the edges of chloroplasts and localize on the abaxial side of the observed region. The dots have faint chlorophyll signals. This phenomenon is much more clear in the image with enhanced brightness (right panel in A). Since the bodies are merely the edges of epidermal chloroplasts, their chlorophyl signals are faint. Therefore, these bodies are not Rubisco-containing bodies but are instead simply the edges of chloroplasts in the epidermal cell layer. 

      (7) On page 24, the second paragraph, lines 12-14, the authors claim that no receptors similar to those involved in mitophagy that bind to LC3 (ATG8) have been established in chloroplasts. Actually, it has been reported that a homologue of mitophagy receptor, NBR1, acts as an autophagy receptor to regulate chloroplast protein degradation (Lee et al, 2023, Elife; Wan et al, 2023, EMBO Journal). Although I do think NBR1 is not involved in RCBs based on these reports, these findings should be discussed here. 

      Thank you for this good suggestion. We have added a discussion about this important point to the Discussion section, along with the relevant citations (lines 482–502).

      (8) In the figure legend, the details of the experiments will be better provided, such as leaves stages (Figure 1, Figure 5...), the number of chloroplasts analyzed (Figure 7...). This can help the readers to follow. 

      Thank you for highlighting this. We have checked all of the figure legends and added descriptions of the leaf stages and experimental conditions.  

      Reviewer #3 (Public Review):

      Summary: 

      Regulated chloroplast breakdown allows plants to modulate these energy-producing organelles, for example during leaf aging, or during changing light conditions. This manuscript investigates how chloroplasts are broken down during light-limiting conditions. 

      The authors present very nice time-lapse imaging of multiple proteins as buds form on the surface of chloroplasts and pinch away, then associate with the vacuole. They use mutant analysis and autophagy markers to demonstrate that this process requires the ATG machinery, but not dynamin-related proteins that are required for chloroplast division. The manuscript concludes with a discussion of an internally-consistent model that summarizes the results. 

      Strengths: 

      The main strength of the manuscript is the high-quality microscopy data. The authors use multiple markers and high-resolution time-lapse imaging to track chloroplast dynamics under light-limiting conditions. 

      Weaknesses: 

      The main weakness of the manuscript is the lack of quantitative data. Quantification of multiple events is required to support the authors' claims, for example, claims about which parts of the plastid bud, about the dynamics of the events, about the colocalization between ATG8 and the plastid stroma buds, and the dynamics of this association. Without understanding how often these events occur and how frequently events follow the manner observed by the authors (in the 1 or 2 examples presented in each figure) it is difficult to appreciate the significance of these findings. 

      We have performed several additional experiments, including the quantification of multiple chloroplast buds or GFP-ATG8-labeled structures from individual plants. The results strengthen our claims and thus improve the significance of the current study. Please see the responses below for details.

      Reviewer #3 (Recommendations For The Authors):

      Overall, the live-cell imaging in this paper is high quality and rigorously conducted. However, without quantification of these events, it is difficult to judge whether this is an occasional contributor to plastid breakdown, or the primary mechanism for this process. 

      - For Figure 1, the authors could estimate the importance of this mechanism for chloroplast breakdown by calculating the volume change in chloroplasts over time during light-limiting conditions, then comparing this to the volume of the puncta that bud off of plastids and the frequency of these events. That is, what percentage of chloroplast volume loss can be accounted for by puncta that bud from chloroplasts? Are there likely other mechanisms contributing to chloroplast breakdown, or is this the primary mechanism? 

      We measured the volumes of chloroplast stroma when the leaves from wild-type (WT) and atg7 plants accumulating RBCS-mRFP were subjected to extended darkness for 1 d (Figure 1-figure supplement 2). The volume of the chloroplast stroma in dark-treated leaves of WT plants was 70% that in leaves before treatment, whereas the volume of the chloroplast stroma in darktreated atg7 leaves was 86% that in leaves before treatment. The transport of Rubiscocontaining bodies into the vacuole did not occur in atg7 leaves (Figure 1-figure supplement 1). These results suggest that the release of chloroplast buds as Rubisco-containing bodies contributes to the decrease in chloroplast stroma volume during dark treatment. These results also suggest that autophagy-independent systems contribute to the decrease in chloroplast volume. It is difficult to monitor the volume or frequency of budding off of puncta from chloroplasts during dark treatment because the budding and transport of the puncta occur relatively quickly and are completed within minutes, and the puncta frequently move away from the plane of focus. Additionally, continuous monitoring of chloroplast morphology over the dark treatment period requires the long-term exposure of leaves to repeated laser excitation, and such treatment might cause unexpected stress. We believe that the evaluation of chloroplast stroma volume after 1 d of dark treatment is important for estimating the contribution of the mechanism described in this study. The descriptions about this additional experiment are in lines 163–174. 

      - The claim that structures budding from the plastid "specifically contains stroma material...without any chlorophyll signal" (p. 6 and Figure 2) should be supported by quantitative analysis of many such buds in multiple cells from multiple independent plants. 

      We performed additional experiments (Figure 2-figure supplement 1) to measure the fluorescence intensity ratios of the stroma marker RBCS-GFP and chlorophyll between chloroplast budding structures and their neighboring chloroplasts in Arabidopsis plants expressing the stromal marker RBCS-GFP along with TOC64-mRFP (a chloroplast outer envelope membrane protein), KEA1-mRFP (a chloroplast inner envelope membrane protein), or ATPC1-tagRFP (a thylakoid membrane protein). The results indicated that chloroplast buds contain chloroplast stroma without chlorophyll signals. The descriptions of this experiment are in lines 175–199. In these experiments, we observed 30 to 33 chloroplast buds from eight individual plants.  

      - Claims about the dynamics of these events in Figures 2 & 3 should be supported by quantitative analysis of many buds in multiple cells from multiple independent plants and appropriate summary statistics (e.g. mean, standard deviation), and claims about the coordination of events should be supported by statistical comparison of these measurements between different markers. 

      As mentioned in the response to the above comments, quantification of fluorescent intensities (Figure 2-figure supplement 1) revealed that the chloroplast budding structures produced TOC64-mRFP and KEA1-mRFP signals without ATPC1-tagRFP signal. These results support the claim that chloroplast buds contain chloroplast stroma and envelope components without thylakoid membranes. 

      It is not easy to quantify the dynamics of chloroplast buds since the puncta sometimes move away from the plane of focus. We therefore added data from individual time-lapse observations showing that the type of movement exhibited by the puncta changes during tracking (Figure 3-figure supplement 1A and 1B, Videos 8 and 9) to strengthen the notion that such a phenomenon was observed repeatedly. 

      - Data in Figure 4 should be supported by quantification of the proportion of plastid-derived puncta that end up inside the vacuole (compared to those that do not) in multiple cells from multiple independent plants. 

      Although we performed additional observations of the destinations of chloroplast-derived puncta, we encountered some difficulty in correctly calculating the proportion of plastid-derived puncta that ended up inside the vacuole. This problem is similar to the difficulty in tracking Rubisco-containing bodies mentioned in the response to the previous comments. During timelapse imaging, puncta sometimes move from the plane of focus toward the deeper side (abaxial side) or near side (adaxial side), causing us to lose track of a number of puncta. Therefore, we could not determine the destinations of all puncta to calculate the proportion of puncta that ended up in the vacuolar lumen.

      Alternatively, we added the results of three experiments (Figure 4-figure supplement 1, Videos 12–14) examining how the vacuolar membrane engulfs the chloroplast-derived puncta to incorporate them inside the vacuole. The data support the notion that such a phenomenon occurs repeatedly in sugar-starved leaves. All results were obtained from individual plants. 

      - Data in Figure 6 should also be supported by quantitative analysis of many buds in multiple cells from multiple independent plants, to determine whether ATG8 associates with all RBCScontaining buds, and vice versa. 

      To address this issue, we performed additional experiments on plants expressing GFP-ATG8a and RBCS-mRFP (Figure 6-figure supplements 3 and 4). First, we observed 58 chloroplast buds from eight individual plants and evaluated the proportion of GFP-ATG8a-labeled chloroplast buds. We determined that 64% of chloroplast buds were at least autophagy-associated structures (Figure 6-figure supplement 3A–3C). This result also suggests that chloroplasts can form autophagy-independent budding structures, which might be associated with stromule-related structures or the autophagy-independent vesiculation machinery. We also evaluated the number of GFP-ATG8a-labeled chloroplast buds (Figure 6-figure supplement 3D and 3E). The formation of such structures increased in response to dark treatment (Figure 6-figure supplement 3D), but they did not appear in atg7 plants exposed to the dark (Figure 6-figure supplement 3E). These results support the notion that the formation of chloroplast buds to be released as Rubisco-containing bodies requires the core ATG machinery. 

      Furthermore, we observed 157 GFP-ATG8a-labeled structures from thirteen individual plants and evaluated the proportion of chloroplast-associated isolation membranes (Figure 6-figure supplement 4). We also classified the chloroplast-associated, GFP-ATG8alabeled structures into two categories: the chloroplast surface type (Figure 7-figure supplement 4A) and the chloroplast bud type (Figure 7-figure supplement 4B). This experiment suggested that 43% of the isolation membranes labeled by GFP-ATG8a were involved in chloroplast degradation during an early phase of sugar starvation (extended darkness for 5 to 9 h from the end of night) in mesophyll cells. We believe that these results indicate that autophagy contributes substantially to chloroplast degradation via the morphological changes observed in this study.  The descriptions about these experiments are in lines 284–300 in the Results section and in lines 426–444 in the Discussion section. 

      - Which parts of the plastid bud (Fig 2), about the dynamics of the events (Fig 3), about the colocalization between ATG8 and the plastid stroma buds, and the dynamics of this association (Fig 6). 

      We performed multiple quantitative studies to address the issues listed above. We believe that these additional experiments strengthened our findings.

      - I suggest that the authors avoid using the term "vesicles" to describe the plastid-derived puncta, since it doesn't seem like coat proteins are required for their formation. I suggest "puncta" or similar terms. 

      We replaced the term “vesicles” with “puncta” or other suitable terms, as suggested.

      References for response to reviewers

      Abreu ME, Munné-Bosch S (2009) Salicylic acid deficiency in transgenic lines and mutants increases seed yield in the annual plant. J Exp Bot 60: 1261-1271.

      Boelter B, Mitterreiter MJ, Schwenkert S, Finkemeier I, Kunz HH (2020) The topology of plastid inner envelope potassium cation efflux antiporter KEA1 provides new insights into its regulatory features. Photosynth Res 145: 43-54.

      Brunkard JO, Runkel AM, Zambryski PC (2015) Chloroplasts extend stromules independently and in response to internal redox signals. Proc Natl Acad Sci U S A 112: 10044-10049.

      Caplan JL, Kumar AS, Park E, Padmanabhan MS, Hoban K, Modla S, Czymmek K, Dinesh-Kumar SP (2015) Chloroplast stromules function during innate immunity. Dev Cell 34: 45-57.

      Delaney TP, Uknes S, Vernooij B, Friedrich L, Weymann K, Negrotto D, Gaffney T, Gutrella M, Kessmann H, Ward E, Ryals J (1994) A Central Role of Salicylic-Acid in Plant-Disease Resistance. Science 266: 1247-1250.

      Hanson MR, Sattarzadeh A (2011) Stromules: Recent Insights into a Long Neglected Feature of Plastid Morphology and Function. Plant Physiol 155: 1486-1492.

      Ishida H, Yoshimoto K, Izumi M, Reisen D, Yano Y, Makino A, Ohsumi Y, Hanson MR, Mae T (2008) Mobilization of rubisco and stroma-localized fluorescent proteins of chloroplasts to the vacuole by an ATG gene-dependent autophagic process. Plant Physiol 148: 142-155.

      Kohler RH, Cao J, Zipfel WR, Webb WW, Hanson MR (1997) Exchange of protein molecules through connections between higher plant plastids. Science 276: 2039-2042.

      Kunz HH, Gierth M, Herdean A, Satoh-Cruz M, Kramer DM, Spetea C, Schroeder JI (2014) Plastidial transporters KEA1, -2, and -3 are essential for chloroplast osmoregulation, integrity, and pH regulation in. Proc Natl Acad Sci U S A 111: 74807485.

      Lee HN, Chacko JV, Solis AG, Chen KE, Barros JA, Signorelli S, Millar AH, Vierstra RD, Eliceiri KW, Otegui MS, Benitez-Alfonso Y (2023) The autophagy receptor NBR1 directs the clearance of photodamaged chloroplasts. Elife 12: e86030.

      Ono Y, Wada S, Izumi M, Makino A, Ishida H (2013) Evidence for contribution of autophagy to rubisco degradation during leaf senescence in Arabidopsis thaliana. Plant Cell Environ 36: 1147-1159.

      Smith AM, Stitt M (2007) Coordination of carbon supply and plant growth. Plant Cell Environ 30: 1126-1149.

      Usadel B, Blasing OE, Gibon Y, Retzlaff K, Hoehne M, Gunther M, Stitt M (2008) Global transcript levels respond to small changes of the carbon status during progressive exhaustion of carbohydrates in Arabidopsis rosettes. Plant Physiol 146: 1834-1861.

      Yoshimoto K, Jikumaru Y, Kamiya Y, Kusano M, Consonni C, Panstruga R, Ohsumi Y, Shirasu K (2009) Autophagy negatively regulates cell death by controlling NPR1dependent salicylic acid signaling during senescence and the innate immune response in Arabidopsis. Plant Cell 21: 2914-2927.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this detailed study, Cohen and Ben-Shaul characterized the AOB cell responses to various conspecific urine samples in female mice across the estrous cycle. The authors found that AOB cell responses vary with the strains and sexes of the samples. Between estrous and non-estrous females, no clear or consistent difference in responses was found. The cell response patterns, as measured by the distance between pairs of stimuli, are largely stable. When some changes do occur, they are not consistent across strains or male status. The authors concluded that AOB detects the signals without interpreting them. Overall, this study will provide useful information for scientists in the field of olfaction.

      Strengths:

      The study uses electrophysiological recording to characterize the responses of AOB cells to various urines in female mice. AOB recording is not trivial as it requires activation of VNO pump. The team uses a unique preparation to activate the VNO pump with electric stimulation, allowing them to record AOB cell responses to urines in anesthetized animals. The study comprehensively described the AOB cell responses to social stimuli and how the responses vary (or not) with features of the urine source and the reproductive state of the recording females. The dataset could be a valuable resource for scientists in the field of olfaction.

      Weaknesses:

      (1) The figures could be better labeled.

      We revised all figures (except the model figure, Fig. 8), and among other improvements (many of which were suggested by the reviewers in other comments), added more labelling and annotation within the figures.

      (2) For Figure 2E, please plot the error bar. Are there any statistics performed to compare the mean responses?

      We added error bars (standard errors of the mean). We had not originally performed statistical comparisons between the stimuli, but now we have. The analysis of responses strength now appears in a new table (Table 1)

      (3) For Figure 2D, it will be more informative to plot the percentage of responsive units.

      Done.

      (4) Could the similarity in response be explained by the similarity in urine composition? The study will be significantly strengthened by understanding the "distance" of chemical composition in different urine.

      We agree. As we wrote in the Discussion: “Ultimately, lacking knowledge of the chemical space associated with each of the stimuli, this and all the other ideas developed here remain speculative.” We note however, that chemical distance (which in itself is hard to define) will provide only part of the picture. The other part is the “projection” of chemical space on the receptor array. This is an idea that we develop in the Discussion and in Figure 8. Specifically, that it is the combination of stimulus composition, and receptor tuning properties that will determine stimulus distances in neuronal space.

      That said, a better understanding of the chemical distance is an important aspect that we are working to include in our future studies. For this dataset unfortunately, we have no such data.

      (5) If it is not possible for the authors to obtain these data first-hand, published data on MUPs and chemicals found in these urines may provide some clues.

      This comment is directly related to the previous one. Measurements about some classes of molecules may be found for some of the stimuli that we used here, but not for all. We are not aware of any single dataset that contains this information for any type of molecule across the entire stimulus set that we have used and pooling results from different studies has limited validity because of the biological and technical variability across studies. In order to reliably interpret our current recordings, it would be necessary to measure the urinary content of the very same samples that were used for stimulation. Unfortunately, we are not able to conduct this analysis at this stage.

      (6) It is not very clear to me whether the female overrepresentation is because there are truly more AOB cells that respond to females than males or because there are only two female samples but 9 male samples.

      The definitive answer to this comment is given in our response to the next one.

      Nevertheless, we agree that this is an important point. It is true that the number of neurons fulfilling each of the patterns depends on the number of individual stimuli that define it (and on the frequency of neurons that respond to those stimuli). However, our measure of “over representation” was designed to overcome this bias, by using bootstrapping to reveal if the observed number of patterns is larger than expected by chance.  The higher frequency of responses to female, as compared to male stimuli, is observed in other studies by others and by us, also when the number of male and female stimuli is matched (e.g., Bansal et al BMC Biol 2021, Ben-Shaul et al, PNAS 2010, Hendrickson et al, JNS, 2008). However, here, by overrepresentation, we do not refer to the higher frequency of female responding neurons, but rather that given the number of responding neurons, the female pattern is more common than expected by chance.

      (7) If the authors only select two male samples, let's say ICR Naïve and ICR DOM, combine them with responses to two female samples, and do the same analysis as in Figure 3, will the female response still be overrepresented?

      Following this suggestion, we have performed this analysis, and we were glad to see that the result is the one we had anticipated. Below, we provide an image of the results, following the same approach that we applied before, and showed in Figure 3C. Here, we defined a female pattern (using the two female samples) and compared it to a male pattern (using the ICR naïve and ICR DOM as suggested). It is as if we had only four stimuli in our set. As in the article, we calculated the expected distribution with 100,000 shuffles. We denoted this pattern as F/M ICR. The results are shown below.

      Under the present conditions, the distribution of the number of female selective patterns is larger (i.e., shifted to the right, compare to the female category in Figure 3C. This is expected, since now the criterion is more permissive. Specifically, now to qualify as a “female pattern”, the two responses to female urine must be stronger only than the responses to the two male stimuli included in this analysis (and to all other responses). Notably, although the null distribution shifted to the right, the actual number of neurons fulfilling this pattern is also larger, so that the actual number remains significantly larger than expected by chance. This is also true for the reverse category (as is the case in the ~female category Figure 3C).  Thus, we conclude that overrepresentation of the female pattern is not a trivial consequence of the number of male and female stimuli.

      Author response image 1.

      (8) In Figure 4B and 4C, the pairwise distance during non-estrus is generally higher than that during estrus, although they are highly correlated. Does it mean that the cells respond to different urines more distinctively during diestrus than in estrus?

      This is an important observation (!) and we had originally overlooked it.  It is true that higher distance (as they are in estrus) imply more distinct population level responses and hence better discrimination among stimuli. However, this is inconsistent with all our other analyses that do not point to enhanced selectivity or discrimination in either state. If anything, we find somewhat higher sparseness in estrus.  Yet, there may be technical explanations for the differences.

      For Euclidean distances, the explanation may be trivial. The distance depends on the number of dimensions (i.e., units), and since our sample contains more neurons recorded during non-estrus, the larger distance is expected.

      In fact, there is a similar dependence on sample size for the correlation distance. Smaller samples are associated with higher (spurious) correlations, and hence larger samples are be associated with larger distances. To demonstrate this, we conducted a simple simulation, where we calculated the absolute correlation coefficients of random samples from standard normal distributions (using the MATLAB function randn), changing the size of the population. For each sample size, we conducted 1000 tests. We considered sample sizes from 10 to 100000, including 200 and 300 (which are similar to our sample sizes). The results are shown in the figure below. Note that the absolute value of the correlation coefficient decreases with sample size, while the p-value for the observed correlation is stable at ~0.5.

      While this is not a rigorous analysis of this issue, and while it does not exactly reflect the scenario in our data, where correlations are generally positive, it shows that the observed correlation (and hence correlation distance) is also affected by sample size.

      For these reasons, we focus on comparison of these distances, rather than the absolute values of the correlation distances.

      Author response image 2.

      Following this comment, we now write in the manuscript:

      “We first note that distances are generally larger during non-estrus, suggesting enhanced discrimination during this stage. However, further analyses of sparseness and selectivity do not support this idea (see below). Furthermore, we note that both Euclidean and correlation distances generally depend on sample size. In both cases, distances are expected to increase as a function of sample size, which in our dataset, is larger for the non-estrus (n = 305) as compared to the estrus (n = 241) neurons. Because of this factor, we focus here on the similarity of the relative within-state distances across the states (and not on their absolute magnitudes). Specifically, we find a positive and significant correlation among pairwise population distances under the two states. Thus, at the population level, representational space remains broadly stable across the estrus cycle. Nevertheless, several points in Fig. 4D, E clearly diverge from a linear relationship, implying that representational space differs under the two states. We next examine such state-dependent changes in more detail.”

      (9) The correlation analysis is not entirely intuitive when just looking at the figures. Some sample heatmaps showing the response differences between estrous states will be helpful.

      If we understand correctly, the idea is to show the correlation matrices from which the values in 4B and 4C are taken. The relevant images are now included in figure 4B, C and are references within the main text.

      Reviewer #2 (Public review):

      Summary:

      Many aspects of the study are carefully done, and in the grand scheme this is a solid contribution. I have no "big-picture" concerns about the approach or methodology. However, in numerous places the manuscript is unnecessarily vague, ambiguous, or confusing. Tightening up the presentation will magnify their impact.

      We have reviewed the text and made substantial editing changes. Along with other specific comments by made both reviewers, we hope that these changes improve the presentation.

      Strengths:

      (1) The study includes urine donors from males of three strains each with three social states, as well as females in two states. This diversity significantly enhances their ability to interpret their results.

      (2) Several distinct analyses are used to explore the question of whether AOB MCs are biased towards specific states or different between estrus and non-estrus females. The results of these different analyses are self-reinforcing about the main conclusions of the study.

      (3) The presentation maintains a neutral perspective throughout while touching on topics of widespread interest.

      Weaknesses:

      (1) Introduction:

      The discussion of the role of the VNS and preferences for different male stimuli should perhaps include Wysocki and Lepri 1991

      We assume that the reviewer is referring to “Consequences of removing the vomeronasal organ” by Wysocki CJ, Lepri JJ, a review article in J Steroid Biochem from 1991. We were not familiar with this specific article and have now read it. The article discusses various male behaviors, and some effects on female behavior and physiology (e.g., puberty acceleration, maternal behaviors, ovulation) but we could not find any mention of the preference of female mice in this article. We also expanded our search to all pubmed articles authored by Wysocki and Lepri and then all articles by Wysocki (with the keyword Vomeronasal). Despite our best intentions to give due credit, we found nothing that seems directly related to this statement. Please correct us if we had missed anything.

      (2) Results:

      a) Given the 20s gap between them, the distinction between sample application and sympathetic nerve trunk stimulation needs to be made crystal clear; in many places, "stimulus application" is used in places where this reviewer suspects they actually mean sympathetic nerve trunk stimulation.

      We realize that this is confusing, and we also agree that at least in one place, we have not been sufficiently clear about the distinction. To clarify, we distinguish between stimulus application (physical application of stimulus to the nostril), and stimulation (which refers to SNT stimulation, which typically induces VNO suction). The general term stimulus presentation refers to the entire process. As explained in the text, in our analysis, we consider the entire window starting at application and ending 40s after stimulation. This is because we sometimes observe immediate responses following application. One such responses is seen in Figure 2D, and this is directly related to a detailed comment made below (on Figure 1D, part c). Indeed, for this figure time 0 indicates stimulus application. This was indicated previously, but we have now rearranged order of the panels to make the distinction between this response and other clearer. We have also revised the figure caption and the text to clarify this issue.

      b) There appears to be a mismatch between the discussion of Figure 3 and its contents. Specifically, there is an example of an "adjusted" pattern in 3A, not 3B.

      True. we have revised the text to correctly refer to the figure. Thanks.

      c) The discussion of patterns neglects to mention whether it's possible for a neuron to belong to more than one pattern. For example, it would seem possible for a neuron to simultaneously fit the "ICR pattern" and the "dominant adjusted pattern" if, e.g., all ICR responses are stronger than all others, but if simultaneously within each strain the dominant male causes the largest response.

      This is true. In the legend to Figure 3B, we actually wrote: “A neuron may fulfill more than one pattern and thus may appear in more than one row.”, but we now also write in the main text:

      “We note that criteria for adjusted patterns are less stringent than for the standard patterns defined above. Furthermore, some patterns are not mutually exclusive, and thus, a neuron may fulfil more than a single pattern.”

      (3) Discussion:

      a) The discussion of chemical specificity in urine focuses on volatiles and MUPs (citation #47), but many important molecules for the VNS are small, nonvolatile ligands. For such molecules, the corresponding study is Fu et al 2015.

      Agreed. We now cite this work and several others that were not included before in the context of chemical and electrophysiological analyses.

      b) "Following our line of reasoning, this scarcity may represent an optimal allocation of resources to separate dominant from naïve males": 1 unit out of 215 is roughly consistent with a single receptor. Surely little would be lost if there could be more computational capacity devoted to this important axis than that? It seems more likely that dominance is computed from multiple neuronal types with mixed encoding.

      We fully agree, and we are not claiming that dominance, nor any other feature, is derived using dedicated feature selective neurons. Our discussion of resource allocation is inevitably speculative. Our main point in this context is that a lack of overrepresentation does not imply that a feature is not important. As a note, we do not think that there is good reason to suppose that AOB neurons reflect the activity of single receptors.

      To present this potential confusion, we now added the following sentences in the Discussion subsection titled “Response patterns of AOB-MCs”:

      “We stress that we do not suggest that features such as physiological state are encoded by the activity of single neurons. In fact, we believe that most ethologically relevant features are encoded by the activity of multiple neurons. Nevertheless, such population level representations ultimately depend on the response properties of individual neurons, and we thus ask: what can we learn from our analysis of response pattern frequency?”

      (4) Methods:

      a) Male status, "were unambiguous in most cases": is it possible to put numerical estimates on this? 55% and 99% are both "most," yet they differ substantially in interpretive uncertainty.

      Upon reexamination, we realized that this sentence is incorrect. Ambiguous cases were not considered as dominant for urine collection. We only classified mice as dominant if they “won” the tube test and exhibited dominant behavior in the subsequent observation period in the cage. The phrasing has now been corrected in the manuscript (Methods section).

      b) Surgical procedures and electrode positioning: important details of probes are missing (electrode recording area, spacing, etc).

      This information has been added to the Methods subsection “Surgical procedures and electrode positioning”

      c) Stimulus presentation procedure: Are stimuli manually pipetted or delivered by apparatus with precise timing?

      They are delivered manually. This has now been clarified in the text.

      d) Data analysis, "we applied more permissive criteria involving response magnitude": it's not clear whether this is what's spelled out in the next paragraph, or whether that's left unspecified. In either case, the next paragraph appears to be about establishing a noise floor on pattern membership, not a "permissive criterion."

      True, the next paragraph is not the explanation for the more permissive criteria. The more permissive criteria involving response magnitude are actually those described in Figure 3A and 3B. The sentence that was quoted above merely states that before applying those criteria, we had also searched for patterns defined by binary designation of neurons as responsive, or not responsive, to each of the stimuli (this is directly related to the next comment below). Using those binary definitions, we obtained a very small number of neurons for each pattern and thus decided to apply the approach actually used and described in the manuscript.

      To clarify this confusion, we thoroughly derived the description of this paragraph, and the beginning of the next one in the Methods section.

      e) Data analysis, method for assessing significance: there's a lot to like about the use of pooling to estimate the baseline and the use of an ANOVA-like test to assess unit responsiveness.

      But:

      i) for a specific stimulus, at 4 trials (the minimum specified in "Stimulus presentation procedure") kruskalwallis is questionable. They state that most trials use 5, however, and that should be okay.

      The exact values are now given in the text. The mean number of repeated presentations per stimulus: 5.1± 0.9, mean ± sd. In 72% of the cases, stimuli were given 5 or more times. Otherwise, they were presented 4 times. In the context of the statistical test, we note that we are not comparing 5 (or 4) values with another set of 5 (or 4 values), but with a much larger sample (~44-55 baseline trials – given 11 trials and 4-5 repeats of each). Under this scenario, we think that the statistical approach is sound. However, the more important consideration, in our opinion, is given below.

      ii) the methods statement suggests they are running kruskalwallis individually for each neuron/stimulus, rather than once per neuron across all stimuli. With 11 stimuli, there is a substantial chance of a false-positive if they used p < 0.05 to assess significance. (The actual threshold was unstated.) Were there any multiple comparison corrections performed? Or did they run kruskalwallis on the neuron, and then if significant assess individual stimuli? (Which is a form of multiple-comparisons correction.)

      First, we indeed failed to mention that our criterion was 0.05. This has been corrected, by adding the information to the results and the Methods sections. No, we did not apply any multiple comparison measures. We consider each neuron-stimulus pair as an independent entity, and we are aware that this leads to a higher false positive rate. On the other hand, applying multiple comparisons would be problematic, as the same number of stimuli used in different studies varies. Application of multiple comparison corrections would thus lead to different response criteria across different studies, which would be very problematic. This raises the almost philosophical question regarding the use of multiple comparisons (as well as one and two tailed tests), but practically, most, if not all of our conclusions involve comparisons across conditions. For this purpose, we think that our procedure is valid. More generally, while selection of responses according to significance has some obvious advantages, the decision to use any particular criterion is entirely arbitrary. Therefore, we do not attach any special meaning to the significance threshold used here. Rather, we think of it as a simple criterion that allows us to exclude weakly responding or non-responsive neurons, and to compare frequencies of neurons that fulfill this criterion, under different conditions and contexts.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      Results:

      "are represented more than represented by chance" seems to have a misplaced word

      True. Thanks. Corrected.

      Figure 1D:

      a) Indicate the meaning of the number that appears in the top left for each unit (10, 5, 40, 5, 5) (I'm guessing it's the vertical scale for the PSTH, but best to spell it out explicitly.)

      This information has been added.

      b) "The red vertical line indicates stimulus application": is it the application of the chemical stimulus or SNT shock?

      Please see our answer to c

      c) "For unit 2, time 0 indicate stimulus application, as in this case, responses began after stimulus application, prior to stimulation." First, the meaning of time 0 for the other units is not clearly specified (we infer that unit 2 is an exception, but we don't know what most of them mean). Second, it seems as if the response (?) to ICR naive begins even before stimulus application.

      This issue was also mentioned above as the 2nd weakness raised by this reviewer. To explain the meaning of the red lines, and resolve this confusion, we revised the figure caption text to indicate that for all units (except the former unit 2) time 0 indicates SNT stimulation. We also changed the order of the unit examples, placing the former unit 2 in the rightmost position. It is true that for this unit, there is a firing rate change prior to stimulus application, which actually appears as rate attenuation following stimulus application. In this specific case, we consider this activity as “noise”, and note that this neuron-stimulus combination would not be classified as a response (since there is no consistent change across stimulus presentation).

      As a note, while reviewing this figure, we noted an error. We have previously written that the ITI was 10 s, whereas it was actually 18 s long. This has been corrected in the Figure and in the text.

      Figure 2B:

      "The mean error due to the reduced 2-D representation is 0.29 (arbitrary units)." This is unclear. MDS is often described in terms of % of variance explained, is that what this means? If so, the units are not arbitrary; otherwise, it's unclear whether specifying a value with arbitrary units adds any value.

      This is a very good point, and we thank the reviewer for identifying this mistake. The units are not arbitrary! They are units of correlation distance. We now added a scale bar (a square) to panel 2B to indicate what a distance of 0.1. Following this comment, we also calculated the mean error in the original data, and noted the ratio between the mean absolute error (due to considering only two dimensions) and the mean original distances. We also now report the value of the first two eigenvalues. Specifically, we now write:

      “Note that like all dimensionally reduced representations, the representation in Fig. 2B is an approximation. Here, the first two eigenvalues of account for 44.6% of the variance of the original distances (30.4% and 14.2%, respectively for the first and second dimension). Another way to evaluate the representation is via the mean error due to the reduced 2-D representation. Here, it is 0.29, whereas the mean of the original distances is 0.73.”

      Figure 3A:

      a) There is a truncated label (or something) above the panel letter.

      Thanks. Corrected. This was part of the “Figure” label

      b) The graphic for the "adjusted pattern" also fits the criterion of the "pattern": for example, in the top row the activity for ICR is still higher than for any other stimulus, thus fulfilling the criterion of a "pattern" and not just an "adjusted pattern."

      That was not our intention. An adjusted pattern does not necessarily fulfill the (non-adjusted) “pattern” (while the opposite is true). We have now revised the rightmost panel in figure 3A, adding both “&s” to indicate that all three conditions must be fulfilled, and in attempt for a more intuitive representation, applied a different background denoting stimuli with irrelevant responses. We also changed the terms in the legend within the panel, making them more accurate: (Thus, “strong activity” was changed to “stronger responses”). In addition, we revised the text and figure legends in attempt to better clarify these definitions.

      Figure 3B:

      I'm assuming that the columns of the heatmap correspond to different urine stimuli, and that the color is normalized firing rate. But readers should not have to guess.

      True, and agreed. We added legends to clarify this.

      Figure 4B:

      The caption should mention that the pairwise measures are between the stimulus columns of panel A.

      We revised the caption to indicate this. Note that we also added two additional panels to this figure.

      Figure 5A&B:

      Instead of a multiple-comparisons correction, it seems likely to be better to use a 2-way ANOVA. At a minimum, the nature of the multiple-comparisons correction needs to be specified (many are conservative, but they differ in the extent of how conservative they are).

      We now write in the text that we used a Bonferroni correction (this information previously appeared only in the caption). We also found an error in the caption. We previously wrote that we used a binomial exact test for both panels A and B. However, only the data in panel A was calculated with a binomial exact test. The data in panel B was calculated with a one-way ANOVA.

      We now also applied a 2-way ANOVA to response magnitudes (i.e., panel B). We find a main effect of stimulus, but not of state, and no effect of interaction between the two. This is consistent with our previous analyses. This analysis is now included in the text. We thank the reviewer for this suggestion.

      Editor's note:

      Should you choose to revise your manuscript, if you have not already done so, please include full statistical reporting including exact p-values wherever possible alongside the summary statistics (test statistic and df) and, where appropriate, 95% confidence intervals. These should be reported for all key questions and not only when the p-value is less than 0.05 in the main manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews

      We thank the Reviewers for their thorough reading and thoughtful feedback. Below, we address each of the concerns raised in the public reviews, and outline our revisions that aim to further clarify and strengthen the manuscript.

      In our response, we clarify our conceptualization of elasticity as a dimension of controllability, formalizing it within an information-theoretic framework, and demonstrating that controllability and its elasticity are partially dissociable. Furthermore, we provide clarifications and additional modeling results showing that our experimental design and modeling approach are well-suited to dissociating elasticity inference from more general learning processes, and are not inherently biased to find overestimates of elasticity. Finally, we clarify the advantages and disadvantages of our canonical correlation analysis (CCA) approach for identifying latent relationships between multidimensional data sets, and provide additional analyses that strengthen the link between elasticity estimation biases and a specific psychopathology profile. 

      Public Reviews:

      Reviewer 1 (Public review): 

      This research takes a novel theoretical and methodological approach to understanding how people estimate the level of control they have over their environment, and how they adjust their actions accordingly. The task is innovative and both it and the findings are well-described (with excellent visuals). They also offer thorough validation for the particular model they develop. The research has the potential to theoretically inform the understanding of control across domains, which is a topic of great importance.

      We thank the Reviewer for their favorable appraisal and valuable suggestions, which have helped clarify and strengthen the study’s conclusion. 

      An overarching concern is that this paper is framed as addressing resource investments across domains that include time, money, and effort, and the introductory examples focus heavily on effort-based resources (e.g., exercising, studying, practicing). The experiments, though, focus entirely on the equivalent of monetary resources - participants make discrete actions based on the number of points they want to use on a given turn. While the same ideas might generalize to decisions about other kinds of resources (e.g., if participants were having to invest the effort to reach a goal), this seems like the kind of speculation that would be better reserved for the Discussion section rather than using effort investment as a means of introducing a new concept (elasticity of control) that the paper will go on to test.

      We thank the Reviewer for pointing out a lack of clarity regarding the kinds of resources tested in the present experiment. Investing additional resources in the form of extra tickets did not only require participants to pay more money. It also required them to invest additional time – since each additional ticket meant making another attempt to board the vehicle, extending the duration of the trial, and attentional effort – since every attempt required precisely timing a spacebar press as the vehicle crossed the screen. Given this involvement of money, time, and effort resources, we believe it would be imprecise to present the study as concerning monetary resources in particular. That said, we agree with the Reviewer that results might differ depending on the resource type that the experiment or the participant considers most. Thus, we now clarify the kinds of resources the experiment involved (lines 87-97): 

      “To investigate how people learn the elasticity of control, we allowed participants to invest different amounts of resources in attempting to board their preferred vehicle. Participants could purchase one (40 coins), two (60 coins), or three tickets (80 coins) or otherwise walk for free to the nearest location. Participants were informed that a single ticket allowed them to board only if the vehicle stopped at the station, while additional tickets provided extra chances to board even after the vehicle had left the platform. For each additional ticket, the chosen vehicle appeared moving from left to right across the screen, and participants could attempt to board it by pressing the spacebar when it reached the center of the screen. Thus, each additional ticket could increase the chance of boarding but also required a greater investment of resources—decreasing earnings, extending the trial duration, and demanding attentional effort to precisely time a button press when attempting to board.”

      In addition, in the revised discussion, we now highlight the open question of whether inferences concerning the elasticity of control generalize across different resource domains (lines 341-348):

      “Another interesting possibility is that individual elasticity biases vary across different resource types (e.g., money, time, effort). For instance, a given individual may assume that controllability tends to be highly elastic to money but inelastic to effort. Although the task incorporated multiple resource types (money, time, and attentional effort), the results may differ depending on the type of resources on which the participant focuses. Future studies could explore this possibility by developing tasks that separately manipulate elasticity with respect to different resource types. This would clarify whether elasticity biases are domain-specific or domaingeneral, and thus elucidate their impact on everyday decision-making.”

      Setting aside the framing of the core concepts, my understanding of the task is that it effectively captures people's estimates of the likelihood of achieving their goal (Pr(success)) conditional on a given investment of resources. The ground truth across the different environments varies such that this function is sometimes flat (low controllability), sometimes increases linearly (elastic controllability), and sometimes increases as a step function (inelastic controllability). If this is accurate, then it raises two questions.

      First, on the modeling front, I wonder if a suitable alternative to the current model would be to assume that the participants are simply considering different continuous functions like these and, within a Bayesian framework, evaluating the probabilistic evidence for each function based on each trial's outcome. This would give participants an estimate of the marginal increase in Pr(success) for each ticket, and they could then weigh the expected value of that ticket choice (Pr(success)*150 points) against the marginal increase in point cost for each ticket. This should yield similar predictions for optimal performance (e.g., opt-out for lower controllability environments, i.e., flatter functions), and the continuous nature of this form of function approximation also has the benefit of enabling tests of generalization to predict changes in behavior if there was, for instance, changes in available tickets for purchase (e.g., up to 4 or 5) or changes in ticket prices. Such a model would of course also maintain a critical role for priors based on one's experience within the task as well as over longer timescales, and could be meaningfully interpreted as such (e.g., priors related to the likelihood of success/failure and whether one's actions influence these). It could also potentially reduce the complexity of the model by replacing controllability-specific parameters with multiple candidate functions (presumably learned through past experience, and/or tuned by experience in this task environment), each of which is being updated simultaneously.

      We thank the Reviewer for suggesting this interesting alternative modeling approach. We agree that a Bayesian framework evaluating different continuous functions could offer advantages, particularly in its ability to generalize to other ticket quantities and prices. To test the Reviewer's suggestion, we implemented a Bayesian model where participants continuously estimate both controllability and its elasticity as a mixture of three archetypal functions mapping ticket quantities to success probabilities. The flat function provides no control regardless of how many tickets are purchased (corresponding to low controllability). The step function provides the same level of control as long as at least one ticket is purchased (inelastic controllability). The linear function increases control proportionally with each additional ticket (elastic controllability). The model computes the likelihood that each of the functions produced each new observation, and accordingly updates its beliefs. Using these beliefs, the model estimates the probability of success for purchasing each number of tickets, allowing participants to weigh expected control against increasing ticket costs. Despite its theoretical advantages for generalization to different ticket quantities, this continuous function approximation model performed significantly worse than our elastic controllability model (log Bayes Factor > 4100 on combined datasets). We surmise that the main advantage offered by the elastic controllability model is that it does not assume a linear increase in control as a function of resource investment – even though this linear relationship was actually true in our experiment and is required for generalizing to other ticket quantities, it likely does not match what participants were doing. We present these findings in a new section ‘Testing alternative methods’ (lines 686-701):

      “We next examined whether participant behavior would be better characterized as a continuous function approximation rather than the discrete inferences in our model. To test this, we implemented a Bayesian model where participants continuously estimate both controllability and its elasticity as a mixture of three archetypal functions mapping ticket quantities to success probabilities. The flat function provides no control regardless of how many tickets are purchased (corresponding to low controllability). The step function provides full control as long as at least one ticket is purchased (inelastic controllability). The linear function linearly increases control with the number of extra tickets (i.e., 0%, 50%, and 100% control for 1, 2, and 3 tickets, respectively; elastic controllability). The model computes the likelihood that each of the functions produced each new observation, and accordingly updates its beliefs. Using these beliefs, the model estimates the probability of success for purchasing each number of tickets, allowing participants to weigh expected control against increasing ticket costs. Despite its theoretical advantages for generalization to different ticket quantities, this continuous function approximation model performed significantly worse than the elastic controllability model (log Bayes Factor > 4100 on combined datasets), suggesting that participants did not assume that control increases linearly with resource investment.”

      We also refer to this analysis in our updated discussion (326-339): 

      “Second, future models could enable generalization to levels of resource investment not previously experienced. For example, controllability and its elasticity could be jointly estimated via function approximation that considers control as a function of invested resources. Although our implementation of this model did not fit participants’ choices well (see Methods), other modeling assumptions or experimental designs may offer a better test of this idea.”

      Second, if the reframing above is apt (regardless of the best model for implementing it), it seems like the taxonomy being offered by the authors risks a form of "jangle fallacy," in particular by positing distinct constructs (controllability and elasticity) for processes that ultimately comprise aspects of the same process (estimation of the relationship between investment and outcome likelihood). Which of these two frames is used doesn't bear on the rigor of the approach or the strength of the findings, but it does bear on how readers will digest and draw inferences from this work. It is ultimately up to the authors which of these they choose to favor, but I think the paper would benefit from some discussion of a common-process alternative, at least to prevent too strong of inferences about separate processes/modes that may not exist. I personally think the approach and findings in this paper would also be easier to digest under a common-construct approach rather than forcing new terminology but, again, I defer to the authors on this.

      We acknowledge the Reviewer's important point about avoiding a potential "jangle fallacy." We entirely agree with the Reviewer that elasticity and controllability inferences are not distinct processes. Specifically, we view resource elasticity as a dimension of controllability, hence the name of our ‘elastic controllability’ model. In response to this and other Reviewers’ comments, in the revised manuscript, we now offer a formal definition of elasticity as the reduction in uncertainty about controllability due to knowing the amount of resources available to the agent (lines 16-20; see further details in response to Reviewer 3 below).  

      With respect to how this conceptualization is expressed in the modeling, we note that the representation in our model of maximum controllability and its elasticity via different variables is analogous to how a distribution may be represented by separate mean and variance parameters. Even the model suggested by the Reviewer required a dedicated variable representing elastic controllability, namely the probability of the linear controllability function. More generally, a single-process account allows that different aspects of the said process would be differently biased (e.g., one can have an accurate estimate of the mean of a distribution but overestimate its variance). Therefore, our characterization of distinct elasticity and controllability biases (or to put it more accurately, 'elasticity of controllability bias' and 'maximum controllability bias') is consistent with a common construct account.

      To avoid misunderstandings, we have now modified the text to clarify that we view elasticity as a dimension of controllability that can only be estimated in conjunction with controllability. Here are a few examples:

      Lines 21-28: “While only controllable environments can be elastic, the inverse is not necessarily true – controllability can be high, yet inelastic to invested resources – for example, choosing between bus routes affords equal control over commute time to anyone who can afford the basic fare (Figure 1; Supplementary Note 1). That said, since all actions require some resource investment, no controllable environment is completely inelastic when considering the full spectrum of possible agents, including those with insufficient resources to act (e.g., those unable to purchase a bus fare or pay for a fixed-price meal).”

      Lines 45-47: “Experimental paradigms to date have conflated overall controllability and its elasticity, such that controllability was either low or elastic[16-20]. The elasticity of control, however, must be dissociated from overall controllability to accurately diagnose mismanagement of resources.”

      Lines 70-72: “These findings establish elasticity as a crucial dimension of controllability that guides adaptive behavior, and a computational marker of control-related psychopathology.”

      Lines 87-88: “To investigate how people learn the elasticity of control, we allowed participants to invest different amounts of resources in attempting to board their preferred vehicle.”

      Reviewer 2 (Public review):

      This research investigates how people might value different factors that contribute to controllability in a creative and thorough way. The authors use computational modeling to try to dissociate "elasticity" from "overall controllability," and find some differential associations with psychopathology. This was a convincing justification for using modeling above and beyond behavioral output and yielded interesting results. Interestingly, the authors conclude that these findings suggest that biased elasticity could distort agency beliefs via maladaptive resource allocation. Overall, this paper reveals some important findings about how people consider components of controllability.

      We appreciate the Reviewer's positive assessment of our findings and computational approach to dissociating elasticity and overall controllability.

      The primary weakness of this research is that it is not entirely clear what is meant by "elastic" and "inelastic" and how these constructs differ from existing considerations of various factors/calculations that contribute to perceptions of and decisions about controllability. I think this weakness is primarily an issue of framing, where it's not clear whether elasticity is, in fact, theoretically dissociable from controllability. Instead, it seems that the elements that make up "elasticity" are simply some of the many calculations that contribute to controllability. In other words, an "elastic" environment is inherently more controllable than an "inelastic" one, since both environments might have the same level of predictability, but in an "elastic" environment, one can also partake in additional actions to have additional control overachieving the goal (i.e., expend effort, money, time).

      We thank the Reviewer for highlighting the lack of clarity about the concept of elasticity. We first clarify that elasticity cannot be entirely dissociated from controllability because it is a dimension of controllability. If no controllability is afforded, then there cannot be elasticity or inelasticity. This is why in describing the experimental environments, we only label high-controllability, but not low-controllability, environments as ‘elastic’ or ‘inelastic’. For further details on this conceptualization of elasticity, and associated revisions of the text, see our response above to Reviewer 1. 

      Second, we now clarify that controllability can also be computed without knowing the amount of resources the agent is able and willing to invest, for instance by assuming infinite resources available or a particular distribution of resource availabilities. However, knowing the agent’s available resources often reduces uncertainty concerning controllability. This reduction in uncertainty is what we define as elasticity. Since any action requires some resources, this means that no controllable environment is entirely inelastic if we also consider agents that do not have enough resources to commit any action. However, even in this case, environments can differ in the degree to which they are elastic. For further details on this formal definition, and associated revisions of the text, see our response to Reviewer 3.

      Importantly, whether an environment is more or less elastic does not fully determine whether it is more or less controllable. In particular, environments can be more controllable yet less elastic. This is true even if we allow that investing different levels of resources (i.e., purchasing 0, 1, 2, or 3 tickets) constitute different actions, in conjunction with participants’ vehicle choices. Below, we show this using two existing definitions of controllability. 

      Definition 1, reward-based controllability[1]: If control is defined as the fraction of available reward that is controllably achievable, and we assume all participants are in principle willing and able to invest 3 tickets, controllability can be computed in the present task as:

      where P( S'= goal ∣ 𝑆, 𝐴, 𝐶 ) is the probability of reaching the treasure from present state 𝑆 when taking action A and investing C resources in executing the action. In any of the task environments, the probability of reaching the goal is maximized by purchasing 3 tickets (𝐶 = 3) and choosing the vehicle that leads to the goal (𝐴 = correct vehicle). Conversely, the probability of reaching the goal is minimized by purchasing 3 tickets (𝐶 = 3) and choosing the vehicle that does not lead to the goal (𝐴 = wrong vehicle). This calculation is thus entirely independent of elasticity, since it only considers what would be achieved by maximal resource investment, whereas elasticity consists of the reduction in controllability that would arise if the maximal available 𝐶 is reduced. Consequently, any environment where the maximum available control is higher yet varies less with resource investment would be more controllable and less elastic. 

      Note that if we also account for ticket costs in calculating reward, this will only reduce the fraction of achievable reward and thus the calculated control in elastic environments.   

      Definition 2, information-theoretic controllability[2]: Here controllability is defined as the reduction in outcome entropy due to knowing which action is taken:

      where H(S'|S) is the conditional entropy of the distribution of outcomes S' given the present state S, and H(S'|S, A, C) is the conditional entropy of the outcome given the present state, action, and resource investment. 

      To compare controllability, we consider two environments with the same maximum control:

      • Inelastic environment: If the correct vehicle is chosen, there is a 100% chance of reaching the goal state with 1, 2, or 3 tickets. Thus, out of 7 possible action-resource investment combinations, three deterministically lead to the goal state (≥1 tickets and correct vehicle choice), three never lead to it (≥1 tickets and wrong vehicle choice), and one (0 tickets) leads to it 20% of the time (since walking leads to the treasure on 20% of trials).

      • Elastic Environment: If the correct vehicle is chosen, the probability of boarding it is 0% with 1 ticket, 50% with 2 tickets, and 100% with 3 tickets. Thus, out of 7 possible actionresource investment combinations, one deterministically leads to the goal state (3 tickets and correct vehicle choice), one never leads to it (3 tickets and wrong vehicle choice), one leads to it 60% of the time (2 tickets and correct vehicle choice: 50% boarding + 50% × 20% when failing to board), one leads to it 10% of time (2 ticket and wrong vehicle choice), and three lead to it 20% of time (0-1 tickets).

      Here we assume a uniform prior over actions, which renders the information-theoretic definition of controllability equal to another definition termed ‘instrumental divergence’[3,4]. We note that changing the uniform prior assumption would change the results for the two environments, but that would not change the general conclusion that there can be environments that are more controllable yet less elastic. 

      Step 1: Calculating H(S'|S)

      For the inelastic environment:

      P(goal) = (3 × 100% + 3 × 0% + 1 × 20%)/7 = .46, P(non-goal) = .54  H(S'|S) = – [.46 × log<sub>2</sub>(.46) + .54 × log<sub>2</sub>(.54)] = 1 bit

      For the elastic environment:

      P(goal) = (1 × 100% + 1 × 0% + 1 × 60% + 1 × 10% + 3 × 20%)/7 = .33, P(non-goal) = .67 H(S'|S) = – [.33 × log<sub>2</sub>(.33) + .67 × log<sub>2</sub>(.67)] = .91 bits

      Step 2: Calculating H(S'|S, A, C)

      Inelastic environment: Six action-resource investment combinations have deterministic outcomes entailing zero entropy, whereas investing 0 tickets has a probabilistic outcome (20%). The entropy for 0 tickets is: H(S'|C = 0) = -[.2 × log<sub>2</sub> (.2) + 0.8 × log<sub>2</sub> (.8)] = .72 bits. Since this actionresource investment combination is chosen with probability 1/7, the total conditional entropy is approximately .10 bits

      Elastic environment: 2 actions have deterministic outcomes (3 tickets with correct/wrong vehicle), whereas the other 5 actions have probabilistic outcomes:

      2 tickets and correct vehicle (60% success): 

      H(S'|A = correct, C = 2) = – [.6 × log<sub>2</sub> (.6) + .4 × log<sub>2</sub> (.4)] = .97 bits 2 tickets and wrong vehicle (10% success): 

      H(S'|A = wrong, C = 2) = – [.1 × log<sub>2</sub> (.1) + .9 × log<sub>2</sub> (.9)] = .47 bits 0-1 tickets (20% success):

      H(S'|C = 0-1) = – [.2 × log<sub>2</sub> (.2) + .8 × log<sub>2</sub> (.8)] = .72 bits

      Thus the total conditional entropy of the elastic environment is: H(S'|S, A, C) = (1/7) × .97 + (1/7) × .47 + (3/7) × .72 = .52 bits

      Step 3: Calculating I(S'|A, S)  

      Inelastic environment: I(S'; A, C | S) = H(S'|S) – H(S'|S, A, C) = 1 – 0.1 = .9 bits 

      Elastic environment: I(S'; A, C | S) = H(S'|S) – H(S'|S, A, C) = .91 – .52 = .39 bits

      Thus, the inelastic environment offers higher information-theoretic controllability (.9 bits) compared to the elastic environment (.39 bits). 

      Of note, even if each combination of cost and success/failure to reach the goal is defined as a distinct outcome, then information-theoretic controllability is higher for the inelastic (2.81 bits) than for the elastic (2.30 bits) environment. These calculations are now included in the Supplementary materials (Supplementary Note 1). 

      In sum, for both definitions of controllability, we see that environments can be more elastic yet less controllable. We have also revised the manuscript to clarify this distinction (lines 21-28):

      “While only controllable environments can be elastic, the inverse is not necessarily true – controllability can be high, yet inelastic to invested resources – for example, choosing between bus routes affords equal control over commute time to anyone who can afford the basic fare (Figure 1; Supplementary Note 1). That said, since all actions require some resource investment, no controllable environment is completely inelastic when considering the full spectrum of possible agents, including those with insufficient resources to act (e.g., those unable to purchase a bus fare or pay for a fixed-price meal).”

      Reviewer 3 (Public review):

      A bias in how people infer the amount of control they have over their environment is widely believed to be a key component of several mental illnesses including depression, anxiety, and addiction. Accordingly, this bias has been a major focus in computational models of those disorders. However, all of these models treat control as a unidimensional property, roughly, how strongly outcomes depend on action. This paper proposes---correctly, I think---that the intuitive notion of "control" captures multiple dimensions in the relationship between action and outcome is multi-dimensional. In particular, the authors propose that the degree to which outcome depends on how much *effort* we exert, calling this dimension the "elasticity of control". They additionally propose that this dimension (rather than the more holistic notion of controllability) may be specifically impaired in certain types of psychopathology. This idea thus has the potential to change how we think about mental disorders in a substantial way, and could even help us better understand how healthy people navigate challenging decision-making problems.

      Unfortunately, my view is that neither the theoretical nor empirical aspects of the paper really deliver on that promise. In particular, most (perhaps all) of the interesting claims in the paper have weak empirical support.

      We appreciate the Reviewer's thoughtful engagement with our research and recognition of the potential significance of distinguishing between different dimensions of control in understanding psychopathology. We believe that all the Reviewer’s comments can be addressed with clarifications or additional analyses, as detailed below.  

      Starting with theory, the elasticity idea does not truly "extend" the standard control model in the way the authors suggest. The reason is that effort is simply one dimension of action. Thus, the proposed model ultimately grounds out in how strongly our outcomes depend on our actions (as in the standard model). Contrary to the authors' claims, the elasticity of control is still a fixed property of the environment. Consistent with this, the computational model proposed here is a learning model of this fixed environmental property. The idea is still valuable, however, because it identifies a key dimension of action (namely, effort) that is particularly relevant to the notion of perceived control. Expressing the elasticity idea in this way might support a more general theoretical formulation of the idea that could be applied in other contexts. See Huys & Dayan (2009), Zorowitz, Momennejad, & Daw (2018), and Gagne & Dayan (2022) for examples of generalizable formulations of perceived control.

      We thank the Reviewer for the suggestion that we formalize our concept of elasticity to resource investment, which we agree is a dimension of action. We first note that we have not argued against the claim that elasticity is a fixed property of the environment. We surmise the Reviewer might have misread our statement that “controllability is not a fixed property of the environment”. The latter statement is motivated by the observation that controllability is often higher for agents that can invest more resources (e.g., a richer person can buy more things). We clarify this in our revision of the manuscript in lines 8-15 (changes in bold): 

      “The degree of control we possess over our environment, however, may itself depend on the resources we are willing and able to invest. For example, the control a biker has over their commute time depends on the power they are willing and able to invest in pedaling. In this respect, a highly trained biker would typically have more control than a novice. Likewise, the control a diner in a restaurant has over their meal may depend on how much money they have to spend. In such situations, controllability is not fixed but rather elastic to available resources (i.e., in the same sense that supply and demand may be elastic to changing prices[14]).”

      To formalize elasticity, we build on Huys & Dayan’s definition of controllability1 as the fraction of reward that is controllably achievable, 𝜒 (though using information-theoretic definitions[2,3] would work as well). To the extent that this fraction depends on the amount of resources the agent is able and willing to invest (max 𝐶), this formulation can be probabilistically computed without information about the particular agent involved, specifically, by assuming a certain distribution of agents with different amounts of available resources. This would result in a probability distribution over 𝜒. Elasticity can thus be defined as the amount of information obtained about controllability due to knowing the amount of resources available to the agent: I(𝜒; max 𝐶). We have added this formal definition to the manuscript (lines 15-20): 

      “To formalize how elasticity relates to control, we build on an established definition of controllability as the fraction of reward that is controllably achievable[15], 𝜒. Uncertainty about this fraction could result from uncertainty about the amount of resources that the agent is able and willing to invest, 𝑚𝑎𝑥 𝐶. Elasticity can thus be defined as the amount of information obtained about controllability by knowing the amount of available resources: 𝐼(𝜒; 𝑚𝑎𝑥 𝐶).”

      Turning to experiment, the authors make two key claims: (1) people infer the elasticity of control, and (2) individual differences in how people make this inference are importantly related to psychopathology. Starting with claim 1, there are three sub-claims here; implicitly, the authors make all three. (1A) People's behavior is sensitive to differences in elasticity, (1B) people actually represent/track something like elasticity, and (1C) people do so naturally as they go about their daily lives. The results clearly support 1A. However, 1B and 1C are not supported. Starting with 1B, the experiment cannot support the claim that people represent or track elasticity because the effort is the only dimension over which participants can engage in any meaningful decision-making (the other dimension, selecting which destination to visit, simply amounts to selecting the location where you were just told the treasure lies). Thus, any adaptive behavior will necessarily come out in a sensitivity to how outcomes depend on effort. More concretely, any model that captures the fact that you are more likely to succeed in two attempts than one will produce the observed behavior. The null models do not make this basic assumption and thus do not provide a useful comparison.

      We appreciate the Reviewer's critical analysis of our claims regarding elasticity inference, which as detailed below, has led to an important new analysis that strengthens the study’s conclusions. However, we respectfully disagree with two of the Reviewer’s arguments. First, resource investment was not the only meaningful decision dimension in our task, since participant also needed to choose the correct vehicle to get to the right destination. That this was not trivial is evidenced by our exclusion of over 8% of participants who made incorrect vehicle choices more than 10% of the time. Included participants also occasionally erred in this choice (mean error rate = 3%, range [0-10%] now specified in lines 363-366). 

      Second, the experimental task cannot be solved well by a model that simply tracks how outcomes depend on effort because 20% of the time participants reached the treasure despite failing to board their vehicle of choice. In such cases, reward outcomes and control were decoupled. Participants could identify when this was the case by observing the starting location (since depending on the starting location, the treasure location could have been automatically reached by walking), which was revealed together with the outcome. To determine whether participants distinguished between control-related and non-control-related reward, we have now fitted a variant of our model to the data that allows learning from each of these kinds of outcomes by means of a different free parameter. The results show that participants learned considerably more from control-related outcomes. They were thus not merely tracking outcomes, but specifically inferred when outcomes can be attributed to control. We now include this new analysis in the revised manuscript (Methods lines 648-661):

      “To ascertain that participants were truly learning latent estimates of controllability rather than simpler associations, we conducted two complementary analyses.

      First, we implemented a simple Q-learning model that directly maps ticket quantities to expected values based on reward prediction errors, without representing latent controllability. This associative model performed substantially worse than even our simple controllability model (log Bayes Factor ≥ 1854 on the combined datasets). Second, we fitted a variant of the elastic controllability model that compared learning from control-related versus chance outcomes via separate parameters (instead of assuming no learning from chance outcomes). Chance outcomes were observed by participants in the 20% of trials where reward and control were decoupled, in the sense that participants reached the treasure regardless of whether they boarded their vehicle of choice. Results showed that participants learned considerably more from control-related, as compared to chance, outcomes (mean learning ratio=1.90, CI= [1.83, 1.97]). Together, these analyses show that participants were forming latent controllability estimates rather than direct action-outcome associations.”

      Controllability inference by itself, however, still does not suffice to explain the observed behavior. This is shown by our ‘controllability’ model, which learns to invest more resources to improve control, yet still fails to capture key features of participants’ behavior, as detailed in the manuscript. This means that explaining participants’ behavior requires a model that not only infers controllability—beyond merely outcome probability—but also assumes a priori that increased effort could enhance control. Building these a priori assumption into the model amounts to embedding within it an understanding of elasticity – the idea that control over the environment may be increased by greater resource investment. 

      That being said, we acknowledge the value in considering alternative computational formulations of adaptation to elasticity, as now expressed in the revised discussion (lines 326-333; reproduced below in response to the Reviewer’s comment on updating controllability beliefs when losing with less than 3 tickets).

      For 1C, the claim that people infer elasticity outside of the experimental task cannot be supported because the authors explicitly tell people about the two notions of control as part of the training phase: "To reinforce participants' understanding of how elasticity and controllability were manifested in each planet, [participants] were informed of the planet type they had visited after every 15 trips." (line 384).

      We thank the Reviewer for highlighting this point. We agree that our experimental design does not test whether people infer elasticity spontaneously. However, our research question was whether people can distinguish between elastic and inelastic controllability. The results strongly support that they can, and this does have potential implications for behavior outside of the experimental task. Specifically, to the extent that people are aware that in some contexts additional resource investment improves control, whereas in other contexts it does not, then our results indicate that they would be able to distinguish between these two kinds of contexts through trial-and-error learning. That said, we agree that investigating whether and how people spontaneously infer elasticity is an interesting direction for future work. We have now added this to the discussion of future directions (lines 287-295):

      “Additionally, real life typically doesn’t offer the streamlined recurrence of homogenized experiences that makes learning easier in experimental tasks, nor are people systematically instructed and trained about elastic and inelastic control in each environment. These complexities introduce substantial additional uncertainty into inferences of elasticity in naturalistic settings, thus allowing more room for prior biases to exert their influences. The elasticity biases observed in the present studies are therefore likely to be amplified in real-life behavior. Future research should examine how these complexities affect judgments about the elasticity of control to better understand how people allocate resources in real-life.”

      Finally, I turn to claim 2, that individual differences in how people infer elasticity are importantly related to psychopathology. There is much to say about the decision to treat psychopathology as a unidimensional construct. However, I will keep it concrete and simply note that CCA (by design) obscures the relationship between any two variables. Thus, as suggestive as Figure 6B is, we cannot conclude that there is a strong relationship between Sense of Agency and the elasticity bias---this result is consistent with any possible relationship (even a negative one). The fact that the direct relationship between these two variables is not shown or reported leads me to infer that they do not have a significant or strong relationship in the data.

      We agree that CCA is not designed to reveal the relationship between any two variables. However, the advantage of this analysis is that it pulls together information from multiple variables. Doing so does not treat psychopathology as unidimensional. Rather, it seeks a particular dimension that most strongly correlates with different aspects of task performance.

      This is especially useful for multidimensional psychopathology data because such data are often dominated by strong correlations between dimensions, whereas the research seeks to explain the distinctions between the dimensions. Similar considerations apply to the multidimensional task parameters, which although less correlated, may still jointly predict the relevant psychopathological profile better than each parameter does in isolation. Thus, the CCA enabled us to identify a general relationship between task performance and psychopathology that accounts for different symptom measures and aspects of controllability inference. 

      Using CCA can thus reveal relationships that do not readily show up in two-variable analyses. Indeed, the direct correlation between Sense of Agency (SOA) and elasticity bias was not significant – a result that, for completeness, we now report in Supplementary Figure 3 along with all other direct correlations. We note, however, that the CCA analysis was preregistered and its results were replicated. Additionally, participants scoring higher on the psychopathology profile also overinvested resources in inelastic environments but did not futilely invest in uncontrollable environments (Figure 6A), providing external validation to the conclusion that the CCA captured meaningful variance specific to elasticity inference. Most importantly, an auxiliary analysis specifically confirmed the contributions of both elasticity bias (Figure 6D, middle plot) and, although not reported in the original paper, of the Sense of Agency score (SOA; p=.03 permutation test; see updated Figure 6D, bottom plot) to the observed canonical correlation. The results thus enable us to safely conclude that differences in elasticity inferences are significantly associated with a profile of control-related psychopathology to which SOA contributed significantly. We now report this when presenting the CCA results (lines 255-257): 

      “Loadings on the side of psychopathology were dominated by an impaired sense of agency (SOA; contribution to canonical correlation: p=.03, Figure 6D, bottom plot), along with obsessive compulsive symptoms (OCD), and social anxiety (LSAS) – all symptoms that have been linked to an impaired sense of control[22-25].”

      Finally, whereas interpretation of individual CCA loadings that were not specifically tested remains speculative, we note that the pattern of loadings largely replicated across the initial and replication studies (see Figure 6B), and aligns with prior findings. For instance, the positive loadings of SOA and OCD match prior suggestions that a lower sense of control leads to greater compensatory effort7, whereas the negative loading for depression scores matches prior work showing reduced resource investment in depression[5-6].

      We have now revised the manuscript to clarify the justification for our analytical approach (lines 236-248):

      “To examine whether the individual biases in controllability and elasticity inference have psychopathological ramifications, we assayed participants on a range of self-report measures of psychopathologies previously linked to a distorted sense of control (see Methods, pg. 24). Examining the direct correlations between model parameters and psychopathology measures (reported in Supplementary Figure 3) does not account for the substantial variance that is typically shared among different forms of psychopathology. For this reason, we instead used a canonical correlation analysis (CCA) to identify particular dimensions within the parameter and psychopathology spaces that most strongly correlate with one another.”

      We also now include a cautionary note in the discussion (lines 309-315):

      “Whereas our pre-registered CCA effectively identified associations between task parameters and a psychopathological profile, this analysis method does not directly reveal relationships between individual variables. Auxiliary analyses confirmed significant contributions of both elasticity bias and sense of agency to the observed canonical correlation, but the contribution of other measures remains to be determined by future work. Such work could employ other established measures of agency, including both behavioral indices and subjective self-reports, to better understand how these constructs relate across different contexts and populations.”

      There is also a feature of the task that limits our ability to draw strong conclusions about individual differences in elasticity inference. As the authors clearly acknowledge, the task was designed "to be especially sensitive to overestimation of elasticity" (line 287). A straightforward consequence of this is that the resulting *empirical* estimate of estimation bias (i.e., the gamma_elasticity parameter) is itself biased. This immediately undermines any claim that references the directionality of the elasticity bias (e.g. in the abstract). Concretely, an undirected deficit such as slower learning of elasticity would appear as a directed overestimation bias. When we further consider that elasticity inference is the only meaningful learning/decisionmaking problem in the task (argued above), the situation becomes much worse. Many general deficits in learning or decision-making would be captured by the elasticity bias parameter. Thus, a conservative interpretation of the results is simply that psychopathology is associated with impaired learning and decision-making.

      We apologize for our imprecise statement that the task was ‘especially sensitive to overestimation of elasticity’, which justifiably led to Reviewer’s concern that slower elasticity learning can be mistaken for elasticity bias. To make sure this was not the case, we made use of the fact that our computational model explicitly separates bias direction (𝜆) from the rate of learning through two distinct parameters, which initialize the prior concentration and mean of the model’s initial beliefs concerning elasticity (see Methods pg. 23). The higher the concentration of the initial beliefs (𝜖), the slower the learning. Parameter recovery tests confirmed that our task enables acceptable recovery of both the bias λ<sub>elasticity</sub> (r=.81) and the concentration 𝜖<sub>elasticity</sub> (r=.59) parameters. And importantly, the level of confusion between the parameters was low (confusion of 0.15 for 𝜖<sub>elasticity</sub> → λ<sub>elasticity</sub> and 0.04 for λ<sub>elasticity</sub>→ 𝜖<sub>elasticity</sub> This result confirms that our task enables dissociating elasticity biases from the rate of elasticity learning. 

      Moreover, to validate that the minimal level of confusion existing between bias and the rate of learning did not drive our psychopathology results, we re-ran the CCA while separating concentration from bias parameters. The results (figure below) demonstrate that differences in learning rate (𝜖) had virtually no contribution to our CCA results, whereas the contribution of the pure bias (𝜆) was preserved. 

      We now report on this additional analysis in the text (lines 617-627):

      “To capture prior biases that planets are controllable and elastic, we introduced parameters λ<sub>controllability</sub> and λ<sub>elasticity</sub>, each computed by multiplying the direction (λ – 0.5) and strength (ϵ) of individuals’ prior belief. 𝜖<sub>controllability</sub> and 𝜖<sub>elasticity</sub> range between 0 and 1, with values above 0.5 indicating a bias towards high controllability or elasticity, and values below 0.5 indicating a bias towards low controllability or elasticity. 𝜖<sub>controllability</sub> and 𝜖<sub>elasticity</sub> are positively valued parameters capturing confidence in the bias. Parameter recovery analyses confirmed both good recoverability (see S2 Table) and low confusion between bias direction and strength (𝜖<sub>controllability</sub> → λ<sub>controllability</sub> = −. 07, λ<sub>controllability</sub> → 𝜖<sub>controllability</sub> =. 16, 𝜖<sub>elasticity</sub> → λ<sub>elasticity</sub> =. 15, λ<sub>elasticity</sub> → 𝜖<sub>elasticity</sub> =. 04), ensuring that observed biases and their relation to psychopathology do not merely reflect slower learning (Supplementary Figure 4), which can result from changes in bias strength but not direction.”

      We also more precisely articulate the impact of providing participants with three free tickets at their initial visits to each planet.

      Showing that a model parameter correlates with the data it was fit to does not provide any new information, and cannot support claims like "a prior assumption that control is likely available was reflected in a futile investment of resources in uncontrollable environments." To make that claim, one must collect independent measures of the assumption and the investment.

      We apologize if this and related statements seemed to be describing independent findings. They were meant to describe the relationship between model parameters and model-independent measures of task performance. It is inaccurate, though, to say that they provide no new information, since results could have been otherwise. For instance, whether a higher controllability bias maps onto resource misallocation in uncontrollable environments (as we observed) depends on the range of this parameter in our population sample. Had the range been more negative, a higher controllability bias could have instead manifested as optimal allocation in controllable environments. Additionally, these analyses serve two other purposes: as a validity check, confirming that our computational model effectively captured observed individual differences, and as a help for readers to understand what each parameter in our model represents in terms of observable behavior. We now better clarify the descriptive purposes of these regressions (lines 214-220, 231-235): 

      “To clarify how fitted model parameters related to observable behavior, we regressed participants’ opt-in rates and extra ticket purchases on the parameters (Figure 6A) ...”

      “... In sum, the model parameters captured meaningful individual differences in how participants allocated their resources across environments, with the controllability parameter primarily explaining variance in resource allocation in uncontrollable environments, and the elasticity parameter primarily explaining variance in resource allocation in environments where control was inelastic.”

      Did participants always make two attempts when purchasing tickets? This seems to violate the intuitive model, in which you would sometimes succeed on the first jump. If so, why was this choice made? Relatedly, it is not clear to me after a close reading how the outcome of each trial was actually determined.

      We thank the Reviewer for highlighting the need to clarify these aspects of the task in the revised manuscript. 

      When participants purchased two extra tickets, they attempted both jumps, and were never informed about whether either of them succeeded. Instead, after choosing a vehicle and attempting both jumps, participants were notified where they arrived at. This outcome was determined based on the cumulative probability of either of the two jumps succeeding. Success meant that participants arrived at where their chosen vehicle goes, whereas failure meant they walked to the nearest location (as determined by where they started from). 

      Though it is unintuitive to attempt a second jump before seeing whether the first succeed, this design choice ensured two key objectives. First, that participants would consistently need to invest not only more money but also more effort and time in planets with high elastic controllability. Second, that the task could potentially generalize to the many real-world situations where the amount of invested effort has to be determined prior to seeing any outcome, for instance, preparing for an exam or a job interview. We now explicitly state these details when describing the experimental task (lines 393-395):

      “When participants purchased multiple tickets, they made all boarding attempts in sequence without intermediate feedback, only learning whether they successfully boarded upon reaching their final destination. This served two purposes. First, to ensure that participants would consistently need to invest not only more money but also more effort and time in planets with high elastic controllability. Second, to ensure that results could potentially generalize to the many real-world situations where the amount of invested effort has to be determined prior to seeing any outcome (e.g., preparing for an exam or a job interview).”

      It should be noted that the model is heuristically defined and does not reflect Bayesian updating. In particular, it overestimates control by not using losses with less than 3 tickets (intuitively, the inference here depends on your beliefs about elasticity). I wonder if the forced three-ticket trials in the task might be historically related to this modeling choice.

      We apologize for not making this clear, but in fact losing with less than 3 tickets does reduce the model’s estimate of available control. It does so by increasing the elasticity estimates (a<sub>elastic≥1</sub>,a<sub>elastic2</sub> parameters), signifying that more tickets are needed to obtain the maximum available level of control, thereby reducing the average controllability estimate across ticket investment options. We note this now in the presentation of the computational model (caption Figure 4):

      “A failure to board does not change estimated maximum controllability, but rather suggests that 1 ticket might not suffice to obtain control (a<sub>elastic≥1</sub> + 1; 𝑙𝑖𝑔ℎ𝑡 𝑔𝑟𝑒𝑒𝑛 𝑑𝑖𝑚𝑖𝑛𝑖𝑠ℎ𝑒𝑑). As a result, the model’s estimate of average controllability across ticket options is reduced.”

      It would be interesting to further develop the model such that losing with less than 3 tickets would also impact inferences concerning the maximum available control, depending on present beliefs concerning elasticity, but the forced three-ticket purchases already expose participants to the maximum available control, and thus, the present data may not be best suited to test such a model. These trials were implemented to minimize individual differences concerning inferences of maximum available control, thereby focusing differences on elasticity inferences. We now explicitly address these considerations in the revised discussion (lines 326-333) with the following: 

      “Future research could explore alternative models for implementing elasticity inference that extend beyond our current paradigm. First, further investigation is warranted concerning how uncertainty about controllability and its elasticity interact. In the present study, we minimized individual differences in the estimation of maximum available control by providing participants with three free tickets at their initial visits to each planet. We made this design choice to isolate differences in the estimation of elasticity, as opposed to maximum controllability. To study how these two types of estimations interact, future work could benefit from modifying this aspect of our experimental design.”

      Furthermore, we have now tested a Bayesian model suggested by Reviewer 1, but we found that this model fitted participants’ choices worse (see details in the response to Reviewer 1’s comments). 

      Recommendations for the authors:

      Reviewer 1 (Recommendations for the authors):

      In the introduction, the definition of controllability and elasticity, and the scope of "resources" investigated in the current study were unclear. If I understand correctly, controllability is defined as "the degree to which actions influence the probability of obtaining a reward", and elasticity is defined as the change in controllability based on invested resources. This would define the controllability of the environment and the elasticity of controllability of the environment. However, phrases such as "elastic environment" seem to imply that elasticity can directly attach to an environment, instead of attaching to the controllability of the environment.

      We thank the Reviewer for highlighting the need to clarify our conceptualization of elasticity and controllability. We now provide formal definitions of both, with controllability defined as the fraction of controllably achievable reward[1], and elasticity as the reduction in uncertainty about controllability due to knowing the amount of resources the agent is willing and able to invest (see further details in the response to Reviewer 3’s public comments). In the revised manuscript, we now use more precise language to clarify that elasticity is a property of controllability, not of environments themselves. In addition, we now clarify that the current study manipulated monetary, attentional effort, and time costs together (see further details in the response to Reviewer 1’s public comments).   

      (2) Some of the real-world examples were confusing. For example, the authors mention that investing additional effort due to the belief that this leads to better outcomes in OCD patients is overestimated elasticity, but exercising due to the belief that this can make one taller is overestimated controllability. What's the distinction between the examples? The example of the chess expert practicing to win against a novice, because the amount of effort they invest would not change their level of control over the outcome is also unclear. If the control over the outcome depends on their skill set, wouldn't practicing influence the control over the outcome? In the case of the meeting time example, wouldn't the bus routes differ in their time investments even though they are the same price? In addition to focusing the introductory examples around monetary resources, I would also generally recommend tightening the link between those examples and the experimental task.

      We thank the Reviewer for highlighting the need to clarify the examples used to illustrate elasticity and controllability. We have now revised these examples to more clearly distinguish between the concepts and to strengthen their connection to the experimental task.

      Regarding the OCD example, the possibility that OCD patients overestimate elasticity comes from research suggesting they experience low perceived control but nevertheless engage in excessive resource investment2, reflecting a belief that only through repeated and intense effort can they achieve sufficient control over outcomes. As an example, consider an OCD patient investing unnecessary effort in repeatedly locking their door. This behavior cannot result from an overestimation of controllability because controllability truly is close to maximal. It also cannot result from an underestimation of the maximum attainable control, since in that case investing more effort is futile. Such behavior, however, can result from an overestimation of the degree to which controllability requires effort (i.e., overestimation of elasticity). 

      Similarly, with regards to the chess expert, we intended to illustrate a situation where given their current level, the chess expert is already virtually guaranteed to win, such that additional practice time does not improve their chances. Conversely, the height example illustrates overestimated controllability because the outcome (becoming taller through exercise) is in fact not amenable to control through any amount of resource investment.

      Finally, the meeting time example was meant to illustrate that if the desired outcome is reaching a meeting in time, then different bus routes that cost the same provide equal control over this outcome to anyone who can afford the basic fare. This demonstrates inelastic controllability with respect to money, as spending more on transportation doesn't increase the probability of reaching the meeting on time. The Reviewer correctly notes that time investment may differ between routes. However, investing more time does not improve the expected outcome. This illustrates that inelastic controllability does not preclude agents from investing more resources, but such investment does not increase the fraction of controllably achievable reward (i.e., the probability of reaching the meeting in time).

      In the revised manuscript, we’ve refined each of the above examples to better clarify the specific resources being considered, the outcomes they influence, and their precise relationship to both elasticity and controllability: 

      OCD (lines 40-43): Conversely, the repetitive and unusual amount of effort invested by people with obsessive-compulsive disorder in attempts to exert control[23,24] could indicate an overestimation of elasticity, that is, a belief that adequate control can only be achieved through excessive and repeated resource investment[25].  

      Chess expert (54-57): Alternatively, they may do so because they overestimate the elasticity of control – for example, a chess expert practicing unnecessarily hard to win against a novice, when their existing skill level already ensures control over the match's outcome.

      Height (lines 53-54): A given individual, for instance, may tend to overinvest resources because they overestimate controllability – for example, exercising due to a misguided belief that that this can make one taller, when in fact height cannot be controlled. 

      Meeting time (lines 26-28): Choosing between bus routes affords equal control over commute time to anyone who can afford the basic fare (Figure 1).

      Methods

      (1) In the elastic controllability model definition, controllability is defined as "the belief that boarding is possible" (with any number of tickets). The definition again is different from in the task description where controllability is defined as "the probability of the chosen vehicle stopping at the platform if purchasing a single ticket."

      We clarify that "the probability of the chosen vehicle stopping at the platform if purchasing a single ticket" is our definition for inelastic controllability, as opposed to overall/maximum controllability, as stated here (lines 101-103):

      "We defined inelastic controllability as the probability that even one ticket would lead to successfully boarding the vehicle, and elastic controllability as the degree to which two extra tickets would increase that probability."

      Overall controllability is the summation of the two. This summation is referred to in the elastic controllability model definition as the "the belief that boarding is possible". We now clarify this in the caption to figure 4:

      Elastic Controllability model: Represents beliefs about maximum controllability (black outline) and the degree to which one or two extra tickets are necessary to obtain it. These beliefs are used to calculate the expected control when purchasing 1 ticket (inelastic controllability) and the additional control afforded by 2 and 3 tickets (elastic controllability).    

      We also clarify this in the methods when describing the parameterization of the model (lines 529-531): 

      The expected value of one beta distribution (defined by a,sub>control</sub>, b,sub>control</sub>) represents the belief that boarding is possible (controllability) with any number of tickets. 

      (2) The free parameter K is confusing. What is the psychological meaning of this parameter? Is it there just to account for the fact that failure with 3 tickets made participants favor 3 tickets or is there meaning attached to including this parameter?

      This parameter captures how participants update their beliefs about resource requirements after failing to board with maximum resource investment. Our psychological interpretation is that participants who experience failure despite maximum investment (3 tickets) prioritize resolving uncertainty about whether control is fundamentally possible (before exploring whether control is elastic), which can only be determined by continuing to invest maximum resources. 

      We now clarify this in the methods (lines 555-559):

      To account for our finding that failure with 3 tickets made participants favor 3, over 1 and 2, tickets, we introduced a modified elastic controllability* model, wherein purchasing extra tickets is also favored upon receiving evidence of low controllability (loss with 3 tickets). This effect was modulated by a free parameter 𝜅 which reflects a tendency to prioritize resolving uncertainty about whether control is at all possible by investing maximum resources.

      This interpretation is supported by our analysis of 3-ticket choice trajectories (Supplementary Figure 2 presented in response to Reviewer 2). As shown in the figure, participants who win less than 50% of their 3-ticket attempts persistently purchase 3 tickets over the first 10 trials, despite frequent failures. This persistence gradually declines as participants accumulate evidence about their limited control, corresponding with an increase in opt-out rates.

      (3) Some additional details about the task design would be helpful. It seems that participants first completed 90 practice trials and were informed of the planet type every 15 trials (6 times during practice). What message is given to the participants about the planets? Did the authors analyze the last 15 trials of each condition in the regression analysis, and all 30 trials in the modeling analysis? How does the computational model (especially the prior beliefs parameters) reset when the planet changes? How do points accumulate over the session and/or are participants motivated to budget the points? Is it possible for participants to accumulate many points and then switch to a heuristic of purchasing 3 tickets on each trial?

      We apologize for not previously clarifying these details of the experimental design.

      During practice blocks, participants received explicit feedback about each planet's controllability characteristics, to help them understand when additional resources would or would not improve their boarding success. For high inelastic controllability planets, the message read: "Your ride actually would stop for you with 1 ticket! So purchasing extra tickets, since they do cost money, is a WASTE." For low controllability planets: "Doesn't seem like the vehicle stops for you nor does purchasing extra tickets help." Lastly, for high elastic controllability planets: "Hopefully by now it's clear that only by purchasing 3 tickets (LOADING AREA) are you consistently successful in catching your ride." We now include these messages in the methods section describing the task (lines 453-458).

      We indeed analyzed the last 15 trials of each condition in the regression analysis, and all 30 trials in the modeling analysis. Whereas the modeling attempted to explain participants’ learning process, the regression focused on explaining the resultant behavior, which in our pilot data (N=19), manifested fairly stably in the last 15 trials (ticket choices SD = 0.33 compared to .63 in the first 15 trials). The former is already stated in the text (lines 409-415), and we now also clarify the latter when discussing the model fitting procedure (line 695): 

      Reinforcement-learning models were fitted to all choices made by participants via an expectation maximization approach used in previous work.

      The computational model was initialized with the same prior parameters for all planets. When a participant moved to a new planet, the model's beliefs were reset to these prior values, capturing how participants would approach each new environment with their characteristic expectations about controllability and elasticity. We now clarify this in the methods (line 628): 

      For each new planet participants encountered, these parameters were used to initialize the beta distributions representing participants’ beliefs

      Points accumulated across all planets throughout the session, with participants explicitly motivated to maximize their total points as this directly determined their monetary bonus payment. To address the Reviewer's question about changes in ticket purchasing behavior, we conducted a mixed probit regression examining whether accumulated points influenced participants’ decisions to purchase extra tickets. We did not find such an effect (𝛽<sub>coins accumulated</sub> \= .01 𝑝 = .87), indicating that participants did not switch to simple heuristic strategies after accumulating enough coins. We now report this analysis in the methods (lines 421-427):

      Points accumulated across all planets throughout the session, with participants explicitly motivated to maximize their total points as this directly determined their monetary bonus payment. To ensure that accumulated gains did not lead participants to adopt a simple heuristic strategy of always purchasing 3 tickets, we conducted a mixed probit regression examining whether the number of accumulated coins influenced participants' decisions to purchase extra tickets. We did not find such an effect (𝛽<sub>coins accumulated</sub> = .01 𝑝 = .87), ruling out the potential strategy shift.

      Following the modeling section, it may be helpful to have a table of the fitted models, the parameters of each model, and the meaning/interpretation of each parameter.

      We thank the Reviewer for this suggestion. We have now added a table (Supplementary Table 3) that summarizes all fitted models, their parameters, and the meaning/interpretation of each parameter.

      (1) The conclusions from regressing the task choices (opt-in rates and ticket purchases) on the fitted parameters seem confusing given that the model parameters were fitted on the task behavior, and the relationship between these variables seems circular. For example, the authors found that preferences for purchasing 2 or 3 tickets (a2 and a3; computational parameters) were associated with purchasing more tickets (task behavior). But wouldn't this type of task behavior be what the parameters are explaining? It's not clear whether these correlation analyses are about how individuals allocate their resources or about the validity check of the parameters. Perhaps analyses on individual deviation from the optimal strategy and parameter associations with such deviation are better suited for the questions about whether individual biases lead to resource misallocation.

      We thank the Reviewer for highlighting this seeming confusion. These regressions were meant to describe the relationship between model parameters and model-independent measures of task performance. This serves three purposes. First, a validity check, confirming that our computational model effectively captured observed individual differences. Second, to help readers understand what each parameter in our model represents in terms of observable behavior. Third, to examine in greater detail how parameter values specifically mapped onto observable behavior. For instance, whether a higher controllability bias maps onto resource misallocation in uncontrollable environments (as we observed) depends on the range of this parameter in our population sample. Had the range been more negative, a higher controllability bias could have instead manifested as optimal allocation in controllable environments. We now better clarify the descriptive purposes of these regressions (lines 214-220, 231-235): 

      To clarify how fitted model parameters related to observable behavior, we regressed participants’ opt-in rates and extra ticket purchases on the parameters (Figure 6A) ... 

      ... In sum, the model parameters captured meaningful individual differences in how participants allocated their resources across environments, with the controllability parameter primarily explaining variance in resource allocation in uncontrollable environments, and the elasticity parameter primarily explaining variance in resource allocation in environments where control was inelastic.  

      Regarding the suggestion to analyze deviation from optimal strategy, this corresponds with our present approach in that opting in is always optimal in high controllability environments and always non-optimal in low controllability environments, and similarly, purchasing extra tickets is always optimal in elastic controllability environments and always non-optimal elsewhere. Thus, positive or negative coefficients can be directly translated into closer or farther from optimal, depending on the planet type, as indicated in the figure by color. We now clarify this mapping in the figure legend:

      (2) Minor: The legend of Figure 6A is difficult to read. It might be helpful to label the colors as their planet types (low controllability, high elastic controllability, high inelastic controllability).

      We thank the Reviewer for this helpful suggestion. We have revised the figure accordingly.

      Reviewer 2 (Recommendations for the authors):

      As noted above, I'm not sure I agree with (or perhaps don't fully understand) the claims the authors make about the distinctions between their "elastic" and "inelastic" experimental conditions. Let's take the travel example from Figure 1 - is this not just an example of “hierarchical” controllability calculations? In other words, in the elastic example, my choice is between going one speed or another (i.e., exerting more or less effort), and in the inelastic example, my choice is first, which route to take (also a consideration of speed, but with lower effort costs than the elastic scenario), and second, an estimate of the time cost (not within my direct control, but could be estimated). In the elastic scenarios, additional value considerations vary between options, and in others (inelastic), they don't, with control over the first choice point (which bus route to choose, or which lunch option to take), but not over the price. I wonder if the paper would be better framed (or emphasized) as exploring the influences of effort and related "costs" of control. There isn't really such a thing as controllability that does not have any costs associated with it (whether that be action costs, effort, money, or simply scenario complexity).

      We thank the Reviewer for highlighting the need to clarify our distinction between elastic and inelastic controllability as it manifests in our examples. We first clarify that elasticity concerns how controllability varies with resources, not costs. Though resource investment and costs are often tightly linked, that is not always the case, especially not when comparing between agents. For example, it may be equally difficult (i.e., costly) for a professional biker to pedal at a high speed as it is for a novice to pedal at a medium speed, simply because the biker’s muscles are better trained. This resource advantage increases the biker’s control over his commute time without incurring additional costs as compared to the novice. We now clarify this distinction in the text by revising our example to (lines 9-11): 

      “For example, the control a biker has over their commute time depends on the power they are willing and able to invest in pedaling. In this respect, a highly trained biker would typically have more control than a novice.”

      Second, whereas in our examples additional value considerations indeed vary in elastic environments, that does not have to be the case, and indeed, that is not the case in our experiment. In our experimental task, participants are given the option to purchase as many tickets as they wish regardless of whether they are in an elastic or an inelastic environment.  

      We agree that elastic environments often raise considerations regarding the cost of control (for instance, whether it is worth it to pedal harder to get to the destination in time). To consider this cost against potential payoffs, however, the agent must first determine what are the potential payoffs – that is, it must determine the degree to which controllability is elastic to invested resources. It is this antecedent inference that our experiment studies. We uniquely study this inference using environments where control may not only be low or high, but also, where high control may or may not require additional resource investments. We now clarify this point in Figure 1’s caption:

      “In all situations, agents must infer the degree to which controllability is elastic to be able to determine whether the potential gains in control outweigh the costs of investing additional resources (e.g., physical exertion, money spent, time invested).”

      For a formal definition of the elasticity of control, see our response to Reviewer 3’s public comments. 

      Relatedly, another issue I have with the distinctions between inelastic/elastic is that a high/elastic condition has inherently ‘more’ controllability than a high/inelastic condition, no matter what. For example, in the lunch option scenario, I always have more control in the elastic situation because I have two opportunities to exert choice (food option ‘and’ cost). Is there really a significant difference, then, between calling these distinctions "elastic/inelastic" vs. "higher/lower controllability?" Not that it's uninteresting to test behavioral differences between these two types of scenarios, just that it seems unnecessary to refer to these as conceptually distinct.

      As noted in the response above, control over costs may be higher in elastic environments, but it does not have to be so, as exemplified by the elastic environments in our experimental task. For a fuller explanation of why higher elasticity does not imply higher controllability, see our response to Reviewer 2’s public comments. 

      I also wonder whether it's actually the case that people purchased more tickets in the high control elastic condition simply because this is the optimal solution to achieve the desired outcome, not due to a preference for elastic control. To test this, you would need to include a condition in which people opted to spend more money/effort to have high elastic control in an instance where it was not beneficial to do so.

      We appreciate the Reviewer's question about potential preferences for elastic control. We first clarify that participants did not choose which environment type they encountered, so if control was low or inelastic, investing extra resources did not give them more control. Furthermore, our results show that the average participant did not prefer a priori to purchase more tickets. This is evidenced by participants’ successful adaptation to inelastic environments wherein they purchased significantly fewer tickets (see Figure 2B and 2C), and by participants’ parameter fits, which reveal an a priori bias to assume that controllability is inelastic (𝜆<sub>elasticity</sub> \= .16 ± .19), as well as a fixed preference against purchasing the full number of tickets (𝛼<sub>3</sub> \= −.74 ± .37). 

      We now clarify these findings by including a table of all parameter fits in the revised manuscript (see response to Reviewer 1). 

      It was interesting that the authors found that failure with 3 tickets made people more likely to continue to try 3 tickets, however, there is another possible interpretation. Could it be that this is simply evidence of a general controllability bias, where people just think that it is expected that you should be able to exert more money/effort/time to gain control, and if this initially fails, it is an unusual outcome, and they should try again? Did you look at this trajectory over time? i.e., whether repeated tries with 3 tickets immediately followed a failure with 3 tickets? Relatedly, does the perseveration parameter from the model also correlate with psychopathology?

      We thank the Reviewer for this suggestion. Our model accounts for a general controllability bias through the 𝜆<sub>controllability</sub> parameter, which represents a prior belief that planets are controllable. It also accounts, through the 𝜆<sub>elasticity</sub> parameter, for the prior belief that you should be able to exert more money/effort/time to gain control. Now, our addition of 𝜅 to the model captures the observation that failures with 3 tickets made participants more likely to purchase 3 tickets when they opted in. If this observation was due to participants not accepting that the planet is not controllable, then we would expect the increase in 3-ticket purchases when opting in to be coupled with a diminished reduction in opting in. To determine whether this was the case, we tested a variant of our model where 𝜅 not only increases the elasticity estimate but also reduces the controllability update (using 𝛽<sub>control</sub>+(1- 𝜅) instead of 𝛽<sub>control</sub>+1) after failures with 3 tickets. However, implementing this coupling diminished the model's fit to the data, as compared to allowing both effects to occur independently, indicating that the increase in 3 ticket purchases upon failing with 3 tickets did not result from participants not accepting that controllability is in fact low. Thus, we maintain our original interpretation that failure with 3 tickets increases uncertainty about whether control is possible at all, leading participants who continue to opt in to invest maximum resources to resolve this uncertainty. We now report these results in the revised text (lines 662-674). 

      The trajectory over time is consistent this interpretation (new Supplementary Figure 2 shown below). Specifically, we see that under low controllability (0-50%, orange line), over the first 10 trials participants show higher persistence with 3 tickets after failing, despite experiencing frequent failures, but also a higher opt-out probability. As these participants accumulate evidence about their limited control, we observe a gradual decrease in 3-ticket selections that corresponds directly with a further increase in opting out (right panel, orange line). This pattern qualitatively corresponds with the behavior of our computational model (empty circles). We present the results of the new analysis in lines 180-190: 

      “In fact, failure with 3 tickets even made participants favor 3, over 1 and 2, tickets. This favoring  of 3 tickets continued until participants accumulated sufficient evidence about their limited control to opt out (Supplementary Figure 2). Presumably, the initial failures with 3 tickets resulted in an increased uncertainty about whether it is at all possible to control one’s destination. Consequently, participants who nevertheless opted in invested maximum resources to resolve this uncertainty before exploring whether control is elastic.”

      Regarding correlations between the perseveration parameter and psychopathology, we have now conducted a comprehensive exploratory analysis of all two-way relationships between parameters and psychopathology scores (new Supplementary Figure 3). Whereas we observed modest negative correlations with social anxiety (LSAS, r=-0.13), cyclothymic temperament (r=0.13), and alcohol use (AUDIT, r=-0.13), none reached statistical significance after FDR correction for multiple comparisons. 

      Regarding the modeling, I also wondered whether a better alternative model than the controllability model would be a simple associative learning model, where a number of tickets are mapped to outcomes, regardless of elasticity.

      We thank the Reviewer for suggesting this alternative model. Following this suggestion, we implemented a simple associative learning model that directly maps each option to its expected value, without a latent representation of elasticity or controllability. Unlike our controllability model which learns the probability of reaching the goal state for each ticket quantity, this associative learning model simply updates option values based on reward prediction errors.

      We found that this simple Q-learning model performed worse than even the controllability model at explaining participant data (log Bayes Factor  ≥1854 on the combined datasets), further supporting our hypothesis that participants are learning latent estimates of control rather than simply associating options with outcomes. We present the results of this analysis in lines 662664:

      We implemented a simple Q-learning model that directly maps ticket quantities to expected values based on reward prediction errors, without representing latent controllability. This associative model performed substantially worse than even our simple controllability model (log Bayes Factor ≥ 1854 on the combined datasets).

      Reviewer 3 (Recommendations for the authors):

      Please make all materials available, including code (analysis and experiment) and data. Please also provide a link to the task or a video of a few trials of the main task.

      We thank the reviewer for this important suggestion. All requested materials are now available at https://github.com/lsolomyak/human_inference_of_elastic_control. This includes all experiment code, analysis code, processed data, and a video showing multiple sample trials of the main task.

      References

      (1)  Huys, Q. J. M., & Dayan, P. (2009). A Bayesian formulation of behavioral control. Cognition, 113(3), 314– 328.

      (2)  Ligneul, R. (2021). Prediction or causation? Towards a redefinition of task controllability. Trends in Cognitive Sciences, 25(6), 431–433.

      (3)  Mistry, P., & Liljeholm, M. (2016). Instrumental divergence and the value of control. Scientific Reports, 6, 36295.

      (4)  Lin, J. (1991). Divergence measures based on the Shannon entropy. IEEE Transactions on Information Theory, 37(1), 145–151

      (5)  Cohen RM, Weingartner H, Smallberg SA, Pickar D, Murphy DL. Effort and cognition in depression. Arch Gen Psychiatry. 1982 May;39(5):593-7. doi: 10.1001/archpsyc.1982.04290050061012. PMID: 7092490.

      (6)  Bi R, Dong W, Zheng Z, Li S, Zhang D. Altered motivation of effortful decision-making for self and others in subthreshold depression. Depress Anxiety. 2022 Aug;39(8-9):633-645. doi: 10.1002/da.23267. Epub 2022 Jun 3. PMID: 35657301; PMCID: PMC9543190.

      (7)  Tapal, A., Oren, E., Dar, R., & Eitam, B. (2017). The Sense of Agency Scale: A measure of consciously perceived control over one's mind, body, and the immediate environment. Frontiers in Psychology, 8, 1552

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Summary:

      There has been intense controversy over the generality of Hamilton's inclusive fitness rule for how evolution works on social behaviors. All generally agree that relatedness can be a game changer, for example allowing for otherwise unselectable altruistic behaviors when 𝑐 < 𝑟𝑏, where 𝑐 is the fitness cost to the altruism, 𝑏 is the fitness benefit to another, and 𝑟 their relatedness. Many complications have been successfully incorporated into the theory, including different reproductive values and viscous population structures.

      I agree, especially if by incorporating viscous population structures, the reviewer means the discovery of the cancellation effect (Wilson, Pollock, and Dugatkin, 1992, Taylor, 1992).

      The controversy has centered on another dimension; Hamilton's original model was for additive fitness, but how does his result hold when fitnesses are non-additive? One approach has been not to worry about a general result but just find results for particular cases. A consistent finding is that the results depend on the frequency of the social allele - nonadditivity causes frequency dependence that was absent in Hamilton's approach.

      Just to be extra precise: Hamilton’s (1964) original model did not use the Price equation nor the regression approach to define costs and benefits, and it did indeed simply presuppose fixed, additive fitness effects.

      Also for extra precision on terminology: many researchers will describe all fitnesses in social evolution as frequency dependent. The reason they do, is that with or without additivity, both the fitness of cooperators (with the social allele) and the fitness of defectors (without the social alle) typically increase in the frequency of cooperators in the population; the more cooperators there are, the more individuals run into them, which increases average fitness. The result depending on the frequency I take to mean that which of those two fitnesses is larger flips at a certain frequency, which automatically implies that the difference between them is depending on the frequency of the social allele. This is indeed the result of non-additivity. We will return to this in more detail in the response to Reviewer #3. Also at the end of Appendix B I have added a bit to be extra precise regarding frequency dependence.

      Two other approaches derive from Queller via the Price equation. Queller 1 is to find forms like Hamilton's rule, but with additional terms that deal with non-additive interaction, each with an r-like population structure variable multiplied by a b-like fitness effect (Queller, 1985). Queller 2 redefines the fitness effects c and b as partial regressions of the actor's and recipient's genes on fitness. This leaves Hamilton's rule intact, just with new definitions of c and b that depend on frequency (Queller, 1992a).

      Queller 2 is the version that has been most adopted by the inclusive fitness community along with assertions that Hamilton's rule in completely general. In this paper, van Veelen argues that Queller 1 is the correct approach. He derives a general form that Queller only hinted at. He does so within a more rigorous framework that puts both Price's equation and Hamilton's rule on firmer statistical ground. Within that framework, the Queller 2 approach is seen to be a statistical misspecification - it employs a model without interaction in cases that actually do have interaction. If we accept that this is a fatal flaw, the original version of Hamilton's rule is limited to linear fitness models, which might not be common.

      I totally agree.

      Strengths:

      While the approach is not entirely new, this paper provides a more rigorous approach and a more general result. It shows that both Queller 1 and Queller 2 are identities and give accurate results, because both are derived from the Price equation, which is an identity. So why prefer Queller 1? It identifies the misspecification issue with the Queller 2 approach and points out its consequences. For example, it will not give the minimum squared differences between the model and data. It does not separate the behavioral effects of the individuals from the population state (𝑏 and 𝑐 become dependent on 𝑟 and the population frequency).

      Just to be precise on a detail: in the data domain, as long as the number of parameters in a statistical model is lower than the number of data points, adding parameters typically (generically) lowers the sum of squared errors. That is to say, for an underspecified statistical model, the sum of squared errors goes down if a parameter is added, but for an already overspecified statistical model, the same is still true (although, typically, by how much the sum of squared errors is reduced will differ). The model specification task for a statistician includes knowing when to keep adding parameters, because the data suggest that the model is still underspecified, and when to stop adding parameters, because the model is well-specified, even if adding parameters still reduces the sum of squared errors.

      In a modeling context, on the other hand, one can say that sum of squared differences will stop decreasing at the point where the statistical model is well-specified, that is: when it matches the model we are considering.

      The paper also shows how the same problems can apply to non-social traits. Epistasis is the non-additivity of effects of two genes within the individual. (So one wonders why have we not had a similarly fierce controversy over how we should treat epistasis?)

      The paper is clearly written. Though somewhat repetitive, particularly in the long supplement, most of that repetition has the purpose of underscoring how the same points apply equally to a variety of different models.

      Finally, this may be a big step towards reconciliation in the inclusive fitness wars. Van Veelen has been one of the harshest critics of inclusive fitness, and now he is proposing a version of it.

      I am very happy to hear this, because I am indeed hopeful for reconciliation. I would like to add a comment, though. The debate on Hamilton’s rule/inclusive fitness is regularly thought of as a battle between two partizan camps, where both sides care at least as much about winning as they do about getting things right. This is totally understandable, because to some degree that is true. Also, I agree that it is fair to position me in the camp that is critical of the inclusive fitness literature. However, I would like to think that I have not been taking random shots at Hamilton’s rule. I have pointed to problems with the typical use of the Price equation and Hamilton’s rule, and I think I did for very good reasons. I am obviously very happy that finding the Generalized Price equation, and the general version of Hamilton’s rule, allowed me to go beyond this, and (finally) offer a correct alternative, and I totally appreciate that this opens the door for reconciliation, as this reviewer points out. But I would not describe this as a road-toDamascus moment. In order to illustrate the continuity in my work, I would like to point to three papers.

      In van Veelen (2007), I pointed to the missing link between the central result in Hamilton’s (1964) famous paper (which states that selection dynamics take the population to a state where mean inclusive fitness is maximized), and Hamilton’s actual rule (which states that selection will lead to individuals maximizing their individual inclusive fitness). My repair stated the additional assumptions that were necessary to make the latter follow from the former. I would say that this can hardly be characterized as an attack on Hamilton’s rule. Reading Hamilton (1964) with enough care to notice something is missing, and then repairing it, I think is a sign of respect, and not an attack.

      Van Veelen (2011) is about the replicator dynamics for n-player games, with the possibility of assortment. This puts the paper in a domain that does not assume weak selection, and that is typically not much oriented towards inclusive fitness. I included a theorem that implies that, under the condition of linearity, inclusive fitness not only gets the direction of selection right, but 𝑟𝑏 − 𝑐 becomes a parameter that also determines the speed of selection. This I think is representative, in the sense that in many of my papers, I carefully stake out when the classic version of Hamilton’s rule does work.

      In Akdeniz and van Veelen (2020), we moreover take a totally standard inclusive fitness approach in a model of the cancellation effect at the group level.

      I would say that this does not line up with the image of a harsh critic that takes random shots at Hamilton’s rule or inclusive fitness.

      Weaknesses:

      van Veelen argues that the field essentially abandoned the Queller 1 approach after its publication. I think this is putting it too strongly - there have been a number of theoretical studies that incorporate extra terms with higher-order relatednesses. It is probably accurate to say that there has been relative neglect. But perhaps this is partly due to a perception that this approach is difficult to apply.

      I can imagine that the perceived difficulty in application may have played a role in the neglect of the Queller 1 approach. What for sure has played a role, and I would think a much bigger one, is that the literature has been pretty outspoken that the Queller 1 approach is the wrong way to go. The main text cites a number of papers that hold this position very emphatically (The first one of those was a News and Views by Alan Grafen (1985) that accompanied the paper in which Queller presented his Queller 1 approach. I am very happy that Appendix B shows on how many levels this News and Views was wrong.). There is only a handful of papers that follow the Queller 1 example.

      The model in this paper is quite elegant and helps clarify conceptual issues, but I wonder how practical it will turn out to be. In terms of modeling complicated cases, I suspect most practitioners will continue doing what they have been doing, for example using population genetics or adaptive dynamics, without worrying about neatly separating out a series of terms multiplying fitness coefficients and population structure coefficients.

      I am not sure if I see what the reviewer envisions practitioners that use population genetics will keep on doing. I would think that the Generalized Price equation in regression form is a description of population genetic dynamics, and therefore, if practitioners will not make an effort to “neatly separate out a series of terms multiplying fitness coefficients and population structure coefficients”, then all I can say is that they should. I cannot do more than explain why, if they do not, they are at risk of mischaracterizing what gets selected and why.

      Regarding those that use adaptive dynamics, I would say that this is a whole different approach. Within this approach, one can also apply inclusive fitness; see Section 6 and Appendix D of van Veelen et al. (2017). Appendix D is full of deep technical results and was done by Benjamin Allen.

      For empirical studies, it is going to be hard to even try to estimate all those additional parameters. In reality, even the standard Hamilton's rule is rarely tested by trying to estimate all its parameters. Instead, it is commonly tested more indirectly, for example by comparative tests of the importance of relatedness. That of course would not distinguish between additive and non-additive models that both depend on relatedness, but it does test the core idea of kin selection. It will be interesting to see if van Veelen's approach stimulates new ways of exploring the real world.

      Regarding the impact on empirical studies, there are a few things that I would like to say. The first is that I would just like to repeat, maybe a bit more elaborately, what I wrote at the end of the main text. Given that the generalized version of Hamilton’s rule produces a host of Hamilton-like rules, and given the fact that all of them by construction indicate the direction of selection accurately, the question whether or not Hamilton’s rule holds turns out to be illposed. That means that we can stop doing empirical tests of Hamilton’s rule, which are predicated on the idea that Hamilton’s rule, with benefits and costs being determined by the regression method, could be violated – which it cannot (Side note: it is possible to violate Hamilton’s rule, if costs and benefits are defined according to the counterfactual method; see van Veelen et al. (2017) and van Veelen (2018). This way of defining costs and benefits is less common, although there are authors that find this definition natural enough to assume that this is the way in which everybody defines costs and benefits (Karlin and Matessi, 1983, Matessi and Karlin, 1984).). Instead, we should do empirical studies to find out which version of Hamilton’s rule applies to which behaviour in which species.

      would like to not understate what a step forward this is. The size of the step forwards is of course also due to the dismal point of departure. As theorists, we have failed our empiricists, because all 12 studies included in the review by Bourke (2014) of papers that explicitly test Hamilton’s rule are based on the misguided idea that the traditional Hamilton’s rule, with costs and benefits defined according to the regression method, can be violated. While the field does sometimes have disdain for mathematical nit-picking, this is a point where a little more attention to detail would have really helped. If the hypothesis is that Hamilton’s rule holds, and the null is that it does not, then trying to specify how the empirical quantity that reflects inclusive fitness would be distributed under the null hypothesis (in order to do the right statistical tests) would have forced researchers to do something with the information that this quantity is not distributed at all, because Hamilton’s rule is general (in the sense that it holds for any way in which the world works). If one would prefer to reverse the null and the alternative hypothesis, one would run into similar problems. Understanding that the question is ill-posed therefore is a big step forwards from the terrible state of statistics and the waste of research time, attention and money on the empirical side of this field (see also Section 8 of van Veelen et al., 2017).

      I would agree that doing comparative statics may not be much affected by this. Section 5 of van Veelen et al. (2017) indicates that there can be a large set of circumstances under which the general idea “relatedness up → cooperation up” still applies. But that may be a bit unambitious, and Section 8 of van Veelen et al. (2017), and the final section of van Veelen (2018) contain some reflections on empirical testing that may allow us to go beyond that. As long as there is change happening in the Generalized Price equation, the population is not in equilibrium. For empirical tests, one can either aim to capture selection as it happens, or assume that what we observe reflects properties of an equilibrium. This leads to interesting reflections on how to do empirics, which may differ between traits that are continuous and traits that are discrete (again: see van Veelen et al. (2017), and van Veelen (2018).

      Reviewer #2 (Public review):

      Summary:

      This manuscript reconsiders the "general form" of Hamilton's rule, in which "benefit" and "cost" are defined as regression coefficients. It points out that there is no reason to insist on Hamilton's rule of the form −𝑐 + 𝑏𝑟 > 0, and that, in fact, arbitrarily many terms (i.e. higherorder regression coefficients) can be added to Hamilton's rule to reflect nonlinear interactions. Furthermore, it argues that insisting on a rule of the form −𝑐 + 𝑏𝑟 > 0 can result in conditions that are true but meaningless and that statistical considerations should be employed to determine which form of Hamilton's rule is meaningful for a given dataset or model.

      Totally right. I cannot help to want to be extra precise, though, by distinguishing between the data domain and the modelling domain. In the data domain, statistical considerations apply in order to avoid misspecification. In this domain, avoiding misspecification can be complicated, because we do not know the underlying data generating process, and we depend on noisy data to make a best guess. In the modeling domain, however, there is no excuse for misspecification, as the model is postulated by the modeler. I therefore would think that in this domain, it does not really require “statistical considerations” to minimize the probability of misspecification; we can get the probability of misspecification all the way down to 0 by just choosing not to do it.

      Strengths:

      The point is an important one. While it is not entirely novel-the idea of adding extra terms to Hamilton's rule has arisen sporadically (Queller, 1985, 2011; Fletcher et al., 2006; van Veelen et al., 2017)--it is very useful to have a systematic treatment of this point. I think the manuscript can make an important contribution by helping to clarify a number of debates in the literature. I particularly appreciate the heterozygote advantage example in the SI.

      Me too, and I really hope the readers make it this far! I have thought of putting it in the main text, but did not know where that would fit.

      Weaknesses:

      Although the mathematical analysis is rigorously done and I largely agree with the conclusions, I feel there are some issues regarding terminology, some regarding the state of the field, and the practice of statistics that need to be clarified if the manuscript is truly to resolve the outstanding issues of the field. Otherwise, I worry that it will in some ways add to the confusion.

      (1) The "generalized" Price equation: I agree that the equations labeled (PE.C) and (GPE.C) are different in a subtle yet meaningful way. But I do not see any way in which (GPE.C) is more general than (PE.C). That is, I cannot envision any circumstance in which (GPE.C) applies but (PE.C) does not. A term other than "generalized" should be used.

      This is a great point! Just to make sure that those that read the reports online understand this point, let me add some detail. The equation labeled (PE.C) – which is short for Price equation in covariance form – is

      The derivation in Appendix A then assumes that we have a statistical model that includes a constant and a linear term for the p-score. It then defines the model-estimated fitness of individual 𝑖 as , where 𝑤<sub> 𝑖</sub> is the realized number of offspring of individual 𝑖, and 𝜀<sub> 𝑖</sub> is the error term – and it is the sum over all individuals of this error term-squared that is minimized. The vector of model-estimated fitnesses will typically be different for different choices of the statistical model. Appendix A then goes on to show that, whatever the statistical model is that is used, for all of them , as long as the statistical model includes a constant and a linear term for the p-score. That means that we can rewrite (PE.C) as

      The point that the reviewer is making, is that this is not really a generalization. For a given dataset (or, more generally, for a given population transition, whether empirical or in a model), is just a number, and it happens to be the case that 𝐶𝑜𝑣(𝑤:, 𝑝) returns the same number, whatever statistical model we use for determining what the model-estimated fitnesses 𝑤<sub> 𝑖</sub> are (as long as the statistical model includes a constant and a linear term for the p-score). In other words, (PE.C) is not really nested in (GPE.C), so (GPE.C) is not a proper generalization of (PE.C).

      This is a totally correct point, and I had actually struggled a bit with the question what terminology to use here. Equation (GPE.C) is definitely general, in the sense that we can change the statistical model, and thereby change the vector of model-estimated fitnesses , but as long as we keep the constant and the linear term in the statistical model, the equation still applies. But it is not a generalization of (PE.C).

      I do however have a hard time coming up with a better label. The General Price equation may be a bit better, but it still suggests generalization. The Statistical Model-based Price equation does not suggest or imply generalization, but it does not convey how general it is, and it suggests that it could be an alternative to the normal Price equation that one may or may not choose to use – while this version really is the one we should use. It may moreover create the impression that this is only for doing statistics, and one might use the traditional Price equation for anything that is not statistics. I cannot really think of other good alternatives, but I am of course open to suggestions.

      So, by lack of a better label, I called this the Generalized Price equation in covariance form. Though clearly imperfect, there are still a few good things about this label. The first is that, as mentioned above, this equation is general, in the sense that it holds, regardless of the statistical model. The second reason is that this is Step 1 in a sequence of three steps., the other two of which do produce proper generalizations. Step 2 goes from this equation in covariance form to the Generalized Price Equation in regression form, which is a proper generalization of the traditional Price equation in regression form. Step 3 goes from the Generalized Price Equation in regression form to the general version of Hamilton’s rule, which is also a proper generalization of the classical Hamilton’s rule. Since I would suggest that Step 1 on its own is kind of useless, and therefore Step 1 and Step 2 will typically come as a package, I would be tempted to think that this justifies the abuse of terminology for the Price Equation in covariance form. I did however add the observation made by the reviewer at the point where the Generalized Price equation (in both forms) is derived, so I hope this at least partly addresses this concern.

      (2) Regression vs covariance forms of the Price equation: I think the author uses "generalized" in reference to what Price called the "regression form" of his equation. But to almost everyone in the field, the "Price Equation" refers to the covariance form. For this reason, it is very confusing when the manuscript refers to the regression form as simply "the Price Equation".

      As an example, in the box on p. 15, the manuscript states "The Price equation can be generalized, in the sense that one can write a variety of Price-like equations for a variety of possible true models, that may have generated the data." But it is not the Price equation (covariance form) that is being generalized here. It is only the regression that Price used that is being generalized.

      To be consistent with the field, I suggest the term "Price Equation" be used only to refer to the covariance form unless it is otherwise specified as in "regression form of the Price equation".

      I am not sure about the level of confusion induced here, but I totally see that it can be helpful to avoid all ambiguity. I therefore went over everything, and whenever I wrote “Price equation”, I tried to make sure it comes either with “in covariance form” or with “in regression form”. At some places, it is a bit over the top to keep repeating “in regression form”, when it is abundantly clear which form is being discussed. Also, I added no qualifiers if a statement is true for both forms of the Price equation, or if the claim refers to the whole package of going through Step 1 and Step 2 mentioned above.

      (3) Sample covariance: The author refers to the covariance in the Price equation as “sample covariance”. This is not correct, since sample covariance has a denominator of N-1 rather than N (Bessel’s correction). The correct term, when summing over an entire population, is “population covariance”. Price (1972) was clear about this: “In this paper we will be concerned with population functions and make no use of sample functions”. This point is elaborated on by Frank (2012), in the subsection “Interpretation of Covariance”.

      I totally agree. On page 418 of van Veelen (2005), I wrote:

      “Another possibility is that we think of 𝑧<sub>i</sub> and 𝑞<sub>i</sub>, 𝑖 = 1,…,𝑁 as realizations of a jointly distributed random variable. […] In that case the expression between square brackets is a good approximation for what statisticians […] call a sample covariance. A sample covariance is defined as but in large samples it is OK to replace 𝑁 − 1 by 𝑁, and then this formula reduces to Price’s 𝐶𝑜𝑣(𝑧, 𝑞).”

      In van Veelen et al. (2012), I slid a little, because in Box 1 on page 66, I wrote that is the sample covariance, and only in footnote 1 on the same page did I include Bessel’s correction, when I wrote:

      “To be perfectly precise, the sample covariance is defined as

      In this manuscript, I slid a little further, and left Bessel’s correction out altogether. I am happy that the reviewer pointed this out, so I can make this maximally precise again.

      The reviewer also quotes Price (1972), page 485:

      “In this paper we will be concerned with population functions and make no use of sample functions”.

      Below, the reviewer will return to the issue of distinguishing between the sample covariance with Bessel’s correction, and the sample covariance without Bessel’s correction, where the latter is regularly also referred to as the population covariance. A natural interpretation of the quote from Price (1972), if we read a bit around this quote in the paper, is that the difference between his “population functions” and his “sample functions” is indeed Bessel’s correction.

      The reviewer also states that Frank (2012) elaborates on this in the subsection “Interpretation of Covariance”. What is interesting, though, is that, when Frank (2012) writes, on page 1017 “It is important to distinguish between population measures and sample measures”, the difference between those is not that one does, and the other does not include Bessel’s correction. The difference between “population measures” and “sample measures” in Frank (2012), page 1017

      “It is important to distinguish between population measures and sample measures”,

      the difference between those is not that one does, and the other does not include Bessel’s correction. The difference between “population measures” and “sample measures” in Frank (2012), page 1017, is that

      “In many statistical applications, one only has data on a subset of the full population, that subset forming a sample.”

      The distinction between a population covariance and a sample covariance in Frank (2012) therefore is that they are “covariances” of different things (where the word covariances is in quotation marks, because, again, they are not really covariances). Besides just making sure that Price (1972) and Frank (2012) are not using these terms in the same way, this also perfectly illustrates the mix-up between statistical populations (or data generating processes) and biological populations that I discuss on pages 8 and 9 of Appendix A. I will return to this below, when I explain why I want to avoid using the word “population covariance” for the sample covariance without Bessel’s correction.

      Of course, the difference is negligible when the population is large. However, the author applies the covariance formula to populations as small as 𝑁 = 2, for which the correction factor is significant.

      Absolutely right.

      The author objects to using the term "population covariance" (SI, pp. 8-9) on the grounds that it might be misleading if the covariance, regression coefficients, etc. are used for inference because in this case, what is being inferred is not a population statistic but an underlying relationship. However, I am not convinced that statistical inference is or should be the primary use of the Price equation (see next point). At any rate, avoiding potential confusion is not a sufficient reason to use incorrect terminology.

      There are a few related, but separate issues. One is what to call the 𝐶𝑜𝑣(𝑤, 𝑝)-term. Another, somewhat broader, is to avoid mixing up statistical populations and biological populations. A third is what the primary use of the Price equation is. The third issue I will respond to below, where it reappears. Here I will focus on the first two, which can be discussed without addressing the third.

      In a data context, I now call the 𝐶𝑜𝑣(𝑤, 𝑝)-term “’" times the sample covariance, or, in other words, the sample covariance without Bessel’s correction”. This should be unambiguous. In a modeling context I refer to 𝐶𝑜𝑣(𝑤, 𝑝)-term as “the 𝐶𝑜𝑣(𝑤, 𝑝)-term” and describe it as a summary statistic or a notational convention. There are two reasons for this choice.

      The first is that neither of these use the word “population”. I like this, because there is a persistent scope for confusion between statistical populations and biological populations (as exemplified by Frank, 2012). This leads to an incorrect, but widespread intuition that if we “know the entire (biological) population” in a data context, there is nothing that can be estimated. This is what pages 8 and 9 of Appendix A are all about.

      The second reason is that by using two labels, I also differentiate between the data context and the modeling context. This is important for reasons I will return to later.

      Relatedly, I suggest avoiding using 𝐸 for the second term in the Price equation, since (as the ms points out), it is not the expectation of any random variable. It is a population mean. There is no reason not to use something like Avg or bar notation to indicate population mean. Price (1972) uses "ave" for average.

      I totally agree that the second term in the Price equation is not an expectation. I made this point in van Veelen (2005), and I repeated this in the manuscript. This remark by the reviewer prompted me to spell this out a bit more emphatically in Appendix A. That still leaves me with the choice what notation to use.

      I therefore looked up all contributions to the Theme issue “Fifty years of the Price equation” in the Philosophical Transactions of the Royal Society B, and found that almost all contributions use 𝐸, sometimes saying that this refers to an expectation or an average. Of course, this is wrong. However (and this is another argument), it is equally wrong as using 𝐶𝑜𝑣 or 𝑉𝑎𝑟. The terms abbreviated as 𝐶𝑜𝑣 and 𝑉𝑎𝑟 are equally much not a covariance and a variance as the term abbreviated as 𝐸 is not an expectation. So I would think that there are a few reasons for sticking with 𝐸 here; 1) consistency with the literature; 2) consistency with the treatment of other terms; and 3) the fact that this term is not really of any importance in this manuscript. I do however totally understand the reviewer’s reasons, which I suppose include that for using 𝐸, there are relatively unproblematic alternatives (ave or upper bar) that are not available for the other terms. I hope therefore that being a bit more emphatic in the manuscript about 𝐸 not being an expectation at least partly addresses this concern.

      I should add, however, that the distinction between population statistics vs sample statistics goes away for regression coefficients (e.g. b, c, and r in Hamilton's rule) since in this case, Bessel's correction cancels out.

      Totally correct.

      (4) Descriptive vs. inferential statistics: When discussing the statistical quantities in the Price Equation, the author appears to treat them all as inferential statistics. That is, he takes the position that the population data are all generated by some probabilistic model and that the goal of computing the statistical quantities in the Price Equation is to correctly infer this model.

      Before I respond to this, I would like to point out that this literature has started going off the rails right from the very beginning. One of the initial construction errors was to use the ungeneralized Price equation in regression form. The other one is that the paper in which Price (1970) presented his equation is inconsistent, and suggests that the equation can be used for constructing hypotheses and for testing them at the same time (see van Veelen (2005), page 416). That, of course, is not possible; the first happens in the theory/modeling domain, and the second in the empirical testing/statistics domain, and they are separate exercises.

      These construction errors have warped the literature based on it, and have resulted in a lot of mental gymnastics and esoteric statements, which are needed if we are not willing to consider the possibility that there could be anything amiss with the original paper by Price (1970).

      In this paper, I undo both of these construction errors. Undoing the second one means exploring both domains separately. In Sections 2-4 of Appendix A I explore the possibility that the Price equation is applied to data. In Section 5 of Appendix A I explore the possibility that it is used in a modelling context. The primary effort here is just to do it right, and I have not read anything to suggest that I did not succeed in doing this. Secondarily, of course, I also want to contrast this to what happens in the existing literature. That is what this point by the reviewer is about. It is therefore important to be aware that seeing the contrast accurately is complicated by the apologetic warp in the existing literature.

      As a first effort to unwarp, I would like to point to the fact that I am not taking any position on what the Price equation should be used for. All I do here is explore (and find) possibilities, both in the statistical inference domain and in the modeling domain. I also find that there is scope for misspecification in both, and that, in both domains, we should want to avoid misspecification. The thing that I criticize in the existing literature therefore is not the choice of domain. The thing that I criticize is the insistence on, and celebrating of what is most accurately described as misspecification. This typically happens in the modeling domain.

      It is worth pointing out that those who argue in favor of the Price Equation do not see it this way: "it is a mistake to assume that it must be the evolutionary theorist, writing out covariances, who is performing the equivalent of a statistical analysis." (Gardner, West, and Wild, 2011); "Neither data nor inferences are considered here" (Rousset, 2015). From what I can tell, to the supporters of the Price equation and the regression form of Hamilton's rule, the statistical quantities involved are either population-level *descriptive* statistics (in an empirical context), or else are statistics of random variables (in a stochastic modeling context).

      Again, this description of the friction between my paper and the existing literature is predicated on the suggestion that I have only one domain in mind where the Price equation can be applied. That is not the case; I consider both.

      In the previous paragraph, the reviewer states that I “treat statistical quantities as inferential statistics”, and in this paragraph the reviewer contrasts that with the supporters of the (ungeneralized) Price equation that supposedly treat the same quantities as “descriptive statistics”. This is also beside the point, but it will take some effort to sort out the spaghetti of entangled arguments (where the spaghetti is the result of the history in this field, as indicated earlier).

      First of all, it is not unimportant to point out that the way most people use the terms “inferential statistics” and “descriptive statistics” is that the first refers to an activity, and the second to a function of a bunch of numbers, typically data. Inferential statistics is a combination of parameter estimation and model specification (those are activities). Descriptive statistics are for instance the average values of variables of interest (which makes them a function of a set of numbers). When doing inferential statistics (or statistical inference), looking at the descriptive statistics of the dataset is just a routine before the real work begins. It is important to remember that.

      Now I suppose that this reviewer uses these words a little differently. When he or she writes that I “treat statistical quantities as inferential statistics”, I assume that the reviewer means that I want to use a term like for doing statistical inference, or that, when I want to interpret such a term, I include considerations typical of statistical inference. Within the data domain, that is totally correct. In the paper I argue that there are very good reasons for this. We would like to know what the data can tell us about the actual fitness function, and if we do our statistical inference right, and choose our Price-like equation accordingly, then that means that we would be able to give a meaningful interpretation to a term like . It also means that we then have an equation that describes the genetic population dynamics accurately.

      When the reviewer states that other papers treat them as “population level descriptive statistics” in an empirical context, I have a hard time coming up with papers for which that is the case. Most papers apply the Price equation in the modeling domain (That is to say: this is true in evolution. In ecology the Price equation is often applied to data; see Pillai and Gouhier (2019) and Bourrat et al. (2023)). But even if there are researchers that apply the Price equation to data, then considering these statistical quantities as “descriptive statistics” would not make sense. Looking at the descriptive statistics alone is not an empirical exercise; it is just a routine that happens before the actual statistical inference starts. In a data context, saying that considerations that are standard in statistical inference do not apply, because one is just not doing statistical inference, is the equivalent of an admission of guilt. If you do not consider statistical significance, and never mention that sample size could matter, because you are using these terms as “descriptive statistics, not inferential statistics”, then you’re basically admitting to not doing a serious empirical study.

      Besides treating statistical quantities as descriptive statistics in a data context, the reviewer also states that, in a stochastic modeling context, other researchers treat the same statistical quantities as “statistics of random variables”. This is first of all very generous to the existing literature. I imagine that the reviewer is imagining a modeling exercise where for instance the covariance between two variables is postulated. A theory exercise would then take that as a starting point for the derivation of some theoretical result. This, however, is not what happens in most of the literature.

      There are two things that I would like to point out. First of all, postulating covariances and deriving results from assumptions regarding those covariances is not an activity that requires using the Price equation. There are many stochastic models that function perfectly fine without the Price equation. This is maybe a detail, but it is important to realize that what the reviewer probably thinks of as a legitimate theoretical exercise may be something that can very well be done without the Price equation.

      Secondly, I would like to repeat something that I have pointed out before, which is that the Price equation can be written for any transition, whether this transition is likely or unlikely, given a model, and even for transitions that are impossible. For all of those transitions, one can write the (ungeneralized) Price equation, and for all of those, the Price equation will be an identity, and it will contain the things that the reviewer refers to as “statistical quantities”. It is important to realize that these “statistical quantities”, therefore, are properties of a transition, and that every transition comes with its own ”statistical quantity”. That implies that they are not properties of random variables; they reflect something regarding one transition. What one could imagine, though, is the following. To fix ideas, let’s take the Price equation in regression form, and focus on . A meaningful modeling exercise starts with assumptions about the likelihood of all different transitions, and therefore the likelihood of different values of 𝛽 materializing – or it starts with assumptions that imply those probabilities. In a theoretical exercise, one could then derive statements about the expectation and variance of those “statistical quantities”. For instance, one can calculate the expected value 𝐸[𝛽] =𝐸, and the variance 𝑉𝑎𝑟[𝛽] = 𝑉𝑎𝑟 , where this expectation is a proper expectation (taken over the probabilities with which these transitions materialize) and this variance is a proper variance, for the same reason.

      This is what I do on page 416 of van Veelen (2005) and in Section 5 of Appendix A. I think something like this is what the reviewer may have in mind, but it is worth pointing out that this still does not mean that the from the Price equation for any given transition is now a property of a random variable. Much of the literature, however, is not at the level of sophistication that I imagine the reviewer has in mind – although there are papers that are; see the discussion below of Rousset and Billiard (2000) and Van Cleve (2015).

      In the appendix to this reply, I will address the quotes from Gardner, West, and Wild (2011) and Rousset (2015). This takes up some space, so that is why it is at the end of this reply.

      In short, the manuscript seems to argue that Price equation users are performing statistical inference incorrectly, whereas the users insist that they are not doing statistical inference at all.

      That is not what the manuscript argues, but I am happy to clarify. The manuscript explores both the use of the Price equation when applied to data (and therefore for statistical inference) and when applied to transitions in a model. The criticism on the existing literature is not that it performs statistical inference incorrectly. The criticism is that the literature insists on misspecification, which typically happens in a modelling context.

      The problem (and here I think the author would agree with me) arises when users of the Price equation go on to make predictive or causal claims that would require the kind of statistical analysis they claim not to be doing. Claims of the form "Hamilton's rule predicts.." or use of terms like "benefit" and "cost" suggest that one has inferred a predictive or causal relationship in the given data, while somehow bypassing the entire theory of statistical inference.

      I do not really know how to interpret this paragraph. The use of the word “data” suggests that this pertains to a data context, but I do not know what would qualify as a “predictive claim” in that domain, or how any study would go from data to a claim of the form “Hamilton’s rule predicts …”. Again, I do not really know papers that apply the Price equation to data. None of the empirical papers reviewed in Bourke (2014) for instance do. I would however agree that it is close to obvious that an approach that does indeed bypass the entire theory of statistical inference cannot identify causal relations in datasets. I think the examples in Section 2 of Appendix A also clearly illustrate that a literature in which the word “sample size” is absent, cannot be doing statistical inference.

      There is also a third way to use the Price equation which is entirely unobjectionable: as a way to express the relationship between individual-level fitness and population-level gene frequency change in a form that is convenient for further algebraic manipulation. I suspect that this is actually the most common use of the Price equation in practice.

      I am not sure if I understand what it means for the Price equation to “express the relationship between individual-level fitness and population-level gene frequency change”. That is a bit reminiscent of how John Maynard Smith saw the Price equation (Okasha, 2005), but he also emphasized that he was unable to follow George Price and his equation. For sure, it cannot be that one side of the Price equation reflects something at the individual level and the other something at the population level, because both sides of the Price equation are equally aggregated over the population. Just to be safe, and to avoid unwarranted associative thinking, I would therefore choose to be minimalistic, and say that the Price equation is an identity for a transition between a parent population and an offspring population.

      Regardless of the words we choose, however, the question how harmless or objectionable the use of the Price equation is in the literature is absolutely relevant. In earlier papers I have tried to cover a spectrum of examples of different ways to use (or misuse) the Price equation. In van Veelen (2005) I cover Grafen (1985a), Taylor (1989), Price (1972), and Sober and Wilson (2007). The main paper that is discussed in van Veelen et al. (2012) is Queller (1992b), but Section 7 of that paper also discusses the way the Price equation is used in Rousset and Billiard (2000), Taylor (1989), Queller (1985), and Page and Nowak (2002). These discussions also come with a description of how much it takes to repair them, and this varies all the way from nothing, or a bit of minor rewording, to being beyond repair.

      What is good to observe, is that the papers in which the use of the Price equation is the least problematic, are also the papers in which, if the reference to the Price equation would be taken out, nothing really changes. These are papers that start with a model, or a collection of models, and that, at some point in the derivation of their results, point to a step that can, but does not have to be described as using the Price equation. An example of this is Rousset and Billiard (2000); see the detailed description in Section 7 of van Veelen et al. (2012).

      I am happy to point to a few more papers on the no harm, no foul end of the spectrum here.

      Allen and Tarnita (2012) discuss properties of the dynamics in a well-defined set of models.

      Towards the end of the paper, a version of the Price equation more or less naturally appears. This is more of an interesting aside, though, and does not really play a role in derivation of the core results of the paper. Van Cleve (2015) is similar to Rousset and Billiard (2000), in that the “application of the Price equation” there is a minor ingredient of the derivation of the results. (A detail that this reviewer may find worth mentioning, given earlier comments, is that Van Cleve (2015) writes the left-hand side of the Price equation as 𝐸(𝑤Δ𝑝|𝐩), instead of . First two very unimportant things. Van Cleve (2015) uses 𝑤 for mean fitness, for which is a more common symbol. Another detail of lesser importance is that it includes the vector of parent p-scores in the notation, which in their notation is 𝐩. More importantly, however, is that Van Cleve (2015) writes 𝐸(Δ𝑝) for , which extends the (mis)use of the symbol 𝐸 for what really is just an average. This is consistent within the Price equation, in the sense that it now denotes the average with 𝐸, both on the right-hand side and on the left-hand side of the Price equation. It can however be a little bit confusing, because when Rousset and Billiard (2000) write , then this is a proper expectation. In their case, this summarizes all possible transitions out of a given state, and weighs them by their probabilities of happening, given a state summarized by 𝑝.). I am also happy to extend the spectrum a bit here. Some papers on inclusive fitness do not use the Price equation at all, even though one could imagine places where it could be inserted. A nice example of such a paper is Taylor et al. (2007).

      In this paper, I hope I can be excused from taking a complete inventory of this literature, and I hope that I do not have to count how many papers fall into the different categories. This would help assess the veracity of the suspicion the reviewer has, which is that the most common use of the Price equation is entirely unobjectionable, but I just do not have the time. I would however not want to underestimate the aggregate damage done in this field. The spectrum spanned in my earlier papers does include a fair amount of nonsense results. This typically happens in papers that do not study a specific model or set of models, but that take the Price equation as their point of departure for their theorizing. Also there seems to be a positive correlation between how exalted and venerating the language is that is used when describing the wonders and depths of the Price equation, and how little sense the claims make that are “derived” with it.

      We also should not set the bar too low. This is a literature that, at the starting point, has a few construction errors in it, as described in the paper. That is reason for concern. Moreover, one of the main end products of this literature is what we send our empiricists to the field with. As Section 8 of van Veelen et al. (2017) indicates, what we have supplied to our empiricists to work with is nothing short of terrible. I would therefore want to maintain that the damage done is enormous, and if there are also a few papers around that may use the ungeneralized Price equation in an innocuous way, then that is not enough redemption for my taste. We are still facing a literature in which, at every instance where the Price equation is used, we still need to check in which category it falls.

      For a paper that aims to clarify these thorny concepts in the literature, I think it is worth pointing out these different interpretations of statistical quantities in the Price equation (descriptive statistics vs inferential statistics vs algebraic manipulation). One can then critique the conclusions that are inappropriately drawn from the Price equation, which would require rigorous statistical inference to draw. Without these clarifications, supporters of the Price equation will again argue that this manuscript has misunderstood the purpose of the equation and that they never claimed to do inference in the first place.

      I would like to return to the point that I made at the beginning of my response to point (4), which is that the “thorniness” of these concepts is the result of the warp in the literature, resulting from the construction errors in Price (1970). If people want to understand how to apply the Price equation right, I think that reading Appendix A and B would work just fine. Again, I have not read anything that suggests that there is anything incorrect in there, so if the literature contains “thorny” concepts, it might just be that this is the result of the mental gymnastics necessitated by the unwillingness to accept that there might be something not completely right with Price (1970). Moreover, given my experiences in the field, I am not sure that there is anything that I could say that would convince the supporters of the ungeneralized Price equation.

      (5) "True" models: Even if one accepts that the statistical quantities in the Price equation are inferential in nature, the author appears to go a step further by asserting that, even in empirical populations, there is a specific "true" model which it is our goal to infer. This assumption manifests at many points in the SI when the author refers to the "true model" or "true, underlying population structure" in the context of an empirical population.

      Again, in Appendix A I explore both a data context and a modeling context. In the modeling context none of this applies, because in such a context, there is only the model that we postulate. In the part in which I explore what the Price equation can do in a data context, I do indeed use words like “true model” or "true underlying population structure".  

      I do not think it is necessary or appropriate, in empirical contexts, to posit the existence of a Platonic "true" model that is generating the data. Real populations are not governed by mathematical models. Moreover, the goal of statistical inference is not to determine the "true model" for given data but to say whether a given statistical model is justified based on this data. Fitting a linear model, for example, does not rule out the possibility there may be higher-order interactions - it just means we do not have a statistical basis to infer these higher-order interactions from the data (say, because their p-scores are insignificant), and so we leave them out.

      This remark suggests that the statistical approach in Sections 2-4 of Appendix A is more naïve than it should be, and that I would overlook the possibility of, for instance, interaction effects that are really nonzero, but that are statistically not significant. Now first of all, at a superficial level, I would like to say that this strikes me as somewhat inconsistent. In the remarks further back, the reviewer seems to excuse those that use the Price equation on data without any statistical considerations whatsoever. The reason why the reviewer is giving them a pass, is that they are “just not doing statistical inference”. Instead, they are doing this whole other thing with, you know, descriptive statistics. As I indicated above, that is just a fancy way of saying that they are not doing serious statistics – or serious empirics, for that matter.

      In this comment, on the other hand, the reviewer also suggests that the statistics that I use to replace the total absence of any statistical considerations with, is not quite up to snuff. Below, I will indicate why that is not the case at all, but I think it is also worth registering a touch of irony there.

      In order to address this issue, it is worth first observing that the whole of classical statistics is based on probability theory in the following sense. We are always asking ourselves the question: if the data generating process works like this, what would the likelihood be of certain outcomes (datasets); and if the data generating process works some other way (sometimes: the complement of whatever “this” is), what would the likelihood then be of the same outcomes. By comparing those, we draw inferences about the underlying data generating process (which is a word suggestive of a “Platonic” world view that the reviewer seems to reject). Therefore, if one would impose a ban on using Platonic words like “true data generating process”; “actual fitness function”; or “the population structure that is out there”, it would be impossible to teach any course in statistics, basic or advanced. Also it would be impossible to practice, and talk about, applied statistics.

      Now the reviewer claims that “Real populations are not governed by mathematical models”. I do not really know if I agree or disagree with that statement, but the example that the reviewer gives does not fit that claim. The reviewer suggests that if we find a higher order term not to be statistically significant (and therefore we reject the hypothesis that it is nonzero), then that would not necessarily mean that it is not there. That is totally true, and statisticians tend to be fully aware of that. But that does not imply that there is no true data-generating process; the whole premise of this example is that there is, but that the sample size is not large enough to determine it in a detailed enough way so as to include this interaction effect, that apparently is small relative to the sample size.

      The third thing to reflect on here, is that the reviewer seems to suggest that the Generalized Price equation in regression form, as presented in my paper, comes with a specific statistical approach, that he or she classifies as philosophically naïve or unsophisticated. That, however, is not the case, and I am very grateful that this remark by this reviewer allows me to make a point that I think shines a light on how the Generalized Price equation puts the train that started going off the rails in 1970 back on track, and reconnects it with the statistics it borrows its terminology from. To see that, it is good to be aware that statistics never gives certainty. The whole discipline is built around the awareness that it is possible to draw the wrong inference, and the aim is to determine, minimize, and balance, the likelihoods of making different wrong inferences. So, statistics produces statements about the confidence with which one can say that something works one way or the other. In some instances, the data are not enough to say anything with any confidence. In other cases, the data are rich enough so that it is really unlikely that we incorrectly infer that for instance a certain gene matters for fitness.

      The nice thing about the setup with the Generalized Price equation, is that those statistical considerations translate one-to-one to considerations regarding which Price-like equation to choose. If the data do not allow us to pick any model with confidence, then we should be equally agnostic about which Price-like equation describes the population genetic dynamics accurately. If the statistics gives us high confidence that a certain model matches the data, then we should pick the matching Price-like equation with the same confidence. This also carries over to higher level statistical considerations.

      If we think about terms that, if we would gather a gargantuan amount of data, might be statistically significant, but very small, then economists call those statistically significant, but economically insignificant. When rejecting the statistical significance on the basis of a not gargantuan dataset, statisticians are aware that terms that really have a zero effect, as well as terms, the effect of which is really small, are rejected with the same statistical test – and that we should be fine with that. All such considerations carry over to what we think of regarding the choice of a Price-like equation to describe the population genetic dynamics. Even if people disagree about whether or not to include a term that is statistically significant, but relatively small, such a disagreement can still happen within this setup, and just translates to a disagreement on which Price-like equation to choose.

      Similarly, people could also disagree about whether it is justified to use polynomials to characterize a fitness function. If we decide that we can, because of Taylor expansions, then the core result of the paper implies that the population genetic dynamics can be summarized by a generalized Hamilton’s rule (as long as the fitness function includes a constant and a linear term regarding the p-score). On the other hand, if we do not believe this is justified, and prefer to use an altogether different family of fitness functions, then we can no longer do this. All of this leaves space for all kinds of statistical considerations and disagreements, that just carry over to the choice for one or the other Price-like equation as an accurate description of the population genetic dynamics. Or, if one does not believe polynomials should be used, then this leads to not picking any Price-like equation at all.

      So, this is a long way of saying that the Generalized Price equation creates space for all statistical considerations to regain their place, and does not hinge on one approach to statistics or another.

      What we can say is that if we apply the statistical model to data generated by a probabilistic model, and if these models match, then as the number of observations grows to infinity, the estimators in the statistical model converge to the parameters of the data-generating one.

      But this is a mathematical statement, not a statement about real-world populations.

      Again, I do not know if I agree or disagree with the last sentence. However, that does not really matter, because either option only has implications for how we are to think of the relation between a Price-like equation describing a population genetic dynamics and real-world populations. It is not relevant for the question which Price-like equation to pick, or whether to pick one at all.

      A resolution I suggest to points 3, 4, and 5 above is:

      *A priori, the statistical quantities in the Price Equation are descriptive statistics, pertaining only to the specific population data given.

      *If one wishes to impute any predictive power, generalizability, or causal meaning to these statistics, all the standard considerations of inferential statistics apply. In particular, one must choose a statistical model that is justified based on the given data. In this case, one is not guaranteed to obtain the standard (linear) Hamilton's rule and may obtain any of an infinite family of rules.

      *If one uses a model that is not justified based on the given data, the results will still be correct for the given population data but will lack any meaning or generalizability beyond that.

      *In particular, if one considers data generated by a probabilistic model, and applies a statistical model that does not match the data-generating one, the results will be misleading, and will not generalize beyond the randomly generated realization one uses.

      Of course, the author may propose a different resolution to points 3-5, but they should be resolved somehow. Otherwise, the terminology in the manuscript will be incorrect and the ms will not resolve confusion in the field.

      I have outlined my solutions extensively above. I really appreciate that Reviewers #1 and #2 have spent time and attention on the manuscript and on the long appendices.  

      Appendix to the response to reviewer #2: Some remarks on Gardner, West & Wild (2011), Frank (2012), and Rousset (2015)

      An accurate response to the quote from Gardner, West, and Wild (2011) in the review report takes up space. I therefore wanted to put that in an appendix to the response to reviewer #2. I also include a few paragraphs regarding Frank (2012) and Rousset (2015), both of which are also mentioned by reviewer #2. All of this might also be of interest to people that are curious about how what I find in my paper relates to the existing literature.

      Gardner, West & Wild (2011) The quote I am responding to is “it is a mistake to assume that it must be the evolutionary theorist, writing out covariances, who is performing the equivalent of a statistical analysis” I want to put that into context, so I will go over the whole paragraph that surrounds the quote. The paragraph is called Statistics and Evolutionary Theory and can be found on page 1038 of the paper. I think that it is worth pointing out that it is not easy to respond to their somewhat impressionistic collages of words and formulas. I will therefore cut the paragraph up in a few smaller bits and try to make sense of it bit by bit. The paragraph begins with:

      “Our account of the general theory of kin selection has been framed in statistical terms.” Based on what they write two sentences down, the best match between those words and what they do in the paper would be: “our account uses words like “covariance”, “variance” and “expectation” for things that are not what “covariance”, “variance” and “expectation” mean in probability theory and statistics.” I would be totally open to an argument why that is nonetheless OK to do, but the way Gardner, West, and Wild (2011) phrase it obscures the fact that this needs any justification or reflection at all. “Framing something in statistical terms” is unspecific enough to sound completely harmless.

      “The use of statistical methods in the mathematical development of Darwinian theory has itself been subjected to recent criticism (van Veelen, 2005; Nowak et al., 2010b), so we address this criticism here.

      Also here, specifics would be helpful. The “use of statistical methods” sounds like it is more than just using terms from statistics, so this might refer to the minimizing of the sum of squared differences, which is also mentioned a sentence down in Gardner, West, and Wild (2011). If it does, then it is worth observing that in statistics, the minimizing of the sum of squared differences (or residuals, or errors) comes with theorems that point very clearly to what is being achieved by doing this. The Gauss–Markov theorem states that the ordinary least squares (OLS) estimator has the lowest variance within the class of linear unbiased estimators. This implies that minimizing the sum of squared errors helps answering a well-defined question in statistics; under certain conditions, an OLS estimator is our best shot at uncovering an unknown relation between variables. To also minimize a sum of squared differences, but now in the modeling domain, qualifies as “use of statistical methods” only in a very shallow way. It means that a similar minimization is performed. Without an equivalent of the Gauss-Markov theorem that would shine a light on what it is that is being achieved by doing so, that does not carry the same weight as it does in the statistics domain – in that it does not carry any weight at all.

      “The concern is that statistical terms – such as covariances and least-squares regressions – should properly be reserved for conventional statistical analyses, where hypotheses are tested against explicit data, and that they are out of place in the foundations of evolutionary theory (van Veelen, 2005; Nowak et al., 2010b).”

      Again, a few things are a bit vague. What are “explicit data”? Are there data that are not explicit? Why the generic “foundations of evolutionary theory”, instead of a more specific description of what these statistical terms are used for? But either way, this is a misrepresentation of what I wrote in van Veelen (2005). I did not suggest to “reserve statistical terms for conventional statistical analysis” just because. As I do here in the current paper, what I did there was explore the possibilities for the Price equation to help with what I then called Type I and Type II questions. Type I questions find themselves in the modeling domain and Type II questions find themselves in the statistical domain. I was not arguing for a ban on applying statistical concepts outside of the domain of statistical inference. All that I said is that in its current practice, it does not really help answering questions of either type.  

      “However, this concern is misplaced. First, natural selection is a statistical process, and it is therefore natural that this should be defined in terms of aggregate statistics, even if only strictly by analogy (Frank, 1997a, 1998).”

      This is a vague non-argument. Almost nothing is well-defined here. What does it mean for natural selection to be a statistical process? Is that just an unusual term for a random process? If so, then I suppose I agree, but that has nothing to do with what I state or claim. And what does it mean to be defined in terms of aggregate statistics? What is the alternative? I have no idea how any of this relates to anything that I claim or state in my papers.

      “Second, Fisher (1930, p198) coined the term ‘covariance’ in the context of his exposition of the genetical theory of natural selection, so the evolutionary usage of this term has precedent over the way the term is used in other fields.”

      This is what I would call a “historic fallacy”. The fact that Fisher coined the term “covariance” in a book on genetics and natural selection does not mean that any “evolutionary usage” of the term “covariance”, however nonsensical, now has precedent over the way the term is used in other fields. Irrespective of the path that the history of science, genetics, or statistics took, right now we are in a place where about every student at every university anywhere in the world that takes a course in probability theory and/or statistics, learns that covariance is a property of a random variable (see also Wikipedia). And they do for a very good reason; it is essential in recognizing the relation between probability theory on the one hand and statistics on the other. Being curious how this “evolutionary usage” of the term covariance works, if covariance turns out not to be a property of a random variable, is therefore perfectly justified, and “Fisher coined the term” is not a safe word that exempts it from scrutiny. 

      Third, it is a mistake to assume that it must be the evolutionary theorist, writing out covariances, who is performing the equivalent of a statistical analysis.

      Again, that is just not what anyone is saying. Nobody is suggesting that an evolutionary theorist should perform the equivalent of statistical analysis. All I did was point to how little is being achieved by transferring formulas from statistics to a modeling context.

      A better analogy is to regard Mother Nature in the role of statistician, analysing fitness effects of genes by the method of least-squares, and driving genetic change according to the results of her analyses (cf. Crow, 2008).

      I have no idea what any of this means. Mother Nature is a personification of something that is not a person, and that does not have cognition. Without sentience, “Mother Nature” cannot assume the role of statistician, and cannot analyse fitness effects.

      More generally, analogy is the basis of all understanding, so when isomorphisms arise unexpectedly between different branches of mathematics (in this case, theoretical population genetics and statistical least-squares analysis) this represents an opportunity for advancing scientific progress and not an anomaly that is to be avoided.

      This is a strawman argument, puffed up with platitudes. Nobody is arguing against analogies. But what is the analogy supposed to be here? Just taking least squares from statistical inference and performing it in a modeling context does not make it an analogy. The GaussMarkov theorem, which is the basis for why least squares helps answering questions in statistics, just does not mean anything in a modeling context. OLS in modeling is just willful misspecification, and nothing that it does in statistics translates to anything meaningful in modeling. Again, declaring it an analogy, or an isomorphism, does not make it one.

      Frank (2012) Because the reviewer also mentions Frank (2012), I would like to include a small remark on this paper too. “Natural Selection. IV. The Price equation” by Frank (2012) is partly a response to my earlier criticism of the use of the Price equation. Much like Gardner, West, and Wild (2011), I would describe this paper as what is called a ”flight forwards” in Dutch. While the questions I ask are relatively prosaic (such as: how does the Price equation help derive a prediction from model assumptions?), Frank (2012) pivots to suggesting that there is a profound philosophy-of-science disagreement that I am on the wrong side of. It is close to impossible to respond to Frank (2012), because it is a labyrinth of arguments that sound deep and impressive, but that are just not specific enough to know how they relate to points that I made – or even just what they mean in general. Just to pick a random paragraph:

      “Is there some reorientation for the expression of natural selection that may provide subtle perspective, from which we can understand our subject more deeply and analyse our problems with greater ease and greater insight? My answer is, as I have mentioned, that the Price equation provides that sort of reorientation. To argue the point, I will have to keep at the distinction between the concrete and the abstract, and the relative roles of those two endpoints in mature theoretical understanding.”

      For many of those terms, I have no real idea what they mean, and also reading the rest of the paper does not help understanding what this has to do with the more prosaic questions that are waiting for an answer. What is “reorientation”? What does “concrete” versus “abstract” have to do with the question what is being achieved by doing least squares regressions in modeling? What would be an example of a mature and an immature theoretical understanding?

      Rousset (2015) is also mentioned by the reviewer. This paper is not esoteric. It states, as reviewer #2 points out, that "neither data nor inferences are considered". This paper therefore finds itself in the modeling domain, and not in the data domain. It does however still dodge the question what the benefits are of misspecification in the modeling domain. As a matter of fact, it denies that there is misspecification at all.

      “In the presence of synergies, the residuals have zero mean and are uncorrelated to the predictors. No further assumption is made about the distribution of the residuals. Thus, there is no sense in which the regression is misspecified.”

      This is a remarkable quote, and testament to the lasting impact of the construction errors in Price (1970). Misspecification is literally defined as getting the model wrong. In statistics, avoiding misspecification can be complicated, because of the noise in the data. The real datagenerating process is unknown, and because of the noise, there is always the possibility that data that are generated by one model look like they could also have been generated by another. The challenge is to reduce the odds of getting the model wrong to acceptable proportions, which is what statistical tests are for. But in modeling, we know what the model is; it is postulated by the modeler. Therefore, misspecification can be avoided by just not replacing it with a different model.

      What is being discussed in this part of Rousset (2015) is replacing what in this manuscript is called Model 3 (𝑤<sub>𝑖</sub> = 𝛼 + 𝛽<sub>1,0</sub>𝑝<sub>𝑖</sub> + 𝛽<sub>1,1</sub>𝑝<sub>𝑖</sub> + 𝛽<sub>1,1</sub>𝑝<sub>𝑖</sub>𝑞<sub>𝑖</sub> + 𝜀<sub>𝑖</sub>) with Model 2 (𝑤<sub>𝑖</sub> = 𝛼 + 𝛽<sub>1,0</sub>𝑝<sub>𝑖</sub>+ 𝛽<sub>1,0</sub>𝑝<sub>𝑖</sub>𝑞<sub>𝑖</sub> + 𝜀<sub>𝑖</sub>), and choosing the parameters in Model 2 so that it is as close as it can be to Model

      (3) This is just the definition of misspecification. That is to say: the misspecification part is the choosing of Model 2 as a reference model. The minimizing of the sum of squared residuals one could consider as minimizing the damage.

      While Rousset (2015) finds itself in the modeling domain, it does nonetheless point to the field of statistics here, by stating that “the residuals have zero mean and are uncorrelated to the predictors”. From this, the paper concludes that “there is no sense in which the regression is misspecified”. That is just plain wrong. Minimizing the sum of the squared residuals guarantees that the residuals are uncorrelated with the variables that are included in the reference model, with respect to which the squared sum of residuals is minimized. The criterion that Rousset (2015) uses is that the model is well-specified if there is no correlation between the residuals (here: ) and the variables included in the reference model (here: 𝑝<sub>𝑖</sub> and 𝑞<sub>𝑖</sub>). But according to this criterion, all models would always be well-specified, and no model could ever be misspecified. The correct criterion, however, also requires that the residuals are not correlated with variables not included in the reference model. And here, the residuals are in fact correlated with 𝑝<sub>𝑖</sub>𝑞<sub>𝑖</sub>, which is the variable that is included in Model 3, but not in Model 2. Therefore, according to the correct version of this criterion, this model is in fact misspecified – as it should be, because getting the model wrong is the definition of misspecification.

      In order to make sure that there can be no misunderstanding, I have added subsections at the end of Section 2 and Section 4 of Appendix A, and at the end of Section 2 of Appendix B. These subsections show that the algebra of minimizing the sum of squared errors implies that there is no correlation between the errors, or the residuals, and the variables that are included in the model. This is by no means something new; it is the reason why we do OLS to begin with. For additional details about misspecification, I would refer to Section 1b (viii) in van Veelen (2020).

      Finally, there is a detail worth noticing. In the main text, as well as in Appendix B, I use an analogy (and, unlike what Gardner, West, and Wild, 2011, refer to as an analogy, this actually is one). This is an analogy between two choices. On the one hand, there is the choice between Price-like equation 1 (based on Model 1 as a reference model) and Price-like equation 2 (based on Model 2 as a reference model) both applied to Model 2. On the other hand, there is the choice between Price-like equation 2 (based on Model 2 as a reference model) and Price-like equation 3 (based on Model 3 as a reference model) both applied to Model 3. Model 1 is the non-social model, Model 2 is the social model without interaction term, and Model 3 is the social model with interaction term. That makes the first choice a choice between treating a social model as a social model, or as a non-social model. The second choice is between treating a social model with interaction term as a social model with interaction term, or as a social model without interaction term. The power of this analogy is that every argument against treating the social model as if it is a non-social model is also an argument against treating the social model with interaction term as if it is a social model without interaction term.

      This ties in with the incorrect criterion for when a model is well-specified from Rousset (2015) as follows. His criterion (that there should be no correlation between the residuals and the variables in the model) declares the social model without interaction term well-specified as a reference model, when we are considering a social model with interaction term. According to the same criterion, however, the non-social model would also have to be declared to be wellspecified as a reference model, when the model we are considering is a social model. The reason is that also here, there is no correlation between the residuals and the variables that are included in this model. This is clearly not what anyone is advocating for, and for good reasons. The residuals here would, after all, be correlated with the p-score of the partner, which is a variable that is not included in the non-social model. This is a good indication that we should not use the non-social model for a social trait.

      Reviewer #3 (Public review):

      Before responding to this review, I would like to express that I appreciate the fact that the reviews and the responses are public at eLife. Besides just being useful in general, this also allows readers to get a behind the scenes glimpse into the state of the field, and the level of the reviewing. While the reports by Reviewers #1 and #2 show openness and an interest in getting things right, the report by Reviewer #3 is representative of the many review reports that I have received from the inclusive fitness community in the past. These reports tend to be rhetorically strong, and to those who do not have the time to dig deeper in the details, these reports are probably also convincing. I will therefore go through this review line by line to show how little there is behind the confident off-hand dismissal.

      There is an interesting mathematical connection - an "isomorphism"-between Price's equation and least-squares linear regression.

      This is esoteric and needlessly vague. Why is the word “isomorphism” used? In mathematics, an isomorphism is a structure-preserving mapping. The Price equation is an equation, or an identity, which makes it a bit difficult to imagine what the set of objects is on one end of the mapping. Least-squares linear regression can perhaps be seen as a function of a dataset, which would make it a single object (one function). This complicates things at the other end of the mapping too, if that set is a singleton set. The only isomorphism that I can think of is a trivial isomorphism where one equation is mapped onto one function and vice versa. It seems unlikely that this is what the reviewer means. The word isomorphism moreover is in quotes, so maybe this is supposed to be figurative. But what would it be that is being suggested here by this figure of speech? Just saying that there is, as the reviewer puts it, an “interesting mathematical connection”, does not make it so. It would already be a start to just specify what the mathematical connection is, because I have a hard time seeing what that would be. Is it just that, if you divide the Cov(𝑤, 𝑝)-term by the Var(𝑝)-term, then you get a regression coefficient? If that is what the reviewer has in mind, that would be a rather shallow observation.

      Some people have misinterpreted this connection as meaning that there is a generalitylimiting assumption of linearity within Price's equation, and hence that Hamilton's rule-which is derived from Price's equation-provides only an approximation of the action of natural selection.

      Here, the reviewer pulls a switcheroo. The use of the word “general”, or “generality”, here refers to the fact that the classical Price equation is an identity for all possible transitions between a parent and an offspring population. This is the sense in which the inclusive fitness literature uses the word general, and so do I in the relevant places in the manuscript. When I do, I make sure to add phrases like “in the sense that whatever the true model is, it always gets the direction of selection right”. As a consequence, the classical Hamilton’s rule is also totally general, in the same sense.

      One of the core points of the paper is that this is not unique to the classical Price equation. As a matter of fact, there is a large set of Price-like equations and Hamilton-like rules that are equally much identities, and equally much general (in the sense that they get the direction of selection right for all possible transitions). The being an identity and being completely general (in this sense) therefore cannot be a decisive criterion in favour of the classical Price equation and the classical Hamilton’s rule.

      On the other hand, the way in which my Generalized Price equation and my generalized version of Hamilton’s rule are general, is that they do not restrict the statistical model with respect to which errors are squared, summed and minimized to one linear statistical model. This generalization generates the variety of Price-like equations and Hamilton-like rules mentioned above (all of which are general in the sense of always getting the direction of selection right) and it gives us the flexibility to pick one that separates terms that reflect the fitness function from terms that reflect the population state.

      In response to my generalizing the Price equation and Hamilton’s rule in this second sense, the criticism of the reviewer comes down to saying that the Price equation and Hamilton’s rule do not need generalizing, because they already are general – the switcheroo being that this refers to generality in the first sense. That makes it sound like this could be an honest mistake, confusing one way in which these can be described as general with another. However, I really hammered this point home in the manuscript. Even a cursory reading of the manuscript reveals that I am fully aware that the classical Price equation and the classical Hamilton’s rule are general in the first sense.

      It is also not helpful that, as a description of what I supposedly claim, this is impressionistic, and lacks specificity. The Price equation is an equation, or an identity. What does it mean for there to be an “assumption of linearity” within it? For the classical Price equation in covariance form (which Reviewer #2 argues is what most people think of as “the Price equation”) there is no way in which one can transform this into a meaningful statement. There is just nothing in there to which the adjective “linear” can be applied. Linearity only becomes a thing when we ask ourselves how we can interpret the regression coefficient in the classical Price equation in regression form. That would be the linearity of the statistical model the differences with which are squared, summed and minimized in the regression.

      This is in contrast to the majority view that Hamilton's rule is a fully general and exact result.

      Again, in this manuscript, I write, time and again, that the classical Hamilton’s rule is fully general (in the sense that it is applies to any transition), and exact (if that means that it always gets the direction of selection right). So, this is clearly not where the contrast with the majority view lies. The contrast with the majority view is that the majority insist on misspecification, and I suggest not to do that.

      To briefly give some mathematical details: Price's equation defines the action of natural selection in relation to a trait of interest as the covariance between fitness 𝑤 and the genetic breeding value 𝑔 for the trait, i.e. Cov(𝑤, 𝑔);

      The Price equation is an identity, not a definition. When deciding on a definition, there is some freedom. We can choose to define ⊂ so that 𝐴 ⊂ 𝐵 means that 𝐴 is a strict subset of 𝐵; or we can choose to define ⊂ so that 𝐴 ⊂ 𝐵 means that 𝐴 is a (not necessarily strict) subset of 𝐵. The Price equation does not “define the action of natural selection”, because it is an identity. There is no freedom to “define” any other way.

      The more serious reason why this is conceptually also a little dangerous, is the following. Imagine a locus with two alleles. Both of them are non-coding bits of DNA. Selection therefore does not act on either of them. Now imagine a parent population with an average p-score of 0.5, or, in other words, the frequency of these alleles in the parent population is 50-50. That makes the expected value of the p-score in the offspring population 0.5 too. In finite populations, however, randomness can make the p-score grow a bit larger or a bit smaller than 0.5. If the parent population is small, the variance (the expected squared deviation from 0.5) can actually be sizeable. If the p-score in the offspring population lands above 0.5, then the Price equation has a > 0 and a 𝐶𝑜𝑣(𝑤, 𝑝) > 0. Describing the Price equation as “defining the action of natural selection” now suggests that higher p-scores have been selected for (or, in other words, that “the action of natural selection in relation to a trait of interest” is positive). With equal probability, however, < 0 and therefore also 𝐶𝑜𝑣(𝑤, 𝑝) < 0, and this would then make us draw the opposite conclusion, that natural selection has acted to lower the p-scores in the population. Both of those would be wrong, because in this situation, it would have been randomness that changed the average p-score. 

      this is a fully general result that applies exactly to any arbitrary set of (𝑔, 𝑤) data; without any loss of generality this covariance can be expressed as the product of genetic variance Var(𝑝) and a coefficient 𝑏(𝑔, 𝑤), the coefficient simply being defined as 𝑏(𝑔, 𝑤) = for all Var(𝑝) > 0; it happens that if one fits a straight line to the same (𝑔, 𝑤) data by means of least-squares regression then the slope of that line is equal to 𝑏(𝑔, 𝑤).

      Why this needs to be explained is a bit of a mystery. These “mathematical details” are in almost all Price equation papers, and they are the point of departure of my Appendix A (it is on page 7 of a more than 90 page long set of appendices). Seeing the need to explain this suggests that the reviewer thinks that there is a chance that I or anyone reading this paper would have missed this. I have not, and, more importantly, none of this invalidates the point I make in the paper.   

      All of this has already been discussed, repeatedly, in the literature.

      All of this has already been discussed, repeatedly, in the literature indeed. It is just that it does not engage with anything I write in the manuscript, or that I wrote in my other papers.

      Now turn to the present paper: the first sentence of the Abstract says "The generality of Hamilton's rule is much debated", and then the next sentence says "In this paper, I show that this debate can be resolved by constructing a general version of Hamilton's rule".

      This is correct.

      But immediately it's clear that this isn't really resolving the debate, what this paper is actually doing is asserting the correctness of the minority view (i.e. that Hamilton's rule as it currently stands is not a general result)

      It seems to me that the reason why this is “immediately clear” to this reviewer is that the reviewer has not processed the contents of the paper. I am not sure if I have to repeat this, but I am not saying that “Hamilton’s rule as it currently stands” is not general (in the sense that it always gets the direction of selection right). It is, and I say that it is a bunch of times. But so are other rules.

      and then attempting to build a more general form of Hamilton's rule upon that shaky foundation.

      I am not just “attempting to build a more general form of Hamilton's rule”. I did in fact build a more general form of Hamilton’s rule (where the generality refers to the richer set of reference statistical models).

      Predictably, the paper erroneously interprets the standard formulation of Hamilton's rule as a linear approximation and develops non-linear extensions to improve the goodness of fit for a result that is already exactly correct.

      Nowhere in the paper or the appendices do I describe the standard formulation of Hamilton’s rule (or, for that matter, any formulation of Hamilton’s rule) as an “approximation”. It is just not a word that has anything to do with this. If we are doing statistical inference, and the sum of squared errors that is minimized decreases by adding a variable in the statistical model with regard to which the sum of squared errors is minimized, then that will typically improve the goodness of fit. In statistics this is not described that as an improvement in how well the statistical model “approximates” the data, or whatever it is that the reviewer would suggest is being approximated here.

      This is not a convincing contribution. It will not change minds or improve understanding of the topic.

      There is indeed plenty of scope for this not to change minds or improve understanding of the topic. It will not change the minds or improve the understanding of those that are not really interested in getting this right. Obviously, it will also not convince those that do not read it.

      Nor is it particularly novel. Smith et al (2010, "A generalisation of Hamilton's rule for the evolution of microbial cooperation" Science 328, 1700-1703) similarly interpreted Hamilton's rule as a linear model and provided a corresponding polynomial expansion - usefully fitting the model to microbial data so as to learn something about the costs and benefits of cooperation in an empirical setting. it's odd that this paper isn't cited here.

      Let me begin by pointing to what I agree with. Given that smith et al. (2010) and my manuscript are both in the business of generalizing Hamilton’s rule, it would be helpful to the reader if my paper includes more information about how the two efforts relate. I will discuss the relation below, and I will also include that in Appendix B, and point to it in the main text. Before I do, however, I would like to point to two details in the review report that fit a pattern.

      The first is that the reviewer describes what smith et al. (2010) do as “useful”, and seems to think of fitting polynomial expansions as a legitimate way to “learn something about the costs and benefits of cooperation in an empirical setting”. That sounds quite positive. My paper, in which I supposedly repeat this, however, is characterized as misguided. This fits a pattern; all of the reviews I received from the inclusive fitness community include a “done before”, and regularly the done before is described approvingly, while my paper is described as fundamentally flawed.

      Also customary is the lack of detail. What would be really useful here, is something like “equation A.14 in this manuscript is the same as equation 6 in smith et al. (2010) if we choose . This kind of statement would pin down the way in which what I do has been done before. That, however, would require going into detail, at the risk of finding out that what is done in my manuscript is actually quite different from what happens in smith et al. (2010). That is also a recurrent thing. When I look up the done before, I typically find something that is not quite the same.  

      Now on to the paper. What smith et al. (2010) try to do is something that I wholeheartedly support. It is an empirical study that tries to capture non-linearity. A first point of order is that it is worth asking ourselves: linear or non-linear in what? For that, I would like to go back to the setup of my manuscript. Model 2 from the Main Text is

      In this fitness function, 𝑝! is the p-score of individual 𝑖 and 𝑞! is the p-score of the partner that individual 𝑖 is matched with. This is a standard model of social behaviour if 𝛽<sub>1,0</sub> < 0 and 𝛽<sub>0,1</sub> > 0. Such choices for 𝛽<sub>1,0</sub> and 𝛽<sub>0,1</sub> indicate that having a higher p-score decreases the fitness of individual 𝑖 and increases the fitness of its partner. Here we assume that 𝛼 = 1, 𝛽<sub>1,0</sub> \= −1, and 𝛽<sub>0,1</sub> \= 2. We assume that p-scores can only be 0 or 1, or, in other words, we assume that there are only cooperators and defectors in the population (or, in terms of smith et al., 2010: cooperators and cheaters).

      For a well-mixed population, where the likelihood of being matched with a cooperator is the same for cooperators and defectors (it is equal to the frequency of cooperators for both), we can now plot the fitnesses of cooperators (red) and defectors (blue) as a function of the frequency of cooperators (Appendix 1-figure 6 left).

      We can do the same for a population with relatedness where the probability of being matched with a cooperator is + 𝑓<sub>c</sub> for cooperators, and 𝑓<sub>c</sub> for defectors, where 𝑓<sub>c</sub> is the frequency of cooperators (Appendix 1-figure 6 right). For relatedness 𝑟 = 0 and 𝑟 = "7, cooperation is selected against at every frequency.

      Increasing relatedness further, we would find that for 𝑟 = the lines coincide, which implies that at every frequency, cooperation is neither selected for nor against. For 𝑟 > ": cooperation will be selected for at every frequency. This pattern implies that, as we have seen in the manuscript, the classical Hamilton’s rule works perfectly fine for Model 2; with 𝑐 = −𝛽<sub>1,0</sub> = 1 and 𝑏 = 𝛽<sub>0,1</sub> \= 2, cooperation is selected for if and only if 𝑟𝑏 > 𝑐. The fitnesses of cooperators and defectors as functions of the frequency of cooperators, moreover, are always parallel lines, regardless of relatedness.

      Model 3 in the main text extends Model 2 by adding an interaction term:

      Now we choose 𝛼 = 1, 𝛽<sub>1,0</sub> = −1, 𝛽<sub>1,0</sub> = 1, and 𝛽<sub>1,1</sub>  \= 1. We again draw the fitnesses of cooperators and defectors, both at relatedness 𝑟 = 0 (Appendix 1-figure 7 left) and at relatedness 𝑟 = (Appendix 1-figure 7 right). In the manuscript, I argue that the appropriate version of Hamilton’s rule here is Queller’s rule: 𝑟<sub>0,1</sub>𝑏<sub>0,1</sub> + 𝑟<sub>1,1</sub>𝑏<sub>1,1</sub> > 𝑐 with 𝑐 = −𝛽<sub>1,0</sub> = 1, 𝑏<sub>0,1</sub> = 𝛽<sub>0,1</sub> = 1, and 𝑏<sub>1,1</sub> = 𝛽<sub>1,1</sub> = 1. The fitnesses of cooperators and defectors as functions of the frequency of cooperators are still straight lines, but they are no longer parallel.

      The first thing to observe, therefore, is that a model with synergy, in which the classic version of Hamilton’s rule would be misspecified, and Queller’s rule would be well-specified, does not require the fitnesses as functions of the frequencies of cooperators to be non-linear. All that changes with the addition of the interaction term, is that they stop being parallel.

      The paper by smith et al. (2010) is an effort to capture non-linearities in the way fitnesses depend on the frequency of cooperators. That, therefore, goes beyond the step from Model 2 to Model 3. Whether it uses the right method to capture those non-linearities, we will come back to in a second, but it is important to realize that also without these non-linearities, the classic version of Hamilton’s rule can be too limiting to accurately describe selection. (Here, I should add that this implies that we were wrong in Wu et al. (2013), when we suggested that “for this experiment, it seems unnecessary to use the generalized Hamilton’s rule, if instead the Malthusian fitness is adopted. In other words, the Wrightian fitness approach calls for a generalization of Hamilton’s rule, whereas the Malthusian fitness approach does not (or at least not in a drastic way, as Malthusian fitnesses are almost linear in the frequency of cooperators).” Using Malthusian fitnesses, the functions were close to linear, but not close to parallel, and therefore also here, Hamilton’s rule needs generalizing - albeit in a different way than smith et al. (2010) did).

      The cooperation that is observed in the Myxococcus xanthus studied by smith et al. (2010) is not a good match with a model where individuals are matched in pairs for an interaction that determines their fitnesses. These microbes cooperate in large groups, and a better match would therefore be the n-player public goods games studied in van Veelen (2018). There, we see that simple, straightforward ways to describe synergies (or anti-synergies) can easily lead to fitnesses not being linear in the frequency of cooperators.

      The way smith et al. (2010) try to capture those non-linearities, however, is not free of complications. We addressed those in Wu et al. (2013), and I summarized them, shortly, in van Veelen (2018). One of the issues is that most of the non-linearity smith et al. (2010) pick up is the result of considering Wrightian fitness rather than Malthusian fitness. In a continuous time model with a constant growth rate, the population size at time 𝑡 is 𝑁(𝑡) = 𝑒<sup>mt</sup>𝑁(0), where 𝑚 is the Malthusian fitness. In a discrete time model with a constant average number of offspring per individual, the population at time 𝑡 is 𝑁(𝑡) = 𝑤<sup>t</sup>𝑁(0), where 𝑤 is the Wrightian fitness. If we take 𝑚 = ln 𝑤, these are the same, and if 𝑤 is close to 1, then 𝑚 can be approximated by 𝑤 − 1. That also implies that if 𝑤 is close to 1 (or, equivalently, if 𝑚 is close to 0) one is locally linear if the other is too. However, in the experiment by smith et al. (2010) the aggregate fitness effects are not small, and what is highly nonlinear in terms of Wrightian fitness is close to linear in Malthusian fitness.

      Another complication is that the Taylor coefficients that smith et al. (2010) find are the result of a combination of the data and the choice of a functional form they choose to first apply to their data. That means that a different choice of a functional form would have given different Taylor coefficients, while the in-between transformation can also be skipped. Also, the number of Taylor coefficients is larger than the dimensionality of the data, which are based on averages for 6 frequencies. For more details on these complications, I would like to refer to Wu et al. (2013) and van Veelen (2018). A nice detail is that if we consider the way the fitnesses of cooperators and defectors compare when using Malthusian fitnesses, then a comparison of the slopes actually suggests anti-synergies, which leads to a stable mix of cooperators and cheaters, already in the absence of population structure. This matches what is suggested by Archetti and Scheuring, (2011, 2012) and Archetti (2018).

      Besides these technical complications, smith et al. (2010) is also different, in the sense that it is an empirical paper. It does not contain the Generalized Price equation, it contains no insights regarding how to derive population genetic dynamics from the Generalized Price equation, or how to derive the appropriate rules from those, and it has a very different approach to separating fitness effects and population structure.

      To end on a positive note, I would like to quote a bit out of Wu et al. (2013):

      “While we criticise these mathematical issues, we are convinced that smith et al. (2010) aim into the right direction: to incorporate the nonlinearities characteristic of biology into social evolution, we may have to extend and generalize the approach of inclusive fitness. It would be beautiful if such a generalization would ultimately include Hamilton’s original rule as a special case […].”

      I like to think that this is exactly what I have done in this paper.

      References

      Akdeniz, A., & van Veelen, M. (2020). The cancellation effect at the group level. Evolution, 74(7), 1246–1254. doi: 10.1111/evo.13995

      Allen, B., & Tarnita, C. E. (2012). Measures of success in a class of evolutionary models with fixed population size and structure. Journal of Mathematical Biology, 68, 109–143. doi: 10.1007/s00285-012-0622-x

      Archetti, M. (2018). How to Analyze Models of Nonlinear Public Goods. Games 2018, Vol. 9, Page 17, 9(2), 17. doi: 10.3390/g9020017

      Archetti, M., & Scheuring, I. (2011). Coexistence of cooperation and defection in public goods games. Evolution, 65(4), 1140–1148. doi: 10.1111/j.1558-5646.2010.01185.x

      Archetti, M., & Scheuring, I. (2012). Review: Game theory of public goods in one-shot social dilemmas without assortment. Journal of Theoretical Biology, 299, 9–20. doi: 10.1016/j.jtbi.2011.06.018

      Bourke, A. F. G. (2014). Hamilton’s rule and the causes of social evolution. Philosophical Transactions of the Royal Society B: Biological Sciences, 369(1642), 20130362. doi: 10.1098/rstb.2013.0362

      Bourrat, P., Godsoe, W., Pillai, P., Gouhier, T. C., Ulrich, W., Gotelli, N. J., & van Veelen, M. (2023). What is the price of using the Price equation in ecology? Oikos, 2023(8). doi: 10.1111/oik.10024

      Crow, J. F. (2008). Commentary: Haldane and beanbag genetics. International Journal of Epidemiology, 37(3), 442–445. doi: 10.1093/ije/dyn048

      Fisher, R. (1930). The genetical theory of natural selection. Retrieved from https://www.cabidigitallibrary.org/doi/full/10.5555/19601600934

      Fletcher, J. A., & Zwick, M. (2006). Unifying the theories of inclusive fitness and reciprocal altruism. American Naturalist, 168(2), 252–262. doi: 10.1086/506529

      Frank, S. A. (1997). The Price equation, Fisher’s fundamental theorem, kin selection, and causal analysis. Evolution, 51(6), 1712–1729. doi: 10.1111/j.1558-5646.1997.tb05096.x

      Frank, S. A. (1998). Foundations of social evolution. Princeton: Princeton University Press.

      Frank, S. A. (2012). Natural selection. IV. The Price equation*. Journal of Evolutionary Biology, 25(6), 1002–1019. doi: 10.1111/j.1420-9101.2012.02498.x

      Gardner, A., West, S. A., & Wild, G. (2011). The genetical theory of kin selection. Journal of Evolutionary Biology, 24(5), 1020–1043. doi: 10.1111/j.1420-9101.2011.02236.x

      Grafen, A. (1985a). A geometric view of relatedness. Oxford Surveys in Evolutionary Biology, 2(2), 28-89.

      Grafen, A. (1985b). News and Views. Evolutionary theory: Hamilton’s rule OK. Nature, 318(6044), 310–311. doi: 10.1038/318310a0

      Hamilton, W. D. (1964). The genetical evolution of social behaviour. I. Journal of Theoretical Biology, 7(1), 1–16. doi: 10.1016/0022-5193(64)90038-4

      Karlin, S., & Matessi, C. (1983). The eleventh R. A. Fisher Memorial Lecture - Kin selection and altruism. Proceedings of the Royal Society of London. Series B. Biological Sciences, 219(1216), 327–353. doi: 10.1098/rspb.1983.0077

      Matessi, C., & Karlin, S. (1984). On the evolution of altruism by kin selection. Proceedings of the National Academy of Sciences, 81(6), 1754–1758. doi: 10.1073/pnas.81.6.1754

      Nowak, M. A., Tarnita, C. E., & Wilson, E. O. (2010). The evolution of eusociality. Nature, 466(7310), 1057–1062. doi: 10.1038/nature09205

      Okasha, S. (2005). Maynard Smith on the levels of selection question. Biology and Philosophy, 20(5), 989–1010. doi: 10.1007/S10539-005-9019-1/METRICS

      Page, K. M., & Nowak, M. A. (2002). Unifying evolutionary dynamics. Journal of Theoretical Biology, 219(1). doi: 10.1016/S0022-5193(02)93112-7

      Pillai, P., & Gouhier, T. C. (2019). Not even wrong: the spurious measurement of biodiversity’s effects on ecosystem functioning. Ecology, 100(7), e02645. doi: 10.1002/ecy.2645

      Price, G. R. (1970). Selection and Covariance. Nature, 227(5257), 520–521. doi: 10.1038/227520a0

      Price, G. R. (1972). Extension of covariance selection mathematics. Annals of Human Genetics, 35(4), 485-490.

      Queller, D. C. (1985). Kinship, reciprocity and synergism in the evolution of social behaviour. Nature, 318(6044), 366–367. doi: 10.1038/318366a0

      Queller, D. C. (1992a). A general model for kin selection. Evolution, 46(2), 376–380. doi: 10.1111/j.1558-5646.1992.tb02045.x

      Queller, D. C. (1992b). Quantitative Genetics, Inclusive Fitness, and Group Selection. The American Naturalist, 139(3), 540–558. doi: 10.1086/285343

      Queller, D. C. (2011). Expanded social fitness and Hamilton’s rule for kin, kith, and kind. Proceedings of the National Academy of Sciences, 108(supplement_2), 10792–10799. doi: 10.1073/pnas.1100298108

      Rousset, & Billiard. (2000). A theoretical basis for measures of kin selection in subdivided populations: Finite populations and localized dispersal. Journal of Evolutionary Biology, 13(5). doi: 10.1046/j.1420-9101.2000.00219.x

      Rousset, F. (2015). Regression, least squares, and the general version of inclusive fitness. Evolution, 69(11), 2963–2970. doi: 10.1111/evo.12791

      Smith, J., Van Dyken, J. D., & Zee, P. C. (2010). A generalization of hamilton’s rule for the evolution of microbial cooperation. Science, 328(5986), 1700–1703. doi: 10.1126/science.1189675

      Sober, Elliott., & Wilson, D. Sloan. (2007). Unto others : the evolution and psychology of unselfish behavior. 394. Retrieved from https://www.hup.harvard.edu/books/9780674930476

      Taylor, P. D. (1992). Altruism in viscous populations - an inclusive fitness model. Evolutionary Ecology, 6(4), 352–356. doi: 10.1007/bf02270971

      Taylor, Peter D. (1989). Evolutionary stability in one-parameter models under weak selection. Theoretical Population Biology, 36(2), 125–143. doi: 10.1016/00405809(89)90025-7

      Taylor, Peter D., Day, T., & Wild, G. (2007). Evolution of cooperation in a finite homogeneous graph. Nature, 447(7143), 469–472. doi: 10.1038/nature05784

      Van Cleve, J. (2015). Social evolution and genetic interactions in the short and long term. Theoretical Population Biology, 103. doi: 10.1016/j.tpb.2015.05.002

      van Veelen, M. (2005). On the use of the Price equation. Journal of Theoretical Biology, 237(4). doi: 10.1016/j.jtbi.2005.04.026

      van Veelen, M. (2007). Hamilton’s missing link. Journal of Theoretical Biology, 246(3). doi: 10.1016/j.jtbi.2007.01.001

      van Veelen, M. (2011). The replicator dynamics with n players and population structure. Journal of Theoretical Biology, 276(1). doi: 10.1016/j.jtbi.2011.01.044

      van Veelen, M. (2018). Can Hamilton’s rule be violated? ELife, 7. doi: 10.7554/eLife.41901

      van Veelen, M. (2020). The problem with the Price equation. Philosophical Transactions of the Royal Society B: Biological Sciences, 375(1797), 20190355. doi: 10.1098/rstb.2019.0355

      van Veelen, M., Allen, B., Hoffman, M., Simon, B., & Veller, C. (2017). Hamilton’s rule. Journal of Theoretical Biology, 414. doi: 10.1016/j.jtbi.2016.08.019

      van Veelen, M., García, J., Sabelis, M. W., & Egas, M. (2012). Group selection and inclusive fitness are not equivalent; the Price equation vs. models and statistics. Journal of Theoretical Biology, 299. doi: 10.1016/j.jtbi.2011.07.025

      Wilson, D. S., Pollock, G. B., & Dugatkin, L. A. (1992). Can altruism evolve in purely viscous populations? Evolutionary Ecology, 6(4), 331–341. doi: 10.1007/bf02270969

      Wu, B., Gokhale, C. S., van Veelen, M., Wang, L., & Traulsen, A. (2013). Interpretations arising from Wrightian and Malthusian fitness under strong frequency dependent selection. Ecology and Evolution, 3(5). doi: 10.1002/ece3.500

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Weakness:

      Although a familiarity preference is not found, it is possible that this is related to the nature of the stimuli and the amount of learning that they offer. While infants here are exposed to the same perceptual stimulus repeatedly, infants can also be familiarised to more complex stimuli or scenarios. Classical statistical learning studies for example expose infants to specific pseudo-words during habituation/familiarisation, and then test their preference for familiar vs novel streams of pseudo-words. The amount of learning progress in these probabilistic learning studies is greater than in perceptual studies, and familiarity preferences may thus be more likely to emerge there. For these reasons, I think it is important to frame this as a model of perceptual habituation. This would also fit well with the neural net that was used, which is processing visual stimuli rather than probabilistic structures. If statements in the discussion are limited to perceptual paradigms, they would make the arguments more compelling. 

      Thank you for your thoughtful feedback. We have now qualified our claims more explicitly throughout the manuscript to clarify the scope of our study. Specifically, we have made the following revisions:

      (1) Title Update: We have modified the title to “A stimulus-computable rational model of visual habituation in infants and adults” to explicitly specify the domain of our model.

      (2) Qualifying Language Throughout Introduction: We have refined our language throughout the introduction to ensure the scope of our claims is clear. Specifically, we have emphasized that our model applies to visual habituation paradigms by incorporating qualifying language where relevant. At the end of Section 1, we have revised the statement to: "Habituation and dishabituation to sequential visual stimuli are well described by a rational analysis of looking time." This clarification makes sure that our model is framed within the context of visual habituation paradigms, particularly those involving structured sequences of stimuli, while acknowledging that habituation extends beyond the specific cases we study.

      (3) New Paragraph on Scope in the Introduction: We have added language in the Introduction acknowledging that while visual habituation is a fundamental mechanism for learning, it is not the only form of habituation. Specifically, we highlight that: “While habituation is a broadly studied phenomenon across cognitive domains—including language acquisition, probabilistic learning, and concept formation—our focus here is on visual habituation, where infants adjust their attention based on repeated exposure to a visual stimulus.”

      (4) New Paragraph on Scope in the General Discussion: We have also revisited this issue in the General Discussion. We added a dedicated paragraph discussing the scope: “This current work focuses on visual habituation, a fundamental but specific form of habituation that applies to sequential visual stimuli. While habituation has been studied across various domains, our model is specifically designed to account for looking time changes in response to repeated visual exposure. This focus aligns with our choice of perceptual representations derived from CNNs, which process visual inputs rather than abstract probabilistic structures. Visual habituation plays a foundational role in infant cognition, as it provides a mechanism for concept learning based on visual experience. However, it does not encompass all forms of habituation, particularly those involving complex rule learning or linguistic structures. Future work should investigate whether models like RANCH can be extended to capture habituation mechanisms in other learning contexts.”

      Reviewer #2 (Public review):

      There are no formal tests of the predictions of RANCH against other leading hypotheses or models of habituation. This makes it difficult to evaluate the degree to which RANCH provides an alternative account that makes distinct predictions from other accounts. I appreciate that because other theoretical descriptions haven't been instantiated in formal models this might be difficult, but some way of formalising them to enable comparison would be useful. 

      We appreciate the reviewer's concern regarding formal comparisons between RANCH and other leading hypotheses of habituation. A key strength of RANCH is that it provides quantitative, stimulus-computable predictions of looking behavior—something that existing theoretical accounts do not offer. Because previous models can not generate predictions about behaviors, we can not directly compare the previous model with RANCH. 

      The one formal model that the reviewer might be referring to is the Goldilocks model, discussed in the introduction and shown in Figure 1. We did in fact spend considerable time in an attempt to implement a version of the Goldilocks model as a stimulus-computable framework for comparison. However, we found that it required too many free parameters, such as the precise shape of the inverted U-shape that the Goldilocks model postulates, making it difficult to generate robust predictions that we would feel confident attributing to this model specifically. This assertion may come as a surprise to a reader who expects that formal models should be able to make predictions across many situations, but prior models 1) cannot be applied to specific stimuli, and 2) do not generate dynamics of looking time within each trial. These are both innovations of our work. Instead, even prior formal proposals derive metrics (e.g., surprisal) that can only be correlated with aggregate looking time. And prior, non-formalized theories, such as the Hunter and Ames model, are simply not explicit enough to implement. 

      To clarify this point, we have now explicitly stated in the Introduction that existing models are not stimulus-computable and do not generate predictions for looking behavior at the level of individual trials: 

      “Crucially, RANCH is the first stimulus-computable model of habituation, allowing us to derive quantitative predictions from raw visual stimuli. Previous theoretical accounts have described broad principles of habituation, but they do not generate testable, trial-by-trial predictions of looking behavior. As a result, direct comparisons between RANCH and these models remain challenging: existing models do not specify how an agent decides when to continue looking or disengage, nor do they provide a mechanistic link between stimulus properties and looking time. By explicitly modeling these decision processes, RANCH moves beyond post-hoc explanations and offers a computational framework that can be empirically validated and generalized to new contexts.” 

      We also highlight that our empirical comparisons in Figure 1 evaluate theoretical predictions based on existing conceptual models using behavioral data, rather than direct model-to-model comparisons: 

      “Addressing these three challenges allowed us to empirically test competing hypotheses about habituation and dishabituation using our experimental data (Figure

      \ref{fig:conceptual}). However, because existing models do not generate quantitative predictions, we could not directly compare RANCH to alternative computational models. Instead, we evaluated whether RANCH accurately captured key behavioral patterns in looking time.”

      The justification for using the RMSEA fitting approach could also be stronger - why is this the best way to compare the predictions of the formal model to the empirical data? Are there others? As always, the main issue with formal models is determining the degree to which they just match surface features of empirical data versus providing mechanistic insights, so some discussion of the level of fit necessary for strong inference would be useful. 

      Thank you for recommending additional clarity on our choice of evaluation metrics. RMSE is a very standard measure (for example, it’s the error metric used in fitting standard linear regression!). On the other hand, it captures absolute rather than relative errors. Correlation-based measures (e.g., r and r<sup>2</sup>-type measures) provide a measure of relative distance between predictive measures. In our manuscript we reported both RMSE and R². In the revised manuscript, we have now:

      (1) Added a paragraph in the main text explaining that RMSE captures the absolute error in the same units as looking time, whereas r² reflects the relative proportion of variance explained by the model: 

      “RANCH predictions qualitatively matched habituation and dishabituation in both infants and adults. To quantitatively evaluate these predictions, we fit a linear model (adjusting model‐generated samples by an intercept and scaling factor) and then assessed two complementary metrics. First, the root mean squared error (RMSE) captures the absolute error in the same units as looking time. Second, the coefficient of determination ($R^2$) measures the relative variation in looking time that is explained by the scaled model predictions. Since each metric relies on different assumptions and highlights distinct aspects of predictive accuracy, they together provide a more robust assessment of model performance. We minimized overfitting by employing cross‐validation—using a split‐half design for infant data and ten‐fold for adult data—to compute both RMSE and $R^2$ on held‐out samples.”

      (2) We updated Table 1 to include both RMSE and R² for each model variant and linking hypothesis. We now reported both RMSE and R² across the two experiments. 

      We hope these revisions address your concerns by offering a more comprehensive and transparent assessment of our model’s predictive accuracy.

      Regarding your final question, the desired level of fit for insight, our view is that – at least in theory development – measures of fit should always be compared between alternatives (rather than striving for some absolute level of prediction). We have attempted to do this by comparing fit within- and across-samples and via various ablation studies. We now make this point explicit in the General Discussion:

      More generally, while there is no single threshold for what constitutes a “good” model fit, the strength of our approach lies in the relative comparisons across model variants, linking hypotheses, and ablation studies. In this way, we treat model fit not as an absolute benchmark, but as an empirical tool to adjudicate among alternative explanations and assess the mechanistic plausibility of the model’s components.

      The difference in model predictions for identity vs number relative to the empirical data seems important but isn't given sufficient weight in terms of evaluating whether the model is or is not providing a good explanation of infant behavior. What would falsification look like in this context? 

      We appreciate the reviewer’s observation regarding the discrepancy between model predictions and the empirical data for identity vs.~number violations. We were also very interested in this particular deviation and we discuss it in detail in the General Discussion, noting that RANCH is currently a purely perceptual model, whereas infants’ behavior on number violations may reflect additional conceptual factors. Moreover, because this analysis reflects an out-of-sample prediction, we emphasize the overall match between RANCH and the data (see our global fit metrics) rather than focusing on a single data point. Infant looking time data also exhibit considerable noise, so we caution against over-interpreting small discrepancies in any one condition. In principle, a more thorough “falsification” would involve systematically testing whether larger deviations persist across multiple studies or stimulus sets, which is beyond the scope of the current work. 

      For the novel image similarity analysis, it is difficult to determine whether any differences are due to differences in the way the CNN encodes images vs in the habituation model itself - there are perhaps too many free parameters to pinpoint the nature of any disparities. Would there be another way to test the model without the CNN introducing additional unknowns? 

      Thank you for raising this concern. In our framework, the CNN and the habituation model operate jointly to generate predictions, so it can be challenging to parse out whether any mismatches arise specifically from one component or the other. However, we are not worried that the specifics of our CNN procedure introduces free parameters because:

      (1) The  CNN introduces no additional free parameters in our analyses, because it is a pre‐trained model not fitted to our data. 

      (2) We tested multiple CNN embeddings and observed similar outcomes, indicating that the details of the CNN are unlikely to be driving performance (Figure 12).

      Moreover, the key contribution of our second study is precisely that the model can generalize to entirely novel stimuli without any parameter adjustments. By combining a stable, off‐the‐shelf CNN with our habituation model, we can make out‐of‐sample predictions—an achievement that, to our knowledge, no previous habituation model has demonstrated.

      Related to that, the model contains lots of parts - the CNN, the EIG approach, and the parameters, all of which may or may not match how the infant's brain operates. EIG is systematically compared to two other algorithms, with KL working similarly - does this then imply we can't tell the difference between an explanation based on those two mechanisms? Are there situations in which they would make distinct predictions where they could be pulled apart? Also in this section, there doesn't appear to be any formal testing of the fits, so it is hard to determine whether this is a meaningful difference. However, other parts of the model don't seem to be systematically varied, so it isn't always clear what the precise question addressed in the manuscript is (e.g. is it about the algorithm controlling learning? or just that this model in general when fitted in a certain way resembles the empirical data?) 

      Thank you for highlighting these points about the model’s components and the comparison of EIG- vs. KL-based mechanisms. Regarding the linking hypotheses (EIG, KL, and surprisal), our primary goal was to assess whether rational exploration via noisy perceptual sampling could account for habituation and dishabituation phenomena in a stimulus-computable fashion. Although RANCH contains multiple elements—including the CNN for perceptual embedding, the learning model, and the action policy (EIG or KL)—we did systematically vary the “linking hypothesis” (i.e., whether sampling is driven by EIG, KL, or surprisal). We found that EIG and KL gave very similar fits, while surprisal systematically underperformed.

      We agree that future experiments could be designed to produce diverging predictions between EIG and KL, but examining these subtle differences is beyond the scope of our current work. Here, we sought to establish that a rational model of habituation, driven by noisy perceptual sampling, can deliver strong quantitative predictions—even for out-of-sample stimuli—rather than to fully disentangle forward- vs. backward-looking information metrics.

      We disagree, however, that we did not evaluate or formally compare other aspects of the model. In Table 1 we report ablation studies of different aspects of the model architecture (e.g., removal of learning and noise components). Further, the RMSE and R² values reported in Table 1 and Section 4.2.3 can be treated as out-of-sample estimates of performance and used for direct comparison (because Table 1 uses cross-validation and Section 4.2.3 reports out of sample predictions). 

      Perhaps the reviewer is interested in statistical hypothesis tests, but we do not believe these are appropriate here. Cross-validation provides a metric of out-of-sample generalization and model selection based on the resulting numerical estimates. Significance testing is not typically recommended, except in a limited subset of cases (see e.g. Vanwinckelen & Blokeel, 2012 and Raschka, 2018).

      Reviewer #1 (Recommendations for the authors):

      "We treat the number of samples for each stimulus as being linearly related to looking time duration." Looking times were not log transformed? 

      Thank you for your question. The assumption of a linear relationship between the model’s predicted number of samples and looking time duration is intended as a measurement transformation, not a strict assumption about the underlying distribution of looking times. This linear mapping is used simply to establish a direct proportionality between model-generated samples and observed looking durations.

      However, in our statistical analyses, we do log-transform the empirical looking times to account for skewness and stabilize variance. This transformation is standard practice when analyzing infant looking time data but is independent of how we map model predictions to observed times. Since there is no a priori reason to assume that the number of model samples must relate to looking time in a strictly log-linear way, we retained a simple linear mapping while still applying a log transformation in our analytic models where appropriate.

      It would be nice to have figures showing the results of the grid search over the parameter values. For example, a heatmap with sigma on x and eta on y, and goodness of fit indicated by colour, would show the quality of the model fit as a function of the parameters' values, but also if the parameters estimates are correlated (they shouldn't be). 

      Thank you for the suggestion. We agree that visualizing the grid search results can provide a clearer picture of how different parameter values affect model fit. In the supplementary materials, we already present analyses where we systematically search over one parameter at a time to find the best-fitting values.

      We also explored alternative visualizations, including heatmaps where sigma and eta are mapped on the x and y axes, with goodness-of-fit indicated by color. However, we found that the goodness of fit was very similar across parameter settings, making the heatmaps difficult to interpret due to minimal variation in color. This lack of variation in fit reflects the observation that our model predictions are robust to changes in parameter settings, which allows us to report strong out of sample predictions in Section 4. Instead, we opted to use histograms to illustrate general trends, which provide a clearer and more interpretable summary of the model fit across different parameter settings. Please see the heatmaps below, if you are interested. 

      Author response image 1.

      Model fit (measured by RMSE) across a grid of prior values for Alpha, Beta, and V shows minimal variation. This indicates that the model’s performance is robust to changes in prior assumptions.

      Regarding section 5.4, paragraph 2: It might be interesting to notice that a potential way to decorrelate these factors is to look at finer timescales (see Poli et al., 2024, Trends in Cognitive Sciences), which the current combination of neural nets and Bayesian inference could potentially be adapted to do. 

      Thank you for this insightful suggestion. We agree that examining finer timescales of looking behavior could provide valuable insights into the dynamics of attention and learning. In response, we have incorporated language in Section 5.4 to highlight this as a potential future direction: 

      Another promising direction is to explore RANCH’s applicability to finer timescales of looking behavior, enabling a more detailed examination of within-trial fluctuations in attention. Recent work suggests that analyzing moment-by-moment dynamics can help disentangle distinct learning mechanisms \autocite{poli2024individual}.Since RANCH models decision-making at the level of individual perceptual samples, it is well-suited to capture these fine-grained attentional shifts.

      Previous work integrating neural networks with Bayesian (like) models could be better acknowledged: Blakeman, S., & Mareschal, D. (2022). Selective particle attention: Rapidly and flexibly selecting features for deep reinforcement learning. Neural Networks, 150, 408-421. 

      Thank you for this feedback. We have now incorporated this citation into our discussion section: 

      RANCH integrates structured perceptual representations with Bayesian inference, allowing for stimulus-computable predictions of looking behavior and interpretable parameters at the same time. This integrated approach has been used to study selective attention \autocite{blakeman2022selective}.

      Unless I missed it, I could not find an OSF repository (although the authors refer to an OSF repository for a previous study that has not been included). In general, sharing the code would greatly help with reproducibility. 

      Thanks for this comment. We apologize that – although all of our code and data were available through github, we did not provide links in the manuscript. We have now added this at the end of the introduction section. 

      Reviewer #2 (Recommendations for the authors):

      Page 7 "infants clearly dishabituated on trials with longer exposures" - what are these stats comparing? Novel presentation to last familiar? 

      Thank you for pointing out this slightly confusing passage. The statistics reported are comparing looking time in looking time between the novel and familiar test trials after longer exposures. We have now added the following language: 

      Infants clearly dishabituated on trials with longer exposures, looking longer at the novel stimulus than the familiar stimulus after long exposure.

      Order effects were covaried in the model - does the RANCH model predict similar order effects to those observed in the empirical data, ie can it model more generic changes in attention as well as the stimulus-specific ones? 

      Thank you for this question. If we understand correctly, you are asking whether RANCH can capture order effects over the course of the experiment, such as general decreases in attention across blocks. Currently, RANCH does not model these block-level effects—it is designed to predict stimulus-driven looking behavior rather than more general attentional changes that occur over time such as fatigue. In our empirical analysis, block number was included as a covariate to account for these effects statistically, but RANCH itself does not have a mechanism to model block-to-block attentional drift independent of stimulus properties. This is an interesting direction for future work, where a model could integrate global attentional dynamics alongside stimulus-specific learning. To address this, we have added a sentence in the General Discussion saying:

      Similarly, RANCH does not capture more global attention dynamics, such as block-to-block attentional drift independent of stimulus properties.

      "We then computed the root mean squared error (RMSE) between the scaled model results and the looking time data." Why is this the most appropriate approach to considering model fit? Would be useful to have a brief explanation. 

      Thank you for pointing this out. We believe that we have now addressed this issue in Response to Comment #2 from Reviewer 1. 

      The title of subsection 3.3 made me think that you would be comparing RANCH to alternate hypotheses or models but this seems to be a comparison of ways of fitting parameters within RANCH - I think worth explaining that. 

      We have now added a sentence in the subsection to make the content of the comparison more explicit: 

      Here we evaluated different ways of specifying RANCH's decision-making mechanism (i.e., different "linking hypotheses" within RANCH).

      3.5 would be useful to have some statistics here - does performance significantly improve? 

      As discussed above, we systematically compared model variants using cross-validated RMSE and R² values, which provide quantitative evidence of improved performance. While these differences are substantial, we do not report statistical hypothesis tests, as significance testing is not typically appropriate for model comparison based on cross-validation (see Vanwinckelen & Blockeel, 2012; Raschka, 2018). Instead, we rely on out-of-sample predictive performance as a principled basis for evaluating model variants.

      It would be very helpful to have a formal comparison of RANCH and other models - this seems to be largely descriptive at the moment (3.6).

      We believe that we have now addressed this issue in our response to the first comment.

      Does individual infant data show any nonlinearities? Sometimes the position of the peak look is very heterogenous and so overall there appears to be no increase but on an individual level there is. 

      Thank you for your question. Given our experimental design, each exposure duration appears in separate blocks rather than in a continuous sequence for each infant. Because of this, the concept of an individual-level nonlinear trajectory over exposure durations does not directly apply. Instead, each infant contributes looking time data to multiple distinct conditions, rather than following a single increasing-exposure sequence. Any observed nonlinear trend across exposure durations would therefore be a group-level effect rather than a within-subject pattern.

      In 4.1, why 8 or 9 exposures rather than a fixed number? 

      We used slightly variable exposure durations to reduce the risk that infants develop fixed expectations about when a novel stimulus will appear. We have now clarified this point in the text.

      Why do results differ for the model vs empirical data for identity? Is this to do with semantic processing in infants that isn't embedded in the model? 

      Thank you for your comment. The discrepancy between the model and empirical data for identity violations is related to the discrepancy we discussed for number violations in the General Discussion. As noted there, RANCH relies on perceptual similarity derived from CNN embeddings, which may not fully capture distinctions that infants make.

      The model suggests the learner’s prior on noise is higher in infants than adults, so produces potentially mechanistic insights. 

      We agree! One of the key strengths of RANCH is its ability to provide mechanistic insights through interpretable parameters. The finding that infants have a higher prior on perceptual noise than adults aligns with previous research suggesting that early visual processing in infants is more variable and less precise.

    1. Author response:

      The following is the authors’ response to the original reviews.

      In this letter, we respond to each of the reviewers’ comments. We support responses by referring to the revised manuscript and, where necessary, by including additional descriptions and analyses that we consider extrinsic to the manuscript itself. In this letter, all changes to the manuscript are shown in blue. As noted, the displayed figures have been added to the manuscript or the SI. We believe that we have successfully addressed all comments and that the quality of our paper has improved significantly.

      Comment 1: In addition to the technical comments by the reviewers, I would encourage the authors to discuss the dependency of their observations, e.g. emergence of microphase separation, not only on the sequence of the polypeptides, but also on the solution conditions. Similarly, the distributions of ions in the condensate bulk, interphase, and diluted phase, and hence the interfacial free energy, are significantly affected both by the chemical composition of the condensate and the salt concentration itself, see: https://pubs.acs.org/doi/10.1021/acs.nanolett.1c03138

      We thank the editor for this suggestion. Here, we have focused on the effect of sequence on condensate organization. We agree that how changes in solution condition affect condensate, including microphase separation of ELPs, is potentially interesting as well. We note this as a possible future direction at multiple places in the revised Conclusions and Discussion:

      “The simulations successfully reproduced condensate stability variation upon amino acid substitution. While our study is performed at set salt concentration and temperature to isolate the contributions of amino acid hydrophobicity to condensate organization, future studies may consider implementing temperature [cite] or salt [cite] dependent models to explore how solution conditions affect the organization of ELP condensates.”

      “Such a microenvironment arises from the collective behavior of many proteins, can deviate from that of individual chains, and is likely sensitive to the solution conditions,[cite] which are held constant in our study. Future work on systems with double amino acid substitutions or changes to salt concentration or temperature could elucidate the generality of the mean field interpretation and the additivity of individual contributions.”

      Response to referee 1

      Comment 0: This is an interesting, informative, and well-designed study that combines theoretical and experimental methodologies to tackle the phenomenon of higher-resolution structures/substructures in model biomolecular condensates. The results should be published. However, there is significant room for improvement in the presentation and interpretation of the results. As it stands, the precise definition of “frustration,” which is a main theme of this manuscript (as emphasized in the title), is not sufficiently well articulated. This situation should be rectified to avoid ””rustration” becoming a ”catch-all” term without a clear perimeter of applicability rather than a precise, informative description of the physical state of affairs. There are also a few other concerns, e.g., regarding interpretation of correlation of phase-separation critical temperature and transfer free energy of amino acid residues as well as the difference between critical temperature and onset temperature, and the way the simulated configurations are similar to that of gyroids.

      We want to thank the reviewers for their insightful comments. We revised the manuscript extensively to improve its clarity and to address the reviewers’ concerns. In the following, we provide point-to-point responses to all the comments.

      Comment 1: It is accurately pointed out on p.4 that elastin-like polypeptides (ELPs) undergo heat-induced phase separation and therefore exhibit lower critical solution temperatures (LCSTs). But it is not entirely clear how this feature is reproduced by the authors’ simulation. A relationship between simulated surface tension and “transition temperature” is provided in Fig.1C; but is the ”transition temperature” (authors cited ref.41 by Urry) the same as critical temperature? Apparently, Urry’s Tt is””critical onset temperature”, the temperature when phase separation happens at a given polymer concentration. This is different from the (global) critical temperature LCST - though the two may be correlated-or not-depending on the shape of the phase boundary. Moreover, is the MOFF coarse-grained forcefield (first step in the multi-scale simulation), by itself, capable of reproducing heat-induced phase separation in a way similar to the forcefield of Dignon et al., ACS Cent Sci 5, 821-230 (2019)? Or is this temperature-dependent effect appearing only subsequently, after the implementation of the MARTINI and/or all-atom steps? Clarification is needed. To afford a more informative context for the authors’ introductory discussion, the aforementioned Dignon et al. work and the review by Cinar et al. [Chem Eur J 25, 13049-13069 (2019)], both touching upon the physical underpinning of the LCST feature of elastin, should also be cited along with refs.41-43.

      We thank the reviewer for their comment. First, we apologize for the lack of clarity between the global lower critical solution temperature, Tc, and the transition temperature, Tt. We have modified the manuscript to be more explicit that the transition temperature we utilize is dependent on the solution conditions, instead of the global lower critical solution temperature.

      Author response image 1.

      Tt as a function of concentration for ELP[V5A2G3] constructs of different chain lengths. Logarithmic fits to the data for each construct using Eq. 1 are also shown. It is evident that the different curves converge to the critical temperature Tc at the critical concentration Cc. Figure reproduced from ref.[2] CC BY 4.0.

      However, as shown by Chilkoti and coworkers [1, 2] and in Author response image 1, the critical temperature of ELPs Tc is indeed linearly related to Tt with the following relationship

      The above equation highlights the dependence of Tt on the chain length (length) and polymer concentration (conc). The parameter Cc is the corresponding theoretical polypeptide concentration that would be required to achieve Tc, and k is the proportionality constant. Instead of making computationally expensive predictions of condensate critical temperatures, we focused on the surface tension, which can be more readily determined from single constant temperature simulations as detailed in the Methods section. This decision was made so to make it computationally feasible to systematically probe the properties of all 20 amino acids in diblock ELPs in our multiscale model. Furthermore, an expected relationship between the critical temperature and the surface tension can be inferred based on the Flory Huggins theory. In particular, relationships between the Flory Huggins parameter, χ, and interfacial tension (τ) have been investigated, and the relationship can be approximated as

      where α is a positive constant, whose exact value depends on the proximity of χ to the critical value of χ necessary for phase separation (χC).[3, 4] As detailed in new Supplemental Theory of the Supporting Information, for systems undergoing LCST,

      with Therefore, we have

      Several conclusions can be drawn from Eq. 4. First, for α = 1, τ is linearly proportional to Tc. Secondly, τ decreases at larger values for Tc since trend that is consistent with results presented in Figure 1 of the main text. Finally, as detailed in the Supplemental Theory, the inverse relationship between τ and Tc is only expected for systems exhibiting LCSTs. For systems with UCST, τ increases at larger Tc. Therefore, reproducing the correct trend supports the model’s ability to capture the temperature-dependent effect specific to the ELP system.

      We modified the text to define the physical meaning of Tt more explicitly. Furthermore, we added a new section in the Supporting Information titled Supplemental Theory to detail the relationship between Tt, Tc, the Flory-Huggins parameter χ, and the surface tension τ. The updated text now reads:

      “Utilizing the simulated condensate conformations, we computed various quantities to benchmark against experimental measurements. While the critical temperature has been widely used as a measure for condensate stability, determining it computationally is expensive. As an alternative, we computed the surface tension, τ, using 100-µs-long MARTINI simulations performed with the NPNAT ensemble.[cite] As detailed in the Supplemental Theory in the Supporting information, an inverse relationship is expected between τ and the critical temperature, Tc, for systems exhibiting LCSTs. We further approximate Tc with the transition temperatures (Tt) of ELP sequences,[cite] which are the temperatures at which ELPs undergo an LCST transition at a specified solution condition. Tt was shown to be linearly proportional to TC[cite]. As expected, a negative correlation can be readily seen between computed surface tension and experimental Tt (Fig. 1C). This observed negative correlation between Tt and τ supports the simulation approach’s accuracy in reproducing the sequence-dependent changes in ELP phase behavior.”

      The reviewer is correct that MOFF does not explicitly account for temperature-dependent effects in its interaction parameters. But as mentioned above and indicated by the reviewer, the following steps with explicit solvent simulations in the multiscale strategy succeed in capturing sequence-dependent differences in ELP systems, which are evident in both transition temperature and surface tension.

      We cited the two references suggested by the reviewer in the introduction. We further added the following text in the discussion section to suggest explicitly exploring temperature-dependent effects as an interesting future direction.

      “While our study is performed at set salt concentration and temperature to isolate the contributions of amino acid hydrophobicity to condensate organization, future studies may consider implementing temperature[cite] or salt[cite] dependent models to explore how solution conditions effect the organization of ELP condensates.”

      Comment 2: “Frustration” and ”frustrated” are used prominently in the manuscript to characterize certain observed molecular configurations (11 times total, in both the title and in the abstract). Apparently, it is the most significant conceptual pronouncement of this work, hence its precise meaning is of central importance to the authors’ thesis. Whereas one should recognize that the theoretical and experimental observations are striking without invocation of the “frustration” terminology, usage of the term can be useful if it offers a unifying conceptual framework. However, as it stands, a clear definition of the term “frustration” is lacking, leaving readers to wonder what molecular configurations are considered “frustrated” and what are not (i.e., is the claim of observation of frustration falsifiable?). For instance, “frustrated microphase separation” appears in both the title and abstract. A logical question one may ask is: “Are all microphase separations frustrated”? If the answer is in the affirmative, does invocation of the term “frustration” add anything to our physical insight? If the answer is not in the affirmative, then how does one distinguish between microphase separations that are frustrated from those that are not frustrated? Presumably all simulated and experimental molecular configurations in the present study are those of lowest free energy for the given temperature. In other words, they are what they are. In the discussion about frustrated phase separation on p.13, for example, the authors appear to refer to the fact that chain connectivity is preventing hydrophobic residues to come together in a way to achieve the most favorable interactions as if there were no chain connectivity (one may imagine in that case all the hydrophobic residues will form a large cluster without microphase separation). Is this what the authors mean by “frustration”? If that’s true, isn’t that merely stating the obvious, at least for the observed microphase separation? In general, does “frustration” always mean deviation of actual, physical molecular configurations from certain imagined/hypothetical/reference molecular configurations, and therefore dependent upon the choice of the imagined reference configuration? If this is how the authors apply the term “frustration” in the present work, what is the zero-frustration reference state/configuration for microphase separation? And, similarly, what is the zero-frustration reference state/configuration when frustrated EPS-water interactions are discussed (p.14-p.15, Fig.5)? How do non-frustrated water-protein interactions look like? Is the classic clathrate-like organization of water hydrogen bonds around small nonpolar solute “frustrated”?

      We thank the reviewer for their insightful comment, and agree that the concept of “frustration” is both important to our conclusions and, upon review, is too vague in our previous draft of the manuscript.

      For conceptual simplicity and to maximize transferability to real biological systems, we will focus our discussion of frustration on one specific type, which we term “chain frustration.” Chain frustration occurs in states where tertiary interactions between chemically distinct polymer blocks favor phase separation, while chain connectivity prevents macroscopic phase separation from occurring.[5] This frustration leads to microphase separation with microdomains of different monomers.

      We agree with the reviewer that “all microphase separations” are frustrated, and have revised the title to

      “Microphase Separation Produces Interfacial Environment within Diblock Biomolecular Condensates”

      Furthermore, we also removed frustration from the abstract to read

      “The interspersion of hydrophilic and hydrophobic residues and a lack of secondary structure formation result in an interfacial environment, which explains both the strong correlation between ELP condensate stability and interfacial hydrophobicity scales, as well as the prevalence of protein-water hydrogen bonds.”

      We have limited our discussion of the frustration to the incomplete separation of hydrophobic and hydrophobic groups. As pointed out by the reviewer, in this case, frustration refers to the fact that chain connectivity is preventing hydrophobic residues from coming together in a way to achieve the most favorable interactions as if there were no chain connectivity. The reference would be a perfectly macroscopic phase separation that partitions hydrophobic from hydrophilic groups.

      While the frustration from chain connectivity is well understood for block copolymers[5], its effect on producing the interfacial solvation environment, to the best of our knowledge, has not been emphasized before. We have revised the text at the point where we mention frustration to clearly define its meaning.

      “Therefore, while microphase separation occurs in ELP condensates, frustration remains in the system. Hydrophilic residues cannot completely separate from hydrophobic ones due to constraints imposed by the acid sequence, creating unique microenvironments.”

      When discussing the interactions between ELP and water, we used the hydrogen bond analysis to emphasize the interfacial environment. For example, the hydrophobic residues tend to “repel” water molecules, reducing the hydrogen bond density; on the other hand, hydrophilic residues and backbone retain water molecules. This difference resulted in the positive and negative correlation with Tt shown in Fig 5C. The behavior of water molecules is, therefore, inhomogeneous inside the condensate. We expect water molecules to become frustrated due to the simultaneous contact with both hydrophobic and hydrophilic chemical groups, and a perfect reference state would be the pure water environment. However, since this point is not central to our study, to avoid confusion, we have avoided mentioning frustration and revised the text to read amino acid sequence, creating unique microenvironments.”

      “The water hydrogen bond density also highlights an interfacial environment of blended hydrophobic and hydrophilic regions.”

      After revising the text, frustration only appears three times in the manuscript.

      Comment 3: In the discussion about the correlation of various transfer free energy scales for amino acids and Urry’s critical onset temperature (ref.41) on p.11 and Fig.4, is there any theoretical relationship to be expected between the interactions among amino acids of ELPs and their critical onset temperatures? While a certain correlation may be intuitively expected if the free energy scale ”is working”, is there any theoretical insight into the mathematical form of this relationship? A clarifying discussion is needed because it bears logically on whether the observed correlation or lack thereof for different transfer energy scales is a good indication of the adequacy of the energy scales in describing the actual physical interactions at play. This question requires some prior knowledge of the expected mathematical relationship between interaction parameters and onset temperature.

      We thank the reviewer for their comment. The exact relationship between the interactions between amino acids and their transition temperature can be understood in terms of the Flory-Huggins theory, which describes the thermodynamics of polymer mixtures using a lattice model. The chemical composition of the mixture is built into the polymer-solvent interaction parameter

      Where is the coordination number, T is the temperature, kB is the Boltzmann constant, and {ϵpp, ϵss, ϵps} are the strength of polymer-polymer, solventsolvent, and polymer-solvent interactions respectively.[6]

      From the original derivation of Flory-Huggins theory, it can be shown that phase separation occurs when χ is greater than its critical value, or χC, we can derive the critical temperature as

      Δϵ can indeed be interpreted as the free energy cost of transferring a polymer bead from a solution phase to a polymer phase. It corresponds to the change of energy from a mixed state, with contacts between polymer and solvent (ϵps), to the demixed state with only polymer-polymer (ϵpp) and solvent-solvent (ϵss) contacts.

      Therefore, the transfer free energy, and the interactions among amino acids of ELPs, are expected to correlate with the critical temperature. The above discussion has been incorporated into the new section Supplemental Theory in the Supporting Information. There, we also discuss the more general scenario where Δϵ is temperature dependent, which is essential for giving rise to LCST.

      We have modified the main text in the discussions of Figure 4 to better explain these mathematical relationships and their necessary assumptions in order to help interpret our simulations. Here is an expert from where we discuss Figure 4:

      “The strong dependence of molecular organization on amino acid hydrophobicity suggests that the solvation environment of individual residues might be a determining factor for condensate stability. Indeed, as shown in the Supplemental Theory of the Supporting Information, the critical temperature is closely related to the free energy cost of transferring polymer beads from a solution state to a polymer-only environment. This transfer free energy is often used to quantify the hydrophobicity of amino acids [cite]. To explore their relationship more quantitatively, we compared the transition temperature for ELP condensates measured by Urry [cite] to several hydrophobicity scales.”

      Comment 4: To provide a more comprehensive context for the present study, it is useful to compare the microphase separation seen in the authors’ simulation with the micelle-like structures observed in recent simulated condensed/aggregated states of hydrophobic-polar (HP) model sequences in Statt et al., J Chem Phys 152, 075101 (2020) [see esp. Fig.6] and Wesse´n et al., J Phys Chem B 126, 9222-9245 (2022) [see, e.g., Fig.10].

      We thank the reviewer for this suggestion. The results of Statt et al. and Wessen et al.´ indeed provide a nice comparison to our results. While we capture some of the same behavior they observe, the full array of chemical space in our model seems to give some additional morphologies as well.

      First, as predicted by the self-consistent field theory, block copolymers are expected to form primarily lamellar like micelles that clearly seperate the dense and dilute phase when the volume fraction, f, is 0.5 (Response to Comment 5). This prediction is indeed consistent with results from simulations with the HP model, and is consistent with our simulations when the substituted amino acid, X, is sufficiently polar.

      However, this observation is only one of several behaviors we observe. In particular, our simulations also produce gyroid-like structures, which are predicted to emerge at small volume differences, i.e. f ≈ 0.4 or f ≈ 0.6. These different configurations likely emerge due to the more realistic representation of amino acids in our model, which presents more frustration than the HP model. In particular, the backbone atoms are inherently hydrophilic and cannot separate from the hydrophobic side chains. Therefore, under microphase separation, it is inherently difficult to separate the different chemical groups to form lamellar or micelle-like structures. This produces a condensate interior with interfacial properties that may not be captured by the HP model.

      We make note of the micelle-like topologies predicted by HP models in the revised text, citing both Statt et al. and Wessen et al.:´

      “Surprisingly, microphase separation did not produce lamellar morphology as expected for block copolymers with equal volume fraction of the two blocks (Fig. S3 in the Supporting Information) [cite]. In particular, the condensates appear to form gyroid-like structures (Fig. S4 in the Supporting Information), in which the V and X blocks form two interpenetrating networks. This morphology also differs from micelle-like structures seen in simplified hydrophobicpolar (HP) polymers [cite]. It promotes interfacial contacts while maintaining substantial self-interactions as well. Weak interfacial tension between different ELP blocks has also been noted by Hassouneh et al.[cite]”

      Comment 5: ”Gyroid-like morphology” is mentioned several times in the manuscript (p.4, p.8, p.17, Fig.S3). This is apparently an interesting observation, but a clear explanation is lacking. A more detailed and specific discussion, perhaps with additional graphical presentations, should be provided to demonstrate why the simulated condensed-phase ELP configurations are similar to the classical description of gyroid as in, e.g., Terrones & Mackay, Chem Phys Lett 207, 45-50 (1993) and Lambert et al., Phil Trans R Soc A 354, 2009-2023 (1996).

      We thank the reviewer for their comment. Gyroids are canonical structures for diblock copolymers.[5, 7, 8, 9] Their stability is predicted using self-consistent field theory (SCFT), and occurs due to the balance of the volume fraction of polymer block A (fA), the length of the polymer (N), and the Flory-Huggins interaction parameter (χ).[8, 9] The prediction from SCFT suggests that gyroids occur at smaller values of χN and values fA near, but not equal to 0.5 (Author response image 2).[10] We hypothesize that these configurations emerge at equal molar fraction of V and X amino acids due to small differences in solvation volume between each half of the polymer chain.

      Our support for gyroid-like structures is mainly from observations of two interpenetrating networks formed by the two ELP blocks. We have revised Figure S4 to clearly highlight the two networks as shown in Author response image 3.

      We have revised the main text to clearly define the gyroid-like structures as interpenetrating networks, and added the theoretical phase diagram of diblock copolymers predicted by SCFT as Figure S3 in the Supporting Information.

      “In particular, the condensates appear to form gyroid-like structures (Fig. S4 in the Supporting Information), in which the V and X blocks form two interpenetrating networks. This morphology also differs from micelle-like structures seen in simplified hydrophobic-polar (HP) polymers [cite]. It promotes interfacial contacts while maintaining substantial self-interactions as well. Weak interfacial tension between different ELP blocks has also been noted by Hassouneh et al.[cite]”

      We note, however, that proving that our observations are indeed gyroid structures requires more sophisticated mathematical analysis that is beyond the scope of the study. It is also possible that these structures are metastable in our simulations. We emphasize these caveats in the updated Discussion Section.

      “Further studies on the thermodynamic stability of these morphologies and comparing them with predictions from the self-consistent field theory shall provide more insights into the driving forces for their emergence [cite].”

      Author response image 2.

      Theoretical phase diagram[8] and corresponding morphologies for diblock copolymers. The phases are labeled as: body centered cubic (BCC), hexagonal cylinders (HEX), gyroid (GYR), and lamellar (LAM). fA is the volume fraction of a single polymer block, denoted A, χ is the Flory-Huggins interaction parameter, and N is the total degree of polymerisation. Figure reproduced from ref.[10] CC BY 4.0.

      Author response image 3.

      Representative configurations of (A) V5F5 and (B) V5L5 condensates from MARTINI simulations. The valine substituted half of the chain is colored blue (V5) and the X substituted half of the chain is colored red (X5). To highlight the interpenetrating networks formed by the two halves, only the X substituted half of the chain is shown on the left. Simulation interfaces are once repeated periodically in the positive x and positive y dimensions for clarity. High density regions formed by the multiple X substituted half of the chains are highlighted in yellow circles, with one of the chain shown in green.

      Response to referee 2

      Comment 1: The experimental characterization relies on BODIPY and SBD reporting, respectively, on viscosity and polarity. The fluorescent signal of these dyes can possibly depend on many other factors, including quenching. Additional controls are required, or a more extensive discussion with additional references, and a mention to potential limitations of this approach.

      We agree with the reviewer that the fluorescence lifetime signal will be affected by many factors. Compared with the fluorescence intensity, the fluorescence lifetime mainly depends on the dyes’ self properties and environmental factors. BODIPY and SBD have been used in biological systems to detect the microviscosity and micropolarity of condensates. Our group published the same SBD and BODIPY fluorophores in previous work to quantify the microenvironment of protein aggregation and condensations. The extended data (ChemBioChem 20:1078–1087. doi: 10.1002/cbic.201800782; Aggregate 4:e301. doi:10.1002/agt2.301; Nat Chem Biol 1–9. doi:10.1038/s41589-023-01477-1) shows evidences that the BODIPY is only sensitive to the viscosity while SBD is only sensitive to the polarity, but nonsensitive to other environmental factors. As for the quenched issue, the fluorophores with extended pi-rich structure display aggregation-caused quenching (ACQ) effect in high probe concentration, which will lower the fluorescence lifetime and intensity. We usually labeled the 20% molar ratio of the ELPs using NHS-ester fluorophores to get stock solutions. Due to the labeling efficiency, the exact labeling ratio is much lower than 20%. The labeled ELP stock solution will be further mixed with unlabeled ELP to get ELP solutions with low labeling fractions. We measured the ELPs labeled with a different fraction of dyes. The result shows that only BODIPY performs slight ACQ phenomena at a high

      Author response image 4.

      FLIM images of ELP condensates labeled with different fractions of dyes. A) FLIM images of V30A30 condensates with 5%, 2.5%, and 1% BODIPY labels. B) FLIM images of V30A30 condensates with 5%, 2.5%, and 1% fraction of SBD. Droplets were formed with a final concentration of 70 µM ELP labeled with different fractions of BODIPY or SBD in 2 M NaCl solution. Scale bar:5 µm.

      To mostly avoid the potential ACQ effect and achieve enough fluorescence signals, we finally use the ELP labeled with a lower fraction of dyes, 1% of BODIPY and 2.5 % of SBD, to perform the FLIM experiments. The data in Figure 3 will be corrected with the following data.

      Author response image 5.

      Structures of NHS-BODIPY and NHS-SBD, and representative FLIM images of V30A30, A30V30, V30G30 and G30V30 labeled with respective fluorophores. The fluorescence lifetime of each image is the average acquired from three independent experiments. Scale bar: 5 µm.

      We revised the text in the section Microphase separation of ELP condensates as follows “To experimentally test the microphase separation behavior uncovered in simulations, we studied the micro-physicochemical properties of the V-end and X-end of the peptides. We constructed diblock peptides with the combination of 30 pentameric repeats of V block and X (A or G) block, namely V30A30 and V30G30 (Experimental Sequences Section in the Supporting Information). The amino-termini of V30A30 and V30G30 sequences were subsequently labeled with environmentally sensitive BODIPY or SBD fluorophores [cite], whose lifetime could be measured to quantify the viscosity or polarity of the V-end (Fig. 3A, left panel) [cite]. These probes have been reported to be only sensitive to single physicochemical properties.[cite] To avoid artifacts induced by fluorophore labeling, we usually used ELPs labeled with a low fraction of dyes. We also constructed A30V30 and G30V30 diblock peptides, wherein the viscosity or polarity of the A-end or the G-end could be measured by fluorophores that are attached at the amino-terminus (Fig. 3A, right panel). Using FLIM, we found that the lifetime of BODIPY for the V-end (5.43 ns) was longer than that for the A-end (4.35 ns), suggesting that the V-end indeed has a higher microviscosity than the A-end (ηV= 2233.54 cp vs ηA= 969.57 cp). Accordingly, the lifetime of SBD was longer for the V-end (8.75 ns) than the A-end (7.00 ns), indicating that the micropolarity of the V-end was lower than the A-end (ϵV= 13.25 vs ϵA = 18.97). These observations could be largely attributed to the greater extent of dehydration at the V-end due to its higher local peptide density. We further showed that the observed differences are not results of possible artifacts arising from any subtle distinctions between the two sequences V30A30 and A30V30 (Experimental Characterization of ELP Condensates Section in the Supporting Information, Fig. S8-S9 in the Supporting Information). Similar results were observed using the V-G sequences. FLIM experiments revealed that the V-end was more viscous than the G-end (ηV= 2972.72 cp vs ηG= 1958.60 cp) and the V-end was less polar than the G-end (ϵV= 9.14 vs ϵG = 27.50). These experimental observations provided the first line of evidence to support the microphase separation, as suggested by the simulation results.”

      We revised the text in the section Experimental methods as follows

      “The proteins of interest were labeled with NHS ester fluorophore. We used ELPs with 1% BODIPY labels or 2.5% SBD labels to form condensates, which avoid the artifacts induced by fluorophores. Droplets were formed with the final concentration of 70 µM ELP in 2 M NaCl for V-A and 1.5 M NH4SO4 for V-G diblock, respectively. A drop of droplets containing solution was placed on a 0.17 mm coverslip with a 500 µm spacer. Images were acquired by Leica Falcon Fluorescence Microscope equipped with Wil pulse laser and 63X/0.12 oil-immersion objective. The BODIPY was excited at 488 nm and the SBD was excited at 448 nm. The fluorescence lifetime fitting and image analysis were performed in LAS X and Image J.”

      We also used a lower concentration of free dyes to remeasure the properties of the ELP condensates. The Figure S9 data are corrected as follows. The slight differences between the results are caused by experimental errors, which don’t affect the conclusion.

      Author response image 6.

      FLIM image of unlabeled ELP condensates. A) Chemical structure of free fluorophore, which can measure the physicochemical properties of condensates without labeling. B) Representative FLIM images of V30A30 and A30V30. The mix is the mixture of V30A30 (35 µM) and A30V30 (35 µM). Droplets were formed with a final concentration of 70 µM ELP in 2 M NaCl solution with 1 µM fluorophore. C) Representative FLIM images of V30G30 and G30V30. Droplets were formed with a final concentration of 70 µM ELP in 1.5 M (NH4)2SO4 solution with 1 µM fluorophore. The mix is the mixture of V30G30(35 µM) and G30V30 (35 µM). Scale bar, 5 µm. The fluorescence lifetime of each image is the average from three independent measurements.

      We also revised the Sequence dependence of micro-viscosity and polarity section of the Supporting Information as follows

      “Since we used V30X30 and X30V30 to quantify the V- and X-end of the V-X blocks, it is possible that the observed differences arose from the innate property of the V30X30 and X30V30 sequences. To rule out this artifact, we formed the ELP condensates with sequences of V30X30, X30V30, or the V30X30 and X30V30 mixture. The condensates were subsequently treated with the aldehydeBODIPY and methyl-ester SBD fluorophores without the NHS ester reactive warhead (Fig. S9A in the Supporting Information). After brief incubation, aldehyde-BODIPY and methyl-ester SBD fluorophores were recruited into and homogeneously distributed in the ELP condensates. The fluorescence lifetime of aldehyde-BODIPY was the same for V30A30 (4.96 ns), A30V30 (4.99 ns), and their mixture (4.98 ns) (Fig. S9B in the Supporting Information, upper panel). Interestingly, this value is around the average (4.89 ns) of the A-end (4.35 ns) and the V-end (5.43 ns) labeled NHS-BODIPY. For the SBD measurement, methyl-ester SBD resulted in almost identical lifetime values of V30A30 (8.25 ns), A30V30 (8.27 ns), and their mixture (8.28 ns) (Fig. S9B in the Supporting Information, lower panel), again around the average values (7.88 ns) of the A-end (7.00 ns) and the V-end (8.75 ns) labeled NHS-SBD. In addition to the V-A blocks, similar observations were made for the V-G blocks as V30G30 and G30V30 sequences (Fig. S9C in the Supporting Information). The slight difference between the results is attributed to the experiment errors. Because the fluorophores did not covalently label the amino-terminus of the ELP peptides, their lifetime reports closer to the averaged property of the condensates instead of the microscopic property of the V-end or the X-end when the number of molecules is sufficient and the molecular distribution has no preference.

      Our results reveal that the V30X30 and X30V30 condensates exhibited similar macroscopic viscosity or polarity, suggesting that the previously observed different viscosity or polarity of V30X30 and X30V30 could be attributed to the microscopic property of the V-end or X-end.”

      The FLIM technique combined with environment-sensitive fluorophores is a powerful tool for us to investigate the physicochemical properties of the microenvironment within the condensates. However, there are some limitations to this method. As the fluorophore is labeled in the protein, we can only detect the microenvironment surrounding the surface of the probe(the distance may be angstrom level). The fluorescence signal values we got are the statistical average of the fluorescence signals from the complex microenvironments. The signal from the probes is determined by the sampling position, orientation, and number of fluorescent probes. So the quantified values can be compared relatively, but these values can not accurately describe the physical or chemical states in different systems. In addition, the resolution in FLIM experiments is not enough to directly distinguish the microstructure in condensates.

      Comment 2: It is unclear if, after the application of stretching, the micro-structure will eventually return to the original configuration or not. Overall, the point of this experiment remains somewhat unclear.

      We thank the reviewer for this comment. The ELP condensates are actually viscous fluids and they could coalesce into larger droplets within seconds. Due to the high viscosity, ELP condensates show slow fluorescence recovery after photobleaching. As stretching the condensates, the micro-structure of condensates changes to show a response to the outer force. The fluorophores may be pulled out from the microenvironment. For such a dynamic system, we speculate that the microstructure will return to the original after the condensation system equilibrium, which may be a long process. However, it is hard to characterize whether these microstructures have completely returned to their original positions. The purpose of this experiment is to show the microenvironment properties of each terminal in another aspect. The experiment also shows evidence that the microenvironment around the V terminus is more dense than the A terminus.

      Comment 3: The title is too generic and does not reflect the content of the work. There is no analysis of biological condensates. The results are specific to di-block polypetides with specific sequences. This should be clearly specified in text and title.

      We have revised the title to ”Microphase Separation Produces Interfacial Environment within Diblock Biomolecular Condensates”

      Comment 4: MD is out of the expertise of this reviewer. However, when looking at the density profiles (Figure S2), the simulation does not seem to be fully converged. The densities fluctuate inconsistently along the Z direction. The authors should comment on assessing simulation convergence. In many cases, the section used for the density values in the plot (i.e., below 0.06 box lengths away from the condensate center) does not seem representative of the dense phase. It should be justified, why these simulations can still be used for density/hydrogen bonding analysis.

      We thank the reviewer for their comment, and agree that convergence of MD simulations is simultaneously important and difficult to control for. To demonstrate the convergence of our simulations, we have taken an example system (V5F5) and reproduced the density profile in 4 unique time windows of 50 ns each (Author response image 7A-D). We find that all distributions are nearly identical, indicating that further extending these simulations is unlikely to change our findings.

      While we agree that the choice of 0.06 box lengths is arbitrary, it was chosen as an approximation for the interior of the condensate, where the more hydrophobic half of the protein chain tends to be at higher concentration. However, this choice is not important to our overall conclusion. Halving (Author response image 7E) or doubling (Author response image 7F) the cutoff maintains the inverse correlation between the protein density of the X5 half of the condensate and experimental transition temperature.

      Finally, in our multiscale simulation approach, the all-atom portion of the simulation is mostly used to examine water structure and protein solvation. We can see that dividing the simulation into four independent time estimates does not substantially change these properties, resulting in low standard deviations in Figure 5 and Figure 6. Similarly, our previous work on the dielectric of ELP condensates has shown that choosing different starting structures from MARTINI simulations is unlikely to effect the estimate of similar quantities.[11]

      Author response image 7.

      Checking convergence of all-atom simulations of ELP condensates. (A-D) The relative mass density along the Z-distance from the condensate center is shown for the V-substituted and X-substituted halves of V5F5 in four independent time windows of 50 ns each. The Z−axis is defined as the direction perpendicular to the condensate-water interface. The dashed line represents a Z-distance of 0.06 box lengths away from the condensate center, which was the original cutoff for correlation analysis. E-F) Correlation between the mass fraction of the X5 half of the condensate and transition temperature (Tt) from Urry.[12] The condensate is defined as having a Z-distance of 0.03 box lengths (E) or 0.12 box lengths (F) away from the condensate center. ρ is the Pearson correlation coefficient between the two data sets, and the dashed diagonal line is the best fit line. Error bars represent standard deviations of the mean taken over box length intervals of 0.01.

      References

      (1) McDaniel JR, Radford DC, Chilkoti A (2013) A unified model for de novo design of elastin-like polypeptides with tunable inverse transition temperatures. Biomacromolecules 14:2866–2872.

      ](2) Meyer DE, Chilkoti A (2004) Quantification of the effects of chain length and concentration on the thermal behavior of elastin-like polypeptides. Biomacromolecules 5:846–851.

      (3) Helfand E, Tagami Y (1972) Theory of the interface between immiscible polymers. J. Chem. Phys. 56:3592.

      (4) Roe RJ (1975) Theory of the interface between polymers or polymer solutions. I. Two components system. J. Chem. Phys. 62:490–499.

      (5) Shi AC (2021) Frustration in block copolymer assemblies. J. Phys. Condens. Matter 33.

      (6) Flory PJ (1942) Thermodynamics of high polymer solutions. J. Chem. Phys. 10:51.

      (7) Grason GM (2006) The packing of soft materials: Molecular asymmetry, geometric frustration and optimal lattices in block copolymer melts. Phys. Rep. 433:1–64.

      (8) Matsen MW, Bates FS (1996) Unifying weak- and strong-segregation block copolymer theories. Macromolecules 29:1091–1098.

      (9) Matsen MW, Schick M (1994) Stable and unstable phases of a diblock copolymer melt. Phys. Rev. Lett. 72:2660–2663.

      (10) Swann JM, Topham PD (2010) Design and application of nanoscale actuators using block-copolymers. Polymers 2:454–469.

      (11) Ye S et al. (2023) Micropolarity governs the structural organization of biomolecular condensates. Nat. Chem. Biol. pp 1–9.

      (12) Urry DW (1997) Physical chemistry of biological free energy transduction as demonstrated by elastic protein-based polymers. J. Phys. Chem. B 101:11007–11028.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review): 

      Summary: 

      LRRK2 protein is familially linked to Parkinson's disease by the presence of several gene variants that all confer a gain-of-function effect on LRRK2 kinase activity. 

      The authors examine the effects of BDNF stimulation in immortalized neuron-like cells, cultured mouse primary neurons, hIPSC-derived neurons, and synaptosome preparations from the brain. They examine an LRRK2 regulatory phosphorylation residue, LRRK2 binding relationships, and measures of synaptic structure and function. 

      Strengths: 

      The study addresses an important research question: how does a PD-linked protein interact with other proteins, and contribute to responses to a well-characterized neuronal signalling pathway involved in the regulation of synaptic function and cell health? 

      They employ a range of good models and techniques to fairly convincingly demonstrate that BDNF stimulation alters LRRK2 phosphorylation and binding to many proteins. Some effects of BDNF stimulation appear impaired in (some of the) LRRK2 knock-out scenarios (but not all). A phosphoproteomic analysis of PD mutant Knock-in mouse brain synaptosomes is included. 

      We thank this Reviewer for pointing out the strengths of our work. 

      Weaknesses: 

      The data sets are disjointed, conclusions are sweeping, and not always in line with what the data is showing. Validation of 'omics' data is very light. Some inconsistencies with the major conclusions are ignored. Several of the assays employed (western blotting especially) are likely underpowered, findings key to their interpretation are addressed in only one or other of the several models employed, and supporting observations are lacking. 

      We appreciate the Reviewer’s overall evaluaVon. In this revised version, we have provided several novel results that strengthen the omics data and the mechanisVc experiments and make the conclusions in line with the data.

      As examples to aid reader interpretation: (a) pS935 LRRK2 seems to go up at 5 minutes but goes down below pre-stimulation levels after (at times when BDNF-induced phosphorylation of other known targets remains very high). This is ignored in favour of discussion/investigation of initial increases, and the fact that BDNF does many things (which might indirectly contribute to initial but unsustained changes to pLRRK2) is not addressed.  

      We thank the Reviewer for raising this important point, which we agree deserves additional investigation. Although phosphorylation does decrease below pre-stimulation levels, a reduction is also observed for ERK/AKT upon sustained exposure to BDNF in our experimental paradigm (figure 1F-G). This phenomenon is well known in response to a number of extracellular stimuli and can be explained by mechanisms related to cellular negative feedback regulation, receptor desensitization (e.g. phosphorylation or internalization), or cellular adaptation. The effect on pSer935, however, is peculiar as phosphorylation goes below the unstimulated level, as pointed by the reviewer. In contrast to ERK and AKT whose phosphorylation is almost absent under unstimulated conditions (Figure 1F-G), the stoichiometry of Ser935 phosphorylation under unstimulated conditions is high. This observation is consistent with MS determination of relative abundance of pSer935 (e.g. in whole brain LRRK2 is nearly 100% phosphorylated at Ser935, see Nirujogi et al., Biochem J 2021).  Thus we hypothesized that the modest increase in phosphorylation driven by BDNF likely reflects a saturation or ceiling effect, indicating that the phosphorylation level is already near its maximum under resting conditions. Prolonged BDNF stimulation would bring phosphorylation down below pre-stimulation levels, through negative feedback mechanisms (e.g. phosphatase activity) explained above. To test this hypothesis, we conducted an experiment in conditions where LRRK2 is pretreated for 90 minutes with MLi-2 inhibitor, to reduce basal phosphorylation of S935. After MLi-2 washout, we stimulated with BDNF at different time points. We used GFP-LRRK2 stable lines for this experiment, since the ceiling effect was particularly evident (Figure S1A) and this model has been used for the interactomic study. As shown below (and incorporated in Fig. S1B in the manuscript), LRRK2 responds robustly to BDNF stimulation both in terms of pSer935 and pRABs. Phosphorylation peaks at 5-15 mins, while it decreases to unstimulated levels at 60 and 180 minutes. Notably, while the peak of pSer935 at 5-15 mins is similar to the untreated condition (supporting that Ser935 is nearly saturated in unstimulated conditions), the phosphorylation of RABs during this time period exceeds unstimulated levels. These findings support the notion that, under basal conditions, RAB phosphorylation is far from saturation. The antibodies used to detect RAB phosphorylation are the following: RAB10 Abcam # ab230261 e RAB8 (pan RABs) Abcam # ab230260.

      Given the robust response of RAB10 phosphorylation upon BDNF stimulation, we further investigated RAB10 phosphorylation during BDNF stimulation in naïve SH-SY5Y cells. We confirmed that the increase in pSer935 is coupled to increase in pT73-RAB10. Also in this case, RAB10 phosphorylation does not go below the unstimulated level, which aligns with the  low pRAB10 stoichiometry in brain (Nirujogi et al., Biochem J 2021). This experiment adds the novel and exciting finding that BDNF stimulation increases LRRK2 kinase activity (RAB phosphorylation) in neuronal cells. 

      Note that new supplemental figure 1 now includes: A) a comparison of LRRK2 pS935 and total protein levels before and after RA differentiation; B) differentiated GFP-LRRK2 SH-SY5Y (unstimulated, BDNF, MLi-2, BDNF+MLi-2); C) the kinetic of BDNF response in differentiated GFP-LRRK2 SH-SY5Y.

      (b) Drebrin coIP itself looks like a very strong result, as does the increase after BDNF, but this was only demonstrated with a GFP over-expression construct despite several mouse and neuron models being employed elsewhere and available for copIP of endogenous LRRK2. Also, the coIP is only demonstrated in one direction. Similarly, the decrease in drebrin levels in mice is not assessed in the other model systems, coIP wasn't done, and mRNA transcripts are not quantified (even though others were). Drebrin phosphorylation state is not examined.  

      We appreciate the Reviewer suggestions and provided additional experimental evidence supporting the functional relevance of LRRK2-drebrin interaction.

      (1) As suggested, we performed qPCR and observed that 1 month-old KO midbrain and cortex express lower levels of Dbn1 as compared to WT brains (Figure 5G). This result is in agreement with the western blot data (Figure 5H). 

      (2)To further validate the physiological relevance of LRRK2-drebrin interaction we performed two experiments:

      i) Western blots looking at pSer935 and pRab8 (pan Rab) in Dbn1 WT and knockout brains. As reported and quantified in Figure 2I, we observed a significant decrease in pSer935 and a trend decrease in pRab8 in Dbn1 KO brains. This finding supports the notion that Drebrin forms a complex with LRRK2 that is important for its activity, e.g. upon BDNF stimulation. 

      ii) Reverse co-immunoprecipitation of YFP-drebrin full-length, N-terminal domain (1-256 aa) and C-terminal domain (256-649 aa) (plasmids kindly received from Professor Phillip R. Gordon-Weeks, Worth et al., J Cell Biol, 2013) with Flag-LRRK2 co-expressed in HEK293T cells. As shown in supplementary Fig. S2C, we confirm that YFP-drebrin binds LRRK2, with the Nterminal region of drebrin appearing to be the major contributor to this interaction. This result is important as the N-terminal region contains the ADF-H (actin-depolymerising factor homology) domain and a coil-coil region known to directly bind actin (Shirao et al., J Neurochem 2017; Koganezawa et al., Mol Cell Neurosci. 2017). Interestingly, both full-length Drebrin and its truncated C-terminal construct cause the same morphological changes in Factin, indicating that Drebrin-induced morphological changes in F-actin are mediated by its N-terminal domains rather than its intrinsically disordered C-terminal region (Shirao et al., J Neurochem, 2017; Koganezawa et al., Mol Cell Neurosci. 2017). Given the role of LRRK2 in actin-cytoskeletal dynamics and its binding with multiple actin-related protein binding (Fig. 2 and Meixner et al., Mol Cell Proteomics. 2011; Parisiadou and Cai, Commun Integr Biol 2010), these results suggest the possibility that LRRK2 controls actin dynamics by competing with drebrin binding to actin and open new avenues for futures studies.

      (3) To address the request for examining drebrin phosphorylation state, we decided to perform another phophoproteomic experiment, leveraging a parallel analysis incorporated in our latest manuscript (Chen et al., Mol Theraphy 2025). In this experiment, we isolated total striatal proteins from WT and G2019S KI mice and enriched the phospho-peptides. Unlike the experiment presented in Fig. 7, phosphopeptides were enriched from total striatal lysates rather than synaptosomal fractions, and phosphorylation levels were normalized to the corresponding total protein abundance. This approach was intended to avoid bias toward synaptic proteins, allowing for the analysis of a broader pool of proteins derived from a heterogeneous ensemble of cell types (neurons, glia, endothelial cells, pericytes etc.). We were pleased to find that this new experiment confirmed drebrin S339 as a differentially phosphorylated site, with a 3.7 fold higher abundance in G2019S Lrrk2 KI mice. The fact that this experiment evidenced an increased phosphorylation stoichiometry in G2019S mice rather than a decreased is likely due to the normalization of each peptide by its corresponding total protein. Gene ontology analysis of differentially phosphorylated proteins using stringent term size (<200 genes) showed post-synaptic spines and presynaptic active zones as enriched categories (Fig. 3F). A SynGO analysis confirms both pre and postsynaptic categories, with high significance for terms related to postsynaptic cytoskeleton (Fig. 3G). As pointed, this is particularly interesting as the starting material was whole striatal tissue – not synaptosomes as previously – indicating that most significant phosphorylation differences occur in synaptic compartments. This once again reinforces our hypothesis that LRRK2 has a prominent role in the synapse. Overall, we confirmed with an independent phosphoproteomic analysis that LRRK2 kinase activity influences the phosphorylation state of proteins related to synaptic function, particularly postsynaptic cytoskeleton. For clarity in data presentation, as mentioned by the Reviewers, we removed Figure 7 and incorporated this new analysis in figure 3, alongside the synaptic cluster analysis. 

      Altogether, three independent OMICs approaches – (i) experimental LRRK2 interactomics in neuronal cells, (ii) a literature-based LRRK2 synaptic/cytoskeletal interactor cluster, and (iii) a phospho-proteomic analysis of striatal proteins from G2019S KI mice (to model LRRK2 hyperactivity) – converge to synaptic actin-cytoskeleton as a key hub of LRRK2 neuronal function.

      (c) The large differences in the CRISPR KO cells in terms of BDNF responses are not seen in the primary neurons of KO mice, suggesting that other differences between the two might be responsible, rather than the lack of LRRK2 protein. 

      Considering that some variability is expected for these type of cultures and across different species, any difference in response magnitude and kinetics could be attributed to the levels of TrKB  and downstream components expressed by the two cell types. 

      We are confident that differentiated SH-SY5Y cells provide a reliable model for our study as we could translate the results obtained in SH-SY5Y cells in other models. However, to rule out the possibility that the more pronounced effect observed in SH-SY5Y KO cells as respect to Lrrk2 KO primary neurons was due to CRISPR off-target effect, we performed an off-target analysis. Specifically, we selected the first 8 putative off targets exhibiting a CDF (Cutting Frequency Determination) off-target-score >0.2. 

      As shown in supplemental file 1, sequence disruption was observed only in the LRRK2 ontarget site in LRRK2 KO SH-SY5Y cells, while the 8 off-target regions remained unchanged across the genotypes and relative to the reference sequence. 

      (d) No validation of hits in the G2019S mutant phosphoproteomics, and no other assays related to the rest of the paper/conclusions. Drebrin phosphorylation is different but unvalidated, or related to previous data sets beyond some discussion. The fact that LRRK2 binding occurs, and increases with BDNF stimulation, should be compared to its phosphorylation status and the effects of the G2019S mutation. 

      As illustrated in the response to point (b), we performed a new phosphoproteomics investigation – with total striatal lysates instead of striatal synaptosomes and normalization phospho-peptides over total proteins – and found that S339 phosphorylation increases when LRRK2 kinase activity increases (G2019S). To address the request of validating drebrin phosphorylation, the main limitation is that there are no available antibodies against Ser339. While we tried phos-Tag gels in striatal lysates, we could not detect any reliable and specific signal with the same drebrin antibody used for western blot (Thermo Fisher Scientific: MA120377) due to technical limitations of the phosTag method. We are confident that phosphorylation at S339 has a physiological relevance, as it was identified 67 times across multiple proteomic discovery studies and they are placed among the most frequently phosphorylated sites in drebrin (https://www.phosphosite.org/proteinAction.action?id=2675&showAllSites=true).

      To infer a possible role of this phosphorylation, we looked at the predicted pathogenicity of using AlphaMissense (Cheng et al., Science 2023). included as supplementary figure (Fig. S3), aminoacid substitutions within this site are predicted not to be pathogenic, also due to the low confidence of the AlphaFold structure. 

      Ser339 in human drebrin is located just before the proline-rich region (PP domain) of the protein. This region is situated between the actin-binding domains and the C-terminal Homerbinding sequences and plays a role in protein-protein interactions and cytoskeletal regulation (Worth et al., J Cell Biol, 2013). Of interest, this region was previously shown to be the interaction site of adafin (ADFN), a protein involved in multiple cytoskeletal-related processes, including synapse formation and function by regulating puncta adherentia junctions, presynaptic differentiation, and cadherin complex assembly, which are essential for hippocampal excitatory synapses, spine formation, and learning and memory processes (Beaudoin, G. M., 3rd et al., J Neurosci, 2013). Of note, adafin is in the list of LRRK2 interacting proteins (https://www.ebi.ac.uk/intact/home), supporting a possible functional relevance of LRRK2-mediated drebrin phosphorylation in adafin-drebrin complex formation. This has been discussed in the discussion section.

      The aim of this MS analysis in G2019S KI mice – now included in figure 3 – was to further validate the crucial role of LRRK2 kinase activity in the context of synaptic regulation, rather than to discover and characterize novel substrates. Consequently, Figure 7 has been eliminated. 

      Reviewer #2 (Public Review):  

      Taken as a whole, the data in the manuscript show that BDNF can regulate PD-associated kinase LRRK2 and that LRRK2 modifies the BDNF response. The chief strength is that the data provide a potential focal point for multiple observations across many labs. Since LRRK2 has emerged as a protein that is likely to be part of the pathology in both sporadic and LRRK2 PD, the findings will be of broad interest. At the same time, the data used to imply a causal throughline from BDNF to LRRK2 to synaptic function and actin cytoskeleton (as in the title) are mostly correlative and the presentation often extends beyond the data. This introduces unnecessary confusion. There are also many methodological details that are lacking or difficult to find. These issues can be addressed. 

      We appreciate the Reviewer’s positive feedback on our study. We also value the suggestion to present the data in a more streamlined and coherent way. In response, we have updated the title to better reflect our overall findings: “LRRK2 Regulates Synaptic Function through Modulation of Actin Cytoskeletal Dynamics.” Additionally, we have included several experiments that we believe enhance and unify the study.

      (1) The writing/interpretation gets ahead of the data in places and this was confusing. For example, the abstract highlights prior work showing that Ser935 LRRK2 phosphorylation changes LRRK2 localization, and Figure 1 shows that BDNF rapidly increases LRRK2 phosphorylation at this site. Subsequent figures highlight effects at synapses or with synaptic proteins. So is the assumption that LRRK2 is recruited to (or away from) synapses in response to BDNF? Figure 2H shows that LRRK2-drebrin interactions are enhanced in response to BDNF in retinoic acid-treated SH-SY5Y cells, but are synapses generated in these preps? How similar are these preps to the mouse and human cortical or mouse striatal neurons discussed in other parts of the paper (would it be anticipated that BDNF act similarly?) and how valid are SHSY5Y cells as a model for identifying synaptic proteins? Is drebrin localization to synapses (or its presence in synaptosomes) modified by BDNF treatment +/- LRRK2? Or do LRRK2 levels in synaptosomes change in response to BDNF? The presentation requires re-writing to stay within the constraints of the data or additional data should be added to more completely back up the logic. 

      We thank the Reviewer for the thorough suggestions and comments. We have extensively revised the text to accurately reflect our findings without overinterpreting. In particular, we agree with the Reviewer that differentiated SH-SY5Y cells are not  identical to primary mouse or human neurons; however both neuronal models respond to BDNF. Supporting our observations, it is known that SH-SY5Y cells respond to BDNF.  In fact, a common protocol for differentiating SH-SY5Y cells involve BDNF in combination with retinoic acid (Martin et al., Front Pharmacol, 2022; Kovalevich et al., Methods in mol bio, 2013). Additionally, it has been reported that SH-SY5Y cells can form functional synapses (Martin et al., Front Pharmacol, 2022). While we are aware that BDNF, drebrin or LRRK2 can also affect non-synaptic pathways, we focused on synapses when moved to mouse models since: (i) MS and phosphoMS identified several cytoskeletal proteins enriched at the synapse, (ii) we and others have previously reported a role for LRRK2 in governing synaptic and cytoskeletal related processes; (iii) the synapse is a critical site that becomes dysfunctional in the early  stages of PD. We have now clarified and adjusted the text as needed. We have also performed additional experiments to address the Reviewer’s concern:

      (1) “Is the assumption that LRRK2 is recruited to (or away from) synapses in response to BDNF”? This is a very important point. There is consensus in the field that detecting endogenous LRRK2 in brain slices or in primary neurons via immunofluorescence is very challenging with the commercially available  antibodies (Fernandez et al., J Parkinsons Dis, 2022). We established a method in our previous studies to detect LRRK2 biochemically in synaptosomes (Cirnaru et al., Front Mol Neurosci, 2014; Belluzzi et al., Mol Neurodegener., 2016). While these data indicate LRRK2 is present in the synaptic compartments, it would be quite challenging to apply this method to the present study. In fact, applying acute BDNF stimulation in vivo and then isolate synaptosomes is a complex experiment beyond the timeframe of the revision due to the need of mouse ethical approvals. However, this is definitely an intriguing angle to explore in the future.

      (2)“Is drebrin localization to synapses (or its presence in synaptosomes) modified by BDNF treatment +/- LRRK2?” To try and address this question, we adapted a previously published assay to measure drebrin exodus from dendritic spines. During calcium entry and LTP, drebrin exits dendritic spines and accumulates in the dendritic shafts and cell body (Koganezawa et al., 2017). This facilitates the reorganization of the actin cytoskeleton (Shirao et al., 2017). Given the known role of drebrin and its interaction with LRRK2, we hypothesized that LRRK2 loss might affect drebrin relocalization during spine maturation.

      To test this, we treated DIV14 primary cortical neurons from Lrrk2 WT and KO mice with BDNF for 5, 15, and 24 hours, then performed confocal imaging of drebrin localization (Author response image 1). Neurons were transfected at DIV4 with GFP (cell filler) and PSD95 (dendritic spines) for visualization, and endogenous drebrin was stained with an anti-drebrin antibody. We then measured drebrin's overlap with PSD95-positive puncta to track its localization at the spine.

      In Lrrk2 WT neurons, drebrin relocalized from spines after BDNF stimulation, peaking at 15 minutes and showing higher co-localization with PSD95 at 24 hours, indicating the spine remodeling occurred. In contrast, Lrrk2 KO neurons showed no drebrin exodus. These findings support the notion that LRRK2's interaction with drebrin is important for spine remodeling via BDNF. However, additional experiments with larger sample sizes are needed, which were not feasible within the revision timeframe (here n=2 experiments with independent neuronal preparations, n=4-7 neurons analyzed per experiment). Thus, we included the relevant figure as Author response image 1 but chose not to add it in the manuscript (figure 3).

      Author response image 1.

      Lrrk2 affects drebrin exodus from dendritic spines. After the exposure to BDNF for different times (5 minutes, 15 minutes and 24 hours), primary neurons from Lrrk2 WT and KO mice have been transfected with GFP and PSD95 and stained for endogenous drebrin at DIV4. The amount of drebrin localizing in dentritic spines outlined by PSD95 has been assessed at DIV14. The graph shows a pronounced decrease in drebrin content in WT neurons during short time treatments and an increase after 24 hours. KO neurons present no evident variations in drebrin localization upon BDNF stimulation. Scale bar: 4 μm.<br />

      (2) The experiments make use of multiple different kinds of preps. This makes it difficult at times to follow and interpret some of the experiments, and it would be of great benefit to more assertively insert "mouse" or "human" and cell type (cortical, glutamatergic, striatal, gabaergic) etc. 

      We thank the Reviewer for pointing this out. We have now more clearly specified the cell type and species identity throughout the text to improve clarity and interpretation.

      (3) Although BDNF induces quantitatively lower levels of ERK or Akt phosphorylation in LRRK2KO preps based on the graphs (Figure 4B, D), the western blot data in Figure 4C make clear that BDNF does not need LRRK2 to mediate either ERK or Akt activation in mouse cortical neurons and in 4A, ERK in SH-SY5Y cells. The presentation of the data in the results (and echoed in the discussion) writes of a "remarkably weaker response". The data in the blots demand more nuance. It seems that LRRK2 may potentiate a response to BDNF that in neurons is independent of LRRK2 kinase activity (as noted). This is more of a point of interpretation, but the words do not match the images.  

      We thank the Reviewer for pointing this out. We have rephrased our data  presentation to better convey  our findings. We were not surprised to find that loss of LRRK2 causes only a reduction of ERK and AKT activation upon BDNF rather than a complete loss. This is because these pathways are complex and redundant and are activated by a number of cellular effectors. The fact that LRRK2 is one among many players whose function can be compensated by other signaling molecules is also supported by the phenotype of Lrrk2 KO mice that is measurable at 1 month but disappears with adulthood (4 and 18 months) (figure 5).

      Moreover, we removed the sentence “Of note, 90 mins of Lrrk2 inhibition (MLi-2) prior to BDNF stimulation did not prevent phosphorylation of Akt and Erk1/2, suggesting that LRRK2 participates in BDNF-induced phosphorylation of Akt and Erk1/2 independently from its kinase activity but dependently from its ability to be phosphorylated at Ser935 (Fig. 4C-D and Fig. 1B-C)” since the MLi-2 treatment prior to BDNF stimulation was not quantified and our new data point to an involvement of LRRK2 kinase activity upon BDNF stimulation.

      (4) Figure 4F/G shows an increase in PSD95 puncta per unit length in response to BDNF in mouse cortical neurons. The data do not show spine induction/dendritic spine density/or spine morphogenesis as suggested in the accompanying text (page 8). Since the neurons are filled/express gfp, spine density could be added or spines having PSD95 puncta. However, the data as reported would be expected to reflect spine and shaft PSDs and could also include some nonsynaptic sites. 

      The Reviewer is right. We have rephrased the text to reflect an increase in postsynaptic density (PSD) sites, which may include both spine and shaft PSDs, as well as potential nonsynaptic sites.

      (5) Experimental details are missing that are needed to fully interpret the data. There are no electron microscopy methods outside of the figure legend. And for this and most other microscopy-based data, there are few to no descriptions of what cells/sites were sampled, how many sites were sampled, and how regions/cells were chosen. For some experiments (like Figure 5D), some detail is provided in the legend (20 segments from each mouse), but it is not clear how many neurons this represents, where in the striatum these neurons reside, etc. For confocal z-stacks, how thick are the optical sections and how thick is the stack? The methods suggest that data were analyzed as collapsed projections, but they cite Imaris, which usually uses volumes, so this is confusing. The guide (sgRNA) sequences that were used should be included. There is no mention of sex as a biological variable. 

      We thank the Reviewer for pointing out this missing information. We have now included:

      (1) EM methods (page 24)

      (2) Methods for ICC and confocal microscopy now incorporates the Z-stack thickness (0.5 μm x 6 = 3 μm) on page 23.

      (3) Methods for Golgi-Cox staining now incorporates the Z-stack thickness and number of neurons and segments per neuron analyzed. 

      (4) The sex of mice is mentioned in the material and methods (page 17): “Approximately equal numbers of males and females were used for every experiment”.

      (6) For Figures 1F, G, and E, how many experimental replicates are represented by blots that are shown? Graphs/statistics could be added to the supplement. For 1C and 1I, the ANOVA p-value should be added in the legend (in addition to the post hoc value provided). 

      The blots relative to figure 1F,G and E are representative of several blots (at least n=5). The same redouts are part of figure 4 where quantifications are provided. We added the ANOVA p-value in the legend for figure 1C, 1I and 1K.

      (7) Why choose 15 minutes of BDNF exposure for the mass spec experiments when the kinetics in Figure 1 show a peak at 5 mins?  

      This is an important point. We repeated the experiment in GFP-LRRK2 SH-SY5Y cells (figure S1C) and included the 15 min time point. In addition to confirming that pSer935 increases similarly at 5 and 15 minutes, we also observed an increase in RAB phosphorylation at these time points. As mentioned in our response to Reviewer’s 1, we pretreated with MLi-2 for 90 minutes in this experiment to reduce the high basal phosphorylation stoichiometry of pSer935. 

      (8) The schematic in Figure 6A suggests that iPSCs were plated, differentiated, and cultured until about day 70 when they were used for recordings. But the methods suggest they were differentiated and then cryopreserved at day 30, and then replated and cultured for 40 more days. Please clarify if day 70 reflects time after re-plating (30+70) or total time in culture (70). If the latter, please add some notes about re-differentiation, etc. 

      We thank the reviewer for providing further clarity on the iPSC methodology. In the submitted manuscript 70DIV represents the total time in vitro and the process involved a cryostorage event at 30DIV, with a thaw of the cells and a further 40 days of maturation before measurement.  We have adjusted the methods in both the text and figure (new schematic) to clarify this.  The cryopreservation step has been used in other iPSC methods to great effect (Drummond et al., Front Cell Dev Biol, 2020). Due to the complexity and length of the iPSC neuronal differentiation process, cryopreservation represents a useful method with which to shorten and enhance the ability to repeat experiments and reduce considerable variation between differentiations. User defined differences in culture conditions for each batch of neurons thawed can usefully be treated as a new and separate N compared to the next batch of neurons.

      (9) When Figures 6B and 6C are compared it appears that mEPSC frequency may increase earlier in the LRRK2KO preps than in the WT preps since the values appear to be similar to WT + BDNF. In this light, BDNF treatment may have reached a ceiling in the LRRK2KO neurons.

      We thank the reviewer for his/her comment and observations about the ceiling effects. It is indeed possible that the loss of LRRK2 and the application of BDNF could cause the same elevation in synaptic neurotransmission. In such a situation, the increased activity as a result of BDNF treatment would be masked by the increased activity  observed as a result of LRRK2 KO. To better visualize the difference between WT and KO cultures and the possible ceiling effect, we merged the data in one single graph.  

      (10) Schematic data in Figures 5A and C and Figures 5B and E are too small to read/see the data. 

      We thank the Reviewer for this suggestion. We have now enlarged figure 5A and moved the graph of figure 5D in supplemental figure S5, since this analysis of spine morphology is secondary to the one shown in figure 5C.

      Reviewer #1 (Recommendations For The Authors): 

      Please forgive any redundancy in the comments, I wanted to provide the authors with as much information as I had to explain my opinion. 

      Primary mouse cortical neurons at div14, 20% transient increase in S935 pLRRK2 5min after BDNF, which then declines by 30 minutes (below pre-stim levels, and maybe LRRK2 protein levels do also). 

      In differentiated SHSY5Y cells there is a large expected increase in pERK and pAKT that is sustained way above pre-stim for 60 minutes. There is a 50% initial increase in pLRRK2 (but the blot is not very clear and no double band in these cells), which then looks like reduced well below pre-stim by 30 & 60 minutes. 

      We thank the Reviewer for bring up this important point. We have extensively addressed this issue in the public review rebuttal. In essence, the phosphorylation of Ser935 is near saturation under unstimulated conditions, as evidenced by its high basal stoichiometry, whereas Rab phosphorylation is far from saturation, showing an increase upon BDNF stimulation before returning to baseline levels. This distinction highlights that while pSer935 exhibits a ceiling effect due to its near-maximal phosphorylation at rest, pRab responds dynamically to BDNF, indicating low basal phosphorylation and a significant capacity for increase. Figure 1 in the rebuttal summarizes the new data collected. 

      GFP-fused overexpressed LRRK2 coIPs with drebrin, and this is double following 15 min BDNF. Strong result.

      We thank the Reviewer.

      BDNF-induced pAKT signaling is greatly impaired, and pERK is somewhat impaired, in CRISPR LKO SHSY5Y cells. In mouse primaries, both AKT and Erk phosph is robustly increased and sustained over 60 minutes in WT and LKO. This might be initially less in LKO for Akt (hard to argue on a WB n of 3 with huge WT variability), regardless they are all roughly the same by 60 minutes and even look higher in LKO at 60. This seems like a big disconnect and suggests the impairment in the SHSy5Y cells might have more to do with the CRISPR process than the LRRK2. Were the cells sequenced for off-target CRISPR-induced modifications?  

      Following the Reviewer suggestion – and as discussed in the public review section - we performed an off-target analysis. Specifically, we selected the first 8 putative off targets exhibiting a CDF (Cutting Frequency Determination) off-target-score >0.2. As shown in supplemental file 1, sequence disruption was observed only in the LRRK2 on-target site in LRRK2 KO SH-SY5Y cells, while the 8 off-target regions remained unchanged across the genotypes and relative to the reference sequence.  

      No difference in the density of large PSD-95 puncta in dendrites of LKO primary relative to WT, and the small (10%) increase seen in WT after BDNF might be absent in LKO (it is not clear to me that this is absent in every culture rep, and the data is not highly convincing). This is also referred to as spinogenesis, which has not been quantified. Why not is confusing as they did use a GFP fill... 

      The Reviewer is right that spinogenesis is not the appropriate term for the process analyzed. We replaced “spinogenesis” with “morphological alternation of dendritic protrusions” or “synapse maturation” which is correlated with the number of PSD95 positive puncta (ElHusseini et al., Science, 2000) . 

      There is a difference in the percentage of dendritic protrusions classified as filopodia to more being classified as thin spines in LKO striatal neurons at 1 month, which is not seen at any other age, The WT filopodia seems to drop and thin spine percent rise to be similar to LKO at 4 months. This is taken as evidence for delayed maturation in LKO, but the data suggest the opposite. These authors previously published decreased spine and increased filopodia density at P15 in LKO. Now they show that filopodia density is decreased and thin spine density increased at one month. How is that shift from increased to decreased filopodia density in LKO (faster than WT from a larger initial point) evidence of impaired maturation? Again this seems accelerated? 

      We agree with the Reviewer that the initial interpretation was indeed confusing. To adhere closely to our data and avoid overinterpretation – as also suggested by Reviewer 2 – we revised  the text and moved figure 5D to supplementary materials. In essence, our data point out to alterations in the structural properties of dendritic protrusions in young KO mice, specifically a reduction in  their size (head width and neck height) and a decrease in postsynaptic density (PSD) length, as observed with TEM. These findings suggest that LRRK2 is involved in morphological processes during spine development. 

      Shank3 and PSD95 mRNA transcript levels were reduced in the LKO midbrain, only shank3 was reduced in the striatum and only PSD was reduced in the cortex. No changes to mRNA of BDNF-related transcripts. None of these mRNA changes protein-validated. Drebrin protein (where is drebrin mRNA?) levels are reduced in LKO at 1&4 but not clearly at 18 months (seems the most robust result but doesn't correlate with other measures, which here is basically a transient increase (1m) in thin striatal spines).  

      As illustrated before, we performed qPCR for Dbn1 and found that its expression is significantly reduced in the cortex and midbrain and non-significantly reduced in the striatum (1 months old mice, a different cohort as those used for the other analysis in figure 5).  

      24h BDNF increases the frequency of mEPSCs on hIPSC-derived cortical-like neurons, but not LKO, which is already high. There are no details of synapse number or anything for these cultures and compares 24h treatment. BDNF increases mEPSC frequency within minutes PMC3397209, and acute application while recording on cells may be much more informative (effects of BDNF directly, and no issues with cell-cell / culture variability). Calling mEPSC "spontaneous electrical activity" is not standard.  

      We thank the reviewer for this point. We provided information about synapse number (Bassoon/Homer colocalization) in supplementary figure S7. The lack of response of LRRK2 KO cultures in terms of mEPSC is likely due to increase release probability as the number of synapses does not change between the two genotypes. 

      The pattern of LRRK2 activation is very disconnected from that of BDNF signalling onto other kinases. Regarding pLRRK2, s935 is a non-autophosph site said to be required for LRRK2 enzymatic activity, that is mostly used in the field as a readout of successful LRRK2 inhibition, with some evidence that this site regulates LRRK2 subcellular localization (which might be more to do with whether or not it is p at 935 and therefor able to act as a kinase). 

      The authors imply BDNF is activating LRRK2, but really should have looked at other sites, such as the autophospho site 1292 and 'known' LRRK2 substrates like T73 pRab10 (or other e.g., pRab12) as evidence of LRRK2 activation. One can easily argue that the initial increase in pLRRK2 at this site is less consequential than the observation that BDNF silences LRRK2 activity based on p935 being sustained to being reduced after 5 minutes, and well below the prestim levels... not that BDNF activates LRRK2. 

      As described above, we have collected new data showing that BDNF stimulation increases LRRK2 kinase activity toward its physiological substrates Rab10 and Rab8 (using a panphospho-Rab antibody) (Figure 1 and Figure S1). Additionally, we have also extensively commented the ceiling effect of pS935.

      BDNF does a LOT. What happens to network activity in the neural cultures with BDNF application? Should go up immediately. Would increasing neural activity (i.e., through depolarization, forskolin, disinhibition, or something else without BDNF) give a similar 20% increase in pS935 LRRK2? Can this be additive, or occluded? This would have major implications for the conclusions that BDNF and pLRRK2 are tightly linked (as the title suggests).  

      These are very valuable observations; however, they fall outside the scope and timeframe of this study. We agree that future research should focus on gaining a deeper mechanistic understanding of how LRRK2 regulates synaptic activity, including vesicle release probability and postsynaptic spine maturation, independently of BDNF.

      Figures 1A & H "Western blot analysis revealed a rapid (5 mins) and transient increase of Ser935 phosphorylation after BDNF treatment (Fig. 1B and 1C). Of interest, BDNF failed to stimulate Ser935 phosphorylation when neurons were pretreated with the LRRK2 inhibitor MLi-2" . The first thing that stands out is that the pLRRK2 in WB is not very clear at all (although we appreciate it is 'a pig' to work with, I'd hope some replicates are clearer); besides that, the 20% increase only at 5min post-BDNF stimulation seems like a much less profound change than the reduction from base at 60 and more at 180 minutes (where total LRRK2 protein is also going down?). That the blot at 60 minutes in H is representative of a 30% reduction seems off... makes me wonder about the background subtraction in quantification (for this there is much less pLRRK2 and more total LRRK2 than at 0 or 5). LRRK2 (especially) and pLRRK2 seem very sketchy in H. Also, total LRRK2 appears to increase in the SHSY5Y cell not the neurons, and this seems even clearer in 2 H. 

      To better visualize the dynamics of pS935 variation relative to time=0, we presented the data as the difference between t=0 and t=x. It clearly shows that pSe935 goes below prestimulation levels, whereas pRab10 does not. The large difference in the initial stoichiometry of these two phosphorylation is extensively discussed above.

      That MLi2 eliminates pLRRK2 (and seems to reduce LRRK2 protein?) isn't surprising, but a 90min pretreatment with MLi-2 should be compared to MLi-2's vehicle alone (MLi-2 is notoriously insoluble and the majority of diluents have bioactive effects like changing activity)... especially if concluding increased pLRRK2 in response to BDNF is a crucial point (when comparing against effects on other protein modifications such as pAKT). This highlights a second point... the changes to pERK and pAKT are huge following BDNF (nothing to massive quantities), whereas pLRRK2 increases are 20-50% at best. This suggests a very modest effect of BDNF on LRRK in neurons, compared to the other kinases. I worry this might be less consequential than claimed. Change in S1 is also unlikely to be significant... 

      These comments have been thoroughly addressed in the previous responses. Regarding fig. S1, we added an additional experiment (Figure S1C) in GFP-LRRK2 cells showing robust activation of LRRK2 (pS935, pRabs) at the timepoint of MS (15 min).

      "As the yields of endogenous LRRK2 purification were insufficient for AP-MS/MS analysis, we generated polyclonal SH-SY5Y cells stably expressing GFP-LRRK2 wild-type or GFP control (Supplementary Fig. 1)" . I am concerned that much is being assumed regarding 'synaptic function' from SHSY5Y cells... also overexpressing GFP-LRRK2 and looking at its binding after BDNF isn't synaptic function.  

      We appreciate the reviewer’s comment. We would like to clarify that the interactors enriched upon BDNF stimulation predominantly fall into semantic categories related to the synapse and actin cytoskeleton. While this does not imply that these interactors are exclusively synaptic, it suggests that this tightly interconnected network likely plays a role in synaptic function. This interpretation is supported by several lines of evidence: (1) previous studies have demonstrated the relevance of this compartment to LRRK2 function; (2) our new phosphoproteomics data from striatal lysate highlight enrichment of synaptic categories; and (3) analysis of the latest GWAS gene list (134 genes) also indicates significant enrichment of synapse-related categories. Taken together, these findings justify further investigation into the role of LRRK2 in synaptic biology, as discussed extensively in the manuscript’s discussion section.

      Figure 2A isn't alluded to in text and supplemental table 1 isn't about LRRK2 binding, but mEPSCs. 

      We have added Figure 2A and added supplementary .xls table 1, which refers to the excel list of genes with modulated interaction upon BDNF (uploaded in the supplemental material).

      We added the extension .xls also for supplementary table 2 and 3. 

      Figure 2A is useless without some hits being named, and the donut plots in B add nothing beyond a statement that "35% of 'genes' (shouldn't this be proteins?) among the total 207 LRRK2 interactors were SynGO annotated" might as well [just] be the sentence in the text. 

      We have now included the names of the most significant hits, including cytoskeletal and translation-related proteins, as well as known LRRK2 interactors. We decided to retain the donut plots, as we believe they simplify data interpretation for the reader, reducing the need to jump back and forth between the figures and the text.

      Validation of drebrin binding in 2H is great... although only one of 8 named hits; could be increased to include some of the others. A concern alludes to my previous point... there is no appreciable LRRK2 in these cells until GFP-LRRK2 is overexpressed; is this addressed in the MS? Conclusions would be much stronger if bidirectional coIP of these binding candidates were shown with endogenous (GFP-ve) LRRK2 (primaries or hIPSCs, brain tissue?) 

      To address the Reviewer’s concerns to the best of our abilities, we have added a blot in Supplemental figure S1A showing how the expression levels of LRRK2 increase after RA differentiation. Moreover, we have included several new data further strengthening the functional link between LRRK2 and drebrin, including qPCR of Dbn1 in one-month old Lrrk2 KO brains, western blots of Lrrk2 and Rab in Dbn1 KO brains, and co-IP with drebrin N- and Cterm domains. 

      Figures 3 A-C are not informative beyond the text and D could be useful if proteins were annotated. 

      To avoid overcrowding, proteins were annotated in A and the same network structure reported for synaptic and actin-related interactors. 

      Figure 4. Is this now endogenous LRRK2 in the SHSY5Y cells? Again not much LRRK2 though, and no pLRRK shown. 

      We confirm that these are naïve SH-SY5Y cells differentiated with RA and LRRK2 is endogenous. We did not assess pS935 in this experiment, as the primary goal was to evaluate pAKT and pERK1/2 levels. To avoid signal saturation, we loaded less total protein (30 µg instead of the 80 µg typically required to detect pS935). pS935 levels were extensively assessed in Figure 1. This experimental detail has now been added in the material and methods section (page 18).

      In C (primary neurons) There is very little increase in pLRRK2 / LRRK2 at 5 mins, and any is much less profound a change than the reduction at 30 & 60 mins. I think this is interesting and may be a more substantial consequence of BDNF treatment than the small early increase. Any 5 min increase is gone by 30 and pLRRK2 is reduced after. This is a disconnect from the timing of all the other pProteins in this assay, yet pLRRK2 is supposed to be regulating the 'synaptic effects'? 

      The first part of the question has already been extensively addressed. Regarding the timing, one possibility is that LRRK2 is activated upstream of AKT and ERK1/2, a hypothesis supported by the reduced activation of AKT and ERK1/2 observed in LRRK2 KO cells, as discussed in the manuscript, and in MLi-2 treated cells (Author response image 2). Concerning the synaptic effects, it is well established that synaptic structural and functional plasticity occurs downstream of receptor activation and kinase signaling cascades. These changes can be mediated by both rapid mechanisms (e.g., mobilization of receptor-containing endosomes via the actin cytoskeleton) and slower processes involving gene transcription of immediate early genes (IEGs). Since structural and functional changes at the synapse generally manifest several hours after stimulation, we typically assessed synaptic activity and structure 24 hours post-stimulation.

      Akt Erk1&2 both go up rapidly after BDNF in WT, although Akt seems to come down with pLRRK2. If they aren't all the same Akt is probably the most different between LKO and WT but I am very concerned about an n=3 for wb, wb is semi-quantitative at best, and many more than three replicates should be assessed, especially if the argument is that the increases are quantitively different between WT v KO (huge variability in WT makes me think if this were done 10x it would all look same). Moreover, this isn't similar to the LKO primaries  "pulled pups" pooled presumably. 

      Despite some variability in the magnitude of the pAKT/pERK response in naïve SH-SY5Y cells, all three independent replicates consistently showed a reduced response in LRRK2 KO cells, yielding a highly significant result in the two-way ANOVA test. In contrast, the difference in response magnitude between WT and LRRK2 KO primary cultures was less pronounced, which justified repeating the experiments with n=9 replicates. We hope the Reviewer acknowledges the inherent variability often observed in western blot experiments, particularly when performed in a fully independent manner (different cultures and stimulations, independent blots).

      To further strengthen the conclusion that this effect is reproducible and dependent on LRRK2 kinase activity upstream of AKT and ERK, we probed the membranes in figure 1H with pAKT/total AKT and pERK/total ERK. All things considered and consistent with our hypothesis, MLi-2 significantly reduced BDNF-mediated AKT and ERK1/2 phosphorylation levels (Author response image 2). 

      Author response image 2.

      Western blot (same experiments as in figure 1) was performed using antibodies against phospho-Thr202/185 ERK1/2, total ERK1/2 and phospho-Ser473 AKT, total AKT protein levels Retinoic acid-differentiated SH-SY5Y cells stimulated with 100 ng/mL BDNF for 0, 5, 30, 60 mins. MLi-2 was used at 500 nM for 90 mins to inhibit LRRK2 kinase activity.

      G lack of KO effect seems to be skewed from one culture in the plot (grey). The scatter makes it hard to read, perhaps display the culture mean +/- BDNF with paired bars. The fact that one replicate may be changing things is suggested by the weirdly significant treatment effect and no genotype effect. Also, these are GFP-filled cells, the dendritic masks should be shown/explained, and I'm very surprised no one counted the number (or type?) of protrusions, especially as the text describes this assay (incorrectly) as spinogenesis... 

      As suggested by the Reviewer we have replotted the results as bar graphs. Regarding the number of protrusions, we initially counted the number of GFP+ puncta in the WT and did not find any difference (Author response image 3). Due to our imaging setup (confocal microscopy rather than super-resolution imaging and Imaris 3D reconstruction), we were unable to perform a fine morphometric analysis. However, this was not entirely unexpected, as BDNF is known to promote both the formation and maturation of dendritic spines. Therefore, we focused on quantifying PSD95+ puncta as a readout of mature postsynaptic compartments. While we acknowledge that we cannot definitively conclude that each PSD95+ punctum is synaptically connected to a presynaptic terminal, the data do indicate an increase in the number of PSD95+ structures following BDNF stimulation.

      Author response image 3.

      GFP+ puncta per unit of neurite length (µm) in DIV14 WT primary neurons untreated or upon 24 hour of BDNF treatment (100 ng/ml). No significant difference were observed (n=3).

      Figure 5. "Dendritic spine maturation is delayed in Lrrk2 knockout mice". The only significant change is at 1 month in KO which shows fewer filopodia and increased thin spines (50% vs wt). At 4 months the % of thin spines is increased to 60% in both... Filopodia also look like 4m in KO at 1m... How is that evidence for delayed maturation? If anything it suggests the KO spines are maturing faster. "the average neck height was 15% shorter and the average head width was 27% smaller, meaning that spines are smaller in Lrrk2 KO brains" - it seems odd to say this before saying that actually there are just MORE thin spines, the number of mature "mushroom' is same throughout, and the different percentage of thin comes from fewer filopodia. This central argument that maturation is delayed is not supported and could be backwards, at least according to this data. Similarly, the average PSD length is likely impacted by a preponderance of thin spines in KO... which if mature were fewer would make sense to say delayed KO maturation, but this isn't the case, it is the fewer filopodia (with no PSD) that change the numbers. See previous comments of the preceding manuscript. 

      We agree that thin spines, while often considered more immature, represent an intermediate stage in spine development. The data showing an increase in thin spines at 1 month in the KO mice, along with fewer filopodia, could suggest a faster stabilization of these spines, which might indeed be indicative of premature maturation rather than delayed maturation. This change in spine morphology may indicate that the dynamics of synaptic plasticity are affected. Regarding the PSD length, as the Reviewer pointed out, the increased presence of thin spines in KO might account for the observed changes in PSD measurements, as thin spines typically have smaller PSDs. This further reinforces the idea that the overall maturation process may be altered in the KO, but not necessarily delayed. 

      We rephrase the interpretation of these data, and moved figure 5D as supplemental figure S4.

      "To establish whether loss of Lrrk2 in young mice causes a reduction in dendritic spines size by influencing BDNF-TrkB expression" - there is no evidence of this.  

      We agree and reorganized the text, removing this sentence.  

      Shank and PSD95 mRNA changes being shown without protein adds very little. Why is drebrin RNA not shown? Also should be several housekeeping RNAs, not one (RPL27)? 

      We measured Dbn1 mRNA, which shows a significant reduction in midbrain and cortex. Moreover we have now normalized the transcript levels against the geometrical means of three housekeeping genes (RPL27, actin, and GAPDH) relative abundance.

      Drebrin levels being lower in KO seems to be the strongest result of the paper so far (shame no pLRRK2 or coIP of drebrin to back up the argument). DrebrinA KO mice have normal spines, what about haploinsufficient drebrin mice (LKO seem to have half derbrin, but only as youngsters?)  

      As extensively explained in the public review, we used Dbn1 KO mouse brains and were able to show reduced Lrrk2 activity.

      Figure 6. hIPSC-derived cortical neurons. The WT 'cortical' neurons have a very low mEPSC frequency at 0.2Hz relative to KO. Is this because they are more or less mature? What is the EPSC frequency of these cells at 30 and 90 days for comparison? Also, it is very very hard to infer anything about mEPSC frequency in the absence of estimates of cell number and more importantly synapse number. Furthermore, where are the details of cell measures such as capacitance, resistance, and quality control e.g., Ra? Table s1 seems redundant here, besides suggesting that the amplitude is higher in KO at base. 

      We agree that the developmental trajectory of iPSC-derived neurons is critical to accurately interpreting synaptic function and plasticity. In response, we have included additional data now presented in the supplementary figure S7 and summarize key findings below:

      At DIV50, both WT and LRRK2 KO neurons exhibit low basal mEPSC activity (~0.5 Hz) and no response to 24 h BDNF stimulation (50 ng/mL).

      At DIV70 WT neurons show very low basal activity (~0.2 Hz), which increases ~7.5-fold upon BDNF treatment (1.5 Hz; p < 0.001), and no change in synapse number. KO neurons display elevated basal activity (~1 Hz) similar to BDNF-treated WT neurons, with no further increase upon BDNF exposure (~1.3 Hz) and no change in synapse number.

      At DIV90, no significant effect of BDNF in both WT and KO, indicating a possible saturation of plastic responses. The lack of BDNF response at DIV90 may be due to endogenous BDNF production or culture-based saturation effects. While these factors warrant further investigation (e.g., ELISA, co-culture systems), they do not confound the key conclusions regarding the role of LRRK2 in synaptic development and plasticity:

      LRRK2 Enables BDNF-Responsive Synaptic Plasticity. In WT neurons, BDNF induces a significant increase in neurotransmitter release (mEPSC frequency) with no reduction in synapse number. This dissociation suggests BDNF promotes presynaptic functional potentiation. KO neurons fail to show changes in either synaptic function or structure in response to BDNF, indicating that LRRK2 is required for activity-dependent remodeling.

      LRRK2 Loss Accelerates Synaptic Maturation. At DIV70, KO neurons already exhibit high spontaneous synaptic activity equivalent to BDNF-stimulated WT neurons. This suggests that LRRK2 may act to suppress premature maturation and temporally gate BDNF responsiveness, aligning with the differences in maturation dynamics observed in KO mice (Figure 5).  

      As suggested by the reviewer we reported the measurement of resistance and capacitance for all DIV (Table 1, supplemental material). A reduction in capacitance was observed in WT neurons at DIV90, which may reflect changes in membrane complexity. However, this did not correlate with differences in synapse number and is unlikely to account for the observed differences in mEPSC frequency. To control for cell number between groups, cell count prior to plating was performed (80k/cm2; see also methods) on the non-dividing cells to keep cell number consistent.

      The presence of BDNF in WT seems to make them look like LKO, in the rest of the paper the suggestion is that the LKO lack a response to BDNF. Here it looks like it could be that BDNF signalling is saturated in LKO, or they are just very different at base and lack a response.

      Knowing which is important to the conclusions, and acute application (recording and BDNF wash-in) would be much more convincing.

      We agree with the Reviewer’s point that saturation of BDNF could influence the interpretation of the data if it were to occur. However, it is important to note that no BDNF exists in the media in base control and KO neuronal culture conditions. This is  different from other culture conditions and allows us to investigate the effects of  BDNF treatment. Thus, the increased mEPSC frequency observed in KO neurons compared to WT neurons is defined only by the deletion of the gene and not by other extrinsic factors which were kept consistent between the groups. The lack of response or change in mEPSC frequency in KO is proposed to be a compensatory mechanism due to the loss of LRRK2. Of Note, LRRK2 as a “synaptic break” has already been described (Beccano-Kelly et al., Hum Mol Gen, 2015). However, a comprehensive analysis of the underlying molecular mechanisms will  require future studies beyond  with the scope of this paper.

      "The LRRK2 kinase substrates Rabs are not present in the list of significant phosphopeptides, likely due to the low stoichiometry and/or abundance" Likely due to the fact mass spec does not get anywhere near everything. 

      We removed this sentence in light of the new phosphoproteomic analysis.

      Figure 7 is pretty stand-alone, and not validated in any way, hard to justify its inclusion?  

      As extensively explained we removed figure 7 and included the new phospho-MS as part of figure. 3

      Writing throughout shows a very selective and shallow use of the literature.  

      We extensively reviewed the citations.

      "while Lrrk1 transcript in this region is relatively stable during development" The authors reference a very old paper that barely shows any LRRK1 mRNA, and no protein. Others have shown that LRRK1 is essentially not present postnatally PMC2233633. This isn't even an argument the authors need to make. 

      We thank the reviewer and included this more appropriate citation. 

      Reviewer #2 (Recommendations For The Authors): 

      Cyfip1 (Fig 3A) is part of the WAVE complex (page 13). 

      We thank the reviewer and specified it.

      The discussion could be more focused. 

      We extensively revised the discussion to keep it more focused.

      Note that we updated the GO ontology analyses to reflect the updated information present in g:Profiler.

      References.

      Nirujogi, R. S., Tonelli, F., Taylor, M., Lis, P., Zimprich, A., Sammler, E., & Alessi, D. R. (2021). Development of a multiplexed targeted mass spectrometry assay for LRRK2phosphorylated Rabs and Ser910/Ser935 biomarker sites. The Biochemical journal, 478(2), 299–326. https://doi.org/10.1042/BCJ20200930

      Worth, D. C., Daly, C. N., Geraldo, S., Oozeer, F., & Gordon-Weeks, P. R. (2013). Drebrin contains a cryptic F-actin-bundling activity regulated by Cdk5 phosphorylation. The Journal of cell biology, 202(5), 793–806. https://doi.org/10.1083/jcb.201303005

      Shirao, T., Hanamura, K., Koganezawa, N., Ishizuka, Y., Yamazaki, H., & Sekino, Y. (2017). The role of drebrin in neurons. Journal of neurochemistry, 141(6), 819–834. https://doi.org/10.1111/jnc.13988

      Koganezawa, N., Hanamura, K., Sekino, Y., & Shirao, T. (2017). The role of drebrin in dendritic spines. Molecular and cellular neurosciences, 84, 85–92. https://doi.org/10.1016/j.mcn.2017.01.004

      Meixner, A., Boldt, K., Van Troys, M., Askenazi, M., Gloeckner, C. J., Bauer, M., Marto, J. A., Ampe, C., Kinkl, N., & Ueffing, M. (2011). A QUICK screen for Lrrk2 interaction partners--leucine-rich repeat kinase 2 is involved in actin cytoskeleton dynamics. Molecular & cellular proteomics: MCP, 10(1), M110.001172. https://doi.org/10.1074/mcp.M110.001172

      Parisiadou, L., & Cai, H. (2010). LRRK2 function on actin and microtubule dynamics in Parkinson disease. Communicative & integrative biology, 3(5), 396–400. https://doi.org/10.4161/cib.3.5.12286

      Chen, C., Masotti, M., Shepard, N., Promes, V., Tombesi, G., Arango, D., Manzoni, C., Greggio, E., Hilfiker, S., Kozorovitskiy, Y., & Parisiadou, L. (2024). LRRK2 mediates haloperidol-induced changes in indirect pathway striatal projection neurons. bioRxiv : the preprint server for biology, 2024.06.06.597594. https://doi.org/10.1101/2024.06.06.597594

      Cheng, J., Novati, G., Pan, J., Bycroft, C., Žemgulytė, A., Applebaum, T., Pritzel, A.,Wong, L. H., Zielinski, M., Sargeant, T., Schneider, R. G., Senior, A. W., Jumper, J., Hassabis, D., Kohli, P., & Avsec, Ž. (2023). Accurate proteome-wide missense variant effect prediction with AlphaMissense. Science (New York, N.Y.), 381(6664), eadg7492. https://doi.org/10.1126/science.adg7492

      Beaudoin, G. M., 3rd, Schofield, C. M., Nuwal, T., Zang, K., Ullian, E. M., Huang, B., & Reichardt, L. F. (2012). Afadin, a Ras/Rap effector that controls cadherin function, promotes spine and excitatory synapse density in the hippocampus. The Journal of neuroscience : the official journal of the Society for Neuroscience, 32(1), 99–110. https://doi.org/10.1523/JNEUROSCI.4565-11.2012

      Fernández, B., Chittoor-Vinod, V. G., Kluss, J. H., Kelly, K., Bryant, N., Nguyen, A. P. T., Bukhari, S. A., Smith, N., Lara Ordóñez, A. J., Fdez, E., Chartier-Harlin, M. C., Montine, T. J., Wilson, M. A., Moore, D. J., West, A. B., Cookson, M. R., Nichols, R. J., & Hilfiker, S. (2022). Evaluation of Current Methods to Detect Cellular Leucine-Rich Repeat Kinase 2 (LRRK2) Kinase Activity. Journal of Parkinson's disease, 12(5), 1423–1447. https://doi.org/10.3233/JPD-213128

      Cirnaru, M. D., Marte, A., Belluzzi, E., Russo, I., Gabrielli, M., Longo, F., Arcuri, L., Murru, L., Bubacco, L., Matteoli, M., Fedele, E., Sala, C., Passafaro, M., Morari, M., Greggio, E., Onofri, F., & Piccoli, G. (2014). LRRK2 kinase activity regulates synaptic vesicle trafficking and neurotransmitter release through modulation of LRRK2 macromolecular complex. Frontiers in molecular neuroscience, 7, 49. https://doi.org/10.3389/fnmol.2014.00049

      Belluzzi, E., Gonnelli, A., Cirnaru, M. D., Marte, A., Plotegher, N., Russo, I., Civiero, L., Cogo, S., Carrion, M. P., Franchin, C., Arrigoni, G., Beltramini, M., Bubacco, L., Onofri, F., Piccoli, G., & Greggio, E. (2016). LRRK2 phosphorylates pre-synaptic Nethylmaleimide sensitive fusion (NSF) protein enhancing its ATPase activity and SNARE complex disassembling rate. Molecular neurodegeneration, 11, 1. https://doi.org/10.1186/s13024-015-0066-z

      Martin, E. R., Gandawijaya, J., & Oguro-Ando, A. (2022). A novel method for generating glutamatergic SH-SY5Y neuron-like cells utilizing B-27 supplement. Frontiers in pharmacology, 13, 943627. https://doi.org/10.3389/fphar.2022.943627

      Kovalevich, J., & Langford, D. (2013). Considerations for the use of SH-SY5Y neuroblastoma cells in neurobiology. Methods in molecular biology (Clifton, N.J.), 1078, 9–21. https://doi.org/10.1007/978-1-62703-640-5_2

      Drummond, N. J., Singh Dolt, K., Canham, M. A., Kilbride, P., Morris, G. J., & Kunath, T. (2020). Cryopreservation of Human Midbrain Dopaminergic Neural Progenitor Cells Poised for Neuronal Differentiation. Frontiers in cell and developmental biology, 8, 578907. https://doi.org/10.3389/fcell.2020.578907

      Tao, X., Finkbeiner, S., Arnold, D. B., Shaywitz, A. J., & Greenberg, M. E. (1998). Ca2+ influx regulates BDNF transcription by a CREB family transcription factor-dependent mechanism. Neuron, 20(4), 709–726. https://doi.org/10.1016/s0896-6273(00)810107

      El-Husseini, A. E., Schnell, E., Chetkovich, D. M., Nicoll, R. A., & Bredt, D. S. (2000). PSD95 involvement in maturation of excitatory synapses. Science (New York, N.Y.), 290(5495), 1364–1368.

      Glebov OO, Cox S, Humphreys L, Burrone J. Neuronal activity controls transsynaptic geometry. Sci Rep. 2016 Mar 8;6:22703. doi: 10.1038/srep22703. Erratum in: Sci Rep. 2016 May 31;6:26422. doi: 10.1038/srep26422. PMID: 26951792; PMCID: PMC4782104.

      Beccano-Kelly DA, Volta M, Munsie LN, Paschall SA, Tatarnikov I, Co K, Chou P, Cao LP, Bergeron S, Mitchell E, Han H, Melrose HL, Tapia L, Raymond LA, Farrer MJ, Milnerwood AJ. LRRK2 overexpression alters glutamatergic presynaptic plasticity, striatal dopamine tone, postsynaptic signal transduction, motor activity and memory. Hum Mol Genet. 2015 Mar 1;24(5):1336-49. doi: 10.1093/hmg/ddu543. Epub 2014 Oct 24. PMID: 25343991.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, the authors use anatomical tracing and slice physiology to investigate the integration of thalamic (ATN) and retrosplenial cortical (RSC) signals in the dorsal presubiculum (PrS). This work will be of interest to the field, as the postsubiculum is thought to be a key region for integrating internal head direction representations with external landmarks. The main result is that ATN and RSC inputs drive the same L3 PrS neurons, which exhibit superlinear summation to near-coincident inputs. Moreover, this activity can induce bursting in L4 PrS neurons, which can pass the signals LMN (perhaps gated by cholinergic input).

      Strengths:

      The slice physiology experiments are carefully done. The analyses are clear and convincing, and the figures and results are well-composed. Overall, these results will be a welcome addition to the field.

      We thank this reviewer for the positive comment on our work.

      Weaknesses:

      The conclusions about the circuit-level function of L3 PrS neurons sometimes outstrip the data, and their model of the integration of these inputs is unclear. I would recommend some revision of the introduction and discussion. I also had some minor comments about the experimental details and analysis.

      Specific major comments:

      (1) I found that the authors' claims sometimes outstrip their data, given that there were no in vivo recordings during behavior. For example, in the abstract, their results indicate "that layer 3 neurons can transmit a visually matched HD signal to medial entorhinal cortex", and in the conclusion they state "[...] cortical RSC projections that carry visual landmark information converge on layer 3 pyramidal cells of the dorsal presubiculum". However, they never measured the nature of the signals coming from ATN and RSC to L3 PrS (or signals sent to downstream regions). Their claim is somewhat reasonable with respect to ATN, where the majority of neurons encode HD, but neurons in RSC encode a vast array of spatial and non-spatial variables other than landmark information (e.g., head direction, egocentric boundaries, allocentric position, spatial context, task history to name a few), so making strong claims about the nature of the incoming signals is unwarranted.

      We agree of course that RSC does not only encode landmark information. We have clarified this point in the introduction (line 69-70) and formulated more carefully in the abstract (removed the word ‘landmark’ in line 17) and in the  introduction (line 82-83). In the discussion we explicitly state that ‘In our slice work we are blind to the exact nature of the signal that is carried by ATN and RSC axons’ (line 522-523).

      (2) Related to the first point, the authors hint at, but never explain, how coincident firing of ATN and RSC inputs would help anchor HD signals to visual landmarks. Although the lesion data (Yoder et al. 2011 and 2015) support their claims, it would be helpful if the proposed circuit mechanism was stated explicitly (a schematic of their model would be helpful in understanding the logic). For example, how do neurons integrate the "right" sets of landmarks and HD signals to ensure stable anchoring? Moreover, it would be helpful to discuss alternative models of HD-to-landmark anchoring, including several studies that have proposed that the integration may (also?) occur in RSC (Page & Jeffrey, 2018; Yan, Burgess, Bicanski, 2021; Sit & Goard, 2023). Currently, much of the Discussion simply summarizes the results of the study, this space could be better used in mapping the findings to the existing literature on the overarching question of how HD signals are anchored to landmarks.

      We agree with the reviewer on the importance of the question, how do neurons integrate the “right” sets of landmarks and HD signals to ensure stable anchoring? Based on our results we provide a schematic to illustrate possible scenarios, and we include it as a supplementary figure (Figure 1, to be included in the ms as Figure 7—figure supplement 2), as well as a new paragraph in the discussion section (line 516-531).  We point out that critical information on the convergence and divergence of functionally defined inputs is still lacking, both for principal cells and interneurons

      Interestingly, recent evidence from functional ultrasound imaging and electrical single cell recording demonstrated that visual objects may refine head direction coding, specifically in the dorsal presubiculum (Siegenthaler et al. bioRxiv 2024.10.21.619417; doi: https://doi.org/10.1101/2024.10.21.619417). The increase in firing rate for HD cells whose preferred firing direction corresponds to a visual landmark could be supported by the supralinear summation of thalamic HD signals and retrosplenial input described in our study. We include this point in the discussion (line 460-462), and hope that our work will spur further investigations.

      Reviewer #2 (Public Review):

      Richevaux et al investigate how anterior thalamic (AD) and retrosplenial (RSC) inputs are integrated by single presubicular (PrS) layer 3 neurons. They show that these two inputs converge onto single PrS layer 3 principal cells. By performing dual-wavelength photostimulation of these two inputs in horizontal slices, the authors show that in most layer 3 cells, these inputs summate supra-linearly. They extend the experiments by focusing on putative layer 4 PrS neurons, and show that they do not receive direct anterior thalamic nor retrosplenial inputs; rather, they are (indirectly) driven to burst firing in response to strong activation of the PrS network.

      This is a valuable study, that investigates an important question - how visual landmark information (possibly mediated by retrosplenial inputs) converges and integrates with HD information (conveyed by the AD nucleus of the thalamus) within PrS circuitry. The data indicate that near-coincident activation of retrosplenial and thalamic inputs leads to non-linear integration in target layer 3 neurons, thereby offering a potential biological basis for landmark + HD binding.

      The main limitations relate to the anatomical annotation of 'putative' PrS L4 neurons, and to the presentation of retrosplenial/thalamic input modularity. Specifically, more evidence should be provided to convincingly demonstrate that the 'putative L4 neurons' of the PrS are not distal subicular neurons (as the authors' anatomy and physiology experiments seem to indicate). The modularity of thalamic and retrosplenial inputs could be better clarified in relation to the known PrS modularity.

      We thank the reviewer for their important feedback. We discuss what defines presubicular layer 4 in horizontal slices, cite relevant literature, and provide new and higher resolution images. See below for detailed responses to the reviewer’s comments, in the section ‘recommendations to authors’.

      Reviewer #3 (Public Review):

      Summary:

      The authors sought to determine, at the level of individual presubiculum pyramidal cells, how allocentric spatial information from the retrosplenial cortex was integrated with egocentric information from the anterior thalamic nuclei. Employing a dual opsin optogenetic approach with patch clamp electrophysiology, Richevaux, and colleagues found that around three-quarters of layer 3 pyramidal cells in the presubiculum receive monosynaptic input from both brain regions. While some interesting questions remain (e.g. the role of inhibitory interneurons in gating the information flow and through different layers of presubiculum, this paper provides valuable insights into the microcircuitry of this brain region and the role that it may play in spatial navigation).

      Strengths:

      One of the main strengths of this manuscript was that the dual opsin approach allowed the direct comparison of different inputs within an individual neuron, helping to control for what might otherwise have been an important source of variation. The experiments were well-executed and the data was rigorously analysed. The conclusions were appropriate to the experimental questions and were well-supported by the results. These data will help to inform in vivo experiments aimed at understanding the contribution of different brain regions in spatial navigation and could be valuable for computational modelling.

      Weaknesses:

      Some attempts were made to gain mechanistic insights into how inhibitory neurotransmission may affect processing in the presubiculum (e.g. Figure 5) but these experiments were a little underpowered and the analysis carried out could have been more comprehensively undertaken, as was done for other experiments in the manuscript.

      We agree that the role of interneurons for landmark anchoring through convergence in Presubiculum requires further investigation. In our latest work on the recruitment of VIP interneurons we begin to address this point in slices (Nassar et al., 2024 Neuroscience. doi: 10.1016/j.neuroscience.2024.09.032.); more work in behaving animals will be needed.

      Reviewer #1 (Recommendations For The Authors):

      Full comments below. Beyond the (mostly minor) issues noted below, this is a very well-written paper and I look forward to seeing it in print.

      Major comments:

      (1) I found that the authors' claims sometimes outstrip their data, given that there were no in vivo recordings during behavior. For example, in the abstract, their results indicate "that layer 3 neurons can transmit a visually matched HD signal to medial entorhinal cortex", and in the conclusion they state "[...] cortical RSC projections that carry visual landmark information converge on layer 3 pyramidal cells of the dorsal presubiculum". However, they never measured the nature of the signals coming from ATN and RSC to L3 PrS (or signals sent to downstream regions). Their claim is somewhat reasonable with respect to ATN, where the majority of neurons encode HD, but neurons in RSC encode a vast array of spatial and non-spatial variables other than landmark information (e.g., head direction, egocentric boundaries, allocentric position, spatial context, task history to name a few), so making strong claims about the nature of the incoming signals is unwarranted.

      Our study was motivated by the seminal work from Yoder et al., 2011 and 2015, indicating that visual landmark information is processed in PoS and from there transmitted to the LMN.  Based on that, and in the interest of readability, we may have used an oversimplified shorthand for the type of signal carried by RSC axons. There are numerous studies indicating a role for RSC in encoding visual landmark information (Auger et al., 2012; Jacob et al., 2017; Lozano et al., 2017; Fischer et al., 2020; Keshavarzi et al., 2022; Sit and Goard, 2023); we agree of course that this is certainly not the only variable that is represented. Therefore we change the text to make this point clear:

      Abstract, line 17: removed the word ‘landmark’

      Introduction, line 69: added “...and supports an array of cognitive functions including memory, spatial and non-spatial context and navigation (Vann et al., 2009; Vedder et al., 2017). ”

      Introduction, line 82: changed “...designed to examine the convergence of visual landmark information, that is possibly integrated in the RSC, and vestibular based thalamic head direction signals”.

      Discussion, line 522-523: added “In our slice work we are blind to the exact nature of the signal that is carried by ATN and RSC axons.”

      (2) Related to the first point, the authors hint at, but never explain, how coincident firing of ATN and RSC inputs would help anchor HD signals to visual landmarks. Although the lesion data (Yoder et al., 2011 and 2015) support their claims, it would be helpful if the proposed circuit mechanism was stated explicitly (a schematic of their model would be helpful in understanding the logic). For example, how do neurons integrate the "right" sets of landmarks and HD signals to ensure stable anchoring? Moreover, it would be helpful to discuss alternative models of HD-to-landmark anchoring, including several studies that have proposed that the integration may (also?) occur in RSC (Page & Jeffrey, 2018; Yan, Burgess, Bicanski, 2021; Sit & Goard, 2023). Currently, much of the Discussion simply summarizes the results of the study, this space could be better used in mapping the findings to the existing literature on the overarching question of how HD signals are anchored to landmarks.

      We suggest a physiological mechanism for inputs to be selectively integrated and amplified, based on temporal coincidence. Of course there are still many unknowns, including the divergence of connections from a single thalamic or retrosplenial input neuron. The anatomical connectivity of inputs will be critical, as well as the subcellular arrangement of synaptic contacts. Neuromodulation and changes in the balance of excitation and inhibition will need to be factored in. While it is premature to provide a comprehensive explanation for landmark anchoring of HD signals in PrS, our results have led us to include a schematic, to illustrate our thinking (Figure 1, see below).

      Do HD tuned inputs from thalamus converge on similarly tuned HD neurons only? Is divergence greater for the retrosplenial inputs? If so, thalamic input might pre-select a range of HD neurons, and converging RSC input might narrow down the precise HD neurons that become active (Figure 1). In the future, the use of activity dependent labeling strategies might help to tie together information on the tuning of pre-synaptic neurons, and their convergence or divergence onto functionally defined postsynaptic target cells. This critical information is still lacking, for principal cells, and also for interneurons. 

      Interneurons may have a key role in HD-to-landmark anchoring. SST interneurons support stability of HD signals (Simonnet et al., 2017) and VIP interneurons flexibly disinhibit the system (Nassar et al., 2024). Could disinhibition be a necessary condition to create a window of opportunity for updating the landmark anchoring of the attractor? Single PV interneurons might receive thalamic and retrosplenial inputs non-specifically. We need to distinguish the conditions for when the excitation-inhibition balance in pyramidal cells may become tipped towards excitation, and the case of coincident, co-tuned thalamic and retrosplenial input may be such a condition. Elucidating the principles of hardwiring of inputs, as for example, selective convergence, will be necessary. Moreover, neuromodulation and oscillations may be critical for temporal coordination and precise temporal matching of HD-to-landmark signals.

      We note that matching directional with visual landmark information based on temporal coincidence as described here does not require synaptic plasticity. Algorithms for dynamic control of cognitive maps without synaptic plasticity have been proposed (Whittington et al., 2025, Neuron): information may be stored in neural attractor activity, and the idea that working memory may rely on recurrent updates of neural activity might generalize to the HD system. We include these considerations in the discussion (line 497-501; 521-531) and hope that our work will spur further experimental investigations and modeling work.

      While the focus of our work has been on PrS, we agree that RSC also treats HD and landmark signals. Possibly the RSC registers a direction to a landmark rather than comparing it with the current HD (Sit & Goard, 2023). We suggest that this integrated information then reaches PrS. In contrast to RSC, PrS is uniquely positioned to update the signal in the LMN (Yoder et al., 2011), cf. discussion (line 516-520).

      Minor comments:

      (1) Fig 1 - Supp 1: It appears there is a lot of input to PrS from higher visual regions, could this be a source of landmark signals?

      Yes, higher visual regions projecting to PrS may also be a source of landmark information, even if the visual signal is not integrated with HD at that stage (Sit & Goard 2023). The anatomical projection from the visual cortex was first described by Vogt & Miller (1983), but not studied on a functional level so far.

      (2) Fig 2F, G: Although the ATN and RSC measurements look quite similar, there are no stats included. The authors should use an explicit hypothesis test.

      We now compare the distributions of amplitudes and of latencies, using the Mann-Whitney U test. No significant difference between the two groups were found. Added in the figure legend: 2F, “Mann-Whitney U test revealed no significant difference (p = 0.95)”. 2G, “Mann-Whitney U test revealed no significant difference (p = 0.13)”.

      (3) Fig 2 - Supp 2A, C: Again, no statistical tests. This is particularly important for panel A, where the authors state that the latencies are similar but the populations appear to be different.

      Inputs from ATN and RSC have a similar ‘jitter’ (latency standard deviation) and ‘tau decay’. We added in the Fig 2 - Supp 2 figure legend: A, “Mann-Whitney U test revealed no significant difference (p = 0.26)”. C, “Mann-Whitney U test revealed no significant difference (p = 0.87)”.

      As a complementary measure for the reviewer, we performed the Kolmogorov-Smirnov test which confirmed that the populations’ distributions for ‘jitter’ were not significantly different, p = 0.1533.

      (4) Fig 4E, F: The statistics reporting is confusing, why are asterisks above the plots and hashmarks to the side?

      Asterisks refer to a comparison between ‘dual’ and ‘sum’ for each of the 5 stimulations in a Sidak multiple comparison test. Hashmarks refer to comparison of the nth stimulation to the 1st one within dual stimulation events (Friedman + Dunn’s multiple comparison test). We mention the two-way ANOVA p-value in the legend (Sum v Dual, for both Amplitude and Surface).

      (5) Fig 5C: I was confused by the 2*RSC manipulation. How do we know if there is amplification unless we know what the 2*RSC stim alone looks like?

      We now label the right panel in Fig 5C as “high light intensity” or “HLI”. Increasing the activation of Chrimson increases the amplitude of the summed EPSP that now exceeds the threshold for amplification of synaptic events. Amplification refers to the shape of the plateau-like prolongation of the peak, most pronounced on the second EPSP, now indicated with an arrow.  We clarify this also in the text (line 309-310).

      (6) Fig 6D (supplement 1): Typo, "though" should be "through"

      Yes, corrected (line 1015).

      (7) Fig 6G (supplement 1): Typo, I believe this refers to the dotted are in panel F, not panel A.

      Yes, corrected (line 1021).

      (8) Fig 7: The effect of muscarine was qualitatively described in the Results, but there is no quantification and it is not shown in the Figure. The results should either be reported properly or removed from the Results.

      We remove the last sentence in the Results.

      (9) Methods: The age and sex of the mice should be reported. Transgenic mouse line should be reported (along with stock number if applicable).

      We used C57BL6 mice with transgenic background (Ai14 mice, Jax n007914  reporter line) or C57BL6 wild type mice. This is now indicated in the Methods (lines 566-567).

      (10) Methods: If the viruses are only referred to with their plasmid number, then the capsid used for the viruses should be specified. For example, I believe the AAV-CAG-tomato virus used the retroAAV capsid, which is important to the experiment.

      Thank you for pointing this out. Indeed the AAV-CAG-tdTom virus used the retroAAV capsid, (line 575).

      (11) Data/code availability: I didn't see any sort of data/code availability statement, will the data and code be made publicly available?

      Data are stored on local servers at the SPPIN, Université Paris Cité, and are made available upon reasonable request. Code for intrinsic properties analysis is available on github (https://github.com/schoki0710/Intrinsic_Properties). This information is now included (line 717-720).

      (12) Very minor (and these might be a matter of opinion), but I believe "records" should be "recordings", and "viral constructions" should be "viral constructs".

      The text had benefited from proofreading by Richard Miles, who always preferred “records” to “recordings” in his writings. We choose to keep the current wording.

      Reviewer #2 (Recommendations For The Authors):

      Below are two major points that require clarification.

      (1) In the last set of experiments presented by the authors (Figs 6 onwards) they focus on 'putative L4' PrS cells. For several lines of evidence (outlined below), I am convinced that these neurons are not presubicular, but belong to the subiculum. I think this is a major point that requires substantial clarification, in order to avoid confusion in the field (see also suggestions on how to address this comment at the end of this section).

      Several lines of evidence support the interpretation that, what the authors call 'L4 PrS neurons', are distal subicular cells:

      (1.1) The anatomical location of the retrogradely-labelled cells (from mammillary bodies injections), as shown in Figs 6B, C, and Fig. 6_1B, very clearly indicates that they belong to the distal subiculum. The subicular-to-PrS boundary is a sharp anatomical boundary that follows exactly the curvature highlighted by the authors' red stainings. The authors could also use specific subicular/PrS markers to visualize this border more clearly - e.g. calbindin, Wfs-1, Zinc (though I believe this is not strictly necessary, since from the pattern of AD fibers, one can already draw very clear conclusions, see point 1.3 below).

      Our criteria to delimit the presubiculum are the following: First and foremost, we rely on the defining presence of antero-dorsal thalamic fibers that target specifically the presubiculum and not the neighbouring subiculum (Simonnet et al., 2017, Nassar et al., 2018, Simonnet and Fricker, 2018; Jiayan Liu et al., 2021). This provides the precise outline of the presubicular superficial layers 1 to 3. It may have been confusing to the reviewer that our slicing angle gives horizontal sections. In fact, horizontal sections are favourable to identify the layer structure of the PrS,  based on DAPI staining and the variations in cell body size. The work by Ishihara and Fukuda (2016) illustrates in their Figure 12 that the presubicular layer 4 lies below the presubicular layer 3, and forms a continuation with the subiculum (Sub1). Their Figure 4 indicates with a dotted line the “generally accepted border between the (distal) subiculum and PreS”, and it runs from the proximal tip of superficial cells of the PrS toward the white matter, among the radial direction of the cortical tissue.  We agree with this definition. Others have sliced coronally (Cembrowski et al., 2018) which renders a different visualization of the border region with the subiculum.

      Second, let me explain the procedure for positioning the patch electrode in electrophysiological experiments on horizontal presubicular slices. Louis Richevaux, the first author, who carried out the layer 4 cell recordings, took great care to stay very close (<50 µm) to the lower limit of the zone where the GFP labeled thalamic axons can be seen. He was extremely meticulous about the visualization under the microscope, using LED illumination, for targeting. The electrophysiological signature of layer 4 neurons with initial bursts (but not repeated bursting, in mice) is another criterion to confirm their identity (Huang et al., 2017). Post-hoc morphological revelation showed their apical dendrites, running toward the pia, sometimes crossing through the layer 3, sometimes going around the proximal tip, avoiding the thalamic axons (Figure 6D). For example the cell in Figure 6, suppl. 1 panel D, has an apical dendrite that runs through layer 3 and layer 1. 

      Third, retrograde labeling following stereotaxic injection into the LMN is another criterion to define PrS layer 4. This approach is helpful for visualization, and is based on the defining axonal projection of layer 4 neurons (Yoder and Taube, 2011; Huang et al., 2017). Due to the technical challenge to stereotaxically inject only into LMN, the resultant labeling may not be limited to PrS layer 4. We cannot entirely exclude some overflow of retrograde tracers (B) or retrograde virus (C) to the neighboring MMN. This would then lead to co-labeling of the subiculum. In the main Figure 6, panels B and C, we agree that for this reason the red labelled cell bodies likely include also subicular neurons, on the proximal side, in addition to L4 presubicular neurons. We now point out this caveat in the main text (line 324-326) and in the methods (line 591-592).

      (1.2) Consistent with their subicular location, neuronal morphologies of the 'putative L4 cells' are selectively constrained within the subicular boundaries, i.e. they do not cross to the neighboring PrS (maybe a minor exception in Figs. 6_1D2,3). By definition, a neuron whose morphology is contained within a structure belongs to that structure.

      From a functional point of view, for the HD system, the most important criterion for defining presubicular layer 4 neurons is their axonal projection to the LMN (Yoder and Taube 2011). From an electrophysiological standpoint, it is the capacity of layer 4 neurons to fire initial bursts (Simonnet et al., 2013; Huang et al., 2017).  Anatomically, we note that the expectation that the apical dendrite should go straight up into layer 3 might not be a defining criterion in this curved and transitional periarchicortex. Presubicular layer 4 apical dendrites may cross through layer 3 and exit to the side, towards the subiculum (This is the red dendritic staining at the proximal end of the subiculum, at the frontier with the subiculum, Figure 6 C).

      (1.3) As acknowledged by the authors in the discussion (line 408): the PrS is classically defined by the innervation domain of AD fibers. As Figure 6B clearly indicates, the retrogradely-labelled cells ('putative L4') are convincingly outside the input domain of the AD; hence, they do not belong to the PrS.

      The reviewer is mistaken here, the deep layers 4 and 5/6 indeed do not lie in the zone innervated by the thalamic fibers (Simonnet et al., 2017; Nassar et al., 2018; Simonnet and Fricker, 2018) but still belong to the presubiculum. The presubicular deep layers are located below the superficial layers, next to, and in continuation of the subiculum. This is in agreement with work by Yoder and Taube 2011; Ishihara and Fukuda 2016; Boccara, … Witter, 2015; Peng et al., 2017 (Fig 2D); Yoshiko Honda et al., (Marmoset, Fig 2A) 2022; Balsamo et al., 2022 (Figure 2B).

      (1.4) Along with the above comment: in my view, the optogenetic stimulation experiments are an additional confirmation that the 'putative L4 cells' are subicular neurons, since they do not receive AD inputs at all (hence, they are outside of the PrS); they are instead only indirectly driven upon strong excitation of the PrS. This indirect activation is likely to occur via PrS-to-Subiculum 'back-projections', the existence of which is documented in the literature and also nicely shown by the authors (see Figure 1_1 and line 109).

      See above. Only superficial layers 1-3 of the presubiculum receive direct AD input.

      (1.5) The electrophysiological properties of the 'putative L4 cells' are consistent with their subicular identity, i.e. they show a sag current and they are intrinsically bursty.

      Presubicular layer 4 cells also show bursting behaviour and a sag current (Simonnet et al., 2013; Huang et al., 2017).

      From the above considerations, and the data provided by the authors, I believe that the most parsimonious explanation is that these retrogradely-labelled neurons (from mammillary body injections), referred to by the authors as 'L4 PrS cells', are indeed pyramidal neurons from the distal subiculum.

      We agree that the retrograde labeling is likely not limited to the presubicular layer 4 cells, and we now indicate this in the text (line 324-326). However, the portion of retrogradely labeled neurons that is directly below the layer 3 should be considered as part of the presubiculum.

      I believe this is a fundamental issue that deserves clarification, in order to avoid confusion/misunderstandings in the field. Given the evidence provided, I believe that it would be inaccurate to call these cells 'L4 PrS neurons'. However, I acknowledge the fact that it might be difficult to convincingly and satisfactorily address this issue within the framework of a revision. For example, it is possible that these 'putative L4 cells' might be retrogradely-labelled from the Medial Mammillary Body (a major subicular target) since it is difficult to selectively restrict the injection to the LMN, unless a suitable driver line is used (if available). The authors should also consider the possibility of removing this subset of data (referring to putative L4), and instead focus on the rest of the story (referring to L3)- which I think by itself, still provides sufficient advance.

      We agree with the reviewer that it is difficult to provide a satisfactory answer. To some extent, the reviewer’s comments target the nomenclature of the subicular region. This transitional region between the hippocampus and the entorhinal cortex has been notoriously ill defined, and the criteria are somewhat arbitrary for determining exactly where to draw the line. Based on the thalamic projection, presubicular layers 1-3 can now be precisely outlined, thanks to the use of viral labeling. But the presubicular layer 4 had been considered to be cell-free in early works, and termed ‘lamina dissecans’ (Boccara 2010), as the limit between the superficial and deep layers. Then it became of great interest to us and to the field, when the PrS layer 4 cells were first identified as LMN projecting neurons (Yoder and Taube 2011). This unique back-projection to the upstream region of the HD system is functionally very important, closing the loop of the Papez circuit (mammillary bodies - thalamus - hippocampal structures).

      We note that the reviewer does not doubt our results, rather questions the naming conventions. We therefore maintain our data. We agree that in the future a genetically defined mouse line would help to better pin down this specific neuronal population.

      We thank the reviewer for sharing their concerns and giving us the opportunity to clarify our experimental approach to target the presubicular layer 4. We hope that these explanations will be helpful to the readers of eLife as well.

      (2) The PrS anatomy could be better clarified, especially in relation to its modular organization (see e.g. Preston-Ferrer et al., 2016; Ray et al., 2017; Balsamo et al., 2022). The authors present horizontal slices, where cortical modularity is difficult to visualize and assess (tangential sections are typically used for this purpose, as in classical work from e.g. barrel cortex). I am not asking the authors to validate their observations in tangential sections, but just to be aware that cortical modules might not be immediately (or clearly) apparent, depending on the section orientation and thickness. The authors state that AD fibers were 'not homogeneously distributed' in L3 (line 135) and refer to 'patches of higher density in deep L3' (line 136). These statements are difficult to support unless more convincing anatomy and  . I see some L3 inhomogeneity in the green channel in Fig. 1G (last two panels) and also in Fig. 1K, but this seems to be rather upper L3. I wonder how consistent the pattern is across different injections and at what dorsoventral levels this L3 modularity is observed (I think sagittal sections might be helpful). If validated, these observations could point to the existence of non-homogeneous AD innervation domains in L3 - hinting at possible heterogeneity among the L3 pyramidal cell targets. Notably, modularity in L2 and L1 is not referred to. The authors state that AD inputs 'avoid L2' (line 131) but this statement is not in line with recent work (cited above) and is also not in line with their anatomy data in Fig. 1G, where modularity is already quite apparent in L2 (i.e. there are territories avoided by the AD fibers in L2) and in L1 (see for example the last image in Fig. 1G). This is the case also for the RSC axons (Fig. 1H) where a patchy pattern is quite clear in L1 (see the last image in panel H). Higher-mag pictures might be helpful here. These qualitative observations imply that AD and RSC axons probably bear a precise structural relationship relative to each other, and relative to the calbindin patch/matrix PrS organization that has been previously described. I am not asking the authors to address these aspects experimentally, since the main focus of their study is on L3, where RSC/AD inputs largely converge. Better anatomy pictures would be helpful, or at least a better integration of the authors' (qualitative) observations within the existing literature. Moreover, the authors' calbindin staining in Fig. 1K is not particularly informative. Subicular, PaS, MEC, and PrS borders should be annotated, and higher-resolution images could be provided. The authors should also check the staining: MEC appears to be blank but is known to strongly express calb1 in L2 (see 'island' by Kitamura et al., Ray et al., Science 2014; Ray et al., frontiers 2017). As additional validation for the staining: I would expect that the empty L2 patches in Figs. 1G (last two panels) would stain positive for Calbindin, as in previous work (Balsamo et al. 2022).

      We now provide a new figure showing the pattern of AD innervation in PrS superficial layers 1 to 3, with different dorso-ventral levels and higher magnification (Figure 2). Because our work was aimed at identifying connectivity between long-range inputs and presubicular neurons, we chose to work with horizontal sections that preserve well the majority of the apical dendrites of presubicular pyramidal neurons. We feel it is enriching for the presubicular literature to show the cytoarchitecture from different angles and to show patchiness in horizontal sections. The non-homogeneous AD innervation domains (‘microdomains’) in L3 were consistently observed across different injections in different animals.

      Author response image 1.

      Thalamic fiber innervation pattern. A, ventral, and B, dorsal horizontal section of the Presubiculum containing ATN axons expressing GFP. Patches of high density of ATN axonal ramifications in L3 are indicated as “ATN microdomains”. Layers 1, 2, 3, 4, 5/6 are indicated.  C, High magnification image (63x optical section)(different animal).<br />

      We also provide a supplementary figure with images of horizontal sections of calbindin staining in PrS, with a larger crop, for the reviewer to check (Figure 3, see below). We thank the reviewer for pointing out recent studies using tangential sections. Our results agree with the previous observation that AD axons are found in calbindin negative territories (cf Fig 1K). Calbindin+ labeling is visible in the PrS layer 2 as well as in some patches in the MEC (Figure 3 panel A). Calbindin staining tends to not overlap with the territories of ATN axonal ramification. We indicate the inhomogeneities of anterior thalamic innervation that form “microdomains” of high density of green labeled fibers, located in layer 1 and layer 3 (Figure 3, Panel A, middle). Panel B shows another view of a more dorsal horizontal section of the PrS, with higher magnification, with a big Calbindin+ patch near the parasubiculum.

      The “ATN+ microdomains” possess a high density of axonal ramifications from ATN, and have been previously documented in the literature. They are consistently present. Our group had shown them in the article by Nassar et al., 2018, at different dorsoventral levels (Fig 1 C (dorsal) and 1D (ventral) PrS). See also Simonnet et al., 2017, Fig 2B, for an illustration of the typical variations in densities of thalamic fibers, and supplementary Figure 1D. Also Jiayan Liu et al., 2021 (Figure 2 and Fig 5) show these characteristic microzones of dense thalamic axonal ramifications, with more or less intense signals across layers 1, 2, and 3.  While it is correct that thalamic axons can be seen to cross layer 2 to ramify in layer 1, we maintain that AD axons typically do not ramify in layer 2. We modify the text to say, “mostly” avoiding L2 (line 130).

      The reviewer is correct in pointing out that the 'patches of higher density in deep L3' are not only in the deep L3, as in the first panel in Fig 1G, but in the more dorsal sections they are also found in the upper L3. We change the text accordingly (line 135-136) and we provide the layer annotation in Figure 1G. We further agree with the reviewer that RSC axons also present a patchy innervation pattern. We add this observation in the text (line 144).

      It is yet unclear whether anatomical microzones of dense ATN axon ramifications in L3 might fulfill the criteria of a functional modularity, as it is the case for the calbindin patch/matrix PrS organization (Balsamo et al., 2022). As the reviewer points out, this will require more information on the precise structural relationship of AD and RSC axons relative to each other, as well as functional studies. Interestingly, we note a degree of variation in the amplitudes of oEPSC from different L3 neurons (Fig. 2F, discussion line 420; 428), which might be a reflection of the local anatomo-functional micro-organization.

      Minor points:

      (1) The pattern or retrograde labelling, or at least the way is referred to in the results (lines 104ff), seems to imply some topography of AD-to-PreS projections. Is it the case? How consistent are these patterns across experiments, and individual injections? Was there variability in injection sites along the dorso-ventral and possibly antero-posterior PrS axes, which could account for a possibly topographical AD-to-PrS input pattern? It would be nice to see a DAPI signal in Fig. 1B since the AD stands out quite clearly in DAPI (Nissl) alone.

      Yes, we find a consistent topography for the AD-to-PrS projection, for similar injection sites in the presubiculum. The coordinates for retrograde labeling were as indicated -4.06 (AP), 2.00 (ML) and -2.15 mm (DV) such that we cannot report on possible variations for different injection sites.

      (2) Fig. 2_2KM: this figure seems to show the only difference the authors found between AD and RS input properties. The authors could consider moving these data into main Fig. 2 (or exchanging them with some of the panels in F-O, which instead show no difference between AD and RSC). Asterisks/stats significance is not visible in M.

      For space reasons we leave the panels of Fig. 2_2KM in the supplementary section. We increased the size of the asterisk in M.

      (3) The data in Fig. 1_1 are quite interesting, since some of the PrS projection targets are 'non-canonical'. Maybe the authors could consider showing some injection sites, and some fluorescence images, in addition to the schematics. Maybe the authors could acknowledge that some of these projection targets are 'putative' unless independently verified by e.g. retrograde labeling. Unspecific white matter labelling and/or spillover is always a potential concern.

      We now include the image of the injection site for data in Fig. 1_1 as a supplementary Fig. 1_2. The Figure 1_1 shows the retrogradely labeled upstream areas of Presubiculum.

      Author response image 2.

      Retrobeads were injected in the right Presubiculum.<br />

      (4) The authors speculate that the near-coincident summation of RS + AD inputs in L3 cells could be a potential mechanism for the binding of visual + HD information in PrS. However, landmarks are learned, and learning typically implies long-term plasticity. As the authors acknowledge in the discussion (lines 493ff) GluR1 is not expressed in PrS cells. What alternative mechanics could the authors envision? How could the landmark-update process occur in PrS, if is not locally stored? RSC could also be involved (Jakob et al) as acknowledged in the introduction - the authors should keep this possibility open also in the discussion.

      A similar point has been raised by Reviewer 1, please check our answer to their point 2. Briefly, our results indicate that HD-to-landmark updating is a multi-step process. RSC may be one of the places where landmarks are learned. The subsequent temporal mapping of HD to landmark signals in PrS might be plasticity-free, as matching directional with visual landmark information based on temporal coincidence does not necessarily require synaptic plasticity.  It seems likely that there is no local storage and no change in synaptic weights in PrS. The landmark-anchored HD signals reach LMN via L4 neurons, sculpting network dynamics across the Papez circuit. One possibility is that the trace of a landmark that matches HD may be stored as patterns of neural activity that could guide navigation (cf. El-Gaby et al., 2024, Nature) Clearly more work is needed to understand how the HD attractor is updated on a mechanistic level. Recent work in prefrontal cortex mentions “activity slots” and delineates algorithms for dynamic control of cognitive maps without synaptic plasticity (Whittington et al., 2025, Neuron): information may be stored in neural attractor activity, and the idea that working memory may rely on recurrent updates of neural activity might generalize to the HD system. We include these considerations in the discussion (line 499-503; 523-533) and also point to alternative models (line 518 -522) including modeling work in the retrosplenial cortex.

      (5) The authors state that (lines 210ff) their cluster analysis 'provided no evidence for subpopulations of layer 3 cells (but see Balsamo et al., 2022)' implying an inconsistency; however, Balsamo et al also showed that the (in vivo) ephys properties of the two HD cell 'types' are virtually identical, which is in line with the 'homogeneity' of L3 ephys properties (in slice) in the authors' data. Regarding the possible heterogeneity of L3 cells: the authors report inhomogeneous AD innervation domains in L3 (see also main comment 2) and differences in input summation (some L3 cells integrate linearly, some supra-linearly; lines 272) which by itself might already imply some heterogeneity. I would therefore suggest rewording the statements to clarify what the lack of heterogeneity refers to.

      We agree. In line 212 we now state “cluster analysis (Figure 2D) provided no evidence for subpopulations of layer 3 cells in terms of intrinsic electrophysiological properties (see also Balsamo et al., 2022).”

      (6) n=6 co-recorded pairs are mentioned at line 348, but n=9 at line 366. Are these numbers referring to the same dataset? Please correct or clarify

      Line 349 refers to a set of 6 co-recorded pairs (n=12 neurons) in double injected mice with Chronos injected in ATN and Chrimson in RSC (cf. Fig. 7E). The 9 pairs mentioned in line 367 refer to another type of experiment where we stimulated layer 3 neurons by depolarizing them to induce action potential firing while recording neighboring layer 4 neurons to assess connectivity. Line 367  now reads: “In n = 9 paired recordings, we did not detect functional synapses between layer 3 and layer 4 neurons.”

      Reviewer #3 (Recommendations For The Authors):

      Questions for the authors/points for addressing:

      I found that the slice electrophysiology experiments were not reported with sufficient detail. For example, in Figure 2, I am assuming that the voltage clamp experiments were carried out using the Cs-based recording solution, while the current clamp experiments were carried out using the K-Gluc intracellular solution. However, this is not explicitly stated and it is possible that all of these experiments were performed using the K-Gluc solution, which would give slightly odd EPSCs due to incomplete space/voltage clamp. Furthermore, the method states that gabazine was used to block GABA(A) receptor-mediated currents, but not when this occurred. Was GABAergic neurotransmission blocked for all measurements of EPSC magnitude/dynamics? If so, why not block GABA(B) receptors? If not blocking GABAergic transmission for measuring EPSCs, why not? This should be stated explicitly either way.

      The addition of drugs or difference of solution is indicated in the figure legend and/or in the figure itself, as well as in the methods. We now state explicitly: “In a subset of experiments, the following drugs were used to modulate the responses to optogenetic stimulations; the presence of these drugs is indicated in the figure and figure legend, whenever applicable.” (line 632). A Cs-based internal solution and gabazine were used in Figure 5, this is now indicated in the Methods section (line 626). All other experiments were performed using K-Gluc as an internal solution and ACSF.

      Methods: The experiments involving animals are incompletely reported. For example, were both sexes used? The methods state "Experiments were performed on wild‐type and transgenic C57Bl6 mice" - what transgenic mice were used and why is this not reported in detail (strain, etc)? I would refer the authors to the ARRIVE guidelines for reporting in vivo experiments in a reproducible manner (https://arriveguidelines.org/).

      We now added this information in the methods section, subsection “Animals” (line 566-567). Animals of both sexes were used. The only transgenic mouse line used was the Ai14 reporter line (no phenotype), depending on the availability in our animal facility.

      For experiments comparing ATN and RSC inputs onto the same neuron (e.g. Figure 2 supplement 2 G - J), are the authors certain that the observed differences (e.g. rise time and paired-pulse facilitation on the ATN input) are due to differences in the synapses and not a result of different responses of the opsins? Refer to https://pubmed.ncbi.nlm.nih.gov/31822522/ from Jess Cardin's lab. This could easily be tested by switching which opsin is injected into which nucleus (a fair amount of extra work) or comparing the Chrimson synaptic responses with those evoked using Chronos on the same projection, as used in Figure 2 (quite easy as authors should already have the data).

      We actually did switch the opsins across the two injection sites. In Figure 2 - supplement 2G-J, the values linked by a dashed line result from recordings in the switched configuration with respect to the original configuration (in full lines, Chronos injected in RSC and Chrimson in ATN). The values from switched configuration followed the trend of the main configuration and were not statistically different (Mann-Whitney U test).

      Statistical reporting: While the number of cells is generally reported for experiments, the number of slices and animals is not. While slice ephys often treat cells as individual biological replicates, this is not entirely appropriate as it could be argued that multiple cells from a single animal are not independent samples (some sort of mixed effects model that accounts for animals as a random effect would be better). For the experiments in the manuscript, I don't think this is necessary, but it would certainly reassure the reader to report how many animals/slices each dataset came from. At a bare minimum, one would want any dataset to be taken from at least 3 animals from 2 different litters, regardless of how many cells are in there.

      Our slice electrophysiology experiments include data from 38 successfully injected animals: 14 animals injected in ATN, 20 animals injected in RSC, and 4 double injected animals. Typically, we recorded 1 to 3 cells per slice. We now include this information in the text or in the figure legends (line 159, 160, 297, 767, 826, 831, 832, 839, 845, 901, 941).

      For the optogenetic experiments looking at the summation of EPSPs (e.g. figure 4), I have two questions: why were EPSPs measured and not EPSCs? The latter would be expected to give a better readout of AMPA receptor-mediated synaptic currents. And secondly, why was 20 Hz stimulation used for these experiments? One might expect theta stimulation to be a more physiologically-relevant frequency of stimulation for comparing ATN and RSC inputs to single neurons, given the relevance with spatial navigation and that the paper's conclusions were based around the head direction system. Similarly, gamma stimulation may also have been informative. Did the authors try different frequencies of stimulation?

      Question 1. The current clamp configuration allows to measure  EPSPamplification/prolongation by NMDA or persistent Na currents (cf.  Fricker and Miles 2000), which might contribute to supralinearity.

      Question 2. In a previous study from our group about the AD to PrS connection (Nassar et al., 2018), no significant difference was observed on the dynamics of EPSCs between stimulations at 10 Hz versus 30 Hz. Therefore we chose 20 Hz. This value is in the range of HD cell firing (Taube 1995, 1998 (peak firing rates, 18 to 24 spikes/sec in RSC; 41 spikes/sec in AD)(mean firing rates might be lower), Blair and Sharp 1995). In hindsight, we agree that it would have been useful to include 8Hz or 40Hz stimulations. 

      The GABA(A) antagonist experiments in Figure 5 are interesting but I have concerns about the statistical power of these experiments - n of 3 is absolutely borderline for being able to draw meaningful conclusions, especially if this small sample of cells came from just 1 or 2 animals. The number of animals used should be stated and/or caution should be applied when considering the potential mechanisms of supralinear summation of EPSPs. It looks like the slight delay in RSC input EPSP relative to ATN that was in earlier figures is not present here - could this be the loss of feedforward inhibition?

      The current clamp experiments in the presence of QX314 and a Cs gluconate based internal solution were preceded by initial experiments using puff applications of glutamate to the recorded neurons (not shown). Results from those experiments had pointed towards a role for TTX resistant sodium currents and for NMDA receptor activation as a factor favoring the amplification and prolongation of glutamate induced events. They inspired the design of the dual wavelength stimulation experiments shown in Figure 5, and oriented our discussion of the results. We agree of course that more work is required to dissect the role of disinhibition for EPSP amplification. This is however beyond the present study.

      Concerning the EPSP onset delays following RSC input stimulation:  In this set of experiments, we compensated for the notoriously longer delay to EPSP onset, following RSC axon stimulation, by shifting the photostimulation (red) of RSC fibers to -2 ms, relative to the onset of photostimulation of ATN fibers (blue). This experimental trick led to an improved  alignment of the onset of the postsynaptic response, as shown in the figure below for the reviewer.

      Author response image 3.

      In these experiments, the onset of RSC photostimulation was shifted forward in time by -2 ms, in an attempt to better align the EPSP onset to the one evoked by ATN stimulation.<br />

      We insert in the results a sentence to indicate that experiments illustrated in Figure 5 were performed in only a small sample of 3 cells that came from 2 mice (line 297), so caution should be applied. In the discussion we  formulate more carefully, “From a small sample of cells it appears that EPSP amplification may be facilitated by a reduction in synaptic inhibition (n = 3; Figure 5)” (line 487).

      Figure 7: I appreciate the difficulties in making dual recordings from older animals, but no conclusion about the RSC input can legitimately be made with n=1.

      Agreed. We want to avoid any overinterpretation, and point out in the results section that the RSC stimulation data is from a single cell pair. The sentence now reads : “... layer 4 neurons occurred after firing in the layer 3 neuron, following ATN afferent stimuli, in 4 out of 5 cell pairs. We also observed this sequence when RSC input was activated, in one tested pair.” line (347-349)

      Minor points:

      Line 104: 'within the two subnuclei that form the anterior thalamus' - the ATN actually has three subdivisions (AD, AV, AM) so this should state 'two of the three nuclei that form the anterior thalamus...'

      Corrected, line 103

      Line 125: should read "figure 1F" and not "figure 2F".

      Corrected, line 124

      Line 277-280: Why were two different posthoc tests used on the same data in Figures 3E & F?

      We used Sidak’s multicomparison test to compare each event Sum vs. Dual (two different configurations at each time point - asterisks) and Friedman’s and Dunn’s to compare the nth EPSP amplitude to the first one for Dual events (same configuration between time points - hashmarks). We give two-way ANOVA results in the legend.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment:

      The manuscript establishes a sophisticated mouse model for acute retinal artery occlusion (RAO) by combining unilateral pterygopalatine ophthalmic artery occlusion (UPOAO) with a silicone wire embolus and carotid artery ligation, generating ischemia-reperfusion injury upon removal of the embolus. This clinically relevant model is useful for studying the cellular and molecular mechanisms of RAO. The data overall are solid, presenting a novel tool for screening pathogenic genes and promoting further therapeutic research in RAO.

      Thank you for recognizing the sophistication and clinical relevance of our mouse model for acute retinal artery occlusion. We are grateful for your supportive feedback.

      Public reviews:

      (1) Response to Reviewer #1: 

      Summary:

      Wang, Y. et al. used a silicone wire embolus to definitively and acutely clot the pterygopalatine ophthalmic artery in addition to carotid artery ligation to completely block the blood supply to the mouse inner retina, which mimics clinical acute retinal artery occlusion. A detailed characterization of this mouse model determined the time course of inner retina degeneration and associated functional deficits, which closely mimic human patients. Whole retina transcriptome profiling and comparison revealed distinct features associated with ischemia, reperfusion, and different model mechanisms. Interestingly and importantly, this team found a sequential event including reperfusion-induced leukocyte infiltration from blood vessels, residual microglial activation, and neuroinflammation that may lead to neuronal cell death.

      Strengths:

      Clear demonstration of the surgery procedure with informative illustrations, images, and superb surgical videos.

      Two-time points of ischemia and reperfusion were studied with convincing histological and in vivo data to demonstrate the time course of various changes in retinal neuronal cell survivals, ERG functions, and inner/outer retina thickness.

      The transcriptome comparison among different retinal artery occlusion models provides informative evidence to differentiate these models.

      The potential applications of the in vivo retinal ischemia-reperfusion model and relevant readouts demonstrated by this study will certainly inspire further investigation of the dynamic morphological and functional changes of retinal neurons and glial cell responses during disease progression and before and after treatments.

      We sincerely appreciate your detailed and positive feedback. These evaluations are invaluable in highlighting the significance and impact of our work. Thank you for your thoughtful and supportive review.

      Weaknesses:

      It would be beneficial to the manuscript and the readers if the authors could improve the English of this manuscript by correcting obvious grammar errors, eliminating many of the acronyms that are not commonly used by the field, and providing a reason why this complicated but clever surgery procedure was designed and a summary table with the time course of all the morphological, functional, cellular, and transcriptome changes associated with this model.

      Thank you for your thorough review of the manuscript. We sincerely apologize for any grammatical errors resulting from our English language proficiency and have taken the necessary steps to polish the article. Additionally, we have heeded your advice and reduced the use of field-specific acronyms to enhance readability for both the manuscript and its readers.

      Regarding the rationale behind the design of the UPOAO model, we have provided a description in Introduction section. Our group focuses on the research of pathogenesis and clinical treatment for RAO. The absence of an accurate mouse model simulating the retinal ischemic process has hampered progress in developing neuroprotective agents for RAO. To better simulate the retinal ischemic process and possible ischemia-reperfusion injury following RAO, we developed a novel vascular-associated mouse model called the unilateral pterygopalatine ophthalmic artery occlusion (UPOAO) model. We drew inspiration from the widely employed middle cerebral artery occlusion (MCAO) model, commonly used in cerebral ischemic injury research, which guided the development of the UPOAO model.

      We appreciate your valuable suggestion regarding the inclusion of a summary table outlining the time course of morphological, functional, cellular, and transcriptome changes associated with this model. To address this, we intend to include a supplementary table at the end of the article (Table. S2 Summary Table), which will offer a comprehensive overview of the experimental results, thereby aiding in clarity and interpretation.

      Once again, we thank you for your insightful comments and suggestions, which have greatly contributed to the improvement of our manuscript.

      (2) Response to Reviewer #2: 

      Summary:

      The authors of this manuscript aim to develop a novel animal model to accurately simulate the retinal ischemic process in retinal artery occlusion (RAO). A unilateral pterygopalatine ophthalmic artery occlusion (UPOAO) mouse model was established using silicone wire embolization combined with carotid artery ligation. This manuscript provided data to show the changes in major classes of retinal neural cells and visual dysfunction following various durations of ischemia (30 minutes and 60 minutes) and reperfusion (3 days and 7 days) after UPOAO. Additionally, transcriptomics was utilized to investigate the transcriptional changes and elucidate changes in the pathophysiological process in the UPOAO model post-ischemia and reperfusion. Furthermore, the authors compared transcriptomic differences between the UPOAO model and other retinal ischemic-reperfusion models, including HIOP and UCCAO, and revealed unique pathological processes.

      Strengths:

      The UPOAO model represents a novel approach to studying retinal artery occlusion. The study is very comprehensive.

      We greatly appreciate your positive assessment of our work and are encouraged by your recognition of its significance.

      Weaknesses:

      Some statements are incorrect and confusing. It would be helpful to review and clarify these to ensure accuracy and improve readability.

      We sincerely appreciate your meticulous review of the manuscript. Taking into account your valuable feedback, we will thoroughly address the inaccuracies identified in the revised version. Additionally, we will commit to polishing the article to ensure improved readability. We apologize for any confusion caused by these inaccuracies and genuinely thank you for bringing them to our attention.

      Recommendations For The Authors:

      Reviewer #1:

      (1) Response to comment:

      The conclusions of this paper are mostly well supported by clear images and convincing data analysis, but some aspects of image presentation and additional data analysis may be needed to strengthen the manuscript.

      We sincerely appreciate your positive assessment of our work and your recognition of the clear images and convincing data analysis supporting our conclusions. Your constructive feedback on enhancing the clarity of our manuscript's image presentation and additional data analysis is highly valued. In response to your suggestions, we have taken steps to improve readability by removing or correcting uncommon acronyms from certain images. We have also conducted further data analysis to provide more comprehensive insights. Thank you for your guidance in improving the quality of our manuscript.

      (2) Response to recommendation (1):

      In Results 3.1 or in Method 2.2: please explain why this combination of silicone wire embolization and carotid artery ligation was chosen to replace previous models such as UCCAO? What are the advantages? And why the silicone wire embolus was inserted through ECA instead of inserting into CCA directly? The cleverly designed surgical procedure is very impressive but the reasoning behind it is not obvious and needs more explanation.

      Thank you for your valuable feedback.

      In the introduction, we briefly describe the rationale for developing the UPOAO model to simulate acute ischemia-reperfusion of retinal artery occlusion (RAO). Previous common retinal ischemia model had certain shortcomings. For example, in the HIOP model, which is often used for simulating glaucoma, the ischemic factor of interrupted retinal blood flow may be amplified due to the dual effects of IOP-induced mechanical stress [1, 2] and vascular ischemia due to normal saline perfusion in the anterior chamber. In the UCCAO model, recanalization is performed after ligation of the carotid blood vessels, and the retina communicates with the blood vessels in the brain, resulting in retinal hypoperfusion. The retina ischemia in UCCAO is a chronical process, for example, the retina became thinner at week 10 and week 15 [3], while RAO is an acute total retinal ischemic disease. Therefore, it is critically important to develop a simple mouse model that can simulate acute retinal ischemia and reperfusion injury in RAO patients.

      Various models have been developed for ischemic stroke research, with the endoluminal suture model being the most employed method for middle cerebral artery occlusion (MCAO). In this model, filaments are introduced through either the external or internal carotid artery and advanced into the middle cerebral artery, causing temporary blood flow blockage for a specific duration. This method has been extensively employed in studies involving transient occlusion [4]. Among the MCAO models, the Koizumi method (occlusion from the common carotid artery (CCA) to the middle cerebral artery (MCA)) and the Longa method (occlusion from the external carotid artery (ECA) to the MCA) are frequently used. Among these two methods, the Longa method is more widely utilized in research studies. The Longa method has a much lower mortality rate post-surgery (26%) than that of the Koizumi (44%) [5]. The MCAO model induces substantial infarct areas and significantly contributes to advancements in stroke research, including investigations into blood-brain barrier disruption and inflammatory responses to ischemia.

      RAO is considered a form of ocular stroke. Inspired by the MCAO model, we have employed a silicone wire embolus to induce acute interruption of blood flow to the retina. This approach enables the investigation of pathophysiological processes associated with RAO, providing valuable insights into the understanding of this condition. We have clarified these points in the revised manuscript (line 129).

      The reasoning behind inserting the silicone wire embolus through the ECA instead of directly into the CCA is twofold:

      (1) Convenience and avoidance of heavy bleeding and mortality. Inserting the silicone wire embolus requires creating an opening in the artery, which then needs to be ligated at both ends after the silicone wire embolus is removed to prevent excessive bleeding. The ECA's ability to form a straight line with the ICA after folding makes it more convenient for the entry and removal of the silicone wire embolus. This procedure is more convenient to perform on the ECA. The blood flow to the CCA can be restored after the plug is removed from ECA, ensuring that the blood supply to the brain through the CCA is not affected.

      (2) Preservation of reperfusion process. If the silicone wire embolus were inserted directly into the CCA, the ends of the CCA opening would need to be ligated after the silicone wire embolus is removed. This would result in a lack of reperfusion process after retinal ischemia. To enable the reperfusion process, the decision was made to open the ECA instead.

      We have clarified these points in the revised manuscript to better explain the rationale behind our methodology (line 139). Thank you for prompting this important clarification, which we believe will enhance the understanding of our readers.

      (3) Response to recommendation (2):

      Did the UPOPA actually block OA, including both the retinal (CRA) and choroidal (SPCA and LPCA) blood supply? If so, why does it seem only the inner retina was affected but not the outer retina?

      Thank you for your question. We agree with you that the UPOAO model blocks OA, which includes retinal and choroidal vessels. Our experimental results primarily indicate damage to the inner retinal layer within 7 days of reperfusion. For example, OCT and HE staining showed significant thinning of the inner retina after 60 minutes of ischemia followed by 7 days of reperfusion (Figure 4). At the same time, the b-wave amplitudes were decreases, usually indicating damage to the inner layer of the retina. However, the outer retina was seemed not affected by 60 minutes of ischemia based on the results of OCT, HE and immunofluorescence.

      Inner layer of the retina was known to show the highest sensitivity to hypoxic challenges [6], whereas the outer retinal layer was more resistant to hypoxic stress [7]. The possible reason for these results was that the outer layer like photoreceptors is more tolerant against ischemia than inner layer of the retina. Previous studies of retinal ischemia-reperfusion models supported this assumption. In the UCCAO model, the b-wave was more affected than the a-wave. Decreases in the amplitudes of OPs, scotopic b-wave, and photopic b-wave were consistently observed on week 4 after UCCAO, while the amplitude of scotopic a-wave did not dramatically change [8]. Prolonged ischemia, such as permanent ischemia, led to photoreceptor cell degradation, as seen in Stevens et al.'s report of photoreceptors loss 3 months after permanent ligation of both common carotid arteries in bilateral common carotid artery occlusion (BCCAO) [9]. In the HIOP model, the GCL and INL reacted sensitively to ischemic processes. A significant thinning of the GCL as early as 6 hours after 60 minutes of ischemia [10]. Horizontal cells and photoreceptors remained mostly unaffected, while most RGCs and several amacrine cell subtypes disappear [11, 12].

      Our study revealed the changes that occurred within 60 minutes of ischemia and the first 7 days of reperfusion in the UPOAO model. One possibility was that the ischemia duration in our model was not long enough to affect the outer retinal cells. Furthermore, the observation time point for reperfusion was not long enough to see the structure damage and visual dysfunctions in the outer retinal layer. As we have explained in the manuscript, further exploration is needed to understand changes induced by longer ischemia duration and reperfusion periods. Revealing the damage to retinal structure and function during longer ischemia time will be an emphasis direction for our further research.

      (4) Response to recommendation (3):

      Better to only use well-accepted acronyms and remove those that are rarely seen in other publications, such as IMRL, MRL, HIOP, TRT, etc.

      Thank you for your valuable feedback. In our manuscript, we utilized the Spectralis HRA+OCT device (Heidelberg) to capture the retinal images. However, the resulting image layering did not adequately distinguish each retinal layer clearly. To address this limitation, we referred to a clinical OCT stratification approach in RVO and divided the retina into the inner, middle, and outer layers [16]. We acknowledge that this hierarchical description is not commonly used and have therefore followed your recommendation to remove these rare acronyms and instead employ the layer structure abbreviation along with the plus sign. The methods and results have been revised accordingly (line 213, line 368, Figure 4 and Figure S2).

      In addition, for the HIOP model, it is also known as the IR or RIRI model [17-19], and the pathophysiological process of retinal ischemia-reperfusion injury (IRI) is usually used to represent this type of anterior chamber perfusion model. To avoid confusion between the pathophysiological process of ischemia-reperfusion studied in this paper and the common model of high intraocular pressure, we have consistently referred to it as the HIOP model, an abbreviation that is cited in many references [20-22].

      Thanks again for the suggestion. We apologize for any confusion caused by the use of abbreviations and have made the necessary corrections in the manuscript. We have also strengthened the details of OCT layering in the images to enhance readability for our audience.

      (5) Response to recommendation (4):

      Figure 3F, G: What do the OP changes mean? What retina cell dysfunction leads to OP changes? Is there RGC-relevant visual function readout to correlate with RGC death?

      Oscillatory potentials (OPs) are important components of the electroretinogram (ERG). While the precise origin of OPs remains unclear, they are generally believed to be generated from the inner retinal layer, specifically involving bipolar cells, amacrine cells and ganglion cells [23]. OPs are sensitive indicators of retinal ischemic effects and can detect dysfunction before alterations in the b-waves occur [24-26] (We have added these statements at line 358). In this research, the reduction of OPs indicated dysfunction in the inner retinal layer and retinal ischemia.

      The function of RGCs can be non-invasively assessed by using various ERG technique that emphasize the activity of inner retina neurons, including OPs of multifocal ERG (mfERG), photopic negative response (PhNR) in mfERG, pattern electroretinogram (PERG), negative Scotopic Threshold Response (nSTR) [27]. Among these indicators, the PERG appears to be more specifically related to the presence of functional RGCs. However, the complexity of electrophysiological sources and species-specific differences in RGCs characteristics should also be considered. In addition, visual evoked potentials (VEP) can assess the function of visual signaling in the whole visual pathway from RGC axons to the visual cortex of the brain [28, 29]. Unfortunately, due to the unavailability of specific equipment required for evaluating RGCs function, we encountered limitations in conducting a comprehensive assessment in this study. This limitation emphasizes the importance of future studies incorporating RGCs evaluation to provide a more comprehensive understanding of visual pathway functionality and its implications, considering indicators such as PERG and PhNR.

      Thank you for your careful review and insightful questions.

      (6) Response to recommendation (5):

      Figure 4B: RNFL/GCL/IPL normally called GCC (ganglion cell complex).

      We appreciate your helpful recommendation regarding the abbreviation GCC (ganglion cell complex) for the combination of RNFL, GCL, and IPL. We have updated this terminology in the revised manuscript (line 213 and Figure 4).

      (7) Response to recommendation (6):

      Figure 4 A-F: Normally a circular OCT image surrounding the optic nerve head is preferred to measure retina thickness. If in these figures, all the OCT images are from the same location, it may be acceptable, but need to provide imaging details on how these OCT planes are selected and what has been done to make sure the same locations were selected for comparison.

      We agree with your comment on OCT imaging that the retina is usually captured OCT images surrounding the optic nerve head. In this study, our goal was to assess both the thickness of the peripheral retina and the retina near the optic nerve head. To achieve this, we considered the optic nerve head as the apex of the selected field of view (left upper region of panel A in Figure 4). For each mouse, we obtained OCT images of the superior nasal (SN), superior temporal (ST), inferior nasal (IN), and inferior temporal (IT) fields of the optic nerve. We then averaged the thicknesses from these four fields. In each field, we measured and statistically evaluated the retinal thickness at distances of 1.5, 3, and 4.5 papillae diameters (PD) from the optic nerve head.

      This approach allowed us to ensure that the same locations were selected for comparison and provided a comprehensive assessment of retinal thickness across different regions. We have detailed this methodology in the revised manuscript to clarify the imaging process and the consistency of the selected locations.

      Thank you for your insightful feedback.

      Reviewer #2:

      Addressing the following concerns is necessary to improve the manuscript.

      (1) Response to recommendation (1):

      The manuscript contains many grammatical errors and should be carefully reviewed for corrections. For example: In the title, "Silicone Wire Embolization-induced Acute Retinal Artery Ischemia and Reperfusion Model in Mouse: Gene Expression Provide Insight into Pathological Processes". It should be "Provides" instead of "Provide". In the Abstract, "The resident microglia within the retina and peripheral leukocytes which access to the retina were pronounced increased on reperfusion periods." It should be "pronouncedly" or "markedly" instead of " pronounced".

      Thank you for your careful reading and pointing out the grammatical errors in the manuscript. We apologize for these mistakes and have since revised and polished the article with the assistance of native English speakers. Ensuring accurate and clear language usage in scientific writing is crucial, and we appreciate your help in improving the quality of our manuscript. Thank you for bringing these errors to our attention.

      (2) Response to recommendation (2):

      Video 2: the video content from "30s-47s" and "50s-67s" is repeatedly shown.

      Thank you for your careful review of the video. In the process of preparing the external carotid artery for silicone wire embolus insertion, we first ligated the distal end with a square knot and then tied a loose knot at the proximal end. In the video content from "30s-47s" and "50s-67s", we are tying a square knot. We apologize for any confusion caused by these repeated video clips.

      (3) Response to recommendation (3):

      Figure 1: The ConA staining (H-I) and FFA (J-K) were performed before the removal of silicone wire embolus. It would be beneficial to clarify this in the figure legend too. Additionally, the label 'Post. Sup. Alveolar art.: Posterior superior alveolar artery' is not present in Figure 1L."

      Thank you for your thorough review of the manuscript and the valuable suggestions regarding Figure 1. We have updated the figure legend of Figure 1 to clarify that ConA staining (H-I) and FFA (J-K) were performed before the removal of the silicone wire embolus (line 868 and line 873). Additionally, we have included the label 'Post. Sup. Alveolar art' in Figure 1L as you pointed out. We appreciate your careful attention to detail, and we have ensured that these omissions have been rectified in the revised version of the manuscript.

      (4) Response to recommendation (4):

      Figure 2: only representative images of RGCs at the peripheral retina were shown. It is not clear if only RGCs in the peripheral retina were quantified. Is there RGC loss in the central and middle retina in the UPOAO model as well? How many fields of RGCs were quantified for each retina?

      Thank you for your meticulous review of the manuscript. The quantification method of RGCs is described in detail as follows:

      Four radial incisions were made in the retina and flattened on a glass slide to create a "four-leaf clover" shape. Retina was photographed using a fluorescence microscope (BX63, Olympus, Japan). We captured images from three different regions of each retinal quadrant: 0.1 mm-0.5 mm (central region, field numbers: 1, 4, 7, 10), 0.9 mm-1.3 mm (middle region, field numbers: 2, 5, 8, 11), and 1.7 mm-2.1 mm (peripheral region, field numbers: 3, 6, 9, 12) from the optic nerve head, respectively, as shown in Author response image 1.

      Of these, the peripheral field changes were the most noticeable, so we used the Leica SP8 confocal microscope (20X) to capture peripheral field RGCs as a demonstration (Figure 2A, C, E, G). RGC counts of twelve fields of each retina were quantified and the average density of RGCs in twelve fields per retina was shown in Figure 2B, D, F, K. RGC counts in the central (field number: 1, 4, 7, 10), middle (field number: 2, 5, 8, 11), and peripheral (field number: 3, 6, 9, 12) visual fields were shown in Author response table 1-4.We have included this detailed methodology in the revised manuscript to clarify the quantification process and to address the presence of RGCs loss in both the central and middle retina in the UPOAO model. Thank you for pointing out the need for this clarification.

      Author response image 1.

      Schematic diagram of field selection. Scale bar=1.4 mm. Each retinal petal has three distinct visual fields (the area circled by the green line) that radiate from the optic nerve head to the periphery, in that order, the central, middle, and peripheral visual fields.

      Author response table 1.

      RGCs counts in each field of each retina (30-minute ischemia and 3-day reperfusion)

      Author response table 2.

      RGCs counts in each field of each retina (30-minute ischemia and 7-day reperfusion)

      Author response table 3.

      RGCs counts in each field of each retina (60-minute ischemia and 3-day reperfusion)

      Author response table 4.

      RGCs counts in each field of each retina (60-minute ischemia and 7-day reperfusion)

      (5) Response to recommendation (5):

      Figure 3: The representative wave lines in panels A (60min_3d, 60min_7d) and F do not reflect the statistical analysis presented in panels D, E, and G, especially for the amplitudes of b waves and OPs.

      Thank you for your careful review of the manuscript. We've added labels for a-waves, b-waves, and improved the presentation of OPs to make the details of the amplitude more visible (Figure 3). In the previous version, due to incorrect settings, we did not adjust the ordinate spacing when fitting curves of representative wave lines in four groups, resulting in the curves being compressed vertically to the same height. We have now adjusted the curves to be fitted under the same scale bar (shown in the bottom right corner of Figure. 3A). What’s else, we removed the baseline wave of the OPs wave and adjusted the abscissa scale to highlight the N waves and P waves for easy reading (Figure 3F).

      (6) Response to recommendation (6):

      There are two different Supplementary Figure 1 and no Supplementary Figure 3, resulting in misaligned references to Supplementary Figures 1, 2, and 3 in the text.

      Thank you for your careful review of the manuscript. We have reviewed the manuscript again and identified errors in uploading the supplementary figures, which resulted in duplicate Supplementary Figure 1 and the absence of Supplementary Figure 3. We have corrected these issues and realigned the references to Supplementary Figures 1, 2, and 3 in the text to ensure consistency. We appreciate your attention to detail and your reminder to address this issue.

      (7) Response to recommendation (7):

      There is confusion about the definition of ORL (outer retina layer). In Lines 208-209, ORL was defined as the combined thickness of the rest to the retinal pigment epithelium (RPE). It seems the ONL is included in ORL. But in lines 358-359, 907-908, "the ORL encompassed the region from the inner segment/outer segment (IS/OS) to the RPE". Please make the definition consistent. In addition, it is hard to distinguish the regions marked by the green lines in Fig. 4A (sham image) after Line 902.

      Thank you for your careful review of the manuscript. We have addressed the confusion regarding the definition of the outer retinal layer (ORL). The Heidelberg OCT device does not distinguish the layers of the mouse retina well, so we divided it into three broader layers:

      (1) Ganglion Cell Complex (GCC) layer, which encompasses RNFL+GCL+IPL.

      (2) Middle Retinal Layer, which includes INL+OPL.

      (3) Outer Retinal Layer (ORL), which includes ONL+IS/OS+RPE.

      We apologize for the inconsistency and have revised the errors in the manuscript and figure legends accordingly. Additionally, we have removed rare domain-specific acronyms and replaced them with more commonly understood abbreviations, as suggested, to avoid confusion.

      Furthermore, we have enlarged parts of the OCT images to better display the layers, hoping to meet the readers' requirements and improve clarity. Thank you for your valuable feedback.

      (8) Response to recommendation (8):

      Figure 4 (Panels H-J, L-M) incorporated with the text (Line 902) differs from the high-resolution version of Figure 4 included later in the manuscript. In Figure 4 (Panels H-J, L-M) merged with the text (Line 902), the quantification of the IPL and INL thickness is incorrect, and the scale bar is inaccurate. However in the high-resolution version of Figure 4 provided later, the thickness of the RNFL+GCL is incorrect.

      Thank you for your careful review of the manuscript. The quantification of the IPL and INL thickness in Figure 4 (Panels H-J, L-M) incorporated with the text has been revised to ensure accurate measurements and scale bars (Figure 4 and line 924). The high-resolution version of Figure 4 provided later has been updated to correct the thickness measurements of the RNFL+GCL. We have ensured that the ordinate in the high-resolution version of Figure 4 now correctly represents length units, consistent with the equal proportional conversion used in the integrated text figures.

      Thank you for your valuable feedback and for pointing out these errors. We have made the necessary corrections to align the figures accurately with the manuscript.

      (9) Response to recommendation (9):

      Line 384-386: the statement "Notably, a-waves in ERG and the thickness of the outer retinal layers in both OCT and HE remained unchanged." is not accurate, since a-waves in ERG is not changed in 3 days but changed in 7 days, and the thickness of the outer retinal layers in HE is either not measured or not shown in Figure 4.

      Thank you for your careful review of the manuscript. We apologize for this error and have revised it.

      We aimed to convey that the amplitude of the a-waves, which represent the function of the photoreceptors, does not show significant variation, which is consistent with the thickness of the outer retinal layer observed in OCT and HE images. Our results indicated that at 7 days post-injury, the amplitude of the a-waves in ERG was statistically different only at stimulus light intensity of 0.3, 3.0 and 10.0 cd.s/m2. In contrast, the b-wave amplitude was reduced by half compared to sham eyes at almost all stimulus light intensities. At the same time, the immunofluorescence staining results of photoreceptor cells showed no significant change at 7-days. Therefore, we consider the change in a-wave amplitudes were not significant compared to the significant decrease in b-wave amplitude. We have clarified this in the revised manuscript.

      We also analyzed the thickness of the outer retinal layers in HE and found it to be consistent with OCT results, showing no significant changes (shown in below Author response image 2).

      Thank you for your valuable feedback, which has helped improve the accuracy and clarity of our manuscript.

      Author response image 2.

      Thickness of OPL, ONL, IS/OS+RPE in HE staining. n=3; ns: no significance (p>0.05).

      (10) Response to recommendation (10):

      Figure 5 and Figure S3: Quantification data from different sections of the same retina should be averaged to represent one single sample (one data point) for statistical analysis. * in images of Fig. 5E, F, I, J is not defined in the figure legend. It would be easier for readers to follow if the GCL, IPL, INL, and OPL were labeled in retinal sections.

      Thank you for your careful review of the manuscript and recommendation. We have reperformed the statistical analysis and updated the results in Figure 5 and Figure S3. In the UPOAO experimental eyes, no no significant change in the number of HCs (Calbindin) was observed during the 3-days reperfusion period, while a notable reduction was observed after 7 days (Figure 5). Additionally, we have added the definition of the asterisks (*) in the figure legend to clarify their significance. We have also labeled the retinal layers, including the GCL, IPL, INL, OPL, and ONL, in the images to make it easier for readers to follow and understand the data.

      Thank you for helping us improve the clarity and accuracy of our manuscript.

      (11) Response to recommendation (11):

      Lines 407-409, the statement "which aligns with the a-waves observed in ERG (Figure 3D, E) and the changes seen in the outer retinal layers in OCT (Fig S2C, D)" is confusing. No changes were observed by OCT in Fig S2D.

      Thank you for your review and we are sorry about the confusion. The overall trend of the amplitude of the a-wave in ERG at 7-days did not change significantly, which is consistent with the immunofluorescence staining results of the photoreceptor cells. Based on these observations, we consider that the change in the amplitude of the a-wave was not significant. As you pointed out in recommendation 9,since a-waves in ERG were changed in 7-days at the stimulus light intensity of 0.3, 3.0 and 10.0 cd.s/m2, our description on the a-waves in 7-days was not accurate. We have clarified this point in the revised manuscript to ensure it accurately reflects the data presented.

      (12) Response to recommendation (12):

      In Figure S4, panel C shows lymphocyte-mediated immunity, and panel D shows leukocyte-mediated immunity. Please adjust the figure legend accordingly to reflect the figures.

      Thank you for your careful review of the manuscript. We have modified the figure legend of Figure S4.

      (13) Response to recommendation (13):

      Lines 440-442 state "These results suggested early ischemic processions such as cell migration and potential collateral vessel formation." It is not clear why and how "potential collateral vessel formation" is suggested by Figure 6 and Figure S4. Please clarify this in the text.

      Thank you for your careful review of the manuscript and we have deleted this sentence due to insufficient evidence. We have corrected this sentence: "These results suggested that in the early stage of retinal ischemic injury, leukocytes from the microvasculature may infiltrate retinal tissue. More experimental validation will be performed to confirm this hypothesis."(line 448). We will be more cautious in drawing conclusions in the future. Thank you for your reminder.

      (14) Response to recommendation (14):

      For the figure legend of Figure 6 "In each heatmap, upper box showed the top 10 up-regulated genes, and the below one showed the top 10 down-regulated genes." Is this correct? It appears that the upper box shows the top 10 down-regulated genes, and the lower box shows the top 10 up-regulated genes.

      Thank you for your careful review of the manuscript and we have modified the figure legend of Figure 6. In the heatmaps, the upper box showed the top 10 down-regulated genes, and the below one showed the top 10 up-regulated genes (line 977).

      (15) Response to recommendation (15):

      For the figure legend of Figure 7, the statement 'Data points are from retinal sections of four animals' is incorrect, as these data were obtained from whole retinas instead of retinal sections. Please revise the legend to reflect this accurately. The scale bar was absent in the images of Figure 7. Asterisk in Figure 7H and 7I was not defined.

      Thank you for your careful review of the manuscript and we have revised the errors. We have added the scale bar (Figure 7D). The white asterisks in Figure 7H and 7I indicate the activated microglial cells and we have added this definition in the legend of Figure7 (line 981).

      (16) Response to recommendation (16):

      It would be better to switch the order of Figure S7 and Figure S8 to align with their descriptions in the text.

      Thank you for your recommendation and we have switched the order of Figure S7 and Figure S8.

      (17) Response to recommendation (17):

      The gene names in Figure S8 should be written consistently with those listed in Table S1.

      Thank you for your recommendation and we have corrected the gene names.

      (18) Response to recommendation (18):

      In Figure 9, it is not clear why amacrine cells were not included in the UPOAO model, as amacrine cells were also injured as shown in Figure 5I-L.

      Thank you for your careful review of the manuscript and we have added amacrine cells in Figure 9.

      References

      (1) Yang, H., et al., The connective tissue phenotype of glaucomatous cupping in the monkey eye - Clinical and research implications. Prog Retin Eye Res, 2017. 59: p. 1-52.

      (2) Pavlatos, E., et al., Regional Deformation of the Optic Nerve Head and Peripapillary Sclera During IOP Elevation. Invest Ophthalmol Vis Sci, 2018. 59(8): p. 3779-3788.

      (3) Lee, D., et al., A mouse model of retinal hypoperfusion injury induced by unilateral common carotid artery occlusion. Experimental Eye Research, 2020. 201: p. 108275.

      (4) Barthels, D. and H. Das, Current advances in ischemic stroke research and therapies. Biochim Biophys Acta Mol Basis Dis, 2020. 1866(4): p. 165260.

      (5) Smith, H.K., et al., Critical differences between two classical surgical approaches for middle cerebral artery occlusion-induced stroke in mice. J Neurosci Methods, 2015. 249: p. 99-105.

      (6) Janáky, M., et al., Hypobaric hypoxia reduces the amplitude of oscillatory potentials in the human ERG. Doc Ophthalmol, 2007. 114(1): p. 45-51.

      (7) Tinjust, D., H. Kergoat, and J.V. Lovasik, Neuroretinal function during mild systemic hypoxia. Aviat Space Environ Med, 2002. 73(12): p. 1189-94.

      (8) Lee, D., et al., Retinal Degeneration in a Murine Model of Retinal Ischemia by Unilateral Common Carotid Artery Occlusion. Biomed Res Int, 2021. 2021: p. 7727648.

      (9) Yamamoto, H., et al., Complex neurodegeneration in retina following moderate ischemia induced by bilateral common carotid artery occlusion in Wistar rats. Exp Eye Res, 2006. 82(5): p. 767-79.

      (10) Palmhof, M., et al., From Ganglion Cell to Photoreceptor Layer: Timeline of Deterioration in a Rat Ischemia/Reperfusion Model. Front Cell Neurosci, 2019. 13: p. 174.

      (11) Adachi, M., et al., High intraocular pressure-induced ischemia and reperfusion injury in the optic nerve and retina in rats. Graefes Arch Clin Exp Ophthalmol, 1996. 234(7): p. 445-51.

      (12) Jehle, T., et al., Quantification of ischemic damage in the rat retina: a comparative study using evoked potentials, electroretinography, and histology. Invest Ophthalmol Vis Sci, 2008. 49(3): p. 1056-64.

      (13) Hayreh, S.S., H.E. Kolder, and T.A. Weingeist, Central retinal artery occlusion and retinal tolerance time. Ophthalmology, 1980. 87(1): p. 75-8.

      (14) Luo, X., et al., Hypoglycemia induces general neuronal death, whereas hypoxia and glutamate transport blockade lead to selective retinal ganglion cell death in vitro. Invest Ophthalmol Vis Sci, 2001. 42(11): p. 2695-705.

      (15) Schmid, H., et al., Loss of inner retinal neurons after retinal ischemia in rats. Invest Ophthalmol Vis Sci, 2014. 55(4): p. 2777-87.

      (16) Furashova, O. and E. Matthè, Hyperreflectivity of Inner Retinal Layers as a Quantitative Parameter of Ischemic Damage in Acute Retinal Vein Occlusion (RVO): An Optical Coherence Tomography Study. Clin Ophthalmol, 2020. 14: p. 2453-2462.

      (17) Pang, Y., et al., CD38 Deficiency Protects Mouse Retinal Ganglion Cells Through Activating the NAD+/Sirt1 Pathway in Ischemia-Reperfusion and Optic Nerve Crush Models. Invest Ophthalmol Vis Sci, 2024. 65(5): p. 36.

      (18) Feng, Y., et al., GSK840 Alleviates Retinal Neuronal Injury by Inhibiting RIPK3/MLKL-Mediated RGC Necroptosis After Ischemia/Reperfusion. Invest Ophthalmol Vis Sci, 2023. 64(14): p. 42.

      (19) Zeng, S., et al., CREG Protects Retinal Ganglion Cells loss and Retinal Function Impairment Against ischemia-reperfusion Injury in mice via Akt Signaling Pathway. Mol Neurobiol, 2023. 60(10): p. 6018-6028.

      (20) Rosenbaum, D.M., et al., The role of the p53 protein in the selective vulnerability of the inner retina to transient ischemia. Invest Ophthalmol Vis Sci, 1998. 39(11): p. 2132-9.

      (21) Zhang, Y., et al., Melatonin Alleviates Pyroptosis of Retinal Neurons Following Acute Intraocular Hypertension. CNS Neurol Disord Drug Targets, 2021. 20(3): p. 285-297.

      (22) Zhu, J., et al., Protective effects of Erigeron breviscapus Hand.- Mazz. (EBHM) extract in retinal neurodegeneration models. Mol Vis, 2018. 24: p. 315-325.

      (23) Wachtmeister, L., Oscillatory potentials in the retina: what do they reveal. Prog Retin Eye Res, 1998. 17(4): p. 485-521.

      (24) Cao, W., et al., Dextromethorphan attenuates the effects of ischemia on rabbit electroretinographic oscillatory potentials. Documenta Ophthalmologica, 1993. 84(3): p. 247-256.

      (25) Xu, J., et al., Pregabalin Mediates Retinal Ganglion Cell Survival From Retinal Ischemia/Reperfusion Injury Via the Akt/GSK3β/β-Catenin Signaling Pathway. Invest Ophthalmol Vis Sci, 2022. 63(12): p. 7.

      (26)Takács, B., et al., Electroretinographical Analysis of the Effect of BGP-15 in Eyedrops for Compensating Global Ischemia-Reperfusion in the Eyes of Sprague Dawley Rats. Biomedicines, 2024. 12(3).

      (27) Porciatti, V., Electrophysiological assessment of retinal ganglion cell function. Exp Eye Res, 2015. 141: p. 164-70.

      (28) Ridder, W.H. and S. Nusinowitz, The visual evoked potential in the mouse—Origins and response characteristics. Vision Research, 2006. 46(6): p. 902-913.

      (29) Liu, S., et al., An optimized procedure to record visual evoked potential in mice. Exp Eye Res, 2022. 218: p. 109011.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this paper, Song, Shi, and Lin use an existing deep learning-based sequence model to derive a score for each haplotype within a genomic region, and then perform association tests between these scores and phenotypes of interest. The authors then perform some downstream analyses (fine-mapping, various enrichment analyses, and building polygenic scores) to ensure that these associations are meaningful. The authors find that their approach allows them to find additional associations, the associations have biologically interpretable enrichments in terms of tissues and pathways, and can slightly improve polygenic scores when combined with standard SNP-based PRS.

      Strengths:

      • I found the central idea of the paper to be conceptually straightforward and an appealing way to use the power of sequence models in an association testing framework.

      • The findings are largely biologically interpretable, and it seems like this could be a promising approach to boost power for some downstream applications.

      Weaknesses:

      • The methods used to generate polygenic scores were difficult to follow. In particular, a fully connected neural network with linear activations predicting a single output should be equivalent to linear regression (all intermediate layers of the network can be collapsed using matrix-multiplication, so the output is just the inner product of the input with some vector). Using the last hidden layer of such a network for downstream tasks should also be equivalent to projecting the input down to a lower dimensional space with some essentially randomly chosen projection. As such, I am surprised that the neural network approach performs so well, and it would be nice if the authors could compare it to other linear approaches (e.g., LASSO or ridge regression for prediction; PCA or an auto-encoder for converting the input to a lower dimensional representation).

      Response: We thank the reviewer for the recognition and valuable suggestion on our work. Just as the reviewer suggested, our polygenic prediction procedure is equivalent to linear transformation and in this revision, we indeed found that it was unnecessary to use neural network framework to replace linear model. Indeed, both our result and previous work indicated that linear model fitted polygenic traits better than non-linear one, which was also the reason we chose linear activation for neural network in the original manuscript.

      In this revision, we followed the reviewer’s suggestion to apply a more straightforward linear framework for polygenic prediction. We first calculated weighted sum of HFS for each block (1,361 independent blocks in total), then, in each target ancestry, we used LASSO regression to integrate them with SNP PRS into one final score. We also conducted comparative analysis in British European test set and found that LASSO, ridge and elastic net gave similar result, and LASSO performed slightly better. By applying this straightforward framework and sliding window strategy, we moderately improved the prediction performance.

      Line 349: “Using height as a representative trait, we first estimated the proportion of variance captured by top loci, and found that HFS of loci with PIP>0.4 (n=5,101) captured roughly 80% of variance explained by all genome-wide loci (n=1,200,024 corresponded to sling-window strategy; Figure 5A). We then calculated HFS+LDAK in non-British European (NBE), South Asian (SAS), East Asian (EAS) and African (AFR) population in UK Biobank, and observed 17.5%, 16.1%, 17.2% and 39.8% improvement over LDAK alone (p=3.21×10-16, 0.0001, 0.002 and 0.001, respectively. Figure 5C).”

      Author response image 1.

      • A very interesting point of the paper was the low R^2 between the HFS scores in adjacent windows, but the explanation of this was unclear to me. Since the HFS scores are just deterministic functions of the SNPs, it feels like if the SNPs are in LD then the HFS scores should be and vice versa. It would be nice to compare the LD between adjacent windows to the average LD of pairs of SNPs from the two windows to see if this is driven by the fact that SNPs are being separated into windows, or if sei is somehow upweighting the importance of SNPs that are less linked to other SNPs (e.g., rare variants).

      Response: We thank the reviewer for the suggestion on understanding LD mechanism. In this revision, we used chromosome 1 as an example and calculate the pairwise LD among all SNPs within two adjacent loci. As shown in Figure S1 (below), although HFS-based LD is still significantly lower than median SNP-based LD (paired Wilcoxon test p=1.76e-5), we found that median SNP LD between loci was still lower than what typically observed between adjacent SNPs in GWAS (histogram of x axis; median =0.06). We reasoned that dividing SNPs into block is one of the reasons that HFS suffer less LD than standard GWAS, but not the whole story.

      Author response image 2.

      We agree with the reviewer that the effect of rare variants could also play an important role. In fact, sei author has also found that rare variants tended to have larger sei-predicted effects. We conducted an approximate analysis that remove all rare variants and repeated HFS calculation. Indeed, here HFS LD has profoundly raised to median=0.14, indicating that involving rare variants was vital for low LD.

      Author response image 3.

      Line 123: “Further evaluation indicated that this low LD was led by two factors: integration of rare variant impacts and segmentation. Firstly, excluding rare variants from HFS caused the LD raised to median=0.14 (Method; Figure S2C). Secondly, median LD of SNPs from adjacent loci was 0.06, which was significantly higher than HFS LD (paired Wilcoxon p=1.76×10-5) but significantly lower than HFS LD without rare variants (paired Wilcoxon p<2.2×10-16).”

      • There were also a number of robustness checks that would have been good to include in the paper. For instance, do the findings change if the windows are shifted? Do the findings change if the sequence is reverse-complemented?

      Response: Following the reviewer’s suggestion, we conducted a sliding window analysis where all loci were shifted 2048 bp, thereby doubling the total number of loci. In fine-mapping analysis, more than 90% of the causal loci were reproduced in sliding window analysis, either by themselves or by a overlapping locus:

      Line 207: “29.4% of causal loci (PIP>0.95) in the original analysis were still causal in sliding window analysis. 31.1% and 29.3% of causal loci whose 5’ and 3’ overlapping locus had PIP>0.95 in sliding window analysis, respectively, while themselves were no longer causal.”

      In polygenic prediction analysis, sliding window strategy significantly improved prediction accuracy, as we discussed in question 1.

      As for the issue of reverse complement, the nature of sei input layer is to encode both strand in a symmetric manner, such that the output for both strands would be the same. We have also run sei on the reverse complement (generated by seqkit seq -r -p) to verify that original sequence and reverse complement give the same output.

      Response: Following the reviewer’s suggestion, we added a new discussion paragraph on the issue of sequence model performance on interindividual variations. In brief, we suggest that although the drawback of lack of cross-individual training sets exists and future improvement is necessary, chromatin changes could be better predicted than gene expression. This is because the latter task requires information on long range interaction, which varies among genes and are difficult to be captured by using reference genome as training set. We made a schematic to clarify this:

      Author response image 4.

      We also noticed a few recent studies that directly validated sei predictions by experiments and showed significant accuracy, such as https://doi.org/10.1016/j.neuron.2022.12.026. Taken together, while we agreed that it is necessary to improve sequence model by adding more cross-individual training samples, the current SOTA model sei could still provide unique value to our study.

      Line 423: “The challenge of using sequence-based deep learning (DL) models in HFS applications is further compounded by their difficulty in predicting variations between individuals. Recent studies(Huang et al., 2023; Sasse et al., 2023) indicate that DL models, trained on the reference human genome, demonstrate limited accuracy in predicting gene expression levels across different individuals. This limitation is likely due to the models' inability to account for long-range regulatory patterns, which are crucial for understanding the impact of variants on gene expression and vary across genes. In contrast, our study leveraged sequence-determined functional genomic profiles in association studies, which mitigates this issue to an extent. For instance, although sei cannot identify the specific gene regulated by a given input sequence, it can predict changes in the sequence's functional activity. Future improvements in DL models' ability to predict interindividual differences could be achieved by incorporating cross-individual data in the training process. An example of such data is the EN-TEX(Rozowsky et al., 2023) dataset, which aligns functional genomic peaks with the specific individuals and haplotypes they correspond to.”

      Reviewer #2 (Public Review):

      Summary:

      In this work, Song et al. propose a locus-based framework for performing GWAS and related downstream analyses including finemapping and polygenic risk score (PRS) estimation. GWAS are not sufficiently powered to detect phenotype associations with low-frequency variants. To overcome this limitation, the manuscript proposes a method to aggregate variant impacts on chromatin and transcription across a 4096 base pair (bp) loci in the form of a haplotype function score (HFS). At each locus, an association is computed between the HFS and trait. Computing associations at the level of imputed functional genomic scores should enable the integration of information across variants spanning the allele frequency spectrum and bolster the power of GWAS.

      The HFS for each locus is derived from a sequence-based predictive model. Sei. Sei predicts 21,907 chromatin and TF binding tracks, which can be projected onto 40 pre-defined sequence classes ( representing promoters, enhancers, etc.). For each 4096 bp haplotype in their UKB cohort, the proposed method uses the Sei sequence class scores to derive the haplotype function score (HFS). The authors apply their method to 14 polygenic traits, identifying ~16,500 HFS-trait associations. They finemap these trait-associated loci with SuSie, as well as perform target gene/pathway discovery and PRS estimation.

      Strengths:

      Sequence-based deep learning predictors of chromatin status and TF binding have become increasingly accurate over the past few years. Imputing aggregated variant impact using Sei, and then performing an HFS-trait association is, therefore, an interesting approach to bolster power in GWAS discovery. The manuscript demonstrates that associations can be identified at the level of an aggregated functional score. The finemapping and pathway identification analyses suggest that HFS-based associations identify relevant causal pathways and genes from an association study. Identifying associations at the level of functional genomics increases the portability of PRSs across populations. Imputing functional genomic predictions using a sequence-based deep learning model does not suffer from the limitation of TWAS where gene expression is imputed from a limited-size reference panel such as GTEx.

      However, there are several major limitations that need to be addressed.

      Major concerns/weaknesses:

      (1) There is limited characterization of the locus-level associations to SNP-level associations. How does the set of HFS-based associations differ from SNP-level associations?

      Response: We thank the reviewer for the recognition and the valuable suggestion on our manuscript. Following the reviewer’s suggestion, in this revision we added a paragraph to compare the basic characteristics between HFS-based and SNP-based association study. These comparisons suggested that HFS had no advantage in testing marginal association, but performed better in detecting causal associations.

      Line 144: “When comparing HFS association with the standard SNP-based GWAS on the same data, we found that 98% of significant HFS loci also harbored a significant SNP. There were a few cases (n=0~5) where significant HFS loci did not harbored even marginal SNP association (GWAS p>0.01), which were due to the lack of common SNP in these loci. HFS association p value was higher than GWAS p value in 95 % of significant loci, suggested that HFS did not improve power to detect marginal effect. The genomic control inflation factor (λGC) for the HFS association test varied between 0.99 for asthma and 1.50 for height, closely resembling the SNP GWAS (Pearson Correlation Coefficient [PCC]=0.91, paired t-test p=0.16; Method and Figure S3). We concluded that HFS-based association tests had adequate power and do not introduce additional p-value inflation.”

      (2) A clear advantage of performing HFS-trait associations is that the HFS score is imputed by considering variants across the allele frequency spectrum. However, no evidence is provided demonstrating that rare variants contribute to associations derived by the model. Similarly, do the authors find evidence that allelic heterogeneity is leveraged by the HFS-based association model? It would be useful to do simulations here to characterize the model behavior in the presence of trait-associated rare variants.

      Response: Following the reviewer’s suggestion, we conducted a sensitivity analysis that removed all rare (MAF<0.01) variants and repeated the HFS analysis (HFScommon) on chromosome 1. In linear association analysis, we found that 10.6% of HFS signals (p<5×10-8) were missed by HFScommon. In fine-mapping, 55.3% of HFS causal signals (PIP>0.95) were missed by HFScommon. We concluded that rare variants played an important role in the performance of HFS, especially its advantages in fine-mapping.

      Line 175: “We also found that rare variants played an important role in the good find-mapping performance of HFS: when variants with MAF<0.01 were removed, 55.3% of the causal signals would be missed in HFS+SUSIE analysis.”

      We then attempted to conduct a simulation analysis where rare variants were causal to the phenotype, and the association statistics were the same as real GWAS of height. However, such simulation seemed not to properly reflect real scenario: no matter how we changed the association between rare variants and the phenotype, HFS association p-value could hardly reached the significance level of SNP association. We proposed that this is because simulation could not properly reflect how variants impact functional genomics: in fact, when randomly selected a rare variant as causal variant, there is high possibility that it had no impact on functional genomics, therefore its HFS would be close to zero. When such a variant was set as causal (which is unlikely in real scenario), HFS would not properly capture the association. We reasoned that it might be difficult to evaluate HFS by simulation, since the nonlinear relation between SNP and HFS as well as among SNPs were difficult to be properly simulated.

      Author response image 5.

      (3) Sei predicts chromatin status / ChIP-seq peaks in the center of a 4kb region. It would therefore be more relevant to predict HFS using overlapping sequence windows that tile the genome as opposed to using non-overlapping windows for computing HFS scores. Specifically, in line 482, the authors state that "the HFS score represents overall activity of the entire sequence, not only the few bp at the center", but this would not hold given that Sei is predicting activity at the center for any sequence.

      Response: We thank the reviewer for the suggestion on sliding window design. In this revision, we shifted all loci 2,048 bp to double the number of loci and repeated the fine-mapping and polygenic prediction analysis. For fine-mapping, we found that the result was generally robust with regard to sliding window procedure, and the majority of the causal associations were retained:

      Line 207: “29.4% of causal loci (PIP>0.95) in the original analysis were still causal in sliding window analysis. 31.1% and 29.3% of causal loci whose 5’ and 3’ overlapping locus had PIP>0.95 in sliding window analysis, respectively, while themselves were no longer causal.”

      In polygenic prediction, sliding window analysis provided a significantly improved performance compared with previous analysis on non-overlapping loci:

      However, since in this revision we have several updates on the polygenic prediction procedure, it was difficult to quantify how much improvement was led by sliding window design. Thus, we directly showed the new result in figure 5 but did not compare it with the original result.

      We also modified the previously imprecise statement to:

      Line 490: “…it integrated information of the entire sequence, not only the few bp at the center.”

      (4) Is the HFS-based association going to miss coding variation and several regulatory variants such as splicing variants? There are also going to be cases where there's an association driven by a variant that is correlated with a Sei prediction in a neighboring window. These would represent false positives for the method, it would be useful to identify or characterize these cases.

      Response: As the reviewer suggested, sei captured only functional genomic features and is by nature prone not to perform well when the causal variants impact protein sequences. In this revision, we characterized this by focusing on causal exonic variants (SNP PIP>0.95):

      Line 322: “On the other hand, HFS perform worse than SNP-based fine-mapping on exonic regions. Taking height as an example, PolyFun detected 125 causal SNPs (PIP>0.95) in the exonic regions, but only 16% (20) of loci that harbored them also reached PIP>0. 5 (11 reached PIP>0.95) in HFS+SUSIE analysis. Among the 105 loci that missed such signals (HFS PIP<0.5), 12 had a nearby locus (within 10kb) showing HFS PIP>0.95, which likely reflected false positive led by LD. Thus, SNP-based analysis should be prioritized over HFS in coding regions.”

      Additional minor concerns:

      (1) It's not clear whether SuSie-based finemapping is appropriate at the locus level, when there is limited LD between neighboring HFS bins. How does the choice of the number of causal loci and the size of the segment being finemapped affect the results and is SuSie a good fit in this scenario?

      Response: Following the reviewer’s suggestion, we reran SUSIE under different predefined causal loci number (from 2 to 10), and found that the identified causal loci were consistent.

      Author response image 6.

      Line 211: “Besides, HFS+SUSIE was also robust when the predefined number of causal loci (L=2 to 10) was changed, and the number of detected loci were not changed.”

      As for the size of segmentation, we divided the predefined segmentations (independent blocks detected by LDetect) into two half and reran SUSIE, and found that three additional causal loci emerged in one half. This suggested that using too small segmentation might increase the false positive rate. However, since there is no LD between independent blocks (which was guaranteed by LDetect), it is not necessary to use even longer blocks.

      Author response image 7.

      Line 133: “Simulation analysis revealed that when a non-reference sequence class score was associated the trait, reference class score could still capture median 70% of HFS-trait association R2.”

      (2) It is not clear how a single score is chosen from the 117 values predicted by Sei for each locus. SuSie is run assuming a single causal signal per locus, an assumption which may not hold at ~4kb resolution (several classes could be associated with the trait of interest). It's not clear whether SuSie, run in this parameter setting, is a good choice for variable selection here.

      Response: As we discussed below (question 3), in this revision we no longer applied SUSIE to find one sequence class score for each locus due to the impact of overfitting, and use the reference sequence class uniformly for all loci. As reviewer suggested, we applied simulation to evaluate how this procedure influence HFS performance, especially when multiple sequence class of the same locus is causal to the phenotype. We found that reference sequence class score could capture median 69.1% of phenotypic R2 when the causal sequence class is not the reference, and captured median 59.2% of R2 when there was 2~5 non-reference causal class. We concluded that the loss led by skipping sequence class selection is mild, and it is necessary to do so in consideration of the risk of overfitting.

      Author response image 8.

      (3) A single HFS score is being chosen from amongst multiple tracks at each locus independently. Does this require additional multiple-hypothesis correction?

      Response: We agree with the reviewer that choosing the sequence class for each locus represented multiple testing, and with additional experiments we indeed observed some evidences of overfitting of this procedure. Thus, in this revision, we no longer applied the per-locus feature selection procedure, but instead used the sequence class corresponded to the reference (hg38) sequence. Consequently, additional multiple-testing correction is avoided with this procedure. We admitted that such simplification missed certain information, but as mentioned above, such lost is moderate, and is necessary to ensure statistical robustness and reduce false positive. In fact, with such simplification we better controlled the inflation factor of HFS GWAS and got better portability in polygenic prediction.

      (4) The results show that a larger number of loci are identified with HFS-based finemapping & that causal loci are enriched for causal SNPs. However, it is not clear how the number of causal loci should relate to the number of SNPs. It would be really nice to see examples of cases where a previously unresolved association is resolved when using HFS-based GWAS + finemapping.

      Response: In this revision, we did not observe a clear relation between causal loci number and causal gene number. The only trend is that SNP-based fine-mapping seemed to perform better at coding regions, in accordance with the fact that HFS capture functional genomic signals. We also added new interpretations to highlight some examples where HFS resolve previously unresolved association signals. For example,

      Line 287: “Specifically, in 1q32.1 region, HFS+SUSIE identified two loci with PIP>0.9 (Figure 4B). SNP-based association also found significant association in this region, but SNP fine-mapping(Weissbrod et al., 2020) could not resolve this signal and only found seven signals between PIP=0.1 to 0.5.”

      (5) Sequence-based deep learning model predictions can be miscalibrated for insertions and deletions (INDELs) as compared to SNPs. Scaling INDEL predictions would likely improve the downstream modeling.

      Response: Following the reviewer’s suggestion, we conducted a sensitivity analysis that removed all indel on chromosome 1 and repeated HFS analysis. Removing indel has indeed increased the number of significant (p<5e-8) association by 9%, but also slightly increased inflation factor (paired wilcox test p=0.0001). In fine mapping analysis, removing indel caused a 4.7% decrement in the number of detected causal association (PIP>0.95). We reasoned that the potential miscalibration on indel has indeed impacted the statistical power of HFS, but the proper approach to control this impact might not be direct and is still await optimizing. In this revision, we still kept all indels in the analysis, since we proposed that the power of fine-mapping is more important than the power of marginal association.

      Line 213: “Lastly, removing insertion and deletion would reveal 9% more significant association (p<5×10-8) but 4.7% less causal association (PIP>0.95), and slightly increased inflation factor (Wilcoxon p=0.0001, Figure S4).”

      Author response image 9.

      Reviewer #1 (Recommendations For The Authors):

      It was unclear to me why the sei output was rounded to two decimal places to "avoid influence of sei prediction noise". Wouldn't rounding introduce additional noise?

      Response: We thank the reviewer for pointing out our inadequate description. The rounding procedure is used to mask the low value that likely did not reflect any real change. The idea is that, even if a variant actually does not bring about any functional changes, sei would still output a very low HFS value that is not equal to, but close to, zero. By rounding procedure, such low values would be set to zero, which could avoid noise. We have added this rationale to the method section:

      Line 529: “This is due to the fact that even if a variant actually makes no impact on functional genomics, sei would still output a value that are close to but not equal to reference sequence class score. Rounding procedure would set such HFS to zero and remove the random value from sei.”

      Minor comments / typos:

      • There are many typos in the abstract.

      Response: We have revised the typo and grammar issues in the abstract in this revision.

      • I believe "Arachnoid acid-intelligence" should be "Arachidonic acid-intelligence".

      • Consistently there is no space between text and parenthetical citations. For example, "sei(Chen et al., 2022)" should be "sei (Chen et al., 2022)".

      • Line 110: "at least one non-reference haplotypes" --> "at least one non-reference haplotype".

      • Line 155: "data-based method" --> "data-based methods".

      • Lines 165-166: "functionally importance" --> "functional importance".

      Response: We have made these revisions accordingly.

      • Line 210: the sentence containing "this annotation on conditioned of a set of baseline annotations" is unclear.

      Response: We have revised this sentence as “…regressed the PIP against this annotation, with a set of baseline annotations included as covariates, similar to the LDSC framework.”

      • Line 213: "association" --> "associations".

      • Line 219: "association" --> "associations".

      • Line 251: "result" --> "results".

      • Line 269: "result" --> "results".

      • Line 289: "known to involved" --> "known to be involved".

      • Line 356: "LDAK along" --> "LDAK alone".

      • Line 362: "BOLT-LMM along" --> "BOLT-LMM alone".

      • Supplement: "Hihglighted" --> "Highlighted".

      Response: We have made these revisions accordingly.

      • Line 444: Were "British ancestry Caucasians" defined as individuals that self-identified as "white British"? If so, then they should be described as "self-identified "white British"".

      Response: As the reviewer pointed out, we have changed the description as self-identified British ancestry Caucasians.

      Reviewer #2 (Recommendations For The Authors):

      (1) A 2022 cistrome-wide association study (CWAS) computed associations between genetically-predicted chromatin activity and phenotypes. Adding a reference to this paper would be helpful. https://pubmed.ncbi.nlm.nih.gov/36071171/

      Response: Following the reviewer’s suggestion, we discussed the similarity between CWAS and our study:

      Line 89: “In line with this notion, a recent similar strategy called cistrome-wide association study (CWAS) integrated variant-chromatin activity and variant-phenotype association to boost power of genetic study of cancer. (Baca et al., 2022).”

      (2) Line 487 states: "We applied sei to predict 21,906 functional genomic tracks for each sequence, without normalizing for histone mark." It's not clear what normalization is being referred to here.

      Response: We have revised the sentence to:

      Line 495: “We applied sei to predict 21,906 functional genomic tracks for each sequence, without normalizing for histone mark (divided each track score by the sum of histone mark score) as suggested by the sei author.”

      (3) The figures are extremely low resolution, they need to be updated.

      Response: In this revision, we uploaded separate pdf file for each figure to provide high resolution graphs.

      (4). The results section was difficult to follow and would benefit from being written more clearly.

      Response: In this revision, we re-arranged some of the result section to better clarify the main idea. We moved all statistical results to the bracket and focused our main text on the interpretation. For example,

      Line 123: “Further evaluation indicated that this low LD was led by two factors: integration of rare variant impacts and segmentation. Firstly, excluding rare variants from HFS caused the LD raised to median=0.14 (Method; Figure S2C). Secondly, median LD of SNPs from adjacent loci was 0.06, which was significantly higher than HFS LD (paired Wilcoxon p=1.76×10-5) but significantly lower than HFS LD without rare variants (paired Wilcoxon p<2.2×10-16).”

      (5) "Along" is used several times in the final results section (PRS estimation), this should be "alone".

      Response: We have modified all misused “along” by “alone” in this revision.

      (6) Instead of using notation identifying genomic location, it might be clearer to provide gene names when illustrating examples of trait-associated promoters.

      Response: In this revision, we added gene name of the corresponding promoters to the main text to better clarify the findings.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      This paper describes technically-impressive measurements of calcium signals near synaptic ribbons in goldfish bipolar cells. The data presented provides high spatial and temporal resolution information about calcium concentrations along the ribbon at various distances from the site of entry at the plasma membrane. This is important information. Important gaps in the data presented mean that the evidence for the main conclusions is currently inadequate.

      Thank you very much for this positive evaluation of our work. We would like to respectfully point out to the Reviewer that our current study was conducted using zebrafish as a model and not goldfish. We have revised the paper to eliminate any gaps in the data presentation.

      Strengths

      (1) The technical aspects of the measurements are impressive. The authors use calcium indicators bound to the ribbon and high-speed line scans to resolve changes with a spatial resolution of ~250 nm and a temporal resolution of less than 10 ms. These spatial and temporal scales are much closer to those relevant for vesicle release than previous measurements.

      (2) The use of calcium indicators with very different affinities and different intracellular calcium buffers helps provide confirmation of key results.

      Thank you very much for this positive evaluation of our work.

      Weaknesses

      (1) Multiple key points of the paper lack statistical tests or summary data from populations of cells. For example, the text states that the proximal and distal calcium kinetics in Figure 2A differ. This is not clear from the inset to Figure 2A - where the traces look like scaled versions of each other. Values for time to half-maximal peak fluorescence are given for one example cell but no statistics or summary are provided. Figure 8 shows examples from one cell with no summary data. This issue comes up in other places as well.

      Thank you for this feedback. We have addressed this in our revised manuscript where possible. We now include the results of paired-t-tests to compare the amplitudes of proximal vs. distal calcium signals shown in Fig. 2A & C, Fig. 3C & D, Fig. 4 C & D, Fig. 5A-D, and Fig. 8E&F. Because proximal and distal calcium signals were obtained from the same ribbons within 500-nm distances, as the Reviewer pointed out, “the traces look like scaled versions of each other”. For experiments where we make comparisons across cells or different calcium indicators, as shown in Fig.3 E&F, Fig.5E, and Fig. 8B&C, we now include the results of an unpaired t-test. We have now included the t-test statistics information in the respective figure legends in the revised version.

      Regarding the Reviewer’s concern that “values for time to half-maximal peak fluorescence are given for one example cell, but no statistics or summary are provided,” we estimated the fluorescence rise times by only fitting the average traces to compare the overall qualitative behavior of the corresponding calcium indicator fluorescence. We did attempt to analyze the uncertainty for the rise-time estimates, but the simultaneous fitting of the rise- and decay-behavior of time traces is notoriously sensitive to noise, and therefore, a much higher signal-to-noise ratio would be required to provide reliable uncertainty estimation for the corresponding rise-time and decay-time characteristics. This is now explicitly explained in the corresponding Methods subsection.

      In Figure 8, we now show example fluorescence traces from one cell at the bottom of the A and D panels, and the summary data is described in B-C and E-F, with statistics provided in the figure legends.

      (2) Figure 5 is confusing. The figure caption describes red, green, and blue traces, but the figure itself has only two traces in each panel and none are red, green, or blue. It's not possible currently to evaluate this figure.

      Thank you for pointing out this oversight. The figure shows the proximal and distal calcium signals, not the cytoplasmic ones. The figure caption was adjusted to correctly reflect what is shown in the figure.

      (3) The rise time measurements in Figure 2 are very different for low and high-affinity indicators, but no explanation is given for this difference. Similarly, the measurements of peak calcium concentration in Figure 4 are very different from the two indicators. That might suggest that the high-affinity indicator is strongly saturated, which raises concerns about whether that is impacting the kinetic measurements.

      We agree with the Reviewer and had mentioned in the text that we do believe that the high-affinity version of the dye is at least partially saturated. This will be especially a problem for strong depolarizations and signals near the membrane. We slightly changed the corresponding description of results on page 6 to acknowledge this point: “However, it should be noted that Cal520HA will be at least partially saturated at the Ca2+ levels expected in Ca2+ microdomains relevant for vesicle exocytosis, affecting both the amplitude and the kinetics of the fluorescence signal”. 

      Recommendations:

      (1) It would be good to describe the location of calcium channels relative to the ribbon in the introduction.

      We have provided this information in the discussion (please see p. 19: “The faster, smaller, and more spatially confined Ca<sup>2+</sup> signals that are insensitive to the application of high concentrations of exogenous Ca<sup>2+</sup> buffers, referred to here as ribbon proximal Ca<sup>2+</sup> signals, could be due to Ca<sup>2+</sup> influx through Cav channel clusters beneath the synaptic ribbon”). We have now provided this information in the last paragraph of the introduction as well. 

      (2) The introduction is quite technical and would benefit from a more complete description of the findings of the paper (e.g. expanding the last sentence to a full paragraph).

      We have updated the last paragraph of the introduction as per the reviewer’s advice.

      (3) It is not clear that the capacitance measurements in Figure 1 are needed (I did not see them used anywhere else in the paper).

      We have removed the capacitance measurements from the figure.

      (4) Please add legends in the figures themselves defining different line colors and weights so that a reader does not need to search for them in the figure caption.

      We agree that such figure improvements facilitate reading. We have added legends in the figures themselves, where appropriate.

      (5) The insets with the expanded traces in many cases are too small - e.g. Figure 1F.

      We have enlarged the insets in applicable figures as much as possible to facilitate visualization. These changes can be seen in Figures 1, 2, 3, 4, 5, and 8, as well as Supplementary Figure 3.

      (6) Page 5, statistics for amplitude of calcium changes. Is p < 0.001 really correct here? The SEMs indicate an overlap of the two distributions of mean amplitudes - and later data for which you give p = 0.001 has much less overlap.

      Since the two data sets in question come from paired recordings, with a high Pearson correlation coefficient of 0.93, the p-values are in fact, correct despite this significant overlap. We conducted paired-t-tests to compare proximal vs. distal calcium signals obtained from a single calcium indicator shown in Fig. 2A & C, Fig. 3C & D, Fig.4 C & D, Fig.5A-D, and Fig. 8E&F. For experiments where we make comparisons across cells or across different calcium indicators, as shown in Fig.3 E&F, Fig.5E, and Fig. 8B&C, we performed an unpaired t-test. In response to the Reviewer’s comment, we now provide details on t-statistics in the respective figure legends in the revised version.

      (7) The text on page 6 describing Figure 3 appears to repeat several technical aspects of the measurements that have already been described in Figure 1. I would reduce that overlap as it is confusing for a reader.

      Since Fig.1 describes calcium measurements with free calcium indicator, whereas Fig.3 describes bound calcium indicator, we would prefer to keep the information for the sake of completeness, despite some small amount of repetition.

      (8) Figure 4A needs to be described in more detail.

      We have provided the vesicle pool details in the Supplementary Fig. 1.

      (9) The text in Figure 7 is too small.

      We have redone Fig. 7 and Supplemental Fig. 4 to ensure that the tick labels and other text are sufficiently large.

      (10) Are the units (nM) in Figure 8 correct?

      Thank you for pointing that out. The units were supposed to be µM and have been corrected in the figure.

      Reviewer #2 (Public review):

      Summary:

      The study introduces new tools for measuring intracellular Ca2+ concentration gradients around retinal rod bipolar cell (rbc) synaptic ribbons. This is done by comparing the Ca2+ profiles measured with mobile Ca2+ indicator dyes versus ribbon-tethered (immobile) Ca2+ indicator dyes. The Ca2+ imaging results provide a straightforward demonstration of Ca2+ gradients around the ribbon and validate their experimental strategy. This experimental work is complemented by a coherent, open-source, computational model that successfully describes changes in Ca2+ domains as a function of Ca2+ buffering. In addition, the authors try to demonstrate that there is heterogeneity among synaptic ribbons within an individual rbc terminal.

      Strengths:

      The study introduces a new set of tools for estimating Ca2+ concentration gradients at ribbon AZs, and the experimental results are accompanied by an open-source, computational model that nicely describes Ca2+ buffering at the rbc synaptic ribbon. In addition, the dissociated retinal preparation remains a valuable approach for studying ribbon synapses. Lastly, excellent EM.

      Thank you very much for this appreciation of our work.

      Weaknesses:

      Heterogeneity in the spatiotemporal dynamics of Ca2+ influx was not convincingly related to ribbon size, nor was the functional relevance of Ca2+ dynamics to rod bipolars demonstrated (e.g., exocytosis to different postsynaptic targets). In addition, the study would benefit from the inclusion of the Ca2+ currents that were recorded in parallel with the Ca2+ imaging.

      Thank you for this critique. We agree that our data do not establish the relationship between ribbon size and Ca<sup>2+</sup> signal. By analogy to the hair cell literature, we believe that it is a reasonable hypothesis, but more studies will be necessary to definitively determine whether the signal relates to ribbon size or synaptic signaling. This will be addressed in future experiments.

      We have included the calcium current recorded in parallel with calcium imaging in Fig.1, when we show a single example. We now do the same for individual examples shown in Fig. 8 A and D, bottom. The calcium imaging data shown in Figs. 2-5 and Supp. Fig. 3 is the average trace, thus we have provided the averages of the peak calcium current and statistics. Since in Figure 8D-F some ribbons only have one reading, we have not conducted statistical analysis in this case. 

      Recommendations:

      The major conclusion of the work is that within bipolar cells, heterogeneity exists between Ca2+ microdomains formed at synaptic ribbons, which is supported by the results; however, what causes this is not clear. Most of the comments below are suggestions that hopefully help the authors strengthen the association of Ca2+ domain heterogeneity with features of ribbon AZs or at least offer additional options for the authors to communicate their work.

      (1) In the current study, anatomical segregation of SRs by size does not appear to exist across the ZF rod bipolar terminal, nor has this been reported for mouse rod bipolars. In the absence of this, the current study lacks the fortuitous attributes, and thus reasoning, utilized in the hair cell (HC) studies (those cited in the current MS). Namely, the HC studies utilized the following anatomical features to compare EM, IF, and physio results: a) identified differences in ribbon synapses along a tonotopic gradient (basal to apical cochlea), b) compared ribbons on different sides of an inner HC (pillar vs. modiolar), or c) examined age-dependent changes in HC ribbons.

      Thank you for this comment. We agree that we do not show any interesting systematic relationships between ribbon size and cell position or other large-scale morphological features. We added text on page 19 to stress this (“However, in comparing our findings with studies of ribbon size heterogeneity in hair cell…”). However, to our knowledge, diversity in ribbon size has never been reported in bipolar cells. 

      (2) In the absence of intrinsic topographical segregation in ribbon size within rod bipolars, then a) the imaging data attained from dissoc cells needs to be internally as sound as possible, and b) the parameters used to define ribbon dimensions in light (LM) and electron microscopy should be as communicative/interchangeable as possible.

      Thank you for this comment. Our confocal images show a moderate correlation between ribbon size measured as fluorescence of ribeye binding peptide vs. calcium hot spots.  Similarly, SBF-SEM images demonstrate that the ribbon active zone length vs width show a moderate correlation. We have summarized these findings in Figure 11. Thus, as the Reviewer pointed out, our confocal and SBF-SEM findings support each other.

      (3) It is not entirely clear how the authors distinguish rod bipolars (a subset of On-bipolars) from all other ON-bipolars? The two different preparations: dissoc or intact retina, present distinct challenges. In the example presented in Supplementary Figure 2B, the PKCalpha stained bipolar has an axon that is approx. 25 um long, but the expected length should be approx. 50um based on ZF retinal anatomy and recent study on rbc1/2 (Hellevik et al BioRxiv 2023). One could argue rather that the enzymatic treatment or mechanical shear forces caused the axon to shrink. If that is the line of reasoning, then present a low mag field of view with an assortment of dissoc bipolars stained for PKCalpha, zoom in, and describe cell morphologies and their assignment as PKCa + or -. Then you can summarize how axon terminal size, axon length, and PKC staining are or aren't correlated. Based on the results, one might have to perform IF on each dissoc cell that was assayed under LM (Ca2+ imaging) and ephys to verify it's a rod bipolar. In the case of the EM, the authors refer to the terminals analyzed as rbcs because they have larger terminals and less branching than the cbs. Since these are really nice EM images, data-rich, with better resolution than I have ever seen for retinal SBF-EM, do due diligence by tracing the terminals of neighboring bcs (ignoring details within terminals just outline terminals) and make a visual presentation that illustrates that those you selected as rbs have larger terminals than cbs (this can also give of sense of the density distribution of terminal types). Is there a published ephysio on the ZF rbcs which has been correlated with morphology? The Hellevik et al BioRxiv 2023 study shows light responses but not necessary rbcs distinguished from other On-bcs.

      We have quantified the number of rod bipolar cells obtained from our isolation procedure using two approaches: 1. To fix the isolated bipolar cells and perform immunofluorescence with PKC alpha. 2. To isolate bipolar cells from Tg(vsx1: memCerulean)<sup>q19</sup> transgenic zebrafish, labeling rod bipolar cell type 1 (RBC1) that we recently obtained from Dr. Yoshimatsu (Hellevik et al., 2024). Of note, the circuitry of RBC1 has been shown to be similar to the mammalian rod bipolar cell pathway (Hellevik et al., 2024). Below, we list our findings:

      The average terminal size of fixed bipolar cells labeled with PKC alpha was 5.9 ± 0.2 mm, whereas the freshly isolated living bipolar cells used for our physiology experiments had an average terminal size of 6.3 ± 0.2 mm, and the rod bipolar cells from the Tg(vsx1: memCerulean)<sup>q19</sup> line had an average terminal size of 6.9 ± 0.2 mm. We also measured terminal size for fixed bipolar cells, unlabeled with PKC alpha: 3.3 ± 0.2 mm, and unlabeled cells from Tg(vsx1: memCerulean)<sup>q19</sup> cells: 4.0± 0.2 mm.

      In addition, we also pay attention to the soma shape and dendrites, as the primary dendrite of the RBC is thick and short. Connaughton and Nelson have done a thorough analysis of morphological classification. But no measurements were given. https://onlinelibrary.wiley.com/doi/10.1002/cne.20261. Since the axon length is not retained during the isolation procedure, we do not use it as an identification marker for rod bipolar cells in our experiments.

      We re-imaged vsx1 with the DIC channel to compare the terminal sizes of fluorescently labeled RBC1 terminals with those of other BPCs in the DIC channel. Below are the images that can give a sense of the density distribution of terminal types and measurements.

      Author response image 1.

      Tracing all neighboring terminals in SBF-SEM is laborious and beyond the scope of this manuscript, but we will do full reconstructions in a future publication.

      (4) How to strengthen the description of heterogeneity within the dissoc measurements? There are two places in the LM data where heterogeneity may be relevant. The first point here is that Ribbon size (TAMRA- Ribeye binding peptide) and active zone size (Cal520HA/LA-RBP) measurements depend on labelling the ribbon/Ribeye; thus, Ribbon size and AZ size should be correlated on this basis alone. I would expect Pearson's r value to show a stronger association (r > 0.7) than what is reported in Figure 11B/C (r: 0.52 or 0.32). I would interpret a moderate to weak correlation (r < 0.5 to 0.3) as an indication that ribbons are heterogeneous (variability in Ca influx per unit ribbon size). Now to the second point, in Figure 8 and Supplementary Figure 5 there is time-signal amplitude heterogeneity. >>> My curiosity is whether signal amplitude is heterogeneous in space (ribbon size, my speculation) and in time (complex, but compare ribeye bound and free Ca2+ indicator)? It seems like the data in Figure 8 and 11 should cross over and possibly offer the authors more to say.

      We appreciate the Reviewer’s insightful observation and added a sentence at the very end of the Results section reflecting the Reviewer’s argument (“we note that a large correlation between the inferred ribbon size and active zone size…”)

      The Reviewer’s second point about the connection between heterogeneity of signal amplitude in space and in time is an interesting one as well and could be grounds for an additional investigation in the future.

      (5) As the authors know, a very powerful tool for exploring Ca microdomain dynamics is to exploit the Voltage dependence of Cavs (as exemplified in the numerous HC studies that are cited). An I-V protocol would provide a valuable means to illustrate different rates of saturating the LA and HA Ca indicators. More generally, the Ca currents and associated patch clamp parameters (Gm, leak...) can tell us much about the health of the cell and provide an added metric to assess normal variability between cells. A few places in the MS currents are mentioned yet this data is missing (Figure S5 , last line: Amplitude variability between two cells with similar Ca currents.).

      Thank you for the valuable suggestion. We will include I-V protocol across several ribbons in future experiments.  We have included the calcium currents for all the calcium transient traces. We have also included the statistics to compare those currents across conditions.

      Technical comments

      (6) Since the Ribeye-Ca2+ indicator covers the entire ribbon, it will contribute to a signal gradient. The proximal signal is assumed to be closest to the base of the ribbon where presumably the Cav channels are located, and the distal signal will originate from the top (apex) of the ribbon some 200 nm from the base of the ribbon. Have you tried to measure "ribbon lengths and widths" with the HA and LA Ca indicators? My guess would be that the LA will show a gradient, and give you a better indication of the base of the ribbon; whereas the HA signal will have dimensions similar to the TAMRA-peptide.

      Due to the point spread function limitation in the light microscopy, we obtained all ribbon measurements from the SBF-SEM images only. 

      As a surrogate for size in the light microscopy, we used ribbon fluorescence, which we expect should scale with the number of ribeye molecules in the ribbon (Figure 11B) 

      (7) Normalize proximal and distal LM data to highlight kinetic differences (Fig 2-5, 8), and when describing temporal heterogeneity please use a better description that includes time, such as time-to-pk, and decay1, decay 2....

      In the current manuscript, we only focus on the amplitude as it provides the information about the number of calcium channels. We used the rise time measurements to compare the time to reach the peak amplitude at the proximal vs. distal locations, demonstrating that proximal calcium signals reach the peak faster since the calcium channels are located beneath the ribbon.

      We tried to perform fittings to the individual traces. Since they are too noisy to pick out true kinetic differences between ribbons, we would need to average several traces from each ribbon. We plan to apply our high-resolution approach established in this paper to a longer stimulus and perform the fittings as per the Reviewer’s advice for a future paper.

      We now describe on pages 6-7 the two decay components for data in Figs. 2 and 3.

      (8) Why not measure ribbon length in EM as done in confocal and then compare lengths from LM and EM. In Figure S8, you have made a nice presentation of AZ Area from EM. Make similar plots for EM ribbon length (and width?), and compare the distributions to Figure 11 LM data. Maybe use other statistical descriptions like Coeff of Var or look for different populations by using multi-distribution fits. If the differences in length or area (EM data) can be segregated into short and long distances, then a similar feature might arise from the LM data. If no such morphological segregation exists, then the heterogeneity in Ca microdomains may arise from variable Cav channel density or gating, Ca buffer, etc.

      Due to the point spread function limitation in light microscopy, the size of the ribbon dimensions in light microscopy cannot be reliably measured. As a surrogate, we used total fluorescence of the ribbon, which should correlate with the number of ribeye molecules in the ribbon. To obtain ribbon dimensions, we used measurements from the SBF-SEM images only. We summarized the distribution of ribbon width and length in Figures 11C and 11D. The distribution of the active zone size is summarized in Supplementary Figure 8. Pearson’s correlation coefficients are positive, but a weak correlation, suggesting multiple mechanisms likely to contribute to heterogeneity in the local calcium signals as the Reviewer pointed out.

      (9) Again, the quality of the EM data is great, and sufficient to make the assignment of SVs to different pools, as you have done in Fig S1. My only complaint is that the Ultrafast pool as indicated in the schematic of S1A seems to have a misassignment with respect to the green SV that is 15 nm from the PM. In the original Mennerick and Matthews 1996 study, the UF pool emptied in ~1msec. The morphological correlate for the UF has been assumed to be SVs touching the plasma membrane. 15 nm away is about 14 nm too far to be in the UF.

      Thank you for pointing that out. We have updated the vesicles labeling in Supplementary Figure 1 and Main Figure 4.

      Reviewer #3 (Public review):

      Summary:

      In this study, the authors have developed a new Ca indicator conjugated to the peptide, which likely recognizes synaptic ribbons, and have measured microdomain Ca near synaptic ribbons at retinal bipolar cells. This interesting approach allows one to measure Ca close to transmitter release sites, which may be relevant for synaptic vesicle fusion and replenishment. Though microdomain Ca at the active zone of ribbon synapses has been measured by Hudspeth and Moser, the new study uses the peptide recognizing synaptic ribbons, potentially measuring the Ca concentration relatively proximal to the release sites.

      Thank you very much for this positive evaluation of our work.

      Strengths:

      The study is in principle technically well done, and the peptide approach is technically interesting, which allows one to image Ca near the particular protein complexes. The approach is potentially applicable to other types of imaging.

      Thank you very much for this appreciation.

      Weaknesses:

      Peptides may not be entirely specific, and the genetic approach tagging particular active zone proteins with fluorescent Ca indicator proteins may well be more specific. I also feel that "Nano-physiology" is overselling, because the measured Ca is most likely the local average surrounding synaptic ribbons. With this approach, nobody knows about the real release site Ca or the Ca relevant for synaptic vesicle replenishment. It is rather "microdomain physiology" which measures the local Ca near synaptic ribbons, relatively large structures responsible for fusion, replenishment, and recycling of synaptic vesicles.

      The peptide approach has been used fairly extensively in the ribbon synapse field and the evidence that it efficiently labels the ribbon is well established, however, we do acknowledge that the peptide is in equilibrium with a cytoplasmic pool. Thus, some of the signal arises from this cytoplasmic pool. The alternative of a genetically encoded Ca-indicator concatenated to a ribbon protein would not have this problem, but would be more limited in flexibility in changing calcium indicators. We believe both approaches have their merits, each with separate advantages and disadvantages.

      As for the nano vs. micro argument, we certainly do not want to suggest that we are measuring the same nano-domains, on the spatial scale of 10s of nanometers, that drive neurotransmitter release, but we do believe we are in the sub-micrometer -- 100s of nm -- range. We chose the term based on the usage by other authors to describe similar measurements (Neef et al., 2018; https://doi.org/10.1038/s41467-017-02612-y), but we see the reviewer’s point.

      Recommendations:

      I have no recommendation for additional experiments. However, the statement of "nanophysiology" is too much, and the authors should tone done the ms recognizing some caveats.

      As we mention above, we chose the term based on the usage by other authors to describe similar measurements, and we do believe that we achieve resolution of a few hundred nanometers, and therefore would prefer to keep the current title of the manuscript. For example, Figure 5E shows that, with ribeye-bound low-affinity calcium indicator, the proximal calcium signals were preserved in the presence of BAPTA, rising and decaying abruptly, as expected for a nanodomain Ca<sup>2+</sup> elevation. Thus, we believe that this measurement in particular describes a nanodomain-scale signal. However, we acknowledge that we are not currently able to resolve the spatial distribution of Ca<sup>2+</sup> signals with a spatial resolution of 10s of nanometers.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review):

      This study delineates an important set of uninjured and injured periosteal snRNAseq data that provides an overview of periosteal cell responses to fracture healing. The authors also took additional steps to validate some of the findings using immunohistochemistry and transplantation assays. This study will provide a valuable publicly accessible dataset to reexamine the expression of the reported periosteal stem and progenitor cell markers.

      Strengths: 

      (1) This is the first single-nuclei atlas of periosteal cells that are obtained without enzymatic cell dissociation or targeted cell purification by FACS. This integrated snRNAseq dataset will provide additional opportunities for the community to revisit the expression of many periosteal cell markers that have been reported to date.

      (2) The authors delved further into the dataset using cutting-edge algorithms, including CytoTrace, SCENIC, Monocle, STRING, and CellChat, to define the potential roles of identified cell populations in the context of fracture healing. These additional computation analyses generate many new hypotheses regarding periosteal cell reactions.

      (3) The authors also sought to validate some of the computational findings using immunohistochemistry and transplantation assays to support the conclusion.

      Weaknesses: 

      (1) The current snRNAseq datasets contain only a small number of nuclei (1,189 nuclei at day 0, 6,213 nuclei on day 0-7 combined). It is unclear if the number is sufficient to discern subtle biological processes such as stem cell differentiation. 

      We analyzed a total of 6,213 nuclei from uninjured periosteum and fracture calluses at 3 stages of bone healing. We were able to describe 11 distinct cell populations, revealing the diversity of cell populations in uninjured periosteum and post-injury, including rare cell types in the fracture environment such Schwann cells, adipocytes and pericytes. The number of nuclei was sufficient to perform extensive analysis using a combination of cutting-edge algorithms. We agree that more nuclei would allow more in-depth analyses of cell fate transitions and rare populations, such as pericytes and Schwann cells. However, we concentrated here on SSPC/fibrogenic cells that are well represented in our dataset. Our study robustness is also reinforced by the analysis of 4 successive time points to define the SSPC/fibrogenic cell trajectories. Our validations using immunohistochemistry and transplantation assays also confirmed that our dataset is sufficient to define cell trajectories. There is no clear consensus on the number of cells needed to perform sc/snRNAseq analyses, as it depends on the cell types analyzed and the fold changes in gene expression. Previously reported single cell datasets containing a lower number of cells reached major conclusions including SSPC identification, cell differentiation trajectories and differential gene expression (658 cells in (Debnath et al. 2018), 300 in (Ambrosi et al. 2021), around 175 in (Remark et al. 2023).)

      (2) The authors' designation of Sca1+CD34+ cells as SSPCs is not sufficiently supported by experimental evidence. It will be essential to demonstrate stem/progenitor properties of Sca1+CD34+ cells using independent biological approaches such as CFU-F assays. In addition, the putative lineage trajectory of SSPCs toward IIFCs, osteoblasts, and chondrocytes remains highly speculative without concrete supporting data. 

      We performed additional analyses to further support that Sca1+ SSPCs display stem/progenitor properties. We performed CFU assays with Prx1-GFP+ SCA1+ and Prx1-GFP+ SCA1- periosteal cells (Figure 2F-G). We showed that Prx1-GFP+ SCA1+ display significant increased CFU potential compared to Prx1-GFP+ SCA1- cells. In addition, we isolated and transplanted Prx1-GFP+ Sca1+ and Prx1-GFP+ Sca1- periosteal cells at the fracture site of wild-type mice (Figure 2H). Only Sca1+ cells contributed to the callus formation, reinforcing that Sca1+ cells are the SSPC population mediating bone repair. 

      The differentiation trajectory of SSPCs presented in our study is supported by a combination of bioinformatic analyses and in vivo validation:

      - snRNAseq allowed us to identify the different populations in the uninjured periosteum. In silico, in vitro and in vivo analyses all point to Sca1+ cells as the SSPC population (Fig 2EG).

      - At day 3 post-fracture, we did not detect Sca1+ cells in the callus (Fig 4 – Supplementary figure 2). Instead, we observed the appearance of a new population, IIFCs. This population clustered along SSPCs and pseudotime analyses indicate that SSPCs can differentiate into IIFCs (Fig 5B). We confirmed the ability of Sca1+ pSSPCs to form IIFCs, by grafting them in the fracture callus and assessing their fibrogenic fate at day 5 post-fracture (Fig 6B).

      - In silico, we observed that IIFCs clustered along osteogenic and chondrogenic cells. The pseudotime trajectory suggests that IIFCs can differentiate into both lineages (Fig 5B-C). This is coherent with the progressive expression of osteochondrogenic genes observed in IIFCs (Fig 5C, Fig 8A, C, E). In vivo, we observed the progressive expression of Runx2 and Sox9 by IIFCs undergoing differentiation (Fig 6A). We now show that IIFCs are not undergoing apoptosis, indicating that these cells further differentiate (Fig 7 – Supplementary figure 2). To functionally assess the osteochondrogenic potential of IIFCs, we used transplantation assay and showed that Prx1-GFP+ IIFCs isolated from day 3 post-fracture form cartilage and bone when transplanted at the fracture site of wild-type mice (Fig 6C). 

      We would like to insist on the robustness of the bioinformatic analyses performed in our study. First, we used datasets from different time points post-fracture to capture the true temporal progression of cell populations in the fracture callus. We used a large combination of tools shown to be reliable in many studies (Julien et al. 2021; Matsushita et al. 2020; Debnath et al. 2018; Baccin et al. 2020; Junyue Cao et al. 2019; Zhong et al. 2020), and all tools converge in the same trajectory. To further show the relevance of pseudotime in our model, we illustrated the distribution of the cell populations by time point (Fig. 5D). We can observe a parallel between the time points and the pseudotime, reinforcing that the pseudotime trajectory reflects the timing of SSPC differentiation. Overall, the combined in silico, in vitro and in vivo analyses support that Sca1+ Pi16+ cells are the periosteal SSPC population, specifically represented in the uninjured dataset. In response to bone fracture, these SSPCs give rise to IIFCs that are specifically represented in the intermediate stages (days 3 and 5) prior to osteochondrogenic differentiation.

      (3) The designation of POSTN+ clusters as injury-induced fibrogenic cells (IIFCs) is not fully supported by the presented data. The authors' snRNAseq datasets (Figure 1d) demonstrate that there are many POSTN+ cells prior to injury, indicating that POSTN+ cells are not specifically induced in response to injury. It has been widely recognized that POSTN is expressed in the periosteum without fracture. This raises a possibility that the main responder of fracture healing is POSTN+ cells, not SSPCs as they postulate. The authors cannot exclude the possibility that Sca1+CD34+ cells are mere bystanders and do not participate in fracture healing. 

      IIFCs are a population of cells that express high levels of ECM related genes, including Postn, Aspn and collagens. We did not claim that Postn expression is specific to IIFCs. While Postn is detected in the uninjured periosteum, snRNAseq analyses and RNAscope experiments showed that the expression of Postn is limited to a small number of cells in the cambium layer of the periosteum (Fig 4B , Figure 4 – Supplementary figure 1B). These Postn-expressing cells in the uninjured periosteum are not SSPCs, as they do not co-express/co-localize with Pi16+ and Sca1+ cells detected in the fibrous layer (Fig4, Figure 4– Supplementary figure 1A, Figure 6-Supplementary figure 1). These Postn-expressing cells are undergoing osteogenic differentiation as shown by the correlation between Runx2 and Postn expression (Fig. 4 – Supplementary Figure 1C). After fracture, we observed a strong increase in ECM-related gene expression and specifically in the IIFC population. We now show the strong increase of Postn expression after injury (Fig. 4 – Supplementary Figure 1D-E, Figure 6-Supplementary figure 1E). 

      As mentioned in our response above, we now show that SCA1+ cells form cartilage and bone after fracture, while SCA1- cells (including the POSTN+ population) from the uninjured periosteum did not contribute. These data reveal that Sca1+ CD34+ cells are the main SSPC population mediating bone healing and that POSTN+ IIFCs are a transient stage of SSPC differentiation. We added the following text to the result section: “Pi16-expressing SSPCs are located within the fibrous layer, while we observed few POSTN+ cells in the cambium layer (Fig. 4 – Supplementary Fig. 1A). Postn expression is weak in uninjured periosteum and is limited to differentiating cells. Postn expression is strongly increased in response to fracture, specifically in IIFCs (Fig. 4 – Supplementary Fig. 1B-E). “

      (4) Detailed spatial organization of Sca1+CD34+ cells and POSTN+ cells in the uninjured periosteum with respect to the cambium layer and the fibrous layer is not demonstrated. 

      We performed RNAscope experiments to locate Pi16-expressing and Postn-expressing cells in the uninjured periosteum. We observed that Pi16-expressing cells are in the external fibrous layer of the periosteum while Postn-expressing cells are located along the cortex in the cambium layer. The data are added in Fig 4B and Fig. 4- Supplementary Figure 1 and mentioned in the result section “Pi16-expressing SSPCs were located within the fibrous layer, while Postn-expressing cells were found in the cambium layer and corresponded to Runx2-expressing osteogenic cells (Fig. 4 – Supplementary Fig. 1A-C).”.

      (5) Interpretation of transplantation experiments in Figure 5 is not straightforward, as the authors did not demonstrate the purity of Prx1Cre-GFP+SCA1+ cells and Prx1Cre-GFP+CD146- cells to pSSPCs and IIFCs, respectively. It is possible that these populations contain much broader cell types beyond SSPCs or IIFCs.  

      We agree with the reviewer that our methodology for cell transplantation required more justification and validation. We decided to use a transgenic mouse line to be able to trace the cells in vivo after grafting. Prx1 marks limb mesenchyme during development and the Prx1Cre mouse model allows to label all SSPCs contributing to callus formation. Therefore, we used Prx1Cre, R26mTmG mice as donors for SSPCs and IIFCs isolation (Duchamp de Lageneste et al. 2018; Logan et al. 2002). Prx1 does not mark immune and endothelial cells but can label pericytes and fibroblastic populations (Duchamp de Lageneste et al. 2018; Logan et al. 2002; Julien et al. 2021). In the uninjured periosteum, Sca1 (Ly6a) is only expressed by SSPCs and endothelial cells (Fig 3-Supplementary figure 2, Fig 6-Supplementary figure 1). We sorted GFP+ Sca1+ cells from uninjured periosteum of Prx1Cre, R26mTmG mice to isolate only SSPCs and excluding endothelial cells and pericytes. For IIFCs, we isolated cells at day 3 post-fracture, as in our snRNAseq data, we detected IIFCs but no SSPCs, chondrocytes or osteoblasts at this stage of repair. To eliminate Prx1-derived pericytes, we sorted GFP+CD146- cells, as CD146 is specifically expressed by pericytes. We added Figure 6-supplementary Figure 1 to better illustrate the expression of Prx1, SCA1 (Ly6a) and CD146 (Mcam) in the uninjured and day 3 post-fracture datasets. We further demonstrate the purity of SSPCs and IIFCs isolation by qPCR on sorted GFP+ Sca1+ cells from uninjured periosteum and GFP+ CD146- cells from day 3 post-fracture periosteum and hematoma and confirmed the absence of contamination by other cell populations (Figure 6-Supplementary figure 1E). We made the following changes in the text: “To functionally validate the steps of pSSPC activation, we isolated SCA1+ GFP+ pSSPCs from Prx1Cre; R26mTmG mice, excluding endothelial cells, and grafted them at the fracture site of wild-type hosts” and “we isolated GFP+ CD146- from the fracture callus of Prx1Cre; R26mTmG mice at day 3 post fracture, that correspond to IIFCs without contamination by pericytes (CD146+ cells) (Fig. 6C, Figure 6 – Supplementary Fig.1).

      Reviewer #2 (Public Review):

      Summary: 

      The authors described cell type mapping was conducted for both WT and fracture types. Through this, unique cell populations specific to fracture conditions were identified. To determine these, the most undifferentiated cells were initially targeted using stemness-related markers and CytoTrace scoring. This led to the identification of SSPC differentiating into fibroblasts. It was observed that the fibroblast cell type significantly increased under fracture conditions, followed by subsequent increases in chondrocytes and osteoblasts.

      Strengths: 

      This study presented the injury-induced fibrogenic cell (IIFC) as a characteristic cell type appearing in the bone regeneration process and proposed that the IIFC is a progenitor undergoing osteochondrogenic differentiation. 

      Weaknesses: 

      This study endeavored to elucidate the role of IIFC through snRNAseq analysis and in vivo observation. However, such validation alone is insufficient to confirm that IIFC is an osteochondrogenic progenitor, and additional data presentation is required.  

      As mentioned in the response to Reviewer 1, the differentiation trajectory of SSPCs presented in our study is supported by a combination of bioinformatic analyses and in vivo validation:

      - snRNAseq allowed us to identify the different populations in the uninjured periosteum. In silico, in vitro and in vivo analyses altogether showed that Sca1+ cells are the SSPC population (Fig 2E-G).

      - At day 3 post-fracture, we did not detect Sca1+ cells in the callus (Fig 4 – Supplementary figure 2). Instead, we observed the appearance of a new population, IIFCs. This population clustered along SSPCs and pseudotime analyses indicate that SSPCs can differentiate into IIFCs (Fig 5B). We confirmed the ability of Sca1+ SSPCs to form IIFCs, by grafting them in the fracture callus and assessing their fate at day 5 post-fracture (Fig 6B).

      - In silico, we observed that IIFCs clustered along osteogenic and chondrogenic cells. The pseudotime trajectory suggests that IIFCs can differentiate into both lineages (Fig 5B-C). This is coherent with the progressive expression of osteochondrogenic genes observed in IIFCs (Fig 5C, Fig 8A, C, E). In vivo, we observed the progressive expression of Runx2 and Sox9 by IIFCs undergoing differentiation (Fig 6A). We now show that IIFCs are not undergoing apoptosis, indicating that these cells further differentiate (Fig 7 – Supp 2). To functionally assess the osteochondrogenic potential of IIFCs, we used transplantation assay and showed that Prx1-GFP+ IIFCs from day 3 post-fracture form cartilage and bone when transplanted at the fracture site of wild-type mice (Fig 6C). 

      We would like to insist on the robustness of the bioinformatic analyses performed in our study. First, we used datasets from different time points post-fracture to capture the true temporal progression of cell populations in the fracture callus. We used a large combination of tools shown to be reliable in many studies (Julien et al. 2021; Matsushita et al. 2020; Debnath et al. 2018; Baccin et al. 2020; Junyue Cao et al. 2019; Zhong et al. 2020), and all tools converge in the same trajectory. To further show the relevance of pseudotime in our model, we illustrate the distribution of the cell populations by time point (Fig. 5D). We can observe a parallel between the time points and the pseudotime, reinforcing that the pseudotime trajectory reflects the timing of SSPC differentiation. Overall, the combined in silico, in vitro and in vivo analyses strongly support that Sca1+ Pi16+ cells are the periosteal SSPC population, specifically represented in the uninjured dataset. In response to bone fracture, these SSPCs give rise to IIFCs that are specifically represented in the intermediate stages (days 3 and 5) prior to osteochondrogenic differentiation.

      We made the following changes in the text:

      - Line 81-87: “We performed in vitro CFU assays with sorted GFP+SCA1+  and GFP+SCA1- cells isolated from the periosteum of Prx1Cre; R26mTmG mice, as Prx1 labels all SSPCs contributing to the callus formation1. Prx1-GFP+ SCA1+ showed increased CFU potential, confirming their stem/progenitor property (Fig 2F-G).  Then, we grafted Prx1GFP+ SCA1+ et Prx1-GFP+ SCA1- periosteal cells at the fracture site of wild-type mice. Only SCA1+ cells formed cartilage and bone after fracture indicating that SCA1+ cells correspond to periosteal SSPCs with osteochondrogenic potential (Fig 2H).”

      - Line 120-122: “We did not detect Pi16-expressing SPPCs, consistent with the absence of cells expressing SSPC markers in day 3 snRNAseq dataset compared to uninjured periosteum (Fig. 4 – Supplementary Figure 2).”

      - Line 170-172: “Only a small subset of IIFCs undergo apoptosis, further supporting that IIFCs are maintained in the fracture environment giving rise to osteoblasts and chondrocytes (Fig. 7 – Supplementary Figure 2).”

      - Line 277-278: “Following this unique fibrogenic step, IIFCs do not undergo cell death but undergo either osteogenesis or chondrogenesis”

      - Line 281-283: “During bone repair, this initial fibrogenic process is an integral part of the SSPC differentiation process, and a transitional step prior to osteogenesis and chondrogenesis.”

      Reviewer #3 (Public Review): 

      In this manuscript, the authors explored the transcriptional heterogeneity of the periosteum with single nuclei RNA sequencing. Without prior enrichment of specific populations, this dataset serves as an unbiased representation of the cellular components potentially relevant to bone regeneration. By describing single-cell cluster profiles, the authors characterized over 10 different populations in combined steady state and post-fracture periosteum, including stem cells (SSPC), fibroblast, osteoblast, chondrocyte, immune cells, and so on. Specifically, a developmental trajectory was computationally inferred using the continuum of gene expression to connect SSPC, injury-induced fibrogenic cells (IIFC), chondrocyte, and osteoblast, showcasing the bipotentials of periosteal SSPCs during injury repair. Additional computational pipelines were performed to describe the possible gene regulatory network and the expected pathways involved in bone regeneration. Overall, the authors provided valuable insights into the cell state transitions during bone repair and proposed sets of genes with possible involvements in injury response. 

      While the highlights of the manuscript are the unbiased characterization of periosteal composition, and the trajectory of SSPC response in bone fracture response, many of the conclusions can be more strongly supported with additional clarifications or extensions of the analysis.  

      (1) As described in the method section, both the steady-state data and full dataset underwent integration before dimensional reduction and clustering. It would be appreciated if the authors could compare the post-integration landscapes of uninjured cells between steady state and full dataset analysis. Specifically, fibroblasts were shown in Figure 1C and 1E, and such annotations did not exist in Figure 2B. Will it be possible that the original 'fibroblasts' were part of the IIFC population? 

      As suggested, we now identified the fibroblast population from the uninjured periosteum in the integration of datasets from all time points (Figure 5B and Fig. 5 – Supplementary Figure 2). We identified 4 fibroblast populations in the uninjured periosteum: Luzp2+, Cldn1+, Hsd11b1+ and Csmd1+ fibroblasts. Luzp2+ and Cldn1+ fibroblasts are clustering distinctly from the other populations in the integrated dataset. Hsd11b1+ fibroblasts blend with SSPCs and IIFCs in the integrated dataset probably due to the low cell number. Finally, Csmd1+ fibroblasts are clustering at the interface between SSPCs and IIFCs likely because they correspond to differentiating cells both in the uninjured periosteum and in response to fracture. We modified the resolution of clustering in our subset dataset, in order to represent Luzp2+ and Cldn1+ fibroblasts as an isolated cluster (Figure 5B, cluster 10). In addition, both pseudotime (Fig. 5B) and gene regulatory network analyses (Fig. 7D), show that the fibroblast populations are distinct from the activation trajectory of SSPCs. We added the following sentence to the text “Fibroblasts from uninjured periosteum (Hsd11b1+, Cldn1+ and Luzp2+ cells corresponding to cluster 10 of Fig. 5B) clustered separately from the other populations, suggesting the absence of their contribution to bone healing.”

      (2) According to Figure 2, immune cells were taking a significant abundance within the dataset, specifically during days 3 & 5 post-fracture. It will be interesting to see the potential roles that immune cells play during bone repair. For example, what are the biological annotations of the immune clusters (B, T, NK, myeloid cells)? Are there any inflammatory genes or related signals unregulated in these immune cells? Do they interact with SSPC or IIFC during the transition?   

      In this manuscript, we report the overall dataset and focused our analyses on the response of SSPCs to injury and their differentiation trajectories. We did not include detailed analyses of the immune cell populations, that are out of scope of this manuscript and are part of another study (Hachemi et al, biorxiv, 2024)

      (3) The conclusion of Notch and Wnt signaling in IIFC transition was not sufficiently supported by the analysis presented in the manuscript, which was based on computational inferences. It will be great to add in references supporting these claims or provide experimental validations examining selected members of these pathways.

      The role of Wnt and Notch in bone repair has been widely studied and both signaling pathways are known to be regulators of SSPCs differentiation (Lee et al. 2021; Matthews et al. 2014; Novak et al. 2020; Wang et al. 2016; Kraus et al. 2022; Dishowitz et al. 2012; Junjie Cao et al. 2017; Matsushita et al. 2020; Steven Minear et al. 2010; Steve Minear et al. 2010; Kang et al. 2007; Komatsu et al. 2010). It was previously shown that Notch inactivation at early stages of repair leads to bone non-union while Notch inactivation in chondrocytes and osteoblasts does not significantly affect healing, confirming its role in SSPC differentiation before osteochondral commitment (Wang et al. 2016). Wnt was shown to be a critical driver of osteogenesis (Matsushita et al. 2020; Steve Minear et al. 2010; Steven Minear et al. 2010; Kang et al. 2007; Komatsu et al. 2010), as Wnt inhibition alters bone formation and Wnt overactivation increases bone formation (Pinzone et al. 2009; Balemans et Van Hul 2007). The role of Wnt is specific to osteogenic engagement as Wnt inhibition promotes chondrogenesis (Hsieh et al. 2023; C.-L. Wu et al. 2021; Ruscitto et al. 2023). A study by Lee et al. recently confirmed the successive activation and crosstalk of Notch and Wnt pathways during osteogenic differentiation of SSPCs during bone healing (Lee et al. 2021). They showed a peak of Notch activation at day 3 post-injury followed by a progressive decrease that parallels an increase of Wnt signaling inducing osteogenic differentiation. These studies correlate with the sequential activation of Notch and Wnt observed in our snRNAseq analyses. Our analyses now reveal how this sequential activation of Notch and Wnt relates to the fibrogenic and osteogenic phase of SSPC differentiation respectively. We clarified this in the discussion and added the references above to support our claims. 

      Recommendations for the authors: 

      Reviewer #1 (Recommendations For The Authors): 

      (1) The manuscript is well-written overall. However, the authors often oversimplify outcomes and overstate the results. Some of the statements (delineated below) need to be recalibrated to be in line with the presented data. 

      In addition to the suggested conclusions, we also toned down the following ones to avoid overstating our results :

      Line 24: suggesting a crucial paracrine role of this transient IIFC population

      Line 227: suggesting their central role in mediating cell interactions after fracture

      line 243: IIFCs produce paracrine factors that can regulate SSPCs

      - Line 77 (86): The authors should add "might" before "correspond to". 

      We provided new sets of data including CFU experiments and transplantation assay to reinforce our conclusion. We replaced “correspond to” by “encompass”

      - Line 102: SSPCs are obviously not "absent" in day 3 snRNAseq (Figure 2d). The percentage dropped (only) 75%, according to Figure 2e, which is far from disappearance. Overall, immunohistochemical staining is often dichotomous with snRNAseq designations. The authors should more carefully describe the results. 

      We agree that this comment may not reflect the data shown as we observe a strong decrease in the percentage of cells in SSPC clusters, but still detect few cells in the SSPC clusters. However, when we looked at the presence of Sca1+ Pi16+ cells at different time points, we confirmed the absence of cells expressing SSPC signature genes (Sca1, Pi16, Cd34) at day 3 injury. Due to the clustering resolution of the combined integration, some cells in the SSPC clusters might not be Sca1+ Pi16+. We now show these results in Fig. 4 – Supplementary Figure 2. We changed the text accordingly (line 120): “We did not detect Pi16-expressing SPPCs, consistent with the absence of cells expressing SSPC markers in the day 3 snRNAseq dataset compared to uninjured periosteum (Fig. 4 – Supplementary Figure 2)”.

      - Line 134: The authors need to clearly state that GFP+IIFCs were isolated based on Prx1CreGFP+CD146-. The authors did not clearly demonstrate the relationship between POSTN+ cells and CD146- cells, which poses concerns about the interpretation of transplantation experiments. 

      As mentioned above in response to reviewer 1-public review, we have clarified and provided additional information on our strategy to isolate SSPCs and IIFCs. We used the Prx1Cre; R26mTmG mice to mark all SSPCs and their derivatives with the GFP reporter in order to trace these populations after cell grafting. In the uninjured periosteum, Sca1 (Ly6a) is only expressed by SSPCs and endothelial cells. We sorted GFP+Sca1+ cells to exclude endothelial cells. For IIFCs, we isolated cells at day 3 post-fracture, as in our snRNAseq data, we detect IIFCs but no SSPCs, chondrocytes or osteoblasts at this time point. However, we also detected pericytes that can be Prx1-derived. To eliminate potential pericyte contamination, we sorted GFP+ CD146- cells, as CD146 is specifically expressed by pericytes. We added Figure 6-supplementary Figure 1 to better illustrate the expression of Prx1, SCA1 (Ly6a) and CD146 (Mcam) in the uninjured and day 3 post-fracture datasets. We further demonstrate the purity of SSPCs and IIFCs isolation by qPCR on sorted GFP+ Sca1+ cells from uninjured periosteum and GFP+ CD146- cells from day 3 postfracture periosteum and hematoma and confirmed the absence of contamination by other cell populations (Figure 6-Supplementary figure 1E). We made the following changes in the text (line 153): “To functionally validate the steps of pSSPC activation, we isolated SCA1+ GFP+ pSSPCs from Prx1Cre; R26mTmG mice, excluding endothelial cells, and grafted them at the fracture site of wild-type hosts” and “we isolated GFP+ CD146- from the fracture callus of Prx1Cre; R26mTmG mice at day 3 post fracture, that correspond to IIFCs without contamination by pericytes (CD146+ cells) (Fig. 6C, Figure 6 – Supplementary Fig.1).

      - Line 211: It is obvious from Figure 8F that ligand expression was not "specific" to the IIFC phase.

      The data only shows a slight enrichment of ligand score. 

      We corrected the text by “ligand expression was increased during the IIFC phase”.

      (2) Some of the computational predictions are incongruent with the known lineage trajectory. For example, in vivo lineage tracing experiments, including but not limited to, PLoS Genet. 2014. 10:e1004820, demonstrate that some of the chondrocytes within fracture callus can differentiate into osteoblasts. This is incompatible with the authors' conclusion that osteoblasts and chondrocytes represent two different terminal stages of cell differentiation in fracture healing. How do the authors reconcile this apparent inconsistency? 

      In this manuscript, we generated datasets corresponding to the initial stages of bone repair until day 7 post-injury. Therefore, our analyses encompass SSPC activation stages and engagement into osteogenesis and chondrogenesis. The results show that a portion of osteoblasts in the fracture callus are differentiating directly from IIFC via intramembranous ossification. The reviewer is correct to mention that osteoblasts have also been shown to derive from transdifferentiation of chondrocytes, which occurs at later stages of repair during the active phase of endochondral ossification (Julien et al. 2020; Aghajanian et Mohan 2018; Zhou et al. 2014; Hu et al. 2017). This process of chondrocyte to osteoblast transdifferentiation is not represented in our integrated dataset and may require adding later time points. However, when we analyzed the days 5 and 7 datasets independent of days 0 and 3, we were able to identify a cluster of hypertrophic chondrocytes (expressing Col10a1) connecting the clusters of chondrocytes and osteoblasts. This suggests that in this cluster, hypertrophic chondrocytes are undergoing transdifferentiation into osteoblasts as shown in the Author response image 1. Additional time points are needed in a future study to perform in depth analyses of chondrocyte transdifferentiation. 

      Author response image 1.

      Periosteum-derived chondrocytes undergo cartilage to bone transformation. A. UMAP projection of the subset of SSPCs, IIFCs, osteoblasts and chondrocytes in the integration of days 5 and 7 post-fracture datasets. B. Feature plots of Acan, Col10a1 and Ibsp expression.  C. UMAP projection separated by time points. D. Percentage of cells in the hypertrophic/differentiating chondrocyte cluster.

      (3) The authors did not cite some of the studies that described the roles of Notch signaling in fracture healing, for example, J Bone Miner Res. 2014. 29:1283-94. The authors should test the specificity of Notch signaling activities to IIFCs (POSTN+ cells) in vivo. 

      The role of Notch in the activation of SSPCs during bone repair has been investigated in several studies (Lee et al. 2021; Matthews et al. 2014; Novak et al. 2020; Wang et al. 2016; Kraus et al. 2022; Dishowitz et al. 2012; Junjie Cao et al. 2017). Notch dynamic was previously described with a peak at day 3 post-injury before a reduction when cells engage in osteogenesis and chondrogenesis (Lee et al. 2021; Dishowitz et al. 2012; Matthews et al. 2014). Notch plays a role in the early steps of SSPC activation prior to osteochondral differentiation as Notch inactivation in chondrocytes and osteoblasts does not affect bone repair (Wang et al. 2016). We added the references listed above to emphasize the correlation between our results and previous reports on the role of Notch and made changes in the discussion.

      Reviewer #2 (Recommendations For The Authors): 

      Suggestions 

      (1) This research utilized snRNA seq for the basic hypothesis formation; however, the number of nuclei acquired was quite limited. Therefore, please explain the rationale for employing snRNA seq instead of scRNA seq, which includes cytoplasm, and additionally provide the markers used for cell type mapping in the scRNA analysis.  

      As mentioned in our response to reviewer #1 above, we analyzed a total of 6,213 nuclei from uninjured periosteum and fracture calluses at 3 stages of bone healing. We were able to describe 11 distinct cell populations including rare cell types in the fracture environment such Schwann cells, adipocytes and pericytes. The number of nuclei was sufficient to perform extensive analysis using a combination of cutting-edge algorithms. We agree that more nuclei would allow more indepth analyses of cell fate transitions and rare populations, such as pericytes and Schwann cells. However, we concentrated here on SSPC/fibrogenic cell that are well represented in our dataset. Our study robustness is also reinforced by the analysis of 4 successive time points to define the SSPC/fibrogenic cell trajectories. Our validations using immunohistochemistry and transplantation assays also confirmed that our dataset is sufficient to define cell trajectories. There is no clear consensus on the number of cells needed to perform scRNAseq analyses, as it depends on the cell types analyzed and the fold changes in gene expression. Previously reported single cell datasets containing a lower number of cells reached major conclusions including SSPC identification, cell differentiation trajectories and differential gene expression (658 cells in(Debnath et al. 2018), 300 in (Ambrosi et al. 2021) around 175 in(Remark et al. 2023))

      Several studies have shown that snRNAseq provide data quality equivalent to scRNAseq in terms of cell type identification, number of detected genes and downstream analyses (Selewa et al. 2020; Wen et al. 2022; Ding et al. 2020; H. Wu et al. 2019; Machado et al. 2021). While, snRNAseq do not allow the detection of cytoplasm RNA, there is several advantages in using this technique: 

      (1) better representation of the cell types. To perform scRNAseq, a step of enzymatic digestion is needed. This usually leads to an overrepresentation of some cell types loosely attached to the ECM (immune cells, endothelial cells) and a reduced representation of cell types strongly attached to the ECM, such as chondrocytes and osteoblasts. In addition, large or multinucleated cells like hypertrophic chondrocytes and osteoclasts are too big to be sorted and encapsidated using 10X technology. Here, we optimized a protocol to mechanically isolate nuclei from dissected tissues that allows us to capture the diversity of cell types in periosteum and fracture callus.

      (2) higher recovery of nuclei. We performed both isolation of cells and nuclei from periosteum in our study and observed that nuclei extraction is the most efficient way to isolate cells from the periosteum and the fracture callus.

      (3) reduction of isolation time and cell stress. Previous studies showed that enzymatic digestion causes cell stress and induces stem cell activation (Machado et al. 2021; van den Brink et al. 2017). Therefore, we decided to perform snRNAseq to analyze the transcriptome of the intact periosteum without digestion induced-biais.

      We added this sentence in the result section: “Single nuclei transcriptomics was shown to provide results equivalent to single cell transcriptomics, but with better cell type representation and reduced digestion-induced stress response (Selewa et al. 2020; Wen et al. 2022; Ding et al. 2020; H. Wu et al. 2019; Machado et al. 2021)”.

      The list of genes used for cell type mapping are presented in Figure 3 – Supplementary figure 1. We added a detailed dot plot as Figure 3 – Supplementary figure 2.

      (2) During the fracture healing process of long bones, the influx of fibroblasts is a relatively common occurrence, and the fibrous callus that forms during bone repair and regeneration is reported to disappear over time. Therefore, inferring that IIFC differentiates into osteo- and chondrogenic cells based solely on their simultaneous appearance in the same time and space is challenging. More detailed validation is necessary, beyond what is supported by bioinformatics analysis. 

      The first step of bone repair is the formation of a fibrous callus, before cartilage and bone formation. There are no data in the literature demonstrating that an influx of fibroblasts occurs at the fracture site. Several studies now show that cells involved in callus formation are recruited locally (i.e. from the bone marrow, the periosteum and the skeletal muscle surrounding the fracture site) (Duchamp de Lageneste et al. 2018; Julien et al. 2021; Colnot 2009; Jeffery et al. 2022; Debnath et al. 2018; Matsushita et al. 2020; Julien et al. 2022; Matthews et al. 2021). The contribution of locally activated SSPCs to the fibrous callus is less well understood. Lineage tracing shows that GFP+ cell populations traced in Prx1Cre-GFP mice include SSPCs, IIFCs, chondrocytes and osteoblasts.

      The timing of the cell trajectories observed in our dataset correlates with the timing of callus formation previously described in the literature as the day 3 post-fracture mostly contains IIFCs while chondrocytes and osteoblasts appear from day 5 post-fracture. We conclude that IIFCs differentiate into osteochondrogenic cells based on multiple evidence beside the simultaneous appearance in time and space:

      - In silico trajectory analyses identify a trajectory from SSPCs to osteochondrogenic cells via IIFCs. We added an analysis to show that our pseudotime trajectory parallels the timepoints of the dataset, confirming that the differentiation trajectory follows the timing of cell differentiation (Figure 5D).

      - We show that IIFCs start to express chondrogenic and osteogenic genes prior to engaging into chondrogenesis and osteogenesis. In addition, we detected activation of osteo- and chondrogenic specific transcription factors in IIFCs. This shows a differentiation continuum between SSPCs, IIFCS, and osteochondrogenic cells (Figures 6-8).

      - Using transplantation assay, we showed that IIFCs form cartilage and bone, therefore reinforcing the osteochondrogenic potential of this population (Figure 6B).

      - IIFCs do not undergo apoptosis. We assessed the expression of apoptosis-related genes by IIFCs and did not detect expression. This was confirmed by cleaved caspase 3 immunostaining showing that a very low percentage of cells in the early fibrotic tissue undergo apoptosis. 

      Therefore, the idea that the initial fibrous callus is replaced by a new influx of SSPCs or committed progenitors is not supported by recent literature and is not observed in our dataset containing all cell types from the periosteum and fracture site. Overall, our bioinformatic analyses combined with our in vivo validation strongly support that IIFCs are differentiating into chondrocytes and osteoblasts during bone repair. Additional in vivo functional studies will aim to further validate the trajectory and investigate the critical factors regulating this process.

      (3) The influx of most osteogenic progenitors to the bone fracture site typically appears after postfracture day 7. It's essential to ascertain whether the osteogenic cells observed at the time of this study differentiated from IIFC or migrated from surrounding mesenchymal stem cells. 

      As mentioned above, there is not clear evidence in the literature indicating an influx of osteoprogenitors. Cells involved in callus formation are recruited locally and predominantly from the periosteum (Duchamp de Lageneste et al. 2018; Julien et al. 2021; Colnot 2009; Jeffery et al. 2022; Debnath et al. 2018; Matsushita et al. 2020; Matthews et al. 2021; Julien et al. 2022). Our datasets therefore include all cell populations that form the callus. Other sources of SSPCs include the surrounding muscle that contributes mostly to cartilage, and bone marrow that contributes to a low percentage of the callus osteoblasts in the medullary cavity (Julien et al. 2021; Jeffery et al. 2022). We provide evidence that IIFCs give rise to osteogenic cells using our bioinformatic analyses and in vivo transplantation assay (listed in the response above). As indicated in our response to reviewer #1, the steps leading to osteogenic differentiation observed in our dataset reflect the first step of callus ossification and correspond to the process of intramembranous ossification (up to day 7 post-injury). Endochondral ossification also contributes to osteoblasts including the transdifferentiation of chondrocytes into osteoblasts (Julien et al. 2020; Zhou et al. 2014; Hu et al. 2017). While this process mostly occurs around day 14 postfracture, we begin to detect this transition in our integrated day 5-day 7 dataset as shown in Author response image 1. 

      (4) It's crucial to determine whether the IIFC appearing at the fracture site contributes to the formation of the callus matrix or undergoes apoptosis during the fracture healing process. In the early steps of bone repair, the callus is mostly composed of an extracellular matrix (ECM). IIFCs are expressing high levels of ECM genes, including Postn, Aspn and collagens (Col3a1, Col5a1, Col8a1, Col12a1) (Figure 3 – Supplementary Figures 1-2 and Fig. 7 – Supplementary Figure 1B). IIFCs are the cells expressing the highest levels of matrix-related genes compared to the other cell types in the fracture environment (i.e. immune cells, endothelial cells, Schwann cells, pericytes, …) as shown now in Fig. 7 – Supplementary Figure 1A. Therefore, IIFCs are the main contributors to the callus matrix.

      We investigated if IIFCs undergo apoptosis. We observed that only a low percentage of IIFCs express apoptosis-related genes and are positive for cleaved caspase 3 immunostaining at days 3, 5 and 7 of bone repair. This shows that IIFCs do not undergo apoptosis and reinforces our model in which IIFCs further differentiate into osteoblasts and chondrocytes. We added these data in Fig. 7 – Supplementary Figure 2 and added the sentence in the results section “Only a small subset of IIFCs undergo apoptosis, further supporting that IIFCs are maintained in the fracture environment giving rise to osteoblasts and chondrocytes (Fig. 7 – Supplementary Figure 2).” 

      (5) Results from the snRNA seq highlight the paracrine role of IIFC, and verification is needed to ensure that the effect this has on surrounding osteogenic lineages is not misinterpreted.  

      To assess cell-cell interactions, we used tools such as Connectome and CellChat to infer and quantify intercellular communication networks between cell types. Studies showed the robustness of these tools combined with in vivo validation (Sinha et al. 2022; Alečković et al. 2022; Li et al. 2023). Here we used these tools to illustrate the paracrine profile of IIFCs, but in vivo validation would be required using gene inactivation to assess the requirement of individual paracrine factors. We performed extensive analyses of the crosstalk between immune cells and SSPCs using our dataset in another study combined with in vivo validation, showing the robustness of the tool and the dataset (Hachemi et al. 2024). We adjusted our conclusions to reflect our analyses: “suggesting a crucial paracrine role of this transient IIFC population during fracture healing”, “suggesting their central role in mediating cell interactions after fracture”, “suggesting that SSPCs can receive signals from IIFC”. 

      References

      Aghajanian, Patrick, et Subburaman Mohan. 2018. “The Art of Building Bone: Emerging Role of Chondrocyte-to-Osteoblast Transdifferentiation in Endochondral Ossification“. Bone Research 6 (1): 19. https://doi.org/10.1038/s41413-018-0021-z.

      Alečković, Maša, Simona Cristea, Carlos R. Gil Del Alcazar, Pengze Yan, Lina Ding, Ethan D. Krop, Nicholas W. Harper, et al. 2022. “Breast Cancer Prevention by Short-Term Inhibition of TGFβ Signaling“. Nature Communications 13 (1): 7558. https://doi.org/10.1038/s41467-02235043-5.

      Ambrosi, Thomas H., Owen Marecic, Adrian McArdle, Rahul Sinha, Gunsagar S. Gulati, Xinming Tong, Yuting Wang, et al. 2021. “Aged Skeletal Stem Cells Generate an Inflammatory Degenerative Niche”. Nature 597 (7875): 256‑62. https://doi.org/10.1038/s41586-021-03795-7.

      Baccin, Chiara, Jude Al-Sabah, Lars Velten, Patrick M. Helbling, Florian Grünschläger, Pablo Hernández-Malmierca, César Nombela-Arrieta, Lars M. Steinmetz, Andreas Trumpp, et Simon Haas. 2020. “Combined Single-Cell and Spatial Transcriptomics Reveal the Molecular, Cellular and Spatial Bone Marrow Niche Organization”. Nature Cell Biology 22 (1): 38‑48. https://doi.org/10.1038/s41556-019-0439-6.

      Balemans, Wendy, et Wim Van Hul. 2007. “The Genetics of Low-Density Lipoprotein ReceptorRelated Protein 5 in Bone: A Story of Extremes”. Endocrinology 148 (6): 2622‑29. https://doi.org/10.1210/en.2006-1352.

      Brink, Susanne C van den, Fanny Sage, Ábel Vértesy, Bastiaan Spanjaard, Josi Peterson-Maduro, Chloé S Baron, Catherine Robin, et Alexander van Oudenaarden. 2017. “Single-Cell Sequencing Reveals Dissociation-Induced Gene Expression in Tissue Subpopulations”. Nature Methods 14 (10): 935‑36. https://doi.org/10.1038/nmeth.4437.

      Cao, Junjie, Yalin Wei, Jing Lian, Lunyun Yang, Xiaoyan Zhang, Jiaying Xie, Qiang Liu, Jinyong Luo, Baicheng He, et Min Tang. 2017. ”Notch Signaling Pathway Promotes Osteogenic Differentiation of Mesenchymal Stem Cells by Enhancing BMP9/Smad Signaling”. International Journal of Molecular Medicine 40 (2): 378‑88. https://doi.org/10.3892/ijmm.2017.3037.

      Cao, Junyue, Malte Spielmann, Xiaojie Qiu, Xingfan Huang, Daniel M. Ibrahim, Andrew J. Hill, Fan Zhang, et al. 2019. ”The Single-Cell Transcriptional Landscape of Mammalian Organogenesis”. Nature 566 (7745): 496‑502. https://doi.org/10.1038/s41586-019-0969-x.

      Colnot, Céline. 2009. “Skeletal Cell Fate Decisions Within Periosteum and Bone Marrow During Bone Regeneration”. Journal of Bone and Mineral Research 24 (2): 274‑82. https://doi.org/10.1359/jbmr.081003.

      Debnath, Shawon, Alisha R. Yallowitz, Jason McCormick, Sarfaraz Lalani, Tuo Zhang, Ren Xu, Na Li, et al. 2018. “Discovery of a Periosteal Stem Cell Mediating Intramembranous Bone Formation”. Nature 562 (7725): 133‑39. https://doi.org/10.1038/s41586-018-0554-8.

      Ding, Jiarui, Xian Adiconis, Sean K. Simmons, Monika S. Kowalczyk, Cynthia C. Hession, Nemanja D. Marjanovic, Travis K. Hughes, et al. 2020. “Systematic Comparison of Single-Cell and Single-Nucleus RNA-Sequencing Methods”. Nature Biotechnology 38 (6): 737‑46.

      https://doi.org/10.1038/s41587-020-0465-8.

      Dishowitz, Michael I., Shawn P. Terkhorn, Sandra A. Bostic, et Kurt D. Hankenson. 2012. “Notch Signaling Components Are Upregulated during Both Endochondral and Intramembranous Bone Regeneration”. Journal of Orthopaedic Research 30 (2): 296‑303. https://doi.org/10.1002/jor.21518.

      Duchamp de Lageneste, Oriane, Anaïs Julien, Rana Abou-Khalil, Giulia Frangi, Caroline Carvalho, Nicolas Cagnard, Corinne Cordier, Simon J. Conway, et Céline Colnot. 2018. “Periosteum Contains Skeletal Stem Cells with High Bone Regenerative Potential Controlled by Periostin”. Nature Communications 9 (1): 773. https://doi.org/10.1038/s41467-018-03124-z.

      Hsieh, Chen-Chan, B. Linju Yen, Chia-Chi Chang, Pei-Ju Hsu, Yu-Wei Lee, Men-Luh Yen, ShawFang Yet, et Linyi Chen. 2023. “Wnt Antagonism without TGFβ Induces Rapid MSC Chondrogenesis via Increasing AJ Interactions and Restricting Lineage Commitment”. iScience 26 (1): 105713. https://doi.org/10.1016/j.isci.2022.105713.

      Hu, Diane P., Federico Ferro, Frank Yang, Aaron J. Taylor, Wenhan Chang, Theodore Miclau, Ralph S. Marcucio, et Chelsea S. Bahney. 2017. “Cartilage to Bone Transformation during Fracture Healing Is Coordinated by the Invading Vasculature and Induction of the Core Pluripotency Genes”. Development 144 (2): 221‑34. https://doi.org/10.1242/dev.130807.

      Jeffery, Elise C., Terry L.A. Mann, Jade A. Pool, Zhiyu Zhao, et Sean J. Morrison. 2022. “Bone Marrow and Periosteal Skeletal Stem/Progenitor Cells Make Distinct Contributions to Bone Maintenance and Repair”. Cell Stem Cell 29 (11): 1547-1561.e6. https://doi.org/10.1016/j.stem.2022.10.002.

      Julien, Anais, Anuya Kanagalingam, Ester Martínez-Sarrà, Jérome Megret, Marine Luka, Mickaël Ménager, Frédéric Relaix, et Céline Colnot. 2021. “Direct contribution of skeletal muscle mesenchymal progenitors to bone repair”. Nature Communications 12 (1): 2860. https://doi.org/10.1038/s41467-021-22842-5.

      Julien, Anais, Simon Perrin, Oriane Duchamp de Lageneste, Caroline Carvalho, Morad Bensidhoum, Laurence Legeai-Mallet, et Céline Colnot. 2020. “FGFR3 in Periosteal Cells Drives Cartilage-to-Bone Transformation in Bone Repair”. Stem Cell Reports 15 (4): 955‑67. https://doi.org/10.1016/j.stemcr.2020.08.005.

      Julien, Anais, Simon Perrin, Ester Martínez-Sarrà, Anuya Kanagalingam, Caroline Carvalho, Marine Luka, Mickaël Ménager, et Céline Colnot. 2022. “Skeletal Stem/Progenitor Cells in Periosteum and Skeletal Muscle Share a Common Molecular Response to Bone Injury”. Journal of Bone and Mineral Research, juin, jbmr.4616. https://doi.org/10.1002/jbmr.4616.

      Kang, Sona, Christina N. Bennett, Isabelle Gerin, Lauren A. Rapp, Kurt D. Hankenson, et Ormond A. MacDougald. 2007. “Wnt Signaling Stimulates Osteoblastogenesis of Mesenchymal Precursors by Suppressing CCAAT/Enhancer-Binding Protein α and Peroxisome Proliferator Activated        Receptor γ”. Journal of Biological Chemistry 282 (19): 14515‑24. https://doi.org/10.1074/jbc.M700030200.

      Komatsu, David E., Michelle N. Mary, Robert Jason Schroeder, Alex G. Robling, Charles H. Turner, et Stuart J. Warden. 2010. “Modulation of Wnt Signaling Influences Fracture Repair”. Journal of Orthopaedic Research 28 (7): 928‑36. https://doi.org/10.1002/jor.21078.

      Hachemi, Yasmine, Simon Perrin, Maria Ethel, Anais Julien, Julia Vettese, Blandine Geisler, Christian Göritz, et Céline Colnot. 2024. “Multimodal Analyses of Immune Cells during Bone Repair Identify Macrophages as a Therapeutic Target in Musculoskeletal Trauma”. https://doi.org/10.1101/2024.04.29.591608.

      Kraus, Jessica M., Dion Giovannone, Renata Rydzik, Jeremy L. Balsbaugh, Isaac L. Moss, Jennifer L. Schwedler, Julien Y. Bertrand, et al. 2022. “Notch Signaling Enhances Bone Regeneration in the Zebrafish Mandible”. Development 149 (5): dev199995. https://doi.org/10.1242/dev.199995.

      Lee, S., L. H. Remark, A. M. Josephson, K. Leclerc, E. Muiños Lopez, D. J. Kirby, Devan Mehta, et al. 2021. “Notch-Wnt Signal Crosstalk Regulates Proliferation and Differentiation of Osteoprogenitor Cells during Intramembranous Bone Healing”. Npj Regenerative Medicine 6 (1): 29. https://doi.org/10.1038/s41536-021-00139-x.

      Li, Jiaoduan, Dongyan Cao, Lixin Jiang, Yiwen Zheng, Siyuan Shao, Ai Zhuang, et Dongxi Xiang. 2023. “ITGB2-ICAM1 Axis Promotes Liver Metastasis in BAP1-Mutated Uveal Melanoma with Retained Hypoxia and ECM Signatures”. Cellular Oncology (Dordrecht), décembre. https://doi.org/10.1007/s13402-023-00908-4.

      Logan, Malcolm, James F. Martin, Andras Nagy, Corrinne Lobe, Eric N. Olson, et Clifford J. Tabin. 2002. “Expression of Cre Recombinase in the Developing Mouse Limb Bud Driven by aPrxl Enhancer”. Genesis 33 (2): 77‑80. https://doi.org/10.1002/gene.10092.

      Machado, Léo, Perla Geara, Jordi Camps, Matthieu Dos Santos, Fatima Teixeira-Clerc, Jens Van Herck, Hugo Varet, et al. 2021.”Tissue Damage Induces a Conserved Stress Response That Initiates Quiescent Muscle Stem Cell Activation”. Cell Stem Cell 28 (6): 1125-1135.e7. https://doi.org/10.1016/j.stem.2021.01.017.

      Matsushita, Yuki, Mizuki Nagata, Kenneth M. Kozloff, Joshua D. Welch, Koji Mizuhashi, Nicha Tokavanich, Shawn A. Hallett, et al. 2020. “A Wnt-Mediated Transformation of the Bone Marrow Stromal Cell Identity Orchestrates Skeletal Regeneration”. Nature Communications 11 (1): 332. https://doi.org/10.1038/s41467-019-14029-w.

      Matthews, Brya G, Danka Grcevic, Liping Wang, Yusuke Hagiwara, Hrvoje Roguljic, Pujan Joshi, Dong-Guk Shin, Douglas J Adams, et Ivo Kalajzic. 2014. “Analysis of αSMA-Labeled Progenitor Cell Commitment Identifies Notch Signaling as an Important Pathway in Fracture Healing”. Journal of Bone and Mineral Research 29 (5): 1283‑94. https://doi.org/10.1002/jbmr.2140.

      Matthews, Brya G, Sanja Novak, Francesca V Sbrana, Jessica L Funnell, Ye Cao, Emma J Buckels, Danka Grcevic, et Ivo Kalajzic. 2021. “Heterogeneity of Murine Periosteum Progenitors Involved in Fracture Healing”. eLife 10 (février):e58534. https://doi.org/10.7554/eLife.58534.

      Minear, Steve, Philipp Leucht, Samara Miller, et Jill A Helms. 2010. “rBMP Represses Wnt Signaling and Influences Skeletal Progenitor Cell Fate Specification during Bone Repair”. Journal of Bone and Mineral Research 25 (6): 1196‑1207. https://doi.org/10.1002/jbmr.29.

      Minear, Steven, Philipp Leucht, Jie Jiang, Bo Liu, Arial Zeng, Christophe Fuerer, Roel Nusse, et Jill A. Helms. 2010. “Wnt Proteins Promote Bone Regeneration”. Science Translational Medicine 2 (29). https://doi.org/10.1126/scitranslmed.3000231.

      Novak, Sanja, Emilie Roeder, Benjamin P. Sinder, Douglas J. Adams, Chris W. Siebel, Danka Grcevic, Kurt D. Hankenson, Brya G. Matthews, et Ivo Kalajzic. 2020. “Modulation of Notch1 Signaling Regulates Bone Fracture Healing”. Journal of Orthopaedic Research 38 (11): 2350‑61. https://doi.org/10.1002/jor.24650.

      Pinzone, Joseph J., Brett M. Hall, Nanda K. Thudi, Martin Vonau, Ya-Wei Qiang, Thomas J. Rosol, et John D. Shaughnessy. 2009. “The Role of Dickkopf-1 in Bone Development, Homeostasis, and Disease”. Blood 113 (3): 517‑25. https://doi.org/10.1182/blood-2008-03-145169.

      Remark, Lindsey H., Kevin Leclerc, Malissa Ramsukh, Ziyan Lin, Sooyeon Lee, Backialakshmi Dharmalingam, Lauren Gillinov, et al. 2023. “Loss of Notch Signaling in Skeletal Stem Cells Enhances Bone Formation with Aging”. Bone Research 11 (1): 50. https://doi.org/10.1038/s41413-023-00283-8.

      Ruscitto, Angela, Peng Chen, Ikue Tosa, Ziyi Wang, Gan Zhou, Ingrid Safina, Ran Wei, et al. 2023. “Lgr5-Expressing Secretory Cells Form a Wnt Inhibitory Niche in Cartilage Critical for Chondrocyte Identity”. Cell Stem Cell 30 (9): 1179-1198.e7. https://doi.org/10.1016/j.stem.2023.08.004.

      Selewa, Alan, Ryan Dohn, Heather Eckart, Stephanie Lozano, Bingqing Xie, Eric Gauchat, Reem Elorbany, et al. 2020. “Systematic Comparison of High-Throughput Single-Cell and SingleNucleus Transcriptomes during Cardiomyocyte Differentiation”. Scientific Reports 10 (1): 1535. https://doi.org/10.1038/s41598-020-58327-6.

      Sinha, Sarthak, Holly D. Sparks, Elodie Labit, Hayley N. Robbins, Kevin Gowing, Arzina Jaffer, Eren Kutluberk, et al. 2022. “Fibroblast Inflammatory Priming Determines Regenerative versus Fibrotic Skin Repair in Reindeer”. Cell 185 (25): 4717-4736.e25. https://doi.org/10.1016/j.cell.2022.11.004.

      Wang, Cuicui, Jason A. Inzana, Anthony J. Mirando, Yinshi Ren, Zhaoyang Liu, Jie Shen, Regis J. O’Keefe, Hani A. Awad, et Matthew J. Hilton. 2016. “NOTCH Signaling in Skeletal Progenitors Is Critical for Fracture Repair”. The Journal of Clinical Investigation 126 (4): 1471‑81. https://doi.org/10.1172/JCI80672.

      Wen, Fei, Xiaojie Tang, Lin Xu, et Haixia Qu. 2022. “Comparison of Single‑nucleus and Single‑cell Transcriptomes in Hepatocellular Carcinoma Tissue”. Molecular Medicine Reports 26 (5): 339. https://doi.org/10.3892/mmr.2022.12855.

      Wu, Chia-Lung, Amanda Dicks, Nancy Steward, Ruhang Tang, Dakota B. Katz, Yun-Rak Choi, et Farshid Guilak. 2021. “Single Cell Transcriptomic Analysis of Human Pluripotent Stem Cell Chondrogenesis”. Nature Communications 12 (1): 362. https://doi.org/10.1038/s41467-02020598-y.

      Wu, Haojia, Yuhei Kirita, Erinn L. Donnelly, et Benjamin D. Humphreys. 2019. “Advantages of Single-Nucleus over Single-Cell RNA Sequencing of Adult Kidney: Rare Cell Types and Novel Cell States Revealed in Fibrosis”. Journal of the American Society of Nephrology 30 (1): 23‑32. https://doi.org/10.1681/ASN.2018090912.

      Zhong, Leilei, Lutian Yao, Robert J. Tower, Yulong Wei, Zhen Miao, Jihwan Park, Rojesh Shrestha, et al. 2020. “Single Cell Transcriptomics Identifies a Unique Adipose Lineage Cell Population That Regulates Bone Marrow Environment”. eLife 9 (avril):e54695. https://doi.org/10.7554/eLife.54695.

      Zhou, Xin, Klaus von der Mark, Stephen Henry, William Norton, Henry Adams, et Benoit de Crombrugghe. 2014. “Chondrocytes Transdifferentiate into Osteoblasts in Endochondral Bone during Development, Postnatal Growth and Fracture Healing in Mice”. Édité par Matthew L. Warman. PLoS Genetics 10 (12): e1004820. https://doi.org/10.1371/journal.pgen.1004820.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We thank the reviewers for their constructive comments on our manuscript and their appreciation of the results. We provide point-by-point responses bellow. For your convenience we highlight here the main changes to the manuscript.

      ·        More descriptive terminology for the contextual cues (Ctx.A / Ctx.noA is now referred to as LIGHT / DARK).

      ·        Schematic of experiment timeline highlighting the exclusion of non-discriminators following the initial acquisition period. This explains the absence of baseline sex differences post acquisition and clears up some misconceptions about lack of replicability.

      ·        New data (time in port preCS) showing that a prior reward does not cause continued presence in port.

      ·        Several text edits to address all the points raised by the reviewers.

      We hope that the editors and reviewers will be satisfied with this revised version and find the strength of the evidence more convincing.

      Reviewer #1 (Recommendations For The Authors):

      In relation to weaknesses points 1-4 in the public review:

      (1) With regards to the claim (page 4 of pdf), I think I can see what the authors are getting at when they claim "Only Ctx-dep.01 engages context-gated reward predictions", because the same reward is available in each context, and the animal must use contextual information to determine which cue will be rewarded. In other words, it has a discriminative purpose. In Ctx-dep.O1/O2, however, although the context doesn't serve a discriminative purpose in the sense that one cue will always earn a unique outcome, regardless of context, the fact that these cues are differentially rewarded in the different context means that animals may well form context-gated cue-outcome associations (e.g. CtxA-(CS1-O1), CtxnoA-(CS2-O2)). Moreover, the context is informative in this group in telling the animal which cue will be rewarded, even prior to outcome delivery, such that I don't think contextual information will fade to the background of the association and attention be lost to it in the way, say Mackintosh (1975) might predict. Therefore, I don't think this statement is correct.

      I suggest that the authors refine the statement to be more accurate.

      We agree with the reviewer —the context is absolutely relevant for rats trained in the Ctx-dep. O1/O2 task. We have edited the text in several places to make this clear. The question is how (by what mechanism) does the context participate in the control of behavior in this group. The reviewer correctly points out that, just like rats trained in the Ctx-dep. O1 task, rats trained in the Ctx-dep. O1/O2 might have formed context-gated cue-outcome associations. We now clearly acknowledge that in the text.

      However, because in this group the two outcomes are always encountered in different contexts, we argue that these rats could also have formed a direct association between the two contexts and the two outcomes. In other words, each context might directly evoke the expectation of a distinct reward outcome (prepare to drink, or prepare to eat). On a given trial, if the cue and context both tend to activate the same outcome representation, the converging cue+context excitation can add up. This would produce a context-sensitive response, but not via hierarchical modulation process (unlike Ctx-dep O1). Arguably, this last associative mechanism is much simpler and might explain why almost all rats in Ctx-dep. O1/O2 group learned the discrimination and at a much faster rate.

      Therefore, while rats trained in Ctx-dep O1/O2 might engage a combination of associative processes to achieve context-sensitive behavior (including hierarchical associations), only rats in the Ctx-dep O1 critically and unambiguously rely on hierarchical associations to achieve context-sensitive behavior.

      (2) I think the results shown in Figure 1 are very interesting, and well supported by the statistics. It's so nice to see a significant interaction, as so many papers try to report these types of effects without it. However, I do wonder how specific the results are to contextual modulation. That is, should a discriminative discrete cue be used instead of each context (e.g. CS1 indicates CS2 earns O1, CS3 indicates CS4 earns O1), would female rats still be as slow to learn the discrimination?

      I am just curious as to whether the authors have thoughts on this.

      We have not tested this and are not aware of a paper that examined this question specifically.

      However, we would like to point out that in the suggested design (CS1→[CS2→O1]; CS3→[CS4→O1]) the discriminative cues (CS1 and CS3) would almost certainly also acquire substantial reward-predictive value, either because of their direct association with the reward, or via second-order conditioning. This would complicate the interpretation of the results in terms of hierarchical associations. Incorporating non-rewarded presentation of CS1 and CS3 alone (i.e. extinguishing those cues, as is sometimes done in occasion setting experiments) would be one way to reduce the reward expectation evoked by those cues, but this approach has some limitations. Indeed, as mentioned by Rescorla (2006) “During extinction, the net associative strength of a stimulus declines to the level of [a response] threshold, but further decrement stops at that point”. So while extinguished CS1 and CS3 might no longer evoke overt behavioral responses, these cues could retain nonnegligible subthreshold excitatory connection with the US.  Individually, these cues might fail to evoke responding but could nonetheless increase responding during the CS1→CS2 trials (or CS3→CS4 trials), via simple summation. (Rescorla, 2006: “the compound of two [extinguished] stimuli has a strength that exceeds the threshold and so evokes responding”).

      This type of consideration is precisely why we opted for the behavioral task used in the study. In Ctx-dep. O1, the discriminative stimuli exert opposite effects on the two target cues, which rules out summation effects as a mechanism for context-sensitive behavior.

      (3) Pages 8-9 of pdf, where the biological basis or the delayed acquisition of contextual control in females is considered, I find this to be written from a place of assuming that what is observed in the males is the default behaviour. That is, although the estrous cycle and its effects on synaptic plasticity/physiology may well account for the results, is there not a similar argument to be made for androgens in males? Perhaps the androgens also somehow alter synaptic plasticity/physiology, leading to their faster speed, reduced performance stability, and increased susceptibility to stress.

      I would like the argument that female behaviour might be the default, and male behaviour the deviation to be considered in the discussion in addition to those already stated.

      We regret if we gave the impression that male behavior was the default. The paper is intended to report sex differences but we don’t view either sex as the default. To correct this impression, we have added a few sentences in the discussion to highlight male-hormonal factors as well as non-gonadal genetic factors that might have contributed to the observed sex differences.

      (4) In addition, the OFC - which is the brain region found to have differential expression of c-fos in males and females in Figure 5 - is not explicitly discussed with regard to the biological mechanisms of differences, which seems odd.

      I suggest OFC be discussed with regard to biological mechanisms of differences.

      We added a few sentences in the discussion to i) highlight the parallel between our study and human fMRI studies showing superior OFC activation in females during the regulation of emotional responses, ii) Suggest a potential relationship between the reported sex differences (speed of acquisition, robustness of performance, and OFC activation in context-gated reward prediction), iii) acknowledge our ignorance of the root causes of these sex differences.

      We wish we could offer a better answer. We have attempted to offer possible proximal explanations for the observed sex differences, but ultimately our work did not address the root causes of these behavioral and neural sex differences. Therefore we feel that further attempts to explain these differences would be too speculative.

      (5) I did wonder if the authors were aware that in the Rescorla-Wagner model, contextual stimuli are thought to summate with discrete cues to enter into the association with the outcome (i.e., the error term is between lambda and sigmaV, with sigmaV the 'summation' of all stimuli present on a trial, including contextual stimuli). Typically, this is not considered much, because the cue itself is so salient and more consistently paired with reward (whereas the ever-present context is often paired with no reward), but nevertheless, it is a part of the association. I'm not sure it's wrong to say that the background circumstances under which events occur are thought to play little role (as in the second sentence of the introduction), but I was wondering if the authors were aware of this fact when they wrote that.

      This sentence in the introduction was meant to introduce the distinction between eliciting stimuli and modulating contexts. Admittedly, this paints a naive picture, which we now acknowledge (we hope that the rest of the paper provides more nuance). As pointed out by this reviewer, the context is also a stimulus, and, just like any other stimulus, it is eligible for direct association with an outcome. The possibility for direct context→outcome association is precisely the rational for the Ctx-dep O1/O2 group.

      (6) Context-noA - Seems a little confusing for a name, why not just call it context B? NoA appears to imply that nothing happens in A or no outcome is available, whereas this is not always the case.

      We debated which terminology to use. We felt that “Context A vs. Context B” should perhaps be reserved to situations where the global context changes (e.g. two different conditioning boxes with different odors, floor texture etc., with proper counterbalancing procedures). We felt that “Context A vs noA” might be more appropriate here, as we are manipulating the local context by introducing (or removing) one single stimulus (the houselight). In this revised version we followed this reviewer’s advice and adopted a more descriptive, and hopefully less confusing, terminology: "Light vs Dark”.

      (7) Why is it that in the text the Ctx-dep O1/O2 is explained before simple and no discrimination, but in the Figure Ctx-dep O1/O2 is shown last? These should be consistent.

      Thanks for pointing that out. We have switched the order of task description to be consistent with the figures.

      (8) Page 6 (of pdf) - could the authors elaborate a little on why or how (or both) the delivery of reward can interfere with the expression of context-dependent discrimination? Do they just mean the performance of discrimination (e.g., animals will sit at the food port longer if there is food there because they are sitting there and eating it, which does not necessarily reflect the expectation of food based on cue presentations?), in which case it is not the discrimination itself that is being interfered with, just the measure of it. Perhaps the authors could elaborate by just inserting a sentence.

      We have added a few sentences to discuss this effect.

      The first clarification that we can make is that the reduced discrimination performance following reward is not simply due to animals’ continued presence in the reward port. We have added the time pre-cue to Fig. 3 B-F. This measure is not affected by previous reward history, showing that rats are leaving the port between trials.

      So what is driving this effect? At this stage, we are agnostic about the mechanism(s) for this effect. Kuchibhotla et al. (2019) —who first reported a similar effect— proposed a model in which recent rewards modify the threshold for behavioral responses (i.e. performance). In this model, a cue might evoke a weak reward prediction but evoke a strong behavioral response if presented after a reward. Additionally, we believe that learning factors might also contribute to the effect reported here. Indeed, the behavioral response on a given trial likely reflects the balance of hierarchical (context-dependent) associations vs. direct associations (Bradfield and Balleine, 2013). Naturally, this balance is dynamic and influenced by trial history. For instance, a Light:X+ trial might increase the value of cue X and promote responding during the following Dark:X- trial. The same logic could be applied to the influence of the context (e.g., Light:X+ trial might promote responding to a subsequent Light:Y- trial). We are currently working on a computational model that captures the dynamic interplay between hierarchical associations and direct associations. We hope that this model will provide some insight into the learning/performance mechanism for the effects reported here. However this computational work is still in the early stages and beyond the scope of the present study.

      (9) The lack of effect in the Ctx-dep O1/O2 groups in Figure 4 could be due to a lack of power - the group sizes are a lot smaller for this group than for Ctx-dep O1 where an interaction was detected. I think this should be at least addressed in the discussion (i.e., that this lack of effect is possibly due to less power here, as the effects are in the same direction).

      Good point. We now acknowledge this limitation in the text.

      Reviewer #2 (Recommendations For The Authors):

      (1) Please comment on the failure to replicate the sex differences across experiments. Perhaps this is due to some change in the training procedure that is briefly mentioned in the methods (a reduction in the number of rewarded trials) but it is unclear.

      The reviewer correctly observed that Fig. 3-5 do not show sex differences in baseline condition. This is not because of a replication failure, but because non-discriminating subjects were excluded from the experiment at the end of the acquisition period (after 72 training sessions). We now clarify this in the Method and Results section. We also added a schematic of the experiment timeline that highlights the exclusion of non-discriminators at the end of the acquisition period (Fig 1).

      On the topic of replicability, the data for Ctx-dep O1 was collected over 3 cohorts (over the course of 2 years) and the sex difference pattern was consistent.  For instance, the proportion of discriminators vs. non-discriminators for males and females trained in Ctx-dep O1, showed similar patterns across cohorts (see below).

      Author response table 1.

      (2) The design of this experiment makes it possible to analyse whether there is a differential outcome effect (DOE). The DOE would indeed predict better discrimination in group cxt-dep O1/O2 versus cxt-dep O1, which seems to be exactly what the authors observe although between-group statistics are not reported. Inspection of Figure 1 suggests that there may be a DOE in females but not in males. I wonder if the authors might consider reanalysing the data to check this.

      Indeed, there is clearly a differential outcome effect. We now point out this DOE in relation to the latency to achieve discrimination criterion (Fig. 2 C-D). Rats in the Ctx-dep. O1/O2 group acquired discrimination (reached criterion) much faster than rats in in the Ctx-dep. O1 group.

      Following the reviewer’s suggestion, we provide here the results of targeted ANOVAs (focusing exclusively on Ctx-dep. O1 and Ctx-dep. O1/O2) to investigate a potential sex-dependent effect of DOE (i.e. Sex x Task interactions), see figure below. A three-way ANOVA (Sex x Task x Session) conducted on the discrimination index reveal a main effect of Task (F1, 86 \= 173.560, P < 0.001), Session (F2.678, 230.329 \= 140.479, P<0.001) and a marginal effect of Sex (F1,86 = 3.929, P = 0.051), but critically no Task x Sex or Task x Sex x Session interaction (P ≥ 0.504). A two-way ANOVA (Sex x Task) conducted on the sessions to criterion revealed a main effect of both factors (Sex F1, 63 = 9.52, P = 0.003; Task F1, 62 = 184.143, P < 0.001) but critically, no Sex x Task interaction (P = 0.233).  These results indicate that the use of two different outcomes clearly facilitated the acquisition of context-dependent discrimination (DOE effect), but this effect benefited both sexes equally. We thank the reviewer for recommending this analysis.

      Author response image 1.

      Differential outcome effect (DOE) affects males and females equally. A. Discrimination ratio over the acquisition period. B. trials to criterion. Compared to animals trained with a single outcome (Ctx-dep. O1), the introducing dissociable outcomes for the two type of rewarded trials (Ctx-dep. O1/O2) profoundly facilitated the acquisition of discriminated behavior. This effect benefited both sexes equally.

      (3) Some minor points for clarification that the authors may also wish to address:

      - Figure 3: is data presented from sessions 71-80 only or for all sessions? I didn't fully follow the explanation offered in the results section.

      That’s right. The data presented in Fig. 3 considers only sessions 71-80, in discriminator rats —when performance is globally stable. We have edited the text to make this clearer. These 10 sessions represent a total of 800 trials (=10 session * 80 trials). The first trial of a session what not included in the analysis since it was not preceded by any trial. For the remaining 790 trials (10 session x 79 trials), we examined how the outcome of the past trial (reward or nonrewarded) influenced responding on the next trial.  This large sample size (790 trials / rat) was required to ensure that enough data was collected for each possible trial history scenario.

      - The authors argue that females are protected from the disrupting effect of stress. It might be useful if the authors offer further explanation as to what they mean by "protected".

      By “protected”, we simply mean “less sensitive”. We have reworded this sentence in that way. We do not claim to have an understanding of the precise mechanism for this sex dependent effect (although our data point to a possible role of the OFC).

      - The authors state that "delivery of reward, while critical for learning, can also interfere with the expression of context-dependent discrimination". This statement should be explained in further detail. For instance, why should reward delivery specifically impair context-dependent discrimination but not other forms of discrimination?

      We have reworded this sentence to be more inclusive. Indeed, delivery of reward also interferes with other forms of discrimination, particularly when discrimination performance is not yet optimal. We have also added a paragraph to discuss the possible mechanisms by which reward might interfere with discrimination performance in our task.   

      Reviewer #3 (Recommendations For The Authors):

      I do not suggest additional experiments, but I do hope you continue the behavioral work to characterize what is being learned in the task. I think the approach is promising. I would suggest reporting the % time in port and port entries for the entire CS. There is no justification for only analyzing the response in the last 5s.

      We thank the reviewer for the encouragement.

      We opted to focus on the time in port for two main reasons:

      (1) This measure is relatively consistent across the two different reward outcomes (unlike the rate of port entries). Indeed, consistent with prior studies (Delamater et al., 2017), we observed that the type of reward (solid or liquid) influences the topography of the anticipatory magazine-directed behavior. Specifically, cues paired with pellets elicited significantly more port entries than cues paired with chocolate milk. The opposite pattern was observed for time in port --cues paired with chocolate milk elicited more sustained time in port compared to cues paired with pellets (see figure below). While these measures (port entries and time in port) show opposite bias for the two possible outcomes, the size of this bias is much smaller for the time in port (Cohen’s d effect size: port entries: 1.41; time in port: 0.62). As a result, the discrimination ratio calculated from Time in port is consistent across the two outcomes (P = 0.078; effect size: 0.07), which is not the case for the discrimination ratio calculated from port entries (P = 0.007; effect size 0.32 see figure below).

      (2) Unlike the rate of port entries, the time in port shows monotonic increase during training in these tasks. Indeed, we observed here and in past work (Keiflin et al., 2019), that the rate of port entries initially increases with training, but then slightly decreases; particularly for cues paired with liquid reward. In contrast, the time in port continues to increase, or remains high, with extended training. This is easy to understand if we consider the extreme case of a hypothetical rat that might enter the port once upon cue presentation and maintain continued presence in port for the whole cue duration. This rat would have a relatively low rate of port entry (a single port entry per trial) but a high time in port.

      This is not to say that the rate of port entries is not a valid measure overall (we have used, and continue to use, this metric in other preparations). However, for the reasons explained above, we believe that the time in port is a better metric for reward anticipation in this specific study.

      Moreover, we chose to focus our analysis on the last 5s of the cue because that’s when anticipatory food cup behavior is more reliably observed (in our preparation >2/3 of the total time in port in occurs during the last 5s of the cue) and less contaminated by orienting behaviors (Holland, 1977, 1980, 2000). For these reasons, analysis of the last portion of the cue is relatively common in Pavlovian anticipatory approach preparations (El-Amamy and Holland, 2007; Olshavsky et al., 2013; Esber et al., 2015; Holland, 2016a, 2016b; Schiffino and Holland, 2016; Gardner et al., 2017; Sharpe et al., 2021; Maes et al., 2020; Sharpe et al., 2020; Siemian et al., 2021; Kang et al., 2021). Reporting time in port during the same cue epoch facilitates comparisons between these studies.

      We have edited the text in the Method section to provide a brief justification for focusing our analyses on this cue epoch.

      Author response image 2.

      Outcome identity influences the topography of the conditioned response. A-C: Conditioned responding expressed as the number of port entries per trial (A) or time in port per trials (C) for rats trained in the simple discrimination task with a chocolate milk reward (n= 19) or a sucrose pellet (n = 16). Data show the average of the last three 3 sessions. Compared to chocolate milk, pellets tend to produce more port entries. Conversely, chocolate milk tend to produce more time in port. However the magnitude of this bias is smaller for the Time in port. C-D: discrimination ratio calculate from the number of port entries (C) or the time in port (D); the latter is not affected by the outcome identity. *P<0.05; **P<0.01; ***P<0.001 T tests.

      The inconsistent use of terms is distracting throughout the paper. Is it discriminated or context-gated? Please provide a definition of your terms and then use them consistently. Is it a discriminative stimulus, a context, or an occasion setter? These all imply slightly different things and it would help the reader if you just used one term throughout the paper.

      Thanks for pointing that out. We have added a definition for “context-gated” and edited the text to keep the terminology consistent when appropriate. The words “discrimination”/”discriminated” still appear in the manuscript but without implying a mechanism (all tasks are variations of Pavlovian discrimination; the rats discriminating between rewarded and non-rewarded trials).

      As mentioned by this reviewer, the terms “context” and “occasion setter” are not synonymous. Therefore these terms still appear in the manuscript to refer to different concepts (e.g. in our task the visual stimulus is a context for all rats; this context acts as an occasion setter only for some rats).

      Minor:

      Intro, 2nd PP: "autism". This is abbreviated in the abstract but spelled out here. I suggest not abbreviating in the abstract and introducing abbreviations here, as you do with PTSD.

      Fixed as suggested

      Have deficits in contextual modulation been distinguished from potential deficits in binary associative learning in autism, PTSD, and substance use disorders? This is implied, but there are no citations provided.

      We provide a list of references showing deficits in contextual modulation in these disorders.

      This does not mean that these disorders are reducible to deficits in contextual modulation and it does not exclude other forms of deficits in those disorders --including alterations in certain aspects of binary associative learning.

      "In positive occasion-setting, animals learn that a target cue (X) results in a reward outcome (+) only when that cue is accompanied by a contextual feature (A); the same cue presented in absence of this contextual feature remains without consequence (A:X+ / X-)." - there are words missing in this sentence.

      We apologize but we fail identify the missing word(s). Perhaps the reviewer could be more specific and we will be happy to edit the sentence as needed.

      What is a contextual feature, is this redundant or can you provide a specific definition?

      We use the terminology “feature” and “target” as these are the standard terms in the description of occasion setting preparations (one stimulus, “the feature”, sets the occasion for responding –or not responding- to the “target” cue). By contextual feature, we meant that in this specific example the context was the feature. We have clarified this in the text. We believe that these terms are not redundant. Indeed, the context is not always a feature, and a feature is not necessarily a context (phasic cues can serve as “features”).

      Can you provide some background on studies of sex differences in simple associative learning? You imply these have been much more thoroughly studied than conditional discriminations.

      We added a few references as suggested.

      What is the rationale for studying stress?

      Stressful life events exacerbate several mental illnesses, potentially by impacting cognitive functions.

      Although the (sex-dependent) effects of stress on some cognitive function are well established (e.g. working memory, selective attention, spatial navigation), the effect of stress on contextual modulation (a core dysfunction in certain mental illnesses) --and the possible sex-differences in this effect-- had not been formally tested. We added a few sentences in the results section (at the beginning of the stress section) to remind the reminder of why we tested the effect of stress in this task.

      Method/Results:

      Cues are not counterbalanced; the feature is visual and targets are auditory - this should be noted as a limitation in the discussion section.

      We now acknowledge this limitation in the discussion. Moreover we believe that the new terminology for the context —Light vs Dark— (instead of A vs. noA in the original version) makes it abundantly clear that the “context” is this study was always visual.

      Summation is invoked to describe the discrimination with different outcomes, how is summation happening? This is not described. Perhaps incorporate the literature on conditional discriminations with differential outcomes (the "differential outcomes effect").

      We have edited the Result + Discussion section to clarify how summation might contribute to discrimination with different outcomes. We have also added references for the DOE in this task.

      The stress effect is confounded with test order; comparing stress vs. baseline.

      Sorry we don’t understand this point. The “baseline” refers to the animal’s performance on the last training session before the acute stress manipulation (we have edited the text to make this clear). Animals are first trained in the task and then we examine how stress alters their performance in this learned task. We don’t see how this could induce a test order confound.

      Throughout the results section, it would be helpful to have the number of animals reported for each analysis.

      The number of animals for each part of the experiment is now reported in the text, as well as in the figures.

      Discussion:

      "For Ctx-dep. O1, context is an occasion-setter, i.e. a stimulus that hierarchically modulates the associative strength between a target cue and its outcome." This is inaccurate. Occasion setters do not change or modulate the associative strength of a target cue. They modulate whether excitation or inhibition is expressed.

      We reworded the sentence as suggested: “For Ctx-dep. O1, context is an occasion-setter, i.e. a stimulus that modulates the response to a target cue”.

      "Together, these results indicate that the sex differences observed here are not attributable to simple associative, motivational, working-memory, or attentional processes, but are specific to the neurocomputational operations required for the hierarchical, contextual control of behavior." It should be noted here that the difference is one of degree, a quantitative difference, but not a difference in the qualitative features of the process.

      "Regardless of the precise mechanism, our results indicate that, compared to male rats, females ultimately achieved more stable contextual control over cued reward-seeking; their behavior remained context-regulated under stress or after recent rewards." Again this is a matter of degree.

      We absolutely agree. All the sex-difference reported here are a matter of degree. In the framework of McCarthy et al. (2012) the reported effects are type 2 or type 3 sex differences, not type 1 sexual dimorphism. We made a few edits in the Discussion to clarify this point.

      Procedure:

      Please clarify the percentage of trials that were reinforced in the No Discrimination group.

      From session 1-32 (acquisition period), 50% of the trials were reinforced. Following this acquisition period, only 25% of the trials were reinforced to match all the other groups. We have edited the method section to clarify this point.

      Please provide the dimensions of the restraint tubes and the model number if available.

      This information is now included.

      References

      Bradfield LA, Balleine BW (2013) Hierarchical and binary associations compete for behavioral control during instrumental biconditional discrimination. J Exp Psychol Anim Behav Process 39:2–13.

      Delamater AR, Garr E, Lawrence S, Whitlow JW (2017) Elemental, configural, and occasion setting mechanisms in biconditional and patterning discriminations. Behav Processes 137:40–52.

      El-Amamy H, Holland PC (2007) Dissociable effects of disconnecting amygdala central nucleus from the ventral tegmental area or substantia nigra on learned orienting and incentive motivation. Eur J Neurosci 25:1557–1567.

      Esber GR, Torres-Tristani K, Holland PC (2015) Amygdalo-striatal interaction in the enhancement of stimulus salience in associative learning. Behav Neurosci 129:87–95.

      Gardner MPH, Conroy JS, Shaham MH, Styer CV, Schoenbaum G (2017) Lateral Orbitofrontal Inactivation Dissociates Devaluation-Sensitive Behavior and Economic Choice. Neuron 96:1192–1203.e4.

      Holland PC (1977) Conditioned stimulus as a determinant of the form of the Pavlovian conditioned response. J Exp Psychol Anim Behav Process 3:77–104.

      Holland PC (1980) CS-US interval as a determinant of the form of Pavlovian appetitive conditioned responses. J Exp Psychol Anim Behav Process 6:155–174.

      Holland PC (2000) Trial and intertrial durations in appetitive conditioning in rats. Anim Learn Behav 28:121–135.

      Holland PC (2016a) Enhancing second-order conditioning with lesions of the basolateral amygdala. Behav Neurosci 130:176–181.

      Holland PC (2016b) Effects of amygdala lesions on overexpectation phenomena in food cup approach and autoshaping procedures. Behav Neurosci 130:357–375.

      Kang M, Reverte I, Volz S, Kaufman K, Fevola S, Matarazzo A, Alhazmi FH, Marquez I, Iordanova MD, Esber GR (2021) Agency rescues competition for credit assignment among predictive cues from adverse learning conditions. Sci Rep 11:16187.

      Keiflin R, Pribut HJ, Shah NB, Janak PH (2019) Ventral tegmental dopamine neurons participate in reward identity predictions. Curr Biol 29:93–103.e3.

      Kuchibhotla KV, Hindmarsh Sten T, Papadoyannis ES, Elnozahy S, Fogelson KA, Kumar R, Boubenec Y, Holland PC, Ostojic S, Froemke RC (2019) Dissociating task acquisition from expression during learning reveals latent knowledge. Nat Commun 10:2151.

      Maes EJP, Sharpe MJ, Usypchuk AA, Lozzi M, Chang CY, Gardner MPH, Schoenbaum G, Iordanova MD (2020) Causal evidence supporting the proposal that dopamine transients function as temporal difference prediction errors. Nat Neurosci 23:176–178.

      McCarthy MM, Arnold AP, Ball GF, Blaustein JD, De Vries GJ (2012) Sex differences in the brain: the not so inconvenient truth. J Neurosci 32:2241–2247.

      Olshavsky ME, Song BJ, Powell DJ, Jones CE, Monfils M-H, Lee HJ (2013) Updating appetitive memory during reconsolidation window: critical role of cue-directed behavior and amygdala central nucleus. Front Behav Neurosci 7:186.

      Rescorla RA (2006) Deepened extinction from compound stimulus presentation. J Exp Psychol Anim Behav Process 32:135–144.

      Schiffino FL, Holland PC (2016) Secondary visual cortex is critical to the expression of surprise-induced enhancements in cue associability in rats. Eur J Neurosci 44:1870–1877.

      Sharpe MJ, Batchelor HM, Mueller LE, Gardner MPH, Schoenbaum G (2021) Past experience shapes the neural circuits recruited for future learning. Nat Neurosci 24:391–400.

      Sharpe MJ, Batchelor HM, Mueller LE, Yun Chang C, Maes EJP, Niv Y, Schoenbaum G (2020) Dopamine transients do not act as model-free prediction errors during associative learning. Nat Commun 11:106.

      Siemian JN, Arenivar MA, Sarsfield S, Borja CB, Russell CN, Aponte Y (2021) Lateral hypothalamic LEPR neurons drive appetitive but not consummatory behaviors. Cell Rep 36:109615.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Major concerns:

      (1) Is the direct binding of MCAK to the microtubule cap important for its in vivo function?

      a.The authors claim that their "study provides mechanistic insights into understanding the end-binding mechanism of MCAK". I respectfully disagree. My concern is that the paper offers limited insights into the physiological significance of direct end-binding for MCAK activity, even in vitro. The authors estimate that in the absence of other proteins in vitro, ~95% of MCAK molecules arrive at the tip by direct binding in the presence of ~ physiological ATP concentration (1 mM). In cells, however, the major end-binding pathway may be mediated by EB, with the direct binding pathway contributing little to none. This is a reasonable concern because the apparent dissociation constant measured by the authors shows that MCAK binding to microtubules in the presence of ATP is very weak (69 uM). This concern should be addressed by 1) calculating relative contributions of direct and EB-dependent pathways based on the affinities measured in this and other published papers and estimated intracellular concentrations. Although there are many unknowns about these interactions in cells, a modeling-based analysis may be revealing. 2) the recapitulation of these pathways using purifying proteins in vitro is also feasible. Ideally, some direct evidence should be provided, e.g. based on MCAK function-separating mutants (GDP-Pi tubulin binding vs. catalytic activity at the curled protofilaments) that contribution from the direct binding of MCAK to microtubule cap in EB presence is significant.

      We thank the reviewer for the thoughtful comments.

      (1) We think that the end-binding affinity of MCAK makes a significant contribution for its cellular functions. To elucidate this concept, we now use a simple model shown in Supplementary Appendix-2 (see pages 49-51, lines 1246-1316). In this model, we simplified MCAK and EB1 binding to microtubule ends by considering only these two proteins while neglecting other factors (e.g. XMAP215). Specifically, we considered two scenarios: one in which both proteins freely diffuse in the cytoplasm and another where MCAK is localized to specific cellular structures, such as the centrosome or centromere. Based on the modeling results, we argue that MCAK's functional impact at microtubule ends derives both from its intrinsic end-binding capacity and its ability to strengthen the EB1-mediated end association pathway.

      (2) We agree with the reviewer that MCAK exhibiting a lower end-binding affinity (69 µM) is indeed intriguing, as one might intuitively expect a stronger affinity, e.g. in the nanomolar range. Several factors may contribute to this observation. First, this could be partly due to the in vitro system employed, which may not perfectly replicate in vivo conditions, especially when considering cellular processes quantitatively. Variations in medium composition can significantly influence the binding state. For example, reducing salt concentration leads to a marked increase in MCAK’s binding affinity (Helenius et al., 2006; Maurer et al., 2011; McHugh et al., 2019). Additionally, while numerous binding events with short durations were detected, we excluded transient interactions from our analysis to facilitate quantification. This likely leads to an underestimation of the on-rate and, consequently, the binding affinity. Moreover, to minimize the interference of purification tags (His-tag), we ensured their complete removal during protein sample preparation. Previous studies reported that retaining the His-tag of MAPs affects the binding affinity to microtubules (Maurer et al., 2011; Zhu et al., 2009). Finally, a low affinity is not necessarily unexpected. Considering the microtubule end as a receptor with multiple binding sites for MCAK, the overall binding affinity is in the nanomolar range (260 nM). This does not necessarily contradict MCAK being a microtubule dynamics regulator as only a few MCAK molecules may suffice to induce microtubule catastrophe (as discussed on page 13, lines 408-441).

      (3) Ideally, we would search for mutants that specifically interfere with the binding of GDP-Pi-tubulin or the curled protofilaments. However, the mutant we tested significantly impacts the overall affinity of MCAK to microtubules (both end and lattice), making it challenging to isolate and discuss the function of MCAK with respect to the binding to GDP-Pi-tubulin alone. Additionally, we also think that the GDP-Pi-tubulin in the EB cap and the tubulin in the curved protofilaments may share structural similarities. For instance, the tubulin dimers in both states may be less compact compared to those in the lattice, which could explain why MCAK recognizes both simultaneously (Manka and Moores, 2018). However, this remains a conjecture, as there is currently no direct evidence to support it.

      b. As mentioned in the Discussion, preferential MCAK binding to tubulins near the MT tip may enhance MCAK targeting of terminal tubulins AFTER the MCAK has been "delivered" to the distal cap via the EB-dependent mechanism. This is a different targeting mechanism than the direct MCAK-binding. However, the measured binding affinity between MCAK and GMPCPP tubulins is so weak (69 uM), that this effect is also unlikely to have any impact because the binding events between MCAK and microtubule should be extremely rare. Without hard evidence, the arguments for this enhancement are very speculative.

      Please see our response to the comment No. 1. Additionally, we have revised our discussion to discuss the end-binding affinity of MCAK as well as its physiological relevance (please see page 13, lines 408-441; and see Supplementary Appendix-2 in pages 49-51, lines 1246-1316).

      (2) The authors do not provide sufficient justification and explanation for their investigation of the effects of different nucleotides in MCAK binding affinity. A clear summary of the nucleotide-dependent function of MCAK (introduction with references to prior affinity measurements and corresponding MCAK affinities), the justifications for this investigation, and what has been learned from using different nucleotides (discussion) should be provided. My take on these results is that by far the strongest effect on microtubule wall and tip binding is achieved by adding any adenosine, whereas differences between different nucleotides are relatively minor. Was this expected? What can be learned from the apparent similarity between ATP and AMPPNP effects in some assays (Fig 1E, 4C, etc) but not others (Fig 1D,F, etc)?

      We thank the reviewer for this suggestion. We have revised the manuscript accordingly, and below are the main points of our response

      (1) The experiment investigating the effects of different nucleotides on MCAK binding affinity was inspired by the previous studies demonstrating that kinesin-13 interactions with microtubules are highly dependent on their adenosine-bound states. For example, kinesin-13s tightly bind microtubules and prefer to form protofilament curls or rings with tubulin in the AMPPNP state, whereas kinesin-13s are considered to move along the microtubule lattice via one-dimensional diffusion in the ADP·Pi state (Asenjo et al., 2013; Benoit et al., 2018; Friel and Howard, 2011; Helenius et al., 2006). Based on these observations, we wondered whether MCAK's adenosine-bound states might similarly affect its binding preference for growing microtubule ends. We have made the motivation clear in the revised manuscript (please see page 7, lines 199-209).

      (2) Our main finding regarding the effects of nucleotides is that MCAK shows differential end-binding affinity and preference based on its nucleotide state. First, MCAK shows the greatest preference for growing microtubule ends in the ATP state, supporting the idea that diffusive MCAK (MCAK·ATP) can directly bind to growing microtubule ends. Second, MCAK·ATP also demonstrates a binding preference for GTPγS microtubules and the ends of GMPCPP microtubules. The similar trends in binding preference suggest that the affinity for GDP·Pi-tubulin and GTP-tubulin likely underpins MCAK’s preference for growing microtubule ends. To clarify these points, we have added further discussions in the manuscript (please see page 8, lines 230-233; page9, lines 258-270 and pages 13-14, lines 443-458).

      (3) It is not clear why the authors decided to use these specific mutant MCAK proteins to advance their arguments about the importance of direct tip binding. Both mutants are enzymatically inactive. Both show roughly similar tip interactions, with some (minor) differences. Without a clear understanding of what these mutants represent, the provided interpretations of the corresponding results are not convincing.

      We thank the reviewer for this comment. In the revised manuscript, we no longer draw conclusions about the importance of end-binding based on the mutant data. Instead, we think that the mutant data provide insights into the structural basis of the end-binding preference. Therefore, we have rewritten the results in this section to more accurately reflect these findings (please see page 10, lines 295-327).

      (4) GMPCPP microtubules are used in the current study to represent normal dynamic microtubule ends, based on some published studies. However, there is no consensus in the field regarding the structure of growing vs. GMPCPP-stabilized microtubule ends, which additionally may be sensitive to specific experimental conditions (buffers, temperature, age of microtubules, etc). To strengthen the authors' argument, Taxol-stabilized microtubules should be used as a control to test if the effects are specific. Additionally, the authors should consider the possibility that stronger MCAK binding to the ends of different types of microtubules may reflect MCAK-dependent depolymerization events on a very small scale (several tubulin rows). These nano-scale changes to tubulins and the microtubule end may lead to the accumulation of small tubulin-MCAK aggregates, as is seen with other MAPs and slowly depolymerizing microtubules. These effects for MCAK may also depend on specific nucleotides, further complicating the interpretation. This possibility should be addressed because it provides a different interpretation than presented in the manuscript.

      Regarding the two points raised here, our thoughts are as following

      (1) The end of GMPCPP-stabilized microtubules differs from that of growing microtubules, with the most obvious known difference being the absence of the region enriched in GDP-Pi-tubulin. We consider the end of GMPCPP microtubules as an analogue of the distal tip of growing microtubules, based on two key features: (1) curled protofilaments and (2) GMPCPP-tubulin, a close analogue of GTP-tubulin. Notably, both features are present at the ends of both GMPCPP-stabilized and growing microtubules. Moreover, we agree with the suggestion to use taxol-stabilized microtubules as a control. This would eliminate the second feature (absence of GTP-tubulin), allowing us to isolate the effect of the first feature. Therefore, we conducted this experiment, and our data showed that MCAK exhibits only a mild binding preference for the ends of taxol-stabilized microtubules, which is much less pronounced than for the ends of GMPCPP microtubules. This observation supports the idea that GMPCPP-stabilized ends closely resemble the growing ends of microtubules.

      (2) The reviewer suggested that stronger MCAK binding to the ends of different types of microtubules might reflect MCAK-dependent depolymerization events on a very small scale. This is an insightful possibility, which we had overlooked in the original manuscript. Fortunately, we performed the experiments at the single-molecule concentrations. Upon reviewing the raw data, we found that under ATP conditions, the binding events of MCAK were not cumulative (see Fig. X1 below) and showed no evidence of local accumulation of MCAK-tubulin aggregates.

      Author response image 1.

      The representative kymograph showing GFP-MCAK binding at the ends and lattice of GMPCPP microtubules in the presence of 1 mM ATP (10 nM GFP-MCAK), which corresponded to Fig. 5A. The arrow: the end-binding of MCAK. Vertical bar: 1 s; horizontal bar: 2 mm.

      (5) It would be helpful if the authors provided microtubule polymerization rates and catastrophe frequencies for assays with dynamic microtubules and MCAK in the presence of different nucleotides. The video recordings of microtubules under these conditions are already available to the authors, so it should not be difficult to provide these quantifications. They may reveal that microtubule ends are different (or not) under the examined conditions. It would also help to increase the overall credibility of this study by providing data that are easy to compare between different labs.

      We thank the reviewer for this suggestion. In the revised manuscript, we have provided data on the growth rates, which are similar across the different nucleotide states (Fig. s1). However, due to the short duration of our recordings (usually 5 minutes, but with a high frame rate, 10 fps), we did not observe many catastrophe events, which prevented us from quantifying catastrophe frequency using the current dataset. Since we measured the binding kinetics of MCAK during the growing phase of microtubules, the similar growth rates and microtubule end morphologies suggest that the microtubule ends are comparable across the different conditions.

      Reviewer #1 (Recommendations For The Authors):

      a. Please provide more details about how the microtubule-bound molecules were selected for analysis (include a description of scripts, selection criteria, and filters, if any). Fig 1A arrows do not provide sufficient information.

      We first measured the fluorescence intensity of each binding event. A probability distribution of these intensities was then constructed and fitted with a Gaussian function. A binding event was considered to correspond to a single molecule if its intensity fell within μ±2σ of the distribution. The details of the single-molecule screening process are now provided in the revised manuscript (see page17, lines 574-583).

      b. Evidence that MCAK is dimeric in solution should be provided (gel filtration results, controls for Figs1A - bleaching, or comparison with single GFP fluorophore).

      In the revised manuscript, we provide the gel filtration results of purified MCAK and other proteins used in this study. The elution volume of the peak for GFP-MCAK corresponded to a molecular weight range between 120 kDa (EB1-GFP dimer) and 260 kDa (XMAP215-GFP-his6), suggesting that GFP-MCAK exists as a dimer (~220 kDa) under experimental condition (please see Fig.s1 and page 5, lines 104-105). In addition, we also measured the fluorescence intensity of both MCAK<sup>sN+M</sup> and MCAK. MCAK<sup>sN+M</sup> is a monomeric mutant that contains the neck domain and motor domain (Wang et al., 2012). The average intensity of MCAK<sup>sN+M</sup> is 196 A.U., about 65% of that of MCAK (300 A.U.). These two measurements suggest that the purified MCAK used in this study exists dimers (see Fig. s1).

      c. Evidence that MCAK on microtubules represents single molecules should be provided (distribution of GFP brightness with controls - GFP imaged under identical conditions). Since assay buffers include detergent, which is not desirable, all controls should be done using the same assay conditions. The authors should rule out that their main results are detergent-sensitive.

      (1) Regarding if MCAK on microtubules represent single molecules: please refer to our responses to the two points above.

      (2) To rule out the effect of tween-20 (0.0001%, v/v), we performed additional control experiments. The results showed that it has no significant effect on microtubule-binding affinity of MCAK (see Figure below).

      Author response image 2.

      Tween-20 (0.0001%, v/v) has no significant effect on microtubule-binding affinity of MCAK. (A) The representative projection images of GFP-MCAK (5 nM) binding to taxol-stabled GDP microtubules in the presence of 1 mM AMPPNP with or without tween-20. The upper panel showed the results of the control experiments performed without MCAK. Scale bar: 5 mm. (B) Statistical quantification of the binding intensity of GFP-MCAK binding to GDP microtubules with or without tween-20 (53 microtubules from 3 assays and 70 microtubules from 3 assays, respectively). Data were presented as mean ± SEM. Statistical comparisons were performed using the two-tailed Mann-Whitney U test with Bonferroni correction, n.s., no significance.

      d. How did the authors plot single-molecule intensity distributions? I am confused as to why the intensity distribution for single molecules in Fig 1D and 2A looks so perfectly smooth, non-pixelated, and broader than expected for GFP wavelength. Please provide unprocessed original distributions, pixel size, and more details about how the distributions were processed.

      In the revised manuscript, we provided unprocessed original data in Fig. 1B and Fig. 2A. We thank the reviewer for pointing out this problem.

      e. Many quantifications are based on a limited number of microtubules and the number of molecules is not provided, starting from Fig 1D and down. Please provide detailed statistics and explain what is plotted (mean with SEM?) on each graph.

      We performed a thorough inspection of the manuscript and corrected the identified issues.

      f. Plots with averaged data should be supplemented with error bars and N should be provided in the legend. E.g. Fig 1C - average position of MT and peak positions.

      We agree with the reviewer. In the revised manuscript, we have made the changes accordingly (e.g. Fig. 2C).

      g. Detailed information should be provided about protein constructs used in this work including all tags. The use of truncated proteins or charged/bulky tags can modify protein-microtubule interactions.

      We agree with the reviewer. In the revised manuscript, we provide the information of all constructs (see Fig. s1 and the related descriptions in Methods, pages 15-16, lines 476-534).

      h. Line 515: We estimated that the accuracy of microtubule end tracking was ~6 nm by measuring the standard error of the distribution of the estimated error in the microtubule end position. - evidence should be provided using the conditions of this study, not the reference to the prior work by others.

      i. Line 520: We estimated that the accuracy of the measured position was ~2 nm by measuring the standard error of the fitting peak location". Please provide evidence.

      Point h-i: we now provide detailed descriptions of how to estimate tracking and measurement accuracy and error in our work. Please see pages 18-19, lines 626-645.

      j. Kymographs in Fig 5G are barely visible. Please provide single-channel greyscale images. What are the dim molecules diffusing on this microtubule?

      We have incorporated the changes suggested by the reviewer. We think that some of the dim signals may result from stochastic background noise, while others likely represent transient bindings of MCAK. The exposure time in our experiments was approximately 0.05 seconds; if the binding duration were shorter than this, the signal would be lower (i.e. the “dim” signals). It is important to note that in this study, we selected binding events lasting at least 2 consecutive frames, meaning transient binding events were not included. This point has been clarified in the Methods section (see page17, lines 573-583).

      k. Please provide a methods description for Fig 6. Did the buffer include 1 mM ATP? The presence of ATP would make these conditions more physiological. ATP concentration should be stated clearly in the main text or figure legend.

      The buffer contains ATP. In the revised manuscript, we have provided the methods for the experiments of microtubule dynamics assay, as well as the analysis of microtubule lifetimes and catastrophe frequency (see page 17, lines 561-572 and page 20, lines 685-690).

      l. Line 104: experiment was performed in BRB80 supplemented with 50 mM KCl and 1 mM ATP, providing a nearly physiological ion strength. Please provide a reference or add your calculations in Methods.

      We have provided references on page 5, lines 101-104 of our manuscript.

      m. What was the MCAK concentration in Figure 4? Did the microtubule shorten under any of these conditions?

      In these experiments, we used a very low concentration of MCAK and taxol-stabilized microtubules, so there’s no microtubule shortening observed here. ATP: 10 nM GFP-MCAK; AMPPNP: 1 nM GFP-MCAK; ADP: 10 nM GFP-MCAK; APO state: 0.1 nM GFP-MCAK.

      Other criticism:

      Text improvements are recommended in the Discussion. For example, line 348: Fourth, the loss of the binding preference.. suggests that the binding preference .. is required for the optimal .. preference.

      We thank the reviewer for pointing out this. In the revised manuscript, we conducted a thorough revision and review of the text.

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript, Chen et al. investigate the localization of microtubule kinesin-13 MCAK to the microtubule ends. MCAK is a prominent microtubule depolymerase whose molecular mechanisms of action have been extensively studied by a number of labs over the last ~twenty years. Here, the authors use single-molecule approaches to investigate the precise localization of MCAK on growing microtubules and conclude that MCAK preferentially binds to a GDP-Pi-tubulin portion of the microtubule end. The conclusions are speculative and not well substantiated by the data, making the impact of the study in its current form rather limited. Specifically, greater effort should be made to define the region of MCAK binding on microtubule ends, as well as its structural characteristics. Given that MCAK has been previously shown to effectively tip-track growing microtubule ends through an established interaction with EB proteins, the physiological relevance of the present study is unclear. Finally, the manuscript does not cite or properly discuss a number of relevant literature references, the results of which should be directly compared and contrasted to those presented here.

      We thank the reviewer for the comments. As these suggestions are more thoroughly expressed in the following comments for authors, we will provide the responses in the corresponding sections, as shown below.

      Reviewer #2 (Recommendations For The Authors):

      Significant concerns:

      (1) Establishing the precise localization of MCAK wrt microtubule end is highly non-trivial. More details should be provided, including substantial supplementary data. In particular, the authors claim ~6 nm accuracy in microtubule end positioning - this should be substantiated by data showing individual overlaid microtubule end intensity profiles as well as fits with standard deviations etc. Furthermore, to conclude that MCAK binds behind XMAP215, the authors should look at the localization of the two proteins simultaneously, on the same microtubule end. Notably, EB binding profiles are well known to exponentially decay along the microtubule lattice - this is not very apparent from the presented data. If MCAK's autonomous binding pattern matches that of EB, we should be seeing an exponentially-decaying localization for MCAK as well? However, averaged MCAK signals seem to only be fitted to Gaussian. Note that the EB binding region (i.e. position and size of the EB comet) can be substantially modulated by increasing the microtubule growth rate - this can be easily accomplished by increasing tubulin concentrations or the addition of XMAP215 (e.g. see Maurer et al. Cur Bio 2014). Thus to establish that MCAK on its own binds the same region as EB, experiments that directly modulate the size and the position of this region should be added.

      (1) We thank the reviewer for this comment. Regarding the accuracy in microtubule end positioning, we now provide more details, and please see pages 18-19, lines 625-645 in the revised manuscript.

      (2) Regarding the relative localization of XMAP215 and MCAK, we performed additional experiments to record their colocalizations simultaneously, on the same microtubule end. Our results showed that MCAK predominantly binds behind XMAP215, with 14.5% appearing within the XMAP215’s binding region. Please see Fig. 2.D-E and lines 184-197 in the revised manuscript.

      (3) Regarding the exponential decay of the EB1 signal along microtubules, we observed that the position probability distribution measured in the present study follows a Gaussian distribution, and the expected exponential decay was not apparent. Since the exponential decay is thought to result from the time delay between tubulin polymerization and GTP hydrolysis, slower polymerization is expected to reduce this latency (Maurer et al., 2014). In our experiments, the growth rate was relatively low (~0.7 mm/min), much slower than the rate observed in cells, where the comet-shaped EB1 signal is most pronounced. The previous study has shown that the exponential decay of EB1 is more pronounced at growth rates exceeding 3 mm/min in vitro (Maurer et al., 2014). Therefore, we think that the relatively slow growth may account for the observed non-exponential decay distribution of the EB1 signals. The same reason may also explain the distribution of MCAK.

      (4) We agree with the reviewer’s suggestion that altering microtubule growth rate is a valid and effective approach to regulate the EB cap length. However, the conclusion that MCAK binds to the EB region is supported by three lines of evidence: (1) the localization of MCAK at the ends of microtubules, (2) new experimental data showing that MCAK binds to the proximal end of the XMAP215 site, and (3) the tendency of MCAK to bind GTPγS microtubules, similar to EB1. Based on these findings, we did not pursue additional experiments to modify the length of the EB cap.

      (2) Even if MCAK indeed binds behind XMAP215, there is no evidence that this region is defined by the GDP-Pi nucleotide state; it could still be curved protofilaments. GTPyS is an analogue of GTP - to what extent GTPyS microtubules exactly mimic the GDP-Pi-tubulin state remains controversial. Furthermore, nucleotide sensing for EB is thought to be achieved through its binding at the interface of four tubulin dimers. However MCAK's binding site is distinct, and it has been shown to recognize intradimer tubulin curvature. Thus it is not clear how MCAK would sense the nucleotide state. On the other hand, there is mounting evidence that the morphology of the growing microtubule end can be highly variable, and that curved protofilaments may be protruding off the growing ends for tens of nanometers or more, previously observed both by EM as well as by fluorescence (e.g. Mcintosh, Moores, Chretien, Odde, Gardner, Akhmanova, Hancock, Zanic labs). Thus, to establish that MCAK indeed localizes along the closed lattice, EM approaches should be used.

      First, we conducted additional experiments that demonstrate MCAK indeed binds behind XMAP215, supporting the conclusion that MCAK interacts with the EB cap (please see Fig. 2 in the revised manuscript). Second, our argument that MCAK preferentially binds to GDP-Pi tubulin is based on two observations: (1) the binding regions of MCAK overlap with those of EB1, and (2) MCAK preferentially binds to GTPγS microtubules, which are considered a close analogue of GDP-Pi tubulin. Third, understanding the structural basis of how MCAK senses the nucleotide state of tubulin is beyond the scope of the present study. However, inspired by the reviewer’s suggestion, we looked into the structure of the MCAK-tubulin complex. The L2 loop of MCAK makes direct contact with the interdimer interface (Trofimova et al., 2018; Wang et al., 2017), which could provide a structural basis for recognizing the changes induced by GTP hydrolysis. While this remains a hypothesis, it is certainly a promising direction for future research. Forth, we agree with the reviewer that an EM approach would be ideal for establishing that MCAK localizes along the closed lattice. However, this is not the focus of the current study. Instead, we argue that MCAK binds to the EB cap, where at least some lateral interactions are likely to have formed.

      (3) The physiological relevance of the study is rather questionable: MCAK has been previously established to be able to both diffuse along the microtubule lattice (e.g. Helenius et al.) as well as hitchhike on EBs (Gouveia et al.). Given the established localization of EBs to growing microtubule ends in cells, and apparently higher affinity of MCAK for EB vs. the microtubule end itself (although direct comparisons with the literature have not been reported here), the relevance of MCAK's autonomous binding to dynamic microtubule ends is dubious.

      We thank the reviewer for raising the importance of physiological relevance. Please refer to our response to the comment No.1 of reviewer 1. Briefly, we think that the end-binding affinity of MCAK makes a significant contribution for its cellular functions. To elucidate this concept, we now use a simple model shown in Supplementary Appendix-2 (see pages 49-51, lines 1246-1316). In this model, we simplified MCAK and EB1 binding to microtubule ends by considering only these two proteins while neglecting other factors (e.g. XMAP215). Specifically, we considered two scenarios: one in which both proteins freely diffuse in the cytoplasm and another where MCAK is localized to specific cellular structures, such as the centrosome or centromere. Based on the modeling results, we argue that MCAK's functional impact at microtubule ends derives both from its intrinsic end-binding capacity and its ability to strengthen the EB1-mediated end association pathway.

      (4) Finally, the study seriously lacks discussion of and comparison with the existing literature on this topic. There are major omissions in citing relevant literature, such as e.g. landmark study by Kinoshita et al. Science 2001. Several findings reported here directly contradict previous findings in the literature. Direct comparison with e.g. Gouveia et al findings, Helenius et al. findings, and others need to be included. For example, Gouveia et al reported that EB is necessary for MCAK plus-end-tracking in vitro (please see Figure 1 of their manuscript). The authors should discuss how they reconcile the differences in their findings when compared to this earlier study.

      We thank the reviewer for this helpful suggestion. In the revised manuscript, we have updated the text description and included comparative discussions with other relevant studies in the Discussion section. Specifically, we added comparisons with the research on XMAP215 in page 14, lines 459-472 (Barr and Gergely, 2008; Kinoshita et al., 2001; Tournebize et al., 2000). Additionally, we have compared our findings with those of Gouveia et al. and Helenius et al. regarding MCAK's preference for binding microtubule ends in page 6, lines 145-157 and page 13, 408-441, respectively (Gouveia et al., 2010; Helenius et al., 2006).

      Additional specific comments:

      Figure 1

      Gouveia et al. (Figure 1) reported that MCAK does not autonomously preferentially localize to growing tips. Specifically, Gouveia et al. found equal association rates of MCAK to both the lattice and the tip in the presence of EB3delT, an EB3 construct that does not directly interact with MCAK. How can these findings be reconciled with the results presented here?

      We are uncertain why there was no observed difference in the on-rates to the lattice and the end in the study by Gouveia et al. Even when considering only the known affinity of MCAK for curved protofilaments at the distal tip of growing microtubules, we would still expect to observe an end-binding preference. After carefully comparing the experimental conditions, we nevertheless identified some differences. First, we used a 160 nm tip size to calculate the on-rate (k<sub>on</sub>), whereas Gouveia et al. used a 450 nm tip. Using a longer tip size would naturally lead to a smaller(k<sub>on</sub>) value. Note that we chose 160 nm for several reasons: (i) a previous cryo-electron tomography study has elucidated that the sheet structures of dynamic microtubule ends have an average length of around 180 nm (Guesdon et al., 2016); (ii) Analysis of fluorescence signals at dynamic microtubule ends has demonstrated that the taper length at the microtubule end is less than 180 nm (Maurer et al., 2014); (iii) in the present study, we estimated that the length of MCAK's end-binding region is approximately 160 nm. Second, in Gouveia et al., single-molecule binding events were recorded in the presence of 75 nM EB3ΔT, which could potentially create a crowded environment at the tip, reducing MCAK binding. Third, as mentioned in our response to Reviewer 1, we took great care to minimize the interference from purification tags (e.g., His-tag) by ensuring their complete removal during protein preparation. Previous studies reported that retaining the His-tag of MAPs led to a significant increase in binding for microtubules (Maurer et al., 2011; Zhu et al., 2009). We believe that some of the factors mentioned above, or their combined effects, may account for the differences in these two observations.

      1C shows the decay of tubulin signal over several hundred nm - should show individual traces? How aligned? Doesn't this long decay suggest protruding protofilaments? (E.g. Odde/Gardner work).

      (1) In the revised manuscript, we now show individual traces (e.g. in Fig. 1B and Fig. 2A). The average trace for tubulin signal with standard deviation was shown in Fig. 2C.

      (2) The microtubule lattice was considered as a Gaussian wall and its end as a half-Gaussian in every frame. Use the peak position of the half-Gaussian of every frame to align and average microtubule end signals, during the dwell time. The average microtubule ends' half-Gaussion peak used as a reference to measure the intensity profile of individual single-molecule binding event in every frame (see page18, lines 607-624).

      (3) We think that the decay of tubulin signal results from the convolution of the tapered end structure and the point spread function. In the revised manuscript, we have updated the Figures to provide unprocessed original data in Fig. 1B and Fig. 2A.

      Please show absolute numbers of measurements in 1C (rather than normalized distribution only).

      In the revised manuscript, we have included the raw data for both tubulin and MCAK signals as part of the methods description. In Fig. 1, using normalized values allows for the simultaneous representation of microtubule and protein signals on a unified graph.

      How do the results in 1D-G compare with the previous literature? Particularly comparison of on-rates between this study and the Gouveia et al? Assuming 1 um = 1625 dimers, it appears that in the presence of EB3, the on-rate of MCAK to the tips reported in Gouveia et al. is an order of magnitude higher than reported here in the absence of EB3 (4.3 x 10E-4 vs. 2 x 10E-5). If so, and given the robust presence of EB proteins at growing microtubule ends in cells, this would invalidate the potential physiological relevance of the current study. Note that the dwell times measured in Gouveia et al. are also longer than those measured here.

      Note that in Gouveia et al, the concentration of mCherry-EB3 was 75 nM, about 187.5 times higher than that of MCAK (0.4 nM). The relative concentrations of these two proteins are not always the case in cells. Regarding the physiological relevance of the end-binding affinity of MCAK itself, please refer to our response to the point No.1 of Reviewer 1.

      Notably, Helenius et al reported a diffusion constant for MCAK of 0.38 um^2/s, which is more than an order of magnitude higher than reported here. The authors should comment on this!

      In the revised manuscript, we have provided an explanation for the difference in diffusion coefficient. Please see page 6, line 142-157. In short, low salt condition facilitates rapid diffusion of MCAK.

      Figure 2:

      This figure is critical and really depends on the analysis of the tubulin signal. Note significant variability in tubulin signal between presented examples in 2A. Also, while 2C looks qualitatively similar, there appears to be significant variability over the several hundred nm from the tip along the lattice. This is the crucial region; statistical significance testing should be presented. More detailed info, including SDs etc. is necessary.

      In the revised manuscript, we have provided raw data in Fig. 1B and Fig. 2A. Additionally, we have provided statistical analysis on the tubulin signals (Fig. 2C) and performed significance test. Please see page 5, lines 111-116 and page 7, lines 179-183 for detailed descriptions.

      Insights into the morphology of microtubule ends based on TIRF imaging have been previously gained in the literature, with reports of extended tip structures/protruding protofilaments (see e.g. Coombes et al. Cur Bio 2013, based on the methods of Demchouk et al. 2011). Such analysis should be performed here as well, if we are to conclude that nucleotide state alone, as opposed to the end morphology, specifies MCAK's tip localization.

      We appreciate the reviewer’s suggestion and agree that it provides a valid optical microscopy-based approach for estimating microtubule end morphology. However, this method did not establish a direct correlation between microtubule end morphology and tubulin nucleotide status. Therefore, we think that refining the measurement of microtubule end morphology will not necessarily provide more information to the understanding of tubulin nucleotide status at MCAK binding sites. Based on the available data in the present study, there are two main pieces of evidence supporting the idea that MCAK can sense tubulin nucleotide status: (1) the binding regions of MCAK and EB overlap significantly, and (2) MCAK shows a clear preference for binding to GTPγS microtubules, similar to EB1 (we provide a new control to support this, Fig. s4). Of course, we do not consider this to be a perfect set of evidence. As the reviewer has pointed out here and in other suggestions, future work should aim to further distinguish the nucleotide status of tubulin in the dynamic versus non-dynamic regions at the ends of microtubules, and to investigate the structural basis by which MCAK recognizes tubulin nucleotide status.

      EB comet profile should be clearly reproduced. MCAK should follow the comet profile.

      Please see our 3<sup>rd</sup> response to the point 1 of this reviewer.

      The conclusion that the MCAK binding region is larger than XMAP215 is not firm, based on the data presented. The authors state that 'the binding region of MCAK was longer than that of XMAP215'. What is the exact width of the region of the XMAP215 localization and how much longer is the MCAK end-binding region? Is this statistically significant?

      We have revised this part in the revised manuscript (page 6, lines 167-172). The position probability distributions of MCAK and XMAP215 were significantly different (K-S test, p< 10<sup>-5</sup>), and the binding region of MCAK (FWHM=185 nm) was significantly longer than that of XMAP215 (FWHM=123 nm).

      MCAK localization with AMPPNP should also be performed here. Even low concentrations of MCAK have been shown to induce microtubule catastrophe/end depolymerization. This will dramatically affect microtubule end morphology, and thus apparent positioning of MCAK at the end.

      In the end positioning experiment, we used a low concentration of MCAK (1 nM). Under this condition, microtubule dynamics remained unchanged, and the morphology of the microtubule ends was comparable across different conditions (with EB1, MCAK or XMAP215). Additionally, in the revised manuscript, we present a new experiment in which we recorded the localization of both MCAK and XMAP215 on the same microtubule. The results support the conclusion regarding their relative localization: most MCAK is found at the proximal end of the XMAP215 binding region, while approximately 15% of MCAK is located within the XMAP215 binding region. Please see Fig. 2D-E and page 7, lines 184-197 for the corresponding descriptions.

      Figure 3:

      For clearer presentation, projections showing two microtubule lattice types on the same image (in e.g. two different colors) should be shown first without MCAK, and then with MCAK.

      We thank the reviewer for this suggestion. We have adjusted the figure accordingly. Please see Fig. 4 in the revised manuscript.

      Please comment on absolute intensity values - scales seem to be incredibly variable.

      The fluorescence value presented here is the result of multiple images being summed. Therefore, the difference in absolute values is influenced not only by the binding affinity of MCAK in different states to microtubules, but also by the number of images used. In this analysis, we are not comparing MCAK in different states, but rather evaluating the binding ability of MCAK in the same state on different types of microtubules.

      Given that the authors conclude that MCAK binding mimics that of EB, EB intensity measurements and ratios on different lattice substrates should be performed as a positive control.

      We performed additional experiments with EB1, in the revised manuscript, we provide the data as a positive control (please see Fig. s4).

      Figure 4:

      MCAK-nucleotide dependence of GMPCPP microtubule-end binding has been previously established (see e.g. Helenius et al, others?) - what is new here? Need to discuss the literature. This would be more appropriate as a supplemental figure?

      In the present study, we reproduced the GMPCPP microtubule-end binding of MCAK in the AMPPNP state, as shown in several previous reports (Desai et al., 1999; Hertzer et al., 2006). Here, we also quantified the end to lattice binding preference, and our results showed that the nucleotide state-dependence shows the same trend as the binding preference of MCAK to the growing microtubule ends. Therefore, we prefer to keep this figure in the main text (Fig. 5).

      Figure 5:

      Please note that both MCAK mutants show an additional two orders of magnitude lower microtubule binding on-rates when compared to wt MCAK. This makes the analysis of preferential binding substrate for these mutants dubious.

      We agreed with this point. We have rewritten this part. Please see page 10, lines 295-327, in the revised manuscript.

      Figure 6:

      Combined effects of XMAP215 and XKCM1 (MCAK) have been previously explored in the landmark study by Kinoshita et al. Science 2001, which should be cited and discussed. Also note that Moriwaki et al. JCB 2016 explored the combined effects of XMA215 and MCAK - which should be discussed here and compared to the current results.

      We agree with the reviewer. We have revised the discussion on this part. Please see page 11, lines 329-342 and page 14, lines 459-472 in the revised manuscript.

      Please report quantification for growth rate and lifetime.

      In the revised manuscript, we provide all these data. Please see pages 11-12, lines 343-374.

      To obtain any new quantitative information on the combined effects of the two proteins, at the very minimum, the authors should perform a titration in protein concentration.

      We agree with the reviewer on this point. In our pilot experiments, we performed titration experiments to determine the appropriate concentrations of MCAK and XMAP215, respectively. We selected 50 nM for XMAP215, as it clearly enhances the growth rate and exhibits a mild promoting effect on catastrophe—two key effects of XMAP215 reported in previous studies (Brouhard et al., 2008; Farmer et al., 2021). Reducing the XMAP215 concentration eliminates the catastrophe-promoting effect, while increasing it would not much enhance the growth rate. For MCAK, we chose 20 nM, as it effectively promotes catastrophe; increasing the concentration beyond this point leads to no microtubule growth, at least in the MCAK-only condition. If there’s no microtubule growth, it would be difficult to quantify the parameters of microtubule dynamics, hindering a clear comparison of the combined versus individual effects. Therefore, we think that the concentrations used in this study are appropriate and representative. In the revised manuscript, we make this point clearer (see pages 11 and lines 329-342).

      Finally, the writing could be improved for overall clarity.

      We thank the reviewer for pointing out this. In the revised manuscript, we conducted a thorough revision and review of the text.

      Reviewer #3 (Public Review):

      The authors revisit an old question of how MCAK goes to microtubule ends, partially answered by many groups over the years. The authors seem to have omitted the literature on MCAK in the past 10-15 years. The novelty is limited due to what has previously been done on the question. Previous work showed MCAK targets to microtubule plus-ends in cells through association with EB proteins and Kif18b (work from Wordeman, Medema, Walczak, Welburn, Akhmanova) but none of their work is cited.

      We thank the reviewer for the suggestion. Some of the referenced work has already been cited in our manuscript, such as studies on the interaction between MCAK and EB1. However, other relevant literature had not been properly cited. In the revised manuscript, we have added further discussion on this topic in the context of existing findings. Please refer to pages 3-4, lines 68-85, and pages 13, lines 425-441.

      It is not obvious in the paper that these in vitro studies only reveal microtubule end targeting, rather than plus end targeting. MCAK diffuses on the lattice to both ends and its conformation and association with the lattice and ends has also been addressed by other groups-not cited here. I want to particularly highlight the work from Friel's lab where they identified a CDK phosphomimetic mutant close to helix4 which reduces the end preference of MCAK. This residue is very close to the one mutated in this study and is highly relevant because it is a site that is phosphorylated in vivo. This study and the mutant produced here suggest a charge-based recognition of the end of microtubules.

      Here the authors analyze this MCAK recognition of the lattice and microtubule ends, with different nucleotide states of MCAK and in the presence of different nucleotide states for the microtubule lattice. The main conclusion is that MCAK affinity for microtubules varies in the presence of different nucleotides (ATP and analogs) which was partially known already. How different nucleotide states of the microtubule lattice influence MCAK binding is novel. This information will be interesting to researchers working on the mechanism of motors and microtubules. However, there are some issues with some experiments. In the paper, the authors say they measure MCAK residency of growing end microtubules, but in the kymographs, the microtubules don't appear dynamic - in addition, in Figure 1A, MCAK is at microtubule ends and does not cause depolymerization. I would have expected to see depolymerization of the microtubule after MCAK targeting. The MCAK mutants are not well characterized. Do they still have ATPase activity? Are they folded? Can the authors also highlight T537 and discuss this?

      Finally, a few experiments are done with MCAK and XMAP215, after the authors say they have demonstrated the binding sites overlap. The data supporting this statement were not obvious and the conclusions that the effect of the two molecules are additive would argue against competing binding sites. Overall, while there are some interesting quantitative measurements of MCAK on microtubules - in particular in relation to the nucleotide state of the microtubule lattice - the insights into end-recognition are modest and do not address or discuss how it might happen in cells. Often the number of events is not recorded. Histograms with large SEM bars are presented, so it is hard to get a good idea of data distribution and robustness. Figures lack annotations. This compromises therefore their quantifications and conclusions. The discussion was hard to follow and needs streamlining, as well as putting their work in the context of what is known from other groups who produced work on this in the past few years.

      We thank the reviewer for the comments. Regarding the physiological relevance of the end-binding of MCAK itself, please refer to our response to the point No.1 of reviewer 1. Moreover, as we feel that other suggestions are more thoroughly expressed in the following comments for authors, we will provide the responses in the corresponding sections, as shown below.

      Reviewer #3 (Recommendations For The Authors):

      Why, on dynamic microtubules, is MCAK at microtubule plus ends and does not cause a catastrophe?

      At this concentration (10 nM MCAK with 16 mM tubulin in Fig. 1; 1 nM MCAK with 12 mM tubulin in Fig. 2), MCAK has little effect on microtubule dynamics in our experiments. Using TIRFM, we were able to observe individual MCAK binding events. Based on these observations, we think that in the current experimental condition, a single binding event of MCAK is insufficient to induce microtubule catastrophe; rather, it likely requires cumulative changes resulting from multiple binding events.

      Do the MCAK mutants still have ATPase activity?

      The ATPase activities of MCAK<sup>K525A</sup> and MCAK<sup>V298S</sup> are both reduced to about 1/3 of the wild-type (Fig. s6).

      The intensities of GFP are not all the same on the microtubule lattice (eg 1A). See blue and white arrowheads. The authors could be looking at multiple molecules of GFP-MCAK instead of single dimers. How do they account for this possibility?

      In the revised manuscript, we provide the gel filtration result of the purified MCAK, and the position of the peak corresponds to ~220 kDa, demonstrating that the purified MCAK in solution is dimeric (please see Fig.s1 and page 5, lines 101-103). We measured the fluorescence intensity of each binding event. A probability distribution of these intensities was then constructed and fitted with a Gaussian function. A binding event was considered to correspond to a single molecule if its intensity fell within μ±2σ of the distribution. The details of the single-molecule screening process are provided in the revised manuscript (see page 17, lines 574-583).

      In addition, we also measured the fluorescence intensity of both MCAK<sup>sN+M</sup> and MCAK. MCAK<sup>sN+M</sup> is a monomeric mutant that contains the neck domain and motor domain (Wang et al., 2012). The average intensity of MCAK<sup>sN+M</sup> is 196 A.U., about 65 % of that of MCAK (300 A.U.), suggesting that MCAK is a dimer (see Fig. s1). Moreover, we think that some of the dim signals may result from stochastic background noise, while others likely represent transient bindings of MCAK. The exposure time in our experiments was approximately 0.05 seconds; if the binding duration were shorter than this, the signal would be lower. It is important to note that in this study, we specifically selected binding events lasting at least 2 consecutive frames, meaning transient binding events were not included. This point has been clarified in the Methods section (see page 17, lines 568-569 and lines 574-583).

      Could the authors provide a kymograph of an MT growing, in the presence of MCAK+AMPPNP? Can MCAK track the cap?

      Under single-molecule conditions, we observed a single MCAK molecule briefly binding to the end of the microtubule. However, we did not record if MCAK at high concentrations could track microtubule ends under AMPPNP conditions.

      In the experiments in Figure 6, the authors should also show the localization of MCAK and XMAP215 at microtubule plus ends in their kymographs to show the two molecules overlap.

      Regarding the relative localization of XMAP215 and MCAK, we conducted additional experiments to record their colocalization simultaneously at the same microtubule end. Our results show that MCAK predominantly binds behind XMAP215, with 14.5% of MCAK binding within the XMAP215 binding region. Please see Fig. 2.D-E and page 7, lines 184-197 in the revised manuscript. However, we argue that the effects of XMAP215 and MCAK are additive, and their binding sites do not necessarily need to overlap for these effects to occur.

      The authors do not report what statistical tests are done in their graphs, and one concern is over error propagation of their data. Instead of bar graphs, showing the data points would be helpful.

      We have now shown all data points in the revised manuscript.

      MCAK+AMPPNP accumulates at microtubule ends. Appropriate quotes from previous work should be provided.

      We have made the revisions accordingly. Please see page 9, lines 273-276.

      Controls are missing. An SEC profile for all purified proteins should be presented. Also, the authors need to explain if they report the dimeric or monomeric concentration of MCAK, XMAP215, etc...

      We have provided the gel filtration result for all purified proteins in the revised manuscript (Fig.s1). Moreover, we now make it clear that the concentrations of MCAK and EB1 are monomeric concentration. Please see the legend for Fig. 1, line 893 in the revised manuscript.

      Figure 1: the microtubules don't look dynamic at all. This is also why the authors can record MCAK at microtubule ends, because their structure is not changing.

      The microtubules are dynamic, but they may appear non-dynamic due to the relatively slow growth rate and the high frame rate at which we are recording. We propose that individual binding events of MCAK induce structural changes at the nanoscopic or molecular scale, which are not detectable using TIRFM.

      I recommend the authors measure the Kon and Koff for single GFP-MCAK mutant molecules and provide the information alongside their normalized and averaged binding intensities of GFP-MCAK in Fig 5. Showing data points instead of bar graphs would be better.

      (1) We measured k<sub>on</sub> and dwell time for mutants at growing microtubule end. However, we did not perform single-molecule tracking for MCAK’s binding on stabilized microtubules. This is mainly because the superimposed signal on the stable microtubule already indicates the changes in the mutant's binding affinity to different microtubule structures, and moreover, the binding of the mutants is highly transient, making accurate single-molecule tracking and calculations difficult.

      (2) In the revised figure, we have included the data points in all plots.

      When discussing how Kinesin-13 interacts with the lattice, the authors should quote the papers that report the organization of full-length Kinesin-13 on tubulin heterodimers: Trofimova et al, 2018; McHugh et al 2019; Benoit et al, 2018. It would reinforce their model and account for the full-length protein, rather than just the motor domain.

      We thank the suggestion for the reviewer. In our manuscript, we have cited papers on full-length Kinesin-13 to discuss the interaction between MCAK and microtubule end-curved structure. Additionally, we have utilized the MCAK-tubulin crystal structure (PDB ID: 5MIO) in Fig. 6, as it depicts a human MCAK, which is consistent with the protein used in our study. This structure illustrates the interaction sites between MCAK and tubulin dimer, guiding our mutation studies on specific residues. Thus, we prefer to use the structure (PDB ID: 5MIO) in Fig.6.

      Figure 5A. What type of model is this? A PDB code is mentioned. Is this from an X-ray structure? If so, mention it.

      We have now included the structural information in the Figure legend (see page 37, lines 1045).

      Figure 5B. It is not possible to distinguish the different microtubule lattices (GTPyS, GDP, and GMPCPP). The experiment needs to be better labelled.

      We thank the reviewer for this comment. We have now rearranged the figure for better clarity (see Fig. 6).

      "Figure 5D: what are the statistical tests? I don't understand " The statistical comparisons were made versus the corresponding value of 848 GFP-MCAK".

      We have made this point clearer in the revised manuscript (see pages 38, line 1078-1080).

      What is the "EB cap"? This needs explaining.

      We provide this explanation for this, please see page 4, lines 87-89 in the revised manuscript.

      Work from Friel and co-workers showed MCAK T537E did not have depolymerizing activity and a reduced affinity for microtubule ends. The work of the authors should be discussed with respect to this previously published work.

      We thank the reviewer for this suggestion. In the revised manuscript, we have added discussions on this (see page 10, lines 303-307).

      The concentration of protein used in the assays is not always described.

      We have checked throughout the manuscript and made revisions accordingly.

      "Having revealed the novel binding sites of MCAK in dynamic microtubule ends " should be on "we wondered how MCAK may work ..with EB1". This is not addressed so should be removed. Instead, they can quote the work from Akhmanova's lab. Realistically this section should be rephrased as there are other plus-end targeting molecules that compete with MCAK, not just XMAP215 and EB1.

      We have rephrased this section as suggested by this reviewer to be more specific. Please see page 11, lines 329-342.

      What is AMPCPP?

      It should be “AMPPNP”

      Typos in Figure 5.

      Corrected

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review): 

      Summary: 

      This paper described the dynamics of the nuclear substructure called PML Nucleolar Association

      (PNA) in response to DNA damage on ribosomal DNA (rDNA) repeats. The authors showed that the PNA with rDNA repeats is induced by the inhibition of topoisomerases and RNA polymerase I and that the PNA formation is modulated by RAD51, thus homologous recombination. Artificially induced DNA double-strand breaks (DSBs) in rDNA repeats stimulate the formation of PNA with DSB markers. This DSB-triggered PNA formation is regulated by DSB repair pathways. 

      Strengths: 

      This paper illustrates a unique DNA damage-induced sub-nuclear structure containing the PML body, which is specifically associated with the nucleolus. Moreover, the dynamics of this PML Nucleolar Association (PNA) require topoisomerases and RNA polymerase I and are modulated by RAD51mediated homologous recombination and non-homologous end-joining. This study provides a unique regulation of DSB repair at rDNA repeats associated with the unique-membrane-less subnuclear structure. 

      Weaknesses: 

      Although the PNA formation on rDNA repeat is nicely shown by cytological analysis, the biological significance of PNA in DSB repair is not fully addressed.

      We appreciate the succinct summary, and thank you for pointing out this insightful comment. Our data show that the dynamic interaction of PML with nucleolar caps can recognize and sequester damaged rDNA from the reactivated nucleolus. We propose that through this process, the actively transcribed intact rDNA is protected from possible detrimental interaction with the defective, PNAs-sequestered rDNA, most likely to avoid the harmful intra- and inter-chromosomal recombination events that would otherwise likely occur during recombinational repair of the damaged rDNA, as the rDNA repeats present on five chromosomes are highly repetitive. Thus, this novel sorting mechanism might help sustain the integrity of repetitive rDNA loci.

      Our data also indicate that the emergence of PNAs coincided with cell cycle arrest and preceded the establishment of cellular senescence. The senescent response to rDNA damage can primarily protect the genome from the instability of rDNA loci in a manner broadly analogous to that described for protecting the telomeric loci. This notion is supported by the lack of PNA formation in most cancer cells. In the broader context of the biological significance of cellular senescence at the organismal level, such robust response to hazardous rDNA damage in the individual affected cells may limit/prevent the sporadic occurrence of early cancerous lesions, at the expense of potential tissue adverse effects accumulating over time and thereby eventually contributing to organismal aging.

      Reviewer #2 (Public Review): 

      In this manuscript, the authors aim to study the PML-nucleoli association (PNAs) by different genotoxic stress and to determine the underlying molecular mechanisms. 

      First, from a diverse set of genotoxic stress conditions (topoisomerases, RNA Pol I, rRNA processing, and DNA replication stress), the authors have found that the inhibition of topoisomerases and RNA Polymerase I has the highest PNA formation associated with p53 stabilization, gamma-H2AX, and PAF49 segregation. It was further demonstrated that Rad51-mediated HR pathway but not NHEJ pathway is associated with the PNA formation. Immuno-FISH assays show that doxorubicin induces DSBs (53BP1 foci) in rDNA and PNA interactions with rDNA/DJ regions. Furthermore, endonuclease IPpol induced DSB at a defined location in rDNA and led to PNAs. 

      Most claims by the authors are supported by the data provided. However, below weaknesses/concerns may need to be addressed to improve the quality of the study. 

      (1) Top2B toxin doxorubicin had the highest degree of elevating PNAs; however, Top2B-knockdown had almost no noticeable effects on PNAs. How to reconcile the different phenotypes targeting Top2B? 

      We thank the reviewer for this comment and believe we can reconcile the results from doxorubicin treatments and the downregulation of TOP2A and B. 

      The different phenotypes can reflect the fact that doxorubicin targets both human TOP2 isoforms: TOP2A and TOP2B. Hence this treatment can limit any potential redundant roles of the individual topoisomerase subtypes, which, on the other hand, can be manifested under conditions when only one specific member is depleted genetically. On the other hand, it is also crucial to note that these isoforms are not fully functionally redundant. Each isoform reveals a characteristic expression pattern and distinct yet overlapping function (e.g. Nitiss J 2009, doi.org/10.1038/nrc2608, or Uusküla-Reimand 10.1126/sciadv.add4920). Thus, doxorubicin treatment or TOP2A KD can, contrary to TOP2B KD, trigger the formation of PNAs.   

      Additionally, besides topoisomerase inhibition and poisoning, doxorubicin intercalates DNA and elevates oxidative stress. Therefore, the observed effect of doxorubicin may also reflect, to some extent, its broader damaging impact on (r)DNA. On the other hand, the downregulation of individual topoisomerase isoforms shows how the restriction of their respective specific function/s may evoke (r)DNA damage.

      (2) To test the role of Rad51 and DNA-PKcs in the PNA formation, Rad51 inhibitor B02 and DNA-PKcs inhibitor NU-7441 were chosen to use in the study. To further exclude the possible off-target of B02 and NU-7441, siRNA-mediated knockdown of Rad51 and DNA-PKcs would be an appropriate complementary approach to the pharmaceutical inhibitor approach. 

      We followed this stimulating suggestion, and in the revised manuscript, we used pools of siRNAs (esiRNA) to target the mRNA of RAD51 or ligase IV (LIG4) -  to mimic the Rad51 chemical inhibitor B02 and the NHEJ (DNA PK) inhibitor NU-7441, respectively. The relevant new data are presented in Figure 5F-I, 6E, and F, Supplementary Figure 5D, E, F – H, and Supplementary Figure 6C-E. Notably, the results of rDNA damage triggered PNAs formation obtained using the chemical inhibition of the repair pathways and the genetic approach (knockdown), were largely consistent, thereby supporting our original conclusions. There was one interesting partial difference when the B02 RAD51 inhibitor was compared with RAD51 knockdown, which we also comment on below, and suggest a plausible explanation reflecting the fact (known for other DDR proteins such as PARP1, etc.) that the functional inhibition of an expressed protein (here RAD51, by B02) may not necessarily phenotypically recapitulate the absence of such protein (here RAD51 knockdown). Overall, we agree that this was a very important set of control experiments, in addition extended to cell cycle phase analysis.

      First, the LIG4 knockdown impacted the I-PpoI-induced PNAs formation in a way that followed the same trend as the effects caused by the NHEJ pathway inhibitor NU-7441, namely increased frequency of PNAs formation when NHEJ was impaired (Figure 5E a 5I). This was expected based on what we know about the PNA formation, as the NHEJ pathway is active throughout the cell cycle, and when such repair mode is not available in the nucleolus, then more rDNA breaks remain unrepaired and must be transported to the nucleolar caps to be processed by the HR pathway, thereby also leading to more PNAs structures formed under such conditions. In terms of cell cycle phases, the observed increase of I-PpoI-induced PNAs in cells with depleted LIG4 was more pronounced in S/G2 cells, when the PNAspromoting, cap-associated HR pathway is more active. Furthermore, the enhanced occurrence of IPpoI-induced PNAs in cells depleted of LIG4 was counter-acted (partly ‘rescued/prevented’) by the concomitant treatment with the RAD51 inhibitor B02 (Figure 5E and I) compare cells with esiLIG4 alone versus esiLIG4 + B02), overall consistent with the notion that cap-associated HR pathway facilitates PNAs formation.

      Second, in the analogous scenario of comparing the impact of the RAD51 chemical inhibitor (B02) with the siRNA-mediated knockdown of RAD51, the observed trends in terms of the resulting frequencies of I-PpoI-induced PNAs, were also largely consistent, in that both strategies of interfering with RAD51 resulted in fewer PNAs formed than than in cells deficient in NHEJ. On the other hand, we must stress that after RAD51 knockdown, we did not observe a decline of PNAs compared to control cells, which was detected after B02 treatment (Figure 5E and I).  However, when specifically considering the cell cycle position of the individual cells, these new analyses revealed again important similarities between the knockdown and chemical inhibition of RAD51 (Figure 6E, Supplementary Figure 6E).

      Before discussing the partial, cell-cycle-related difference between the impact of RAD51 chemical inhibition vs. knockdown, it is important to consider the PNAs patterns seen in cells with activated IPpoI and proficient in both, NHEJ and HR. Thus, the overall frequency of I-PpoI-induced PNAs formation was higher in G1 than in S/G2 cells. Considering that persistent rDNA DSBs trigger the formation of PNAs, this result may reflect the very limited HDR during G1 phase, in contrast to more efficient repair of I-PpoI-induced rDNA DSBs in S/G2, the cell cycle phase in which the activity of both NHEJ and HDR operate in parallel, the latter pathway offering a safer, error-free mechanism of DSB repair.

      Notably, when comparing the PNAs formation frequency in cells treated with either chemical inhibition of RAD51 (with B02) or upon knockdown of RAD51, we strikingly observed that the decrease of I-PpoIinduced PNAs formation upon RAD51 knockdown was apparent only for cells in G1 (Figure 6E, and Supplementary Figure 6E). We believe that the distinct impact of RAD51 knockdown compared with that of RAD51 inhibitor (mainly seen when S/G2 cells were analyzed separately) might reflect one or a combination of several factors, including e.g. the following: 

      i) The knock-down-induced absence of RAD51 protein may allow access to the persistent DSB lesions by other alternative repair proteins (such as the RAD52-mediated repair reported in diverse pathophysiological circumstances including in cells undergoing senescence, a scenario very relevant for our present study). Such altered stoichiometry of proteins interacting with the persistent rDNA DSBs may contribute to the pattern of PNAs formation that is then distinct from the pattern seen in the presence of  Rad51; 

      ii) Another difference that we observe is the somewhat enhanced frequency of ‘spontaneous’ (i.e., even without activating the I-PpoI) PNAs formation when RAD51 is depleted, a phenomenon not seen when control non-targeting siRNA is transfected or when RAD51 is acutely inhibited by B02 (Figure 5H). Such spontaneous baseline PNA formation likely reflects the enhanced persistence of unrepaired endogenously occurring DNA lesions that are already suboptimally processed during the period following the esiRNA transfection, i.e., under stepwise depletion of the RAD51 protein which is normally required to deal with such omnipresent endogenous lesions occurring during e.g. DNA replication or some oxidative/metabolic processes; 

      iii) The knockdown approach, while clearly robustly depleting RAD51 protein levels (see Supplementary Figure 5D) may nevertheless leave a small residual fraction of the RAD51 protein present in the cells, thereby possibly inhibiting the HDR pathway to a slightly lesser degree than the B02 inhibitor;

      iv) Additionally, it should be noted that the baseline levels of I-PpoI-induced PNAs formation are somewhat higher in the transfection experiments (i.e. when using any siRNA, even the nontargeting control siRNA), compared with the less ‘invasive’ experiments of simply adding a drug/solvent to the cell culture medium. This phenomenon adds to the commonly seen (over decades, by us and many others..) above-baseline transient stress in cells exposed to transfections, often causing even moderate transient DNA damage response. Specifically, in control experiments, the level of I-PpoI-induced PNAs was around 15% in cells transfected with non-targeting siRNA, while the comparable experiment of only I-PpoI induction under non-transfection conditions was around 10%. In other words, the somewhat enhanced baseline counts of I-PpoI-induced PNAs seen in the knock-down experiments compared with chemical inhibitor experiments reflect partly the shift of the total readout counts due to the different baseline counts. This, however, does not alter the observed overall trends that are consistent in both types of experiments.

      While the potential interpretation(s) of the above results are presented in the Discussion section of the revised manuscript, the full mechanistic elucidation of the impact of various experimental manipulations on the PNA formation during the cell cycle would require a dedicated follow-up study.

      (3) Several previous studies have shown the activation of the nucleolar ATM-mediated DNA damage response pathway by I-Ppol-induced DSBs in rDNA. What is the role of nucleolar ATM in the regulation of PNAs?

      We agree this is an important issue the solution of which (explained below) strengthens the mechanistic insights provided in our revised manuscript, and we are grateful to the reviewer for raising this question. To address this important point and even extend the scope from ATM also to ATR, we employed two small-molecule inhibitors of ATM (KU-60019 and KU55933) and also one inhibitor of ATR (VE-822), at concentrations commonly used in analogous studies in the DNA damage response field,  to examine their impact on rDNA damage/PNA formation induced by I-PpoI. The new data are shown in Figures 5A and B. We found that the inhibition of either of the two kinases alone, robustly reduced the number of nuclei with PNAs, indicating that the activity of each of these two DNA damage signaling kinases is required for the formation of I-PpoI-induced PNAs in response to rDNA damage. Future experiments should elucidate precisely which of the very wide range of ATM/ATR substrates and/or specific protein domains and amino acid residues are instrumental in this rDNA damage signaling pathway to induce the formation of PNAs.

      Reviewer #3 (Public Review): 

      Summary: 

      Hornofova et al. examined interactions between the nucleolus and promyelocytic leukemia nuclear bodies (PML-NBs) termed PML-nucleolar associations (PNAs). PNAs are found in a minor subset of cells, exist within distinct morphological subcategories, and are induced by cellular stressors including genotoxic damage. A systematic pharmacological investigation identified that compounds that inhibit RNA Polymerase 1 (RNAPI) and/or topoisomerase 1 or 2A caused the greatest proportion of cells with PNA. A specific RAD51 inhibitor (R02) impacted the number of cells exhibiting PNAs and PNA morphology. Genetic double-strand break (DSB) induction within the rDNA locus also induced PNA structures that were more prevalent when non-homologous end joining (NHEJ) was inhibited. 

      Strengths: 

      PNA are morphologically distinct and readily visualized. The imaging data are high quality, and rDNA is amenable to studying nuclear dynamics. Specific induction of rDNA damage is a strong addition to the non-specific pharmacological damage characterized early in the manuscript. These data nicely demonstrate that rDNA double-strand breaks undermine PNA formation. Figure 1 is a comprehensive examination and presents a compelling argument that RNAPI and/or TOP1, TOP2A inhibition promote PNA structures. 

      Weaknesses: 

      (1) The data are limited to fixed fluorescent microscopy of structures present in a minority of cells. Data are occasionally qualitative and/or based upon interpretation of dynamic events extrapolated from fixed imaging. This study would benefit from live imaging that captures PNA dynamics. 

      We fully agree with the reviewer that live-cell imaging is critical to adequately capture PNA formation and evolution dynamics. While the data presented in this manuscript are based on quantifications of fixed cell images, all these analyses are based on a detailed live-cell imaging examination of the dynamic behavior of PNAs that we reported in our orginal study on PNAs formation as a biological phenomenon (Imrichova et al. (doi: 10.18632/aging.102248. Epub 2019 Sep 7). 

      In the revised version of our present manuscript, we better highlight the live-cell imaging study, in the Introduction section and further point out that the previous dynamic study was based on imaging of human cells ectopically expressing PML-EGFP and B23-RFP. Last but not least, to help the readers of this manuscript to understand the dynamics of PNA evolution, we have now also added an improved schematic figure that better illustrates the temporal dynamics of PNA stage transitions (Figure 1A).

      (2) Cell cycle and cell division are not considered. Double-strand break repair is cell cycle dependent, and most experiments occur over days of treatment and recovery. It is unclear if the cultures are proliferating, or which cell cycle phase the cells are in at the time of analysis. It is also unclear if PNAs are repeatedly dissociating and reforming each cell division. 

      We agree that this is an important point. We previously published (Imrichova et al., doi: 10.18632/aging.102248) that exposure of RPE-1hTERT cells to doxorubicin caused cell cycle arrest and cellular senescence. In the revised manuscript, we added the analysis of how the I-PpoI-induced rDNA DSB affects the cell’s fate (Supplementary Figure 4J-N). Importantly, we found that most of the cells after I-PpoI-induced rDNA DSB also developed cellular senescence, and only 1–3% of cells eventually recovered from such rDNA stress to the extent that they were able to form colonies in a colony-forming assay. Thus, at the time of analysis, most of the cells were non-proliferating. 

      Additionally, in the revised manuscript, we included an analysis of the dependence of PNA formation on specific cell cycle phases (see Figures 6E–I and Supplementary Figure 6C–E). Generally, we found that PNAs can be present in G1/S/G2. Nevertheless, the probability of occurrence in a particular cell cycle phase is affected by the type of treatment. For example, after I-PpoI-induced rDNA damage, the PNAs are primarily present in G1. In contrast, after the sole knockdown of RAD51 or TOP2A, the PNAs are present in S/G2 with higher probability. 

      (3) The relationship of PNA morphologies (bowl, funnel, balloon, and PML-NDS) also remains unclear. It is possible that PNAs mature/progress through the distinct morphologies, and that morphological presentation is a readout of repair or damage in the rDNA locus. However, this is not formally addressed.  

      The reviewer is indeed correct in his/her interpretation of the PNA morphologies as a readout of the dynamic fate of the rDNA lesion. As mentioned in our response to the previous point no. 2 raised by this reviewer (see above), we described the dynamic structural PNA transitions in our previous article (Imrichova et al., doi: 10.18632/aging.102248).

      PNA progresses through distinct structures. Our results indicate that individual PNA subtypes are tied to specific processes. The PNA bowl-type is linked to the recognition of rDNA damage on the nucleolar periphery. The PNA funnel-type clusters several damaged rDNA loci from the nucleolus into PML-NDS, which is the ultimate structure that sequesters unrepaired rDNA away from the reactivated nucleolus.

      The formation of bowls, funnels, and balloons is linked to the inhibition of RNA polymerase I during the formation of nucleolar caps. In contrast, the later stage of PML-NDS is linked to RNA polymerase I reactivation. 

      We should mention that after the I-PpoI treatment, the ‘bowls’ and ‘funnels’ (observed originally in response to topoisomerase inhibitory drugs) are missing, and only PML-NDSs are formed. The apparent absence of the preceding stages of PNAs may reflect the lower extent of rDNA damage induced by I-PpoI treatment, without causing the pan-nucleolar RNA polymerase I inhibition that was observed for other treatments, such as doxorubicin.  

      (4) An I-Ppol targeted sequence within the rDNA locus suggests 3D structural rearrangement following damage. An orthogonal approach measuring rDNA 3D architecture would benefit comprehension.

      This is a very inspiring idea. Given the demanding nature of the required 3D analyses and the fact that this aspect is somewhat outside the scope of the present study, we plan to follow this issue up in our future work, along with our efforts to localize the individual NORs using immune-FISH after introducing the rDNA damage by I-PpoI.

      (5) Following I-Ppol induction, it is possible that cells arrest in a G1 state. This may explain why targeting NHEJ has a greater impact on the number of 53BP1 foci and should be investigated.

      We fully agree with the Reviewer. Indeed, our results showed that after a 24-hour period of I-PpoI induction, most cells (about 90%) are in the G1 phase of the cell cycle, consistent with the activation of the ATM/ATR checkpoint signaling and p53 activation that we observed. Therefore, this cell cycle effect can indeed explain why targeting NHEJ has a greater impact and causes the higher numbers of 53BP1 foci (and also yH2AX foci). 

      (6) Conclusions: PNAs are a phenomenon of biological significance and understanding that significance is of value. More work is required to advance knowledge in this area. The authors may wish to examine the literature on APBs (Alt-associated PML-NBs), which are similar structures where telomeres associate with PML-NBs in a specific subset of cancers. It is possible that APBs and PNAs share similar biology, and prior efforts on APBs may help guide future PNA studies.  

      We are very grateful for this stimulating suggestion. In the Discussion of the revised manuscript, we now address the possible analogy between the APBs under ALT on the one hand, and the PNA formation on rDNA damage studied here, on the other. The following is the quote of the relevant paragraph of the revised Discussion: 

      “There are several similarities between PNAs and APBs. The interaction partner of PML located on both the telomeres and rDNA must be sumoylated, as the PML-SIM domain is essential for the formation of both APBs and PNAs (37,93). The PML IV isoform most efficiently forms APBs and also PNAs (16,37). PML clusters damaged telomeres into APBs, and we observe that several NORs converge in one PNA structure; thus, the PML-dependent clustering of damaged NORs is plausible. On the other hand, there is one critical difference between the otherwise broadly analogous APBs and PNAs. The process of ALT operates in transformed cancer cells that do not express the telomerase, thus enabling telomere maintenance, cell proliferation, and immortalization (94,95). The PNAs, on the other hand, were primarily detected in non-transformed cells, and their formation is linked to cell cycle arrest and establishment of senescence (31,36). It remains to be determined whether the formation of PNAs is positively involved in rDNA repair, resulting in a return of at least some PNA-forming cells to the cell cycle, or if they play a role in blocking the repair of DNA double-stranded breaks on rDNA, broadly analogous to the shelterin complex on telomeres during replicative senescence (96). We propose that the pro-senescent role of PNAs may contribute to the maintenance of rDNA stability, thereby limiting the potential of hazardous genomic instability and, hence, the risk of cellular transformation. Analogous to checkpoint responses and oncogene-induced senescence (97,98) the PNA-associated senescence might provide one aspect of the multifaceted cell-autonomous anti-cancer barrier, in this case guarding the integrity of the most vulnerable repetitive rDNA loci, possibly at the expense of accumulated cellular senescence-associated decline of functional tissues during aging.”

      Our responses to recommendations from the Editors:

      (1) Since this paper does not provide a mechanistic insight into how the different PNA forms after DNA damage and PolI inhibition such as doxorubicin (DOXO) treatment and how HR modulates the PNA formation, it is very important to provide some experimental data for those. For example, as the #3 reviewer suggested, the time-lapse analysis of PML and a rDNA marker after DOXO treatment and recovery would be beneficial. with morphological analysis. 

      We fully agree that live-cell imaging is essential for a better understanding of the evolution and function of PNAs'. The requested time-lapse analysis on the dynamics of the PNA morphological stages after DOXO treatment and recovery is available to the Reviewers and readers in our previously published article that reported the PNA phenomenon and the basic live cell imaging data after doxorubicin treatment using the ectopically expressed PML-GFP and B23-RFP (Imrichova et al.; doi: 10.18632/aging.102248.). In our present revised manuscript, we now refer to this work in the Introduction and further stress that those data were based on live-cell imaging, to better highlight this point along the line recommended by the Reviewers. We have now also added an improved scheme that better explains the temporal dynamics of PNA transitions (Figure 1A).

      (2) In the same line as point #1, it is very important to show what kind of signaling pathway is necessary for PNA formation upon DSB formation with PolI inhibition. For example, as the #2 reviewer advised, the role of ATM or ATR could be tested by adding their inhibitor during the PNA formation. 

      Again, we fully agree that clarification of the signaling pathway required for PNA formation is crucial, and we are grateful for this stimulating recommendation. While the mentioned Reviewer no. 2 (in his/her Public comments) asked only about the role of ATM, the Editors rightly requested that we should use distinct inhibitors to test the respective roles of not only ATM but also ATR. As recommended, we have tested the importance of ATM and ATR kinase activities by inhibiting them during PNA formation. These newly generated data clearly showed that the activity of either kinase is essential for the efficient formation of PNA, thereby providing a significant new mechanistic insight in the revised dataset. In the manuscript, these new results are now shown in Figures. 5A and B. We also addressed this issue in the Public Review (Reviewer #2 point 3).

      (3) Given the association of PML body with telomeres in ALT cells (ALT-associated PML Body, APB) has been established well in the field, the authors need to mention this in the Introduction and also compare how PNA is similar to different from APB clearly in the Discussion.

      We have followed this conceptually important recommendation exactly as suggested: i) We now mention the ALT-associated PML Body (APB)  in the Introduction section (end of the second paragraph) and ii) In much more detail, we now compare the conceptual analogy in terms of similarities and differences between PNA and APB in the revised Discussion.  We also address this issue in the document Response to Public Review (Reviewer #3 point 6). Indeed, we agree that this comparison is very fitting in the context of our dataset and informative for the broad audience.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      Major points. 

      (1) Any treatments shown in Figure 1B and 1C did not induce PNA in most of the cells with around 20% for a maximum value. What time point(s) the authors checked should be stated in the main text or the legend clearly. The authors need to mention the kinetics of different PNA classes and/or doseresponse effects at least for doxorubicin and BMH-21. Or a cell-cycle stage effect should be analyzed and/or discussed given that HR is mainly operating in S and G2 phases. 

      Thank you for pointing this out. We have now clarified the dose effects and also both analyzed and discussed the PNA formation vis a vis cell cycle stages, as recommended by this insightful reviewer.

      First, we have now added an experimental scheme to the Figures for better clarity regarding the time points examined, as suggested.

      Second, our results show that drug doses indeed affect the number and subtype of PNAs that form after such treatments. We show PNAs (types and number) after 0.5 – 5 – 50 µM camptothecin, topotecan, and etoposide (Supplementary Figure 1G and H) and after 0.375 – 0.56 – 0.75 µM doxorubicin (Figure 2A-D and Supplementary Figure 2E-G).  

      The very first detailed analysis of PNA evolution was presented in Imrichova et al. (doi: 10.18632/aging.102248.), where we described, using live-cell imaging, the relationship between the individual doxorubicin-induced PNA types, their transitions, and dynamics. We found that the highest number of nuclei with PNAs was present between 24 and 48 h after treatment initiation. Thus, we selected this time point for PNAs detection after treatments presented in Figure 1B.  

      We have now also added the distribution of nuclei based on the presence of specific PNA types into Supplementary Figure 1F.

      We included the analysis of the dependence of PNA formation on specific cell cycle phases (see Figures 6E–I). A very detailed explanation of the observed cell cycle effects is presented in the document Responses to Public Review, re. Reviewer nr. 2, point 2, so please kindly read our response there.

      (2) Although the induction of PNA by DSBs at rDNA repeats is clearly shown in the paper and modulated by DSB repair pathways, the biological significance of this sub-nuclear structure has not been addressed at all. Is the PNA required for efficient DSB repair per se or pathway choice? Moreover, the PNA kinetic is peculiar. Once formed, the PNA did not show any turnover even after the DNA-damaging agents were washed away (Figure 4H). This structure is succeeded into the next generation after cell division. Such dynamics of PNL should be carefully addressed. 

      The reviewer is correct in that the fate of the PNA and the potential biological significance of this phenomenon required a better explanation. The majority (≈97%) of cells after I-PpoI induction undergo cellular senescence, and therefore, we suppose that the PNA structures are not passed into the next cell cycle, as the bulk of the cells do not proliferate/cycle after such treatments. In this regard, it should be noted that PNAs (PML-NDS) are associated with replicative senescence of human mesenchymal stem cells (our old publication: Janderova-Rossmeislova 2007; doi: 10.1016/j.jsb.2007.02.008). To answer the comment of this reviewer, we have actually never observed that the cells with PNA present would be able to enter mitosis. Based on these findings, we suggest that damage to the repetitive rDNA loci, such as in our experiments in the form of DSBs, could commonly result in unsuccessful repair attempts leading to cellular senescence due to rDNA damage signaling, consistent with our new experiments highlighting the key role of the signaling mediated by the major DNA damage response kinases ATM and ATR, including the role of PNAs formation. For more details, please see also our response to Point 2 raised by the editors, on page 1 of this document, as well as our Public review response to Referee nr. 2, his/her points 2 and 3.

      From a broader perspective, relevant to the biological function of PNAs in this unorthodox cellular stress response, we showed that doxorubicin-induced PML-NDSs separate/sequester persistent rDNA DSBs from the regions of active pre-rRNA transcription. Again, the purpose of this process is not entirely clear at present. However, such separation of unrepaired rDNA from the rest of the genome could have a protective function, thereby limiting the risk of aberrant homologous recombination among hundreds of the repetitive, recombination-prone rDNA copies spread across five chromosomes. It should be stressed that PNAs are rarely seen in cancer cells, and their absence might be linked to the rDNA instability commonly seen in transformed cells. 

      As published in our previous study (Imrichova et al.; doi: 10.18632/aging.102248.), we followed the fate of individual PML-NDS (the last stage of PNA) after the recovery from doxorubicin treatment using live-cell imaging. We observed that the destiny of this structure could be diverse. Some of them sustained in the nucleus for many hours, but a portion of them disappeared. Their extinction may be a manifestation of successful rDNA repair. However, what remains unresolved is why these cells do not reenter the cell cycle and instead develop a senescent phenotype, possibly reflecting some paracrine effects of a cocktail of diverse cytokines and chemokines secreted by the neighboring cells, a phenomenon well established in the senescence field as SASP (senescence-associated secretory phenotype). 

      Notably, during the recovery phase from I-PpoI insult, some of the PML-NDS, in fact, increase in size over time (please refer to the graph in Author response image 1). This enlargement suggests ongoing processes within these structures. Additionally, the sequential accumulation of DHX9 (a multifunctional DNA/RNA helicase) in PNAs during recovery from the I-PpoI insult (as shown in Figure 4G and Supplementary Figure 4H in the revised manuscript) supports the hypothesis that PNAs are associated with as-yet poorly understood process(es). 

      Author response image 1.

      . A scatter plot shows the changes in PNA diameters during the recovery phase from a 24-hour-long expression of IPpoI.

      Last but not least, again relevant for the potential biological role of PNAs, we now also discuss the partial analogy of these structures with the PML-association with telomeres in cells that maintain their telomeres by the ALT recombinational process, as suggested by Referee no. 3 in the public review process. As this consideration addresses also the biological significance of the diverse PML associations and particularly our thoughts about the PNA, we copy/paste this paragraph from the Discussion section of our revised manuscript here, for the convenience of the Reviewer:

      “There are several similarities between PNAs and APBs. The interaction partner of PML located on both the telomeres and rDNA must be sumoylated, as the PML-SIM domain is essential for the formation of both APBs and PNAs (37,93). The PML IV isoform most efficiently forms APBs and also PNAs (16,37). PML clusters damaged telomeres into APBs, and we observe that several NORs converge in one PNA structure; thus, the PML-dependent clustering of damaged NORs is plausible. On the other hand, there is one critical difference between the otherwise broadly analogous APBs and PNAs. The process of ALT operates in transformed cancer cells that do not express the telomerase, thus enabling telomere maintenance, cell proliferation, and immortalization (94,95). The PNAs, on the other hand, were primarily detected in non-transformed cells, and their formation is linked to cell cycle arrest and establishment of senescence (31,36). It remains to be determined whether the formation of PNAs is positively involved in rDNA repair, resulting in a return of at least some PNA-forming cells to the cell cycle, or if they play a role in blocking the repair of DNA double-stranded breaks on rDNA, broadly analogous to the shelterin complex on telomeres during replicative senescence (96). We propose that the pro-senescent role of PNAs may contribute to the maintenance of rDNA stability, thereby limiting the potential of hazardous genomic instability and, hence, the risk of cellular transformation. Analogous to checkpoint responses and oncogene-induced senescence (97,98) the PNA-associated senescence might provide one aspect of the multifaceted cell-autonomous anti-cancer barrier, in this case guarding the integrity of the most vulnerable repetitive rDNA loci, possibly at the expense of accumulated cellular senescence-associated decline of functional tissues during aging.”

      (3) The association of PNA with DSB repair is shown by the colocalization with 53BP1 (Figures 3-5) and the kinetics of DSB repair were assessed by 53BP1 kinetics (Figure 5B). The authors need to check the colocalization of other DSB repair factors in homologous recombination (RPA and RAD51) and nonhomologous end joining (KU) and the kinetics of these DSB repair foci. 

      We are grateful for this very relevant suggestion. In response to this recommendation, we have examined additional markers, linked to homologous recombination. In Figures 6A—D and Supplementary Figures 6A and B, we now show also the localization of RAD51 and RPA32 (pS33), along the lines recommended by this Reviewer.

      (4) In Figure 5B, 53BP1 foci in the "nucleolus" should be shown with that in the nucleus. 

      In the revised manuscript, we show histograms with a count of 53BP1 foci per nucleus.

      (5) The authors often used the words, "difficult-to-repair" and "easy-to-repair" DNA lesions. However, without the nature of these DNA lesions, it is early to distinguish the lesions. So, the authors should avoid them in the title, abstract, results, and figure legends. In Discussion, it is free to use them with a logical explanation. 

      Thank you for the recommendation. We have now changed the term “difficult-to-repair” to “persistent rDNA damage”, as this term better describes at face value the scenario encountered in these experiments. In the new version of the manuscript, we have now emphasized that PNAs are formed as a late response to rDNA damage. We added the observation that PNAs colocalized with rDNA lesions accumulated in the nucleolar cap (periphery of nucleolus), which are probably in-compatible with NHEJ-mediated repair that otherwise occurs within the nucleolus. These persistent lesions contained phospho-RPA, a marker of resected DNA. However, RAD51 was not detected in such late lesions, indicating that the canonical RAD51-dependent HDR pathway is also restricted. Finally, we included a section defining such persistent DNA damage in the revised Discussion.

      Minor points: 

      (1) Page 5, second paragraph, line 6: "expression of PML". 

      (2) Page 5, line 6 from the bottom and Figure 1B: Actinomycin D is not a "specific" RNA polymerase I inhibitor. 

      (3) Page 6, first paragraph, last line: "DNA DSB" should be "DSB". 

      (4) Page 6, second paragraph, lines 6-7: What is the evidence of RNA polymerase I is active (need to explain to the readers)? 

      (5)  Figure 1D and main text: Please mention DOXO is the abbreviation of doxorubicin. 

      We are grateful for these points, which have now all been corrected in the revised version of the manuscript.

      (6) Page 6, third paragraph, line 4 and Figure 1D: What is "esi" not "si"TOP1. 

      In the revised manuscript, we explained what ‘esiRNA’ means; in fact, it is the pool of biologically prepared siRNAs targeting the mRNA of the protein being knocked down.

      (7) Figures 2A and 2B: The effect of B02 alone on PNA should be shown as a control.

      As recommended, the effect of B02 alone is now presented in Supplementary Figures 2A and B. 

      (8) Page 7, first paragraph, last three lines: It is hard to catch how the authors suggested the inhibition of RAD51 suppressed  RNAPI activity. If so, please  check the incorporation of 5FU. 

      Thank you for pointing out this confusing formulation. We have now removed from the revised manuscript the part of that original sentence: “which are predominantly associated with RNAPI inhibition”. 

      We observed that PML ‘balloons’ wrapped the nucleolus with the concomitantly observed complete inhibition of RNAPI in the nucleolus (Imrichova et al.; doi: 10.18632/aging.102248.). Nevertheless, we removed the original phrase from the revised version of the manuscript, as we agree with the reviewer that the causative relationship is so far lacking.

      (9) Page 7, second paragraph: It is critical to clarify what time B02 was added after DOXO removal or during DOXO treatment, or both.  

      We agree: In response we have now added the experimental scheme showing all these temporal details.

      (10) Figure 2H: The experiment lacks control with siTDP2 without etoposide treatment. 

      We did not include this control, unfortunately.

      (11) Page 8, third paragraph, line 3 from the bottom; "besides of rDNA probe, we also utilized probes" is better. 

      We changed this sentence in the revised manuscript, as recommended. 

      (12) Figure 3B: In these multi-color images, it is hard to see blue and gray in merged ones. It is better to show images with a single color. 

      We agree that grayscale is better to follow. However, this type of presentation would significantly increase the number of images, a circumstance we wished to avoid in this already rather image-heavy dataset. Instead, when it was possible, we elevated the intensity of fluorescence in colored images. The list of images with this adjustment is present in the public review. 

      We also inserted the example of the image in greyscale here as Author response image 2. 

      Author response image 2.

      The representative images nucleoli show the localization of 53BP1 (red; a marker of DNA DSB), PML (green, a marker of PML-NB or PNAs), rDNA (blue), and DJ (white; a marker of the acrocentric chromosome) after doxorubicin treatment (2 days) or in the recovery phase (1 and 4 days). The merge of all channels is shown together with the presentation of individual images in greyscale. Scale, 5 µm.  

      (13) Figure 4E: Please add values at D0. 

      We did not analyze the 53BP1 foci before adding Shield1 and doxycycline to induce the expression of I-PpoI (D0). However, as a control, we analyzed the 53BP1 foci in the cells treated for 24 h with the corresponding amount of DMSO as a mock treatment scenario (black line; NT).

      Reviewer #2 (Recommendations For The Authors): 

      (1) The data provided in this manuscript did not explicitly compare the easy-to-repair vs difficult-torepair DNA lesions in rDNA, or at least lack quantitative measures with statistical analysis. Therefore, the title may need to be revised accordingly. 

      We agree, and the title has now been revised to better capture the persistent nature of the rDNA damage that evokes the PNA formation. Please see the response to Reviewer #1, Major points 5, presented above in this document.

      Reviewer #3 (Recommendations For The Authors): 

      (1) Live imaging is paramount to understanding the dynamic nature of PNAs.  

      We agree that live-cell imaging is important. We have addressed this issue in detail in Response to Public review comments, of this Reviewer, as well as in the first point of this document in response to the Editors. In short, although the data presented in this manuscript are based on quantifications of fixed cell images, all these analyses benefit from our previous detailed live-cell imaging data that we reported – describing a careful examination of the dynamic behavior of PNAs in the study by Imrichova et al. (doi: 10.18632/aging.102248). To better illustrate the dynamic behavior of PNAs for the convenience of this reviewer, we include some data from our original article on this topic (referred to above): please see Author response image 3.

      Author response image 3.

      This Figure shows data published in Imrichova et al. (doi: 10.18632/aging.102248.). PML IV-EGFP was ectopically expressed in RPE-1hTERT cells. The localization of PML was followed using live cell imaging. (A) the bowl (in this work named cap) originates from the accumulation of diffuse PML. (B) The transition between bowl (named cap), funnel (named fork), and balloon (named circle). (C + D) PML IV-EGFP (green) and B23-RFP (red) were ectopically expressed in RPE-1hTERT cells. The localization of both proteins was followed by live cell imaging. C – The formation of PML-NDS from the funnel is shown; D – The entire PNA cycle is shown. (PML-bowl formed on the border of the nucleolus, then transformed into the PML-funnel, and finally into PML-NDS. 

      (2) The authors should consider cell cycle and cell proliferation in their analyses. 

      We are grateful for this recommendation, which echoes your own comment nr. 2 in the Public reviews document. Shortly, as we explained in the response to Public review, proliferation of PNA-containing cells is severely limited, as the vast majority of such cells enter a long-term arrest and cellular senescence. Furthermore, inspired by this comment, we have newly performed a series of experiments to address the frequencies of PNA formation vis a vis cell cycle phase position of the individual cells with rDNA damage. In the revised manuscript, we now include the data from these analyses: see Figures 6E–I and Supplementary Figures 6C–E. Our response in the Public Review provides a detailed description of these results.

      (3) Merged fluorescent micrographs in red and green are potentially not discernible to individuals with colour-vision deficiencies. Consider re-colouring into schemes that are more accessible. 

      We agree that some readers may have different preferences about fluorescence micrographs. Here, we used the classical combination of green and red, commonly employed in the field.

      (4) Single-colour fluorescent micrographs are easier to visualize in grey-scale. Whenever a single colour is shown, it will help reader comprehension if the images are shown in this manner. 

      As recommended, we have changed Figures 4C, F, and G from a single-color presentation to a greyscale. 

      (5) There are many long paragraphs that are difficult to digest. I suggest where possible breaking this text into smaller portions (e.g. Page 10, pages 13-14, page 16-17). 

      Thank you for pointing this out. We have now broken the text into smaller portions (in several places), as recommended.

      (6) The B02 and NU7441 data would be bolstered by genetic confirmation (depleting RAD51, BRCA2 or PALB2 for HR, DNA-PK or LIG4 for NHEJ).

      As recommended, we downregulated Rad51 and LIG4 by RNA interference. New data are presented in Figures 5F–I, 6E, and F, Supplementary Figures 5D, E, F–H, and Supplementary Figures 6C–E. The Public Review provides a detailed description of these results and the ensuing conclusions.

      (7) Microscopy results are often qualitative (Fig S1I, S2L, S3A) and need to be bolstered with quantitative data. 

      We appreciate this recommendation and have implemented quantifications in several important microscopy results, as follow:

      S1I: The quantification of the number of cells with types of PNAs after esiTOP1 is present in Supplementary Figure 1L

      S2L: The quantification (% of nuclei with PNAs) is in Figure 2H

      S3A: In this immuno-FISH figure, we captured nuclei with and w/o PNAs. Using the SQUASSH analysis, we identified size-based colocalization between rDNA–PML and DJ–PML presented in Supplementary Figure 3C.

      (8) Stats or error bars are missing (Fig 1D, 2H, S1C-E, S1F, S2A S2D-G, S3E, S4E).

      We apologize for those omissions and we have amended this aspect of the study in the revised manuscript as much as possible:

      Figure 1D: For AMD and doxorubicin and CX-5461 and doxorubicin treatments, three and two biological replicates are shown separately in the same graph, respectively. For AMD and the knockdown of TOP1, the mean from three biological replicates is shown. All these results indicate the elevation number of PNAs when RNAPI is inhibited.

      Figure 2H: The error bars are present. As for siTDP2 in all replicates, the number of cells was the same (4%). Therefore, the error bar is not visible.

      Supplementary Figure 1C-E: Unfortunately, only one replicate (for all treatments) was analyzed by western blotting.

      Supplementary Figure 1F (in revised manuscript SF1G): The error bars are present. By this graph, we mainly wanted to present the variation in PNAs types. 

      Supplementary Figure 2A (in revised manuscript SF2C): We include the whiskers 10-90 percentile and T-test.

      Supplementary Figure 2D-G (in revised manuscript SF2F-I): The error bars are present in all graphs. The changes in SF2F and G are not significant.

      Supplementary Figure 3E: This scheme shows the overlaps between rDNA and PML and rDNA and 53BP1. The collum graph based on these data is shown in Figure 3F.

      Supplementary Figure 4E: The plot profiles representing the mean fluorescence of PML and B23 are shown for different time points. 

      (9) PNA characteristics remind this reviewer of the well-described ALT-associated PML nuclear bodies (APBs) found in immortalized cells lacking telomerase (i.e. Alternative lengthening of telomeres). I recommend the authors look to published data on APBs to help guide how to approach their research within a framework of the cell cycle.

      We fully agree with this insightful comment, and have addressed this point in the Discussion section of the revised manuscript, quoted the relevant studies also in the Introduction, and indeed explained the parallels and also differences of PNA versus APB (see also our response to point 3 highlighted also by the Editors, early in this rebuttal document).  We have also addressed this issue in the Public Review (Reviewer #3 point 6). We agree with the reviewer that this comparison will be of wide interest to readers, given the potential insights into the biological roles of APBs and PNAs.

      For convenience, we copy/paste the relevant new paragraph of the Discussion here:

      “There are several similarities between PNAs and APBs. The interaction partner of PML located on both the telomeres and rDNA must be sumoylated, as the PML-SIM domain is essential for the formation of both APBs and PNAs (37,93). The PML IV isoform most efficiently forms APBs and also PNAs (16,37). PML clusters damaged telomeres into APBs, and we observe that several NORs converge in one PNA structure; thus, the PML-dependent clustering of damaged NORs is plausible. On the other hand, there is one critical difference between the otherwise broadly analogous APBs and PNAs. The process of ALT operates in transformed cancer cells that do not express the telomerase, thus enabling telomere maintenance, cell proliferation, and immortalization (94,95). The PNAs, on the other hand, were primarily detected in non-transformed cells, and their formation is linked to cell cycle arrest and establishment of senescence (31,36). It remains to be determined whether the formation of PNAs is positively involved in rDNA repair, resulting in a return of at least some PNA-forming cells to the cell cycle, or if they play a role in blocking the repair of DNA double-stranded breaks on rDNA, broadly analogous to the shelterin complex on telomeres during replicative senescence (96). We propose that the pro-senescent role of PNAs may contribute to the maintenance of rDNA stability, thereby limiting the potential of hazardous genomic instability and, hence, the risk of cellular transformation. Analogous to checkpoint responses and oncogene-induced senescence (97,98) the PNA-associated senescence might provide one aspect of the multifaceted cell-autonomous anti-cancer barrier, in this case guarding the integrity of the most vulnerable repetitive rDNA loci, possibly at the expense of accumulated cellular senescence-associated decline of functional tissues during aging.” 

      (10) Do PNAs mature/progress through the four distinct structures: bowl, to funnel, to balloon, and finally to PML-NDS. If true, this serves as a phenotypic read-out of damage induction (bowl) and repair (PML-NDs). It would suggest persistent unrepairable damage (0.56 or 0.75 uM doxorubicin) prevents repair leading to the formation of all the PNA structures except PML-NDs. While lower dose doxorubicin (0.375 uM) allows repair to occur, facilitating progression to the PML-ND state, which is then inhabited with B02. 

      Again, this is a very insightful comment. Indeed, as the Reviewer suggests and as we explained e.g., in our response to point 1 raised by this reviewer, PNA progresses through four distinct structures/maturation stages. Our results indicate that individual PNA subtypes are tied to specific processes. PNA bowl-type is linked to the recognition of rDNA damage on the nucleolar surface. The PNA of the funnel-type clusters several rDNA loci from the nucleolus into PML-NDS, which is the ultimate structure sequestering unrepaired rDNA away from the reactivated nucleolus.

      There is a negative correlation between doxorubicin dose and occurrence of PML-NDS, and, indeed, blocking HDR with BO2 combined with a lower doxorubicin dose results in a higher occurrence of all PNAs, including PML-NDS, emerged in the recovery phase. These findings indicate that the greater/more severe extent of rDNA damage, which is associated with RNAPI activity inhibition, is linked to PNAs types associated with RNAPI inhibition (originally published Imrichova et al. (doi: 10.18632/aging.102248.). In contrast, a milder degree of rDNA damage induces the formation of PMLNDS.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Participants in this study completed three visits. In the first, participants received experimental thermal stimulations which were calibrated to elicit three specific pain responses (30, 50, 70) on a 0-100 visual analogue scale (VAS). Experimental pressure stimulations were also calibrated at an intensity to the same three pain intensity responses. In the subsequent two visits, participants completed another pre-calibration check (Visit 2 of 3 only). Then, prior to the exercise NALOXONE or a SALINE placebo-control was administered intravenously. Participants then completed 1 of 4 blocks of HIGH (100%) or LOW (55%) intensity cycling which was tailored according to a functional threshold power (FTP) test completed in Visit 1. After each block of cycling lasting 10 minutes, participants entered an MRI scanner and were stimulated with the same thermal and pressure stimulations that corresponded to 30, 50, and 70 pain intensity ratings from the calibration stage. Therefore, this study ultimately sought to investigate whether aerobic exercise does indeed incur a hypoalgesia effect. More specifically, researchers tested the validity of the proposed endogenous pain modulation mechanism. Further investigation into whether the intensity of exercise had an effect on pain and the neurological activation of pain-related brain centres were also explored.

      Results show that in the experimental visits (Visit 2 and 3), when participants exercised at two distinct intensities as intended. Power output, heart rate, and perceived effort ratings were higher during the HIGH versus LOW-intensity cycling. In particular. HIGH intensity exercise was perceived as "hard" / ~15 on the Borg (1974, 1998) scale, whereas LOW intensity exercise was perceived as "very light" / ~9 on the same scale.

      The fMRI data from Figure 1 indicates that the anterior insula, dorsal posterior insula, and middle cingulate cortex show pronounced activation as stimulation intensity and subsequent pain responses increased, thus linking these brain regions with pain intensity and corroborating what many studies have shown before.

      Results also showed that participants rated a higher pain intensity in the NALOXONE condition at all three stimulation intensities compared to the SALINE condition. Therefore, the expected effect of NALOXONE in this study seemed to occur whereby opioid receptors were "blocked" and thus resulted in higher pain ratings compared to a SALINE condition where opioid receptors were "not blocked". When accounting for participant sex, NALOXONE had negligible effects at lower experimental nociceptive stimulations for females compared to males who showed a hyperalgesia effect to NALOXONE at all stimulation intensities (peak effect at 50 VAS). Females did show a hyperalgesia effect at stimulation intensities corresponding to 50 and 70 VAS pain ratings. The fMRI data showed that the periaqueductal gray (PAG) showed increased activation in the NALOXONE versus SALINE condition at higher thermal stimulation intensities. The PAG is well-linked to endogenous pain modulation.

      When assessing the effects of NALOXONE and SALINE after exercise, results showed no significant differences in subsequent pain intensity ratings.

      When assessing the effect of aerobic exercise intensity on subsequent pain intensity ratings, authors suggested that aerobic exercise in the form of a continuous cycling exercise tailored to an individual's FTP is not effective at eliciting an exercise-induced hypoalgesia response irrespective of exercise intensity. This is because results showed that pain responses did not differ significantly between HIGH and LOW intensity exercise with (NALOXONE) and without (SALINE) an opioid antagonist. Therefore, authors have also questioned the mechanisms (endogenous opioids) behind this effect.

      Strengths:

      Altogether, the paper is a great piece of work that has provided some truly useful insight into the neurological and perceptual mechanisms associated with pain and exercise-induced hypoalgesia. The authors have gone to great lengths to delve into their research question(s) and their methodological approach is relatively sound. The study has incorporated effective pseudo-randomisation and conducted a rigorous set of statistical analyses to account for as many confounds as possible. I will particularly credit the authors on their analysis which explores the impact of sex and female participants' stage of menses on the study outcomes. It would be particularly interesting for future work to pursue some of these lines of research which investigate the differences in the endogenous opioid mechanism between sexes and the added interaction of stage of menses or training status.

      There are certainly many other areas that this article contributes to the literature due to the depth of methods the research team has used. For example, the authors provide much insight into: the impact of exercise intensity on the exercise-induced hypoalgesia effect; the impact of sex on the endogenous opioid modulation mechanism; and the impact of exercise intensity on the neurological indices associated with endogenous pain modulation and pain processing. All of which, the researchers should be credited for due to the time and effort they have spent completing this study. Indeed, their in-depth analysis of many of these areas provides ample support for the claims they make in relation to these specific questions. As such, I consider their evidence concerning the fMRI data to be very convincing (and interesting).

      Weaknesses:

      Although the authors have their own view of their results, I do however, have a slightly different take on what the post-exercise pain ratings seem to show and its implications for judging whether an exercise-induced hypoalgesia effect is present or not. From what I have read, I cannot seem to find whether the authors have compared the post-exercise pain ratings against any data that was collected pre-exercise/at rest or as part of the calibration. Instead, I believe the authors have only compared post-exercise pain ratings against one another (i.e., HIGH versus LOW, NALOXONE versus SALINE). In doing so, I think the authors cannot fully assume that there is no exercise-induced hypoalgesia effect as there is no true control comparison (a no-exercise condition).

      In more detail, Figure 6A appears to show an average of all pain ratings combined per participant (is this correct?). As participants were exposed to stimulations expected to elicit a 30, 50, or 70 VAS rating based on pre-calibration values, therefore the average rating would be expected to be around 50. What Figure 6A shows is that in the SALINE condition, average pain ratings are in fact ~10-15 units lower (~35) and then in the NALOXONE condition, average pain ratings are ~5 units lower (~45) for both exercise intensities. From this, I would surmise the following:

      It appears there is an exercise-induced hypoalgesia effect as average pain ratings are ~30% lower than pre-calibrated/resting pain ratings within the SALINE condition at the same temperature of stimulation (it would also be interesting to see if this effect occurred for the pressure pain).

      It appears there is evidence for the endogenous opioid mechanism as the NALOXONE condition demonstrates a minimal hypoalgesia effect after exercise. I.e., NALOXONE indeed blocked the opioid receptors, and such inhibition prevented the endogenous opioid system from taking effect.

      It appears there is no effect of exercise intensity on the exercise-induced hypoalgesia effect.

      That is, participants can cycle at a moderate intensity (55% FTP) and incur the same hypoalgesia benefits as cycling at an intensity that demarcates the boundary between heavy and severe intensity exercise (100%FTP). This is a great finding in my mind as anyone wishing to reduce pain can do so without having to engage in exercise that is too effortful/intense and therefore aversive - great news! This likely has many applications within the field of public health.

      I will very slightly caveat my summaries with the fact that a more ideal comparison here would be a control condition whereby participants did the same experimental visit but without any exercise prior to entering the MRI scanner. I consider the overall strength of the evidence to be solid, with the answer to the primary research question still a little ambiguous.

      Reviewer #2 (Public review):

      Summary:

      This interesting study compared two different intensities of aerobic exercise (low-intensity, high-intensity) and their efficacy in inducing a hypoalgesic reaction (i.e. exercise-induced hypoalgesia; EIH). fMRI was used to identify signal changes in the brain, with the infusion of naloxone used to identify hypoalgesia mechanisms. No differences were found in postexercise pain perception between the high-intensity and low-intensity conditions, with naloxone infusion causing increased pain perception across both conditions which was mirrored by activation in the medial frontal cortex (identified by fMRI). However, the primary conclusion made in this manuscript (i.e. that aerobic exercise has no overall effect on pain in a mixed population sample) cannot be supported by this study design, because the methodology did not include a baseline (i.e. pain perception following no exercise) to compare high/low-intensity exercise against. Therefore, some of the statements/implications of the findings made in this manuscript need to be very carefully assessed.

      Strengths:

      (1) The use of fMRI and naloxone provides a strong approach by which to identify possible mechanisms of EIH.

      (2) The infusion of naloxone to maintain a stable concentration helps to ensure a consistent effect and that the time course of the protocol won't affect the consistency of changes in pain perception.

      (3) The manipulation checks (differences in intensity of exercise, appropriate pain induction) are approached in a systematic way.

      (4) Whilst the exploratory analyses relating to the interactions for fitness level and sex were not reported in the study pre-registation, they do provide some interesting findings which should be explored further.

      Weaknesses:

      (1) Given that there is no baseline/control condition, it cannot be concluded that aerobic exercise has no effect on pain modulation because that comparison has not been made (i.e. pain perception at 'baseline' has not been compared with pain perception after high/lowintensity exercise). Some of the primary findings/conclusions throughout the manuscript state that there is 'No overall effect of aerobic exercise on pain modulation', but this cannot be concluded.

      (2) Across the manuscript, a number of terms are used interchangeably (and applied, it seems, incorrectly) which makes the interpretation of the manuscript difficult (e.g. how the author's use the term 'exercise-induced pain').

      (3) There is a lack of clarity on the interventions used in the methods, for example, it is not exactly clear the time and order in which the exercise tasks were implemented.

      (4) The exercise test (functional threshold power) used to set the intensity of the low/high exercise bouts is not an accurate means of demarcating steady state and non-steady state exercise. As a result, at the intensity selected for the high-intensity exercise in this study, it is likely that the challenge presented for the high-intensity exercise would have been very different between participants (e.g. some would have been in the 'heavy' domain, whereas others would be in the 'severe' domain).

      (5) It is likely that participants did not properly understand how to use the 6-20 Borg scale to rate their perceived effort, and so caution must be taken in how this RPE data is used/interpreted.

      (6) Although interesting, the secondary analyses (relating to the interaction effects of fitness level and sex) were not included in the study pre-registration, and so the study was not designed to undertake this analysis. These findings should be taken with caution.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Participants in this study completed three visits. In the first one, participants received experimental thermal stimulations which were calibrated to elicit three specific pain responses (30, 50, 70) on a visual analogue scale (VAS). Experimental pressure stimulations were also calibrated at an intensity to the same three pain intensity responses. In the subsequent two visits, participants completed another pre-calibration check (Visit 2 of 3 only). Then, prior to the exercise NALOXONE or a SALINE placebo-control was administered intravenously. Participants then completed 1 of 4 blocks of HIGH (100%) or LOW (55%) intensity cycling which was tailored according to a functional threshold power (FTP) test completed in Visit 1. After each block of cycling lasting 10 minutes, participants entered an MRI scanner and were stimulated with the same thermal and pressure stimulations that corresponded to 30, 50, and 70 pain intensity ratings from the calibration stage. Therefore, this study ultimately sought to investigate whether aerobic exercise does indeed incur a hypoalgesia effect. More specifically, researchers tested the validity of the proposed endogenous pain modulation mechanism.

      Further investigation into whether the intensity of exercise had an effect on pain and the neurological activation of pain-related brain centres was also explored.

      Results show that in the experimental visits (Visit 2 and 3) when participants exercised at two distinct intensities as intended. Power output, heart rate, and perceived effort ratings were higher during the HIGH versus LOW-intensity cycling. In particular, HIGH intensity exercise was perceived as "hard" / ~15 on the Borg (1974) scale, whereas LOW intensity exercise was perceived as "very light" / ~9 on the Borg (1974) scale.

      The fMRI data from Figure 1 indicates that the anterior insula, dorsal posterior insula, and middle cingulate cortex show pronounced activation as stimulation intensity and subsequent pain responses increase, thus linking these brain regions with the percept of pain intensity and corroborating what many studies have shown before.

      Results also showed that participants rated a higher pain intensity in the NALOXONE condition at all three stimulation intensities compared to the SALINE condition. Therefore, the expected effect of NALOXONE in this study seemed to occur whereby opioid receptors were "blocked" and thus resulted in higher pain ratings compared to a SALINE condition where opioid receptors were "not blocked". When accounting for participant sex, NALOXONE had negligible effects at lower experimental nociceptive stimulations for females compared to males who showed a hyperalgesia effect to NALOXONE at all stimulation intensities (peak effect at 50 VAS). Females did show a hyperalgesia effect at stimulation intensities corresponding to 50 and 70 VAS pain ratings. The fMRI data showed that the periaqueductal gray (PAG) showed increased activation in the NALOXONE versus SALINE condition at higher thermal stimulation intensities. The PAG is well-linked to endogenous pain modulation.

      When assessing the effects of NALOXONE and SALINE after exercise, results showed no significant differences in subsequent pain intensity ratings.

      When assessing the effect of aerobic exercise intensity on subsequent pain intensity ratings, authors suggested that aerobic exercise in the form of a continuous cycling exercise tailored to an individual's FTP is not effective at eliciting an exercise-induced hypoalgesia response irrespective of exercise intensity. This is because results showed that pain responses did not differ significantly between HIGH and LOW-intensity exercise with (NALOXONE) and without (SALINE) an opioid antagonist. Therefore, authors have also questioned the mechanisms (endogenous opioids) behind this effect.

      Altogether, the paper is a great piece of work that has provided some truly useful insight into the neurological and perceptual mechanisms associated with pain and exercise-induced hypoalgesia. The authors have gone to great lengths to delve into their research question(s) and their methodological approach is relatively sound. Although the authors have their own view of their results, I do however, have a slightly different take on what the post-exercise pain rating seems to show and its implications for judging whether an exercise-induced hypoalgesia effect is present or not. From what I have read, I cannot seem to find whether the authors have compared the post-exercise pain ratings against any data that was collected preexercise/at rest or as part of the calibration. Instead, I believe the authors have only compared post-exercise pain ratings against one another (i.e., HIGH versus LOW, NALOXONE versus SALINE). In doing so, I think the authors cannot fully question whether there is an exerciseinduced hypoalgesia effect as there is no true control comparison (a no-exercise condition). Nevertheless, there are certainly many other areas that this article contributes to the literature due to the depth of methods the research team has used. For example, the authors provide much insight into: the impact of exercise intensity on the exercise-induced hypoalgesia effect; the impact of sex on the endogenous opioid modulation mechanism; and the impact of exercise intensity on the neurological indices associated with endogenous pain modulation and pain processing. All of which, the researchers should be credited for due to the time and effort they have spent completing this study.

      I have provided some specific comments for the authors to consider. They are organised to correspond to each section as it is presented, and I have denoted the line I am referring to each time.

      To conclude, thank you to the authors for their work, and thank you to the editor for the opportunity to contribute to the review of this paper. I hope my comments are seen as useful and I look forward to seeing the authors' responses.

      We sincerely appreciate the reviewer's insightful comments, which highlight the strengths of our study. In response to the concerns raised, we have made several key revisions to the original manuscript to address the reviewers’ comments. As for the lack of a resting control condition, we acknowledge that our study was not designed to test the overall effect of exercise versus no exercise. However, our primary objective was to compare different exercise intensities, hypothesising that low-intensity (LI) exercise would induce less pain modulation as compared to high-intensity (HI) exercise. By exploring this, we aimed to enhance understanding of the dose-response relationship between exercise and pain modulation. To better reflect this focus, we have revised the misleading phrasing regarding the ‘overall’ effect of exercise to clearly emphasize our primary aim: comparing HI and LI exercise.

      This reviewer suggests an interesting interpretation of the data suggesting that exercise induced hypoalgesia might have occurred for both exercise intensities since the pain ratings provided were lower than the anticipated intensities as determined by the calibration. Given that this difference is lower in the naloxone (NLX) condition could provide evidence of opioidergic mechanisms underlying this effect. Unfortunately, the current study is not designed to comprehensively answer this question since there was no resting control condition. In particular, the lower pain ratings under SAL (Figure 6) could be due to exercise triggering the descending pain modulatory system (DPMS), but equally due to the default activation of the DPMS. Only an additional “no exercise” condition could disentangle this. Furthermore, habituation to noxious stimuli can influence pain ratings, resulting in lower pain ratings during the experiment as compared to the calibration. We have now provided a more detailed overview of the pain ratings at different stimulus intensities after HI and LI exercise in both drug treatment conditions for heat and pressure pain ratings. We elaborated on the specific comments raised in more detail in the following sections.

      Specific Comments

      (1) Abstract

      Line 25 - "we were unable to"... personal preference but this wording is a little 'weighted' in my view. I personally do not think researchers search to prove hypotheses correct, rather we search to prove hypotheses wrong, and therefore only through repeated attempts of falsification can we surmise that something holds true.

      We agree with the reviewer that the chosen wording can be perceived as weighted and have rephrased the sentence.

      Line 33 to 35 - the "...but individual factors... might play a role" is a crucial caveat to this sentence for me. Whilst I can understand that the results of the authors' study indicate that prior assumptions about exercise-induced hypoalgesia and its opioidergic mechanisms may be questioned, I think a little more evidence is needed to finally decide whether aerobic exercise has no overall effect on experimental pain responses. (see more in the Results comments below).

      We thank the reviewer for their comment. We agree that no claims can be made regarding the effect of aerobic exercise per se on pain modulation compared to no exercise based on the current data. Furthermore, we agree that more research is needed to further advance our understanding of (non-)opioidergic mechanisms in exercise-induced pain modulation. However, based on the data presented in this study we propose that the involvement of endogenous opioids in exercise-induced hypoalgesia could be influenced by sex and fitness levels since we could show differences in opioidergic involvement between males and females of different fitness levels. Future studies should account for the fitness levels and sex of the sample investigated.

      (2) Introduction

      Line 48 - please predefine anterior cingulate cortex here.

      We thank the reviewer for detecting this and have introduced the abbreviation for the anterior cingulate cortex in the referenced line.

      Line 49 - please predefine periaqueductal gray here instead of line 52.

      We have introduced the abbreviation for periaqueductal grey in the referenced line.

      Line 47 to 54 - when discussing the descending pain modulatory systems, authors seem to be relating specifically to the intensity/magnitude of pain experiences. However, the different brain regions that are mentioned may have varying "roles" according to which dimension of pain is of focus.

      Hofbauer et al. (2001) - https://doi.org/10.1152/jn.2001.86.1.402

      Rainville et al. (1997) - https://doi.org/10.1126/science.277.5328.968

      The two above studies provide some nice earlier findings on the brain regions - some of which are mentioned by the authors in this section - associated with the processing of pain quality in addition to the intensity of pain... simply attach here if they are of interest to the authors.

      The studies by Hofbauer et al. (2001) and Rainville et al. (1997) provide interesting findings on the effect of hypnotic suggestions on pain affect and the perceived intensity of a painful stimulus. However, these studies did not investigate exercise-induced changes in brain regions of the DPMS. The studies referenced in the relevant section of the manuscript are (one of the few) imaging studies that have indeed investigated brain structures of the DPMS in the context of exercise and pain modulation and, thus, were included in this paragraph to focus on the findings of these studies as well as emphasise the scarcity of imaging studies investigating exercise-induced pain modulation. Given these divergent research topics of the proposed studies, we suggest not including them in this paragraph to maintain a clearer line of argument and focus on exercise-induced pain modulation in brain regions of the DPMS.

      L59 to 61 - a minor comment about the phrasing within this sentence and a recommended change is provided below for the flow of the sentence/paragraph.

      "...there are instances where administration of µ-opioid antagonists has decreased exerciseinduced pain modulation (Droste et al. 1988; etc.) whereas in others there has been little effect (Droste et al. 1988; etc.).

      We have altered the sentence based on the reviewers' suggestions to improve the flow and coherence of the sentence.

      L56 to 72 - Whilst the current version of this paragraph scans well enough, I find that the narrative flits between the mechanisms being discussed and the rationale/shortcomings of current research. I think that the original content of this paragraph can be structured into:

      A- The endogenous opioid system is a likely candidate to explain how exercise elicits a hypoalgesia response.

      B- Citation(s) of the imaging studies (Boecker et al., 2008, etc.) and earlier literature which support A (e.g., Janal et al. 1984).

      C- Further support of this theory as µ-opioid antagonists like naloxone seem to counteract the endogenous opioid effect (Haier et al., 1981).

      D- Introduction of the caveats of previous research such as the studies that observed that µ-opioids did not impact the endogenous pain modulation system during exercise (e.g., Droste et al., 1991, etc.) and the range of different interventions and exercise modalities which make it difficult to draw clear conclusions of the pain modulation effect.

      To me, this structure would set out the details you have already put together in a more orderly and systematic way and also will lead nicely into your ensuing paragraph (Line 74 onwards).

      We appreciate the reviewers' constructive comments on structuring this paragraph. We agree that the proposed version eases the readability and comprehension of the paragraph and have, thus, adapted the restructured paragraph according to the reviewer’s suggestion.

      L75 - Why are single-arm pre-post measures and designs an issue? If you can elaborate a little more this would be very insightful for a reader.

      Single-arm pre-post measurement studies involve participants being assigned to a single experimental condition, with pain assessments conducted only once before and once following an intervention. This study design presents some limitations, particularly in the context of examining exercise-induced modulation of pain (Vaegter and Jones, 2020). Such designs are potentially confounded by the effects of habituation to noxious stimuli, as highlighted by Vaegter and Jones (2020). Incorporating randomised controlled trials with multiple measurement blocks not only mitigates these limitations but also provides a clearer understanding of how individual bouts of exercise influence pain perception. We have now added this to the paper.

      L80 - The reference for the functional threshold power assessment is provided as a number. Please could the authors change to reflect which study/studies they are referring to here (I presume it is the Borszcz and/or the McGrath studies?).

      We apologise for this oversight and have now updated the reference to be displayed correctly. The reviewer is correct in assuming that Borszcz et al. (2018) is the referenced study here.

      L88 - Did participants also receive pressure pain stimulations in addition to the thermal stimuli, as the figure suggests?

      Note Since read on to L102-104 and understood why pressure pain was included but not mentioned due to results. However, I would still recommend including pressure pain stimulations in this line, if possible, to be consistent with what Figure 1 shows and later text in the Methods section also shows.

      We thank the reviewer for their suggestion to mention pressure pain at the referenced line to increase the clarity and consistency of the experimental paradigm. Pressure and heat pain were applied in alternating fashion during scanning. Whilst the results of pressure pain are not included in this study we agree with the reviewer that it should be mentioned again as part of the methods and have added this.

      L94 - I really like Figure 1. Great job.

      Could the authors please define the inter-trial interval (ITI) in the legend? And please could the authors clarify what unit the 30, 50, and 70 figures in the "18 trials per block" section refer to.

      We thank the reviewer for their positive feedback. We have now included a definition of inter-trial-interval (ITI) in the figure legend. Furthermore, we adapted Figure 1 so that the units of the stimulus intensities (30, 50, 70) on the Visual Analog Scale (VAS) are included in the figure allowing for a clearer identification.

      (3) Results

      General comment for figures ... is there a specific reason the authors chose for error bars to be represented by an SE value as opposed to an SD value?

      The reason I ask is that participant responses seem to vary (See Figure 2A and 2E-G as an example). Error bars showing SD values would perhaps do justice to the variability in participant response(s), whereas the SE may be a better representation of the variability in responses due to the assessor's methods of collection. Whilst the SE error bars are narrow (great job on that!), the individual responses are clearly varied which I speculate could be because of the interventions that have been implemented (i.e., exercise intensity).

      The use of Standard Error (SE) is more common in the cognitive neuroscience literature.

      However, as this reviewer noted, we have also included individual data points alongside the SE, thereby providing a comprehensive view that allows for a thorough interpretation of the data distribution.

      L102 to 104 - In fact, it is interesting that exercise did not impact the pressure pain ratings whereas the same cannot be said for thermal pain. In line with some of my comments below about the impact of exercise on pain intensity responses, I would be intrigued to see the results of the pressure pain ratings in more detail.

      Another note on this... Whilst the results for the pressure pain may be beyond the scope of this paper and will be reported separately, knowing of this data is tantalising for a reader. I would suggest to: A) either mention the pressure pain and include the analysis of the data; or B) not mention the pressure pain altogether and save it for the subsequent paper. Either way, I look forward to seeing further discussion on this in future work.

      We have now summarised the behavioural results of exercise on pressure pain ratings below in Supplemental Figure S1.

      There was no hypoalgesic effect evident in the behavioural pain ratings comparing HI to LI exercise in the saline (SAL) condition (β = 0.57, CI [-1.73, 2.86], SE = 1.17, t(1354) = 0.48, P = 0.63; Supplemental Figure S1A, blue bars) as well as no interaction of drug treatment and exercise intensity on pressure pain ratings (β = -1.43, CI [-4.87, 2.01], SE = 1.75, t(2756.02) = -0.82, P = 0.42; Supplemental Figure S1). Post-hoc paired t-tests (Bonferroni-corrected) confirmed there to be no significant differences between the drug treatment conditions at LI (P = 0.18) or HI (P = 0.85) and no significant difference between the exercise intensities in the SAL (P = 0.65) and NLX (P = 0.48) conditions, confirming no significant differences in drug treatment between the exercise intensities.

      Furthermore, there was no significant effect of fitness level on differences in pain ratings (LI – HI exercise) in the SAL condition (β = 3.16, CI [-1.64, 7.97], SE = 2.37, t(38) = 1.34, P = 0.19; Supplemental Figure S1B) and no significant correlation between fitness level and difference pain ratings (r = 0.25, P = 0.13). Finally, there was no significant interaction of drug treatment, exercise intensity, and sex on difference pain ratings (β =-7.97, CI [-18.67, 2.73], SE = 5.51, t(190) = -1.45, P = 0.15; Supplemental Figure S1C-D).

      Exercise did not appear to affect pressure pain ratings and we have now added this to the discussion and in the methods section. However, we think that the figure should be part of the supplements.

      L112 to 113 - Fantastic work for including this analysis in your study. Great job.

      We appreciate the reviewers’ positive feedback on conducting these crucial analyses when investigating sex and gender differences in pain.

      L186 to 189 - It is fascinating that there appears to be no effect of NALOXONE on pain ratings within female participants at a VAS rating of 30 for thermal pain as well as a much diminished hyperalgesia effect at a VAS rating of 50 compared to males. Meanwhile, at higher intensity stimulations corresponding to a VAS rating of 70, females in fact demonstrate a more pronounced hyperalgesia effect compared to males. In addition, the hyperalgesia effect of NALOXONE for males seems to "peak" at a VAS rating of 50. The mechanisms behind these findings alone would be incredibly exciting to explore... but maybe in another study.

      We agree with the reviewer that the differences in males and females are fascinating results and concur that this may hint at varying degrees of opioidergic involvement at different stimulus intensities. This finding is intriguing and potentially clinically relevant, warranting further investigation in future research, although it lies beyond the scope of the current paper.

      L189 - To double check... Figures 4A and 4B refer to the entire cohort (male and female responses combined) whereas C-E are separated by sex?

      In addition, as there are no annotations to the top of Figures 4C-E were no significant differences observed between saline and naloxone conditions per each stimulus intensity? i.e., similar tests to what are shown in Table S6 but separated for each sex.

      Without getting too carried away, there may be something here that indicates a difference between sexes concerning the opioid-driven pain modulation response on a neurological level (i.e., brain region activation).

      The reviewer is correct in assuming that Figures 4A and 4B refer to the entire cohort whilst Fig. 4C – 4E are split for males and females. The full output of the analyses for Fig. 4A and 4B are reported in Supplemental Tables S5 – S7. Furthermore, the full output of the LMER analyses for Fig. 4E is reported in Supplemental Table S10. We agree with the reviewer that additional annotations in Fig. 4C – Fig. 4E ease interpretation and have, thus, added them to the respective figures, denoting the significance of the interaction term stimulus intensity and drug treatment for females (Fig. 4C) and males (Fig. 4D), respectively. For completeness, we now report the post-hoc paired samples t-tests for females and males in the Supplemental Tables S8 and S9, respectively.

      L254 to 258 - "we could not establish an overall hypoalgesia effect of exercise...". Do the results of the exercise intensity x drug treatment provide an answer for this exact hypothesis? After checking the methods section, I cannot seem to find whether the statistical analysis has involved a comparison of the pain ratings after the high (alone), low (alone), or high and low (combined) exercise compared to ratings during control or pre-calibration as part of precalibration (i.e., pain ratings in a rested state without any exercise yet completed).

      We concur with the reviewer's assessment that the study design and statistical analyses cannot address the ‘overall’ effect of exercise compared to no exercise. Please refer back to our general response before comment 1, where we have addressed this point.

      As it seems that the analysis assesses the differences between high and low-intensity exercise, to me, the results of the exercise intensity x drug treatment analysis do not assess whether there is an exercise-induced hypoalgesia effect or not. Instead, it seems to assess whether the intensity of exercise is a differentiating factor in the expected exercise-induced hypoalgesia effect to subsequent pain intensity ratings to experimental pain stimulation. For the authors to judge whether aerobic exercise does or does not have a hypoalgesia effect, then the exercise conditions (either combined or standalone) would have to be compared to a control condition or a data set that involved pain ratings from a pre-exercise timepoint.

      We thank the reviewer for their comment. We would like to point out the we concluded there to be no hypoalgesic effect between the LI and HI exercise based on the LMER model comparing the behavioural pain ratings between the exercise conditions in the SAL condition (β = 1.19, CI [-1.85, 4.22], SE = 1.55, t(1354) = 0.77, P = 0.44; Figure 6A, blue bars and Table S9). The statistical model investigating the interaction of exercise intensity and drug treatment served to show that NLX did not modulate pain differently between the LI and HI exercise conditions.

      Given that our experiment involved different exercise levels in a randomized order, a simple pre vs post analysis is not straightforward. Nevertheless, we have set up a model where we take into account the rating time point (pain ratings provided before each exercise block (prepain ratings) and following each exercise block (post-pain ratings)) at each stimulus intensity (VAS 30, 50, 70) and exercise intensity (LI and HI). The model also takes into account the exercise intensity performed in the previous block, the overall block number as well as the varying subject intercepts. The analysis was completed for heat (Author response image 1A) and pressure (Author response image 1B) pain ratings in the SAL condition to establish whether there was a significant effect of exercise intensity on the changes from pre to post-pain ratings. The model for heat pain yielded a significant main effect for stimulus intensity (β = 1.43, CI [1.34, 1.52], SE = 0.05, t(2054.95) = 31.61, P < 0.001) but no significant interaction of exercise intensity, rating time point, and stimulus intensity (P = 0.14). The model for pressure pain in the SAL condition yielded a significant main effect of stimulus intensity (β = 1.00, CI [0.92, 1.08], SE = 0.04, t(2054.99) = 24.68, P < 0.001) and block number (β = 1.14, CI [0.35, 1.94], SE = 0.41, t(2055.98) = 2.80, P = 0.005) but not interaction of exercise intensity, rating time point, and stimulus intensity (P = 0.38).

      Author response image 1.

      Heat (A) and Pressure (B) pain ratings in the saline (SAL) condition for pre (purple) and post (turquoise) exercise pain ratings at LI and HI exercise and all stimulus intensities (VAS 30, 50, 70). The bars depict the mean pain rating pre and post-exercise and the dots depict the subject-specific mean ratings. The error bars depict the SEM.

      Another point of consideration is that Figure 6A appears to show an average of all pain ratings combined per participant (is this correct?). As participants were exposed to stimulations expected to elicit a 30, 50, or 70 VAS rating based on pre-calibration values, therefore the average rating would be expected to be around 50. What Figure 6A shows is that in the SALINE condition, average pain ratings are in fact ~10-15 units lower (~35) and then in the NALOXONE condition, average pain ratings are ~5 units lower (~45) for both exercise intensities. From this, I would surmise the following:

      • It appears there is an exercise-induced hypoalgesia effect as average pain ratings are ~30% lower than pre-calibrated/resting pain ratings within the SALINE condition at the same temperature of stimulation (it would also be interesting to see if this effect occurred for the pressure pain).

      • It appears there is evidence for the endogenous opioid mechanism as the NALOXONE condition demonstrates a minimal hypoalgesia effect after exercise. I.e., NALOXONE indeed blocked the opioid receptors, and such inhibition prevented the endogenous opioid system from taking effect.

      • It appears there is no effect of exercise intensity on the exercise-induced hypoalgesia effect. That is, participants can cycle at a moderate intensity (55% FTP) and incur the same hypoalgesia benefits as cycling at an intensity that demarcates the boundary between heavy and severe intensity exercise (100%FTP). This is a winner in my mind. Anyone wishing to reduce pain can do so without having to engage in exercise that is too effortful and therefore aversive - great news!

      I will very slightly caveat my summaries with the fact that a more ideal comparison here would be a control condition whereby participants did the same experimental visit but without any exercise prior to entering the MRI scanner.

      As a result of this interpretation of your findings, I do not think that aerobic exercise as a means to cause subsequent hypoalgesia to experimental thermal nociception can be fully discounted. On the contrary, I think your results showed in Figure 6A are evidence for it.

      The reviewer is correct in assuming that Figure 6A shows the averaged pain ratings across all stimulus intensities (VAS 30, 50, and 70) for each subject. To provide more details, we have split Figure 6A by stimulus intensity, now depicting the pain ratings for LI and HI exercise and treatment condition (SAL and NLX) at VAS 30, 50, and 70 (Supplemental Fig. S8). The LMER was extended to include the stimulus intensity and yielded a significant main effect of stimulus intensity (β = 1.39, CI [1.31, 1.47], SE = 0.04, t(2753.12) = -34.082, P < 0.001) and a significant interaction of stimulus intensity and drug treatment (β = 0.12, CI [0.01, 0.24], SE = 0.06, t(2751) = 2.13, P = 0.03) but no significant interaction of exercise intensity, drug treatment, and stimulus intensity (β = -0.05, CI [-0.20, 0.11], SE = 0.08, t(2751) = -0.56, P = 0.58).

      The reviewer further suggests that the average pain ratings in the SAL condition are lower than the anticipated stimulus intensity, thus, indicating exercise-induced hypoalgesia. While this interpretation is one possibility, there is an alternative explanation: the lower pain ratings may stem from habituation to heat pain (Greffrath et al., 2007; Jepma et al., 2014; May et al., 2012). To support this perspective, we have visualised data from other studies in our lab that have been conducted with the same thermode head and device (TSA-2), using the same calibration procedure and aiming for the same stimulus intensities (VAS 30, 50, and 70). In both studies (Author response image 2A: Study 1: Behavioural sample; Author response image 2B: Study 2: fMRI sample; Author response image 2C: Original Exercise Study), participants did not engage in an exercise task and the pain ratings at VAS 30 and VAS 50 were lower than the anticipated intensities (VAS 30: 11.1/13.4; VAS 50: 35.0/35.9). Furthermore, in a previous study by (Wittkamp et al., 2024), the authors showed that, despite calibrating the heat stimuli at VAS 60, participants rated the pain stimuli with M = 48.58 (SD = 13.79).

      This discrepancy observed between calibrated intensities and ratings provided could be attributable to habituation effects, especially at low-intensity stimuli. Moreover, we would like to point the reviewer to the highest stimulus intensity at VAS 70 (Author response image 2C), where no habituation in all three data sets (including the current study) has taken place. This consistency suggests that exercise-induced hypoalgesia may not be present in our findings or potentially confounded by habituation effects.

      Author response image 2.

      Heat pain ratings at different intensities (30, 50, and 70 VAS) in different study samples. Bars depict the mean ratings in the saline (SAL) condition. Individual data points depict subject-specific mean pain ratings. Error bars depict the SEM.

      The reviewer further suggests that there is evidence for endogenous opioidergic modulation since the pain ratings in the NLX condition are lower than the anticipated intensities. We fully agree but, again, would argue that the DPMS can exert its effects on painful stimuli in a default manner, i.e. irrespective of any exercise effect.

      We concur with the reviewer’s interpretation that there is no effect of exercise intensity on exercise-induced hypoalgesia since the ratings between both exercise intensities are not significantly different.

      Finally, we agree that our data does not allow for the interpretation of an ‘overall’ effect of exercise-induced hypoalgesia and would like to point out that we did not aim to claim this. Rather, the data suggests there to be no effect of LI vs. HI aerobic exercise on pain modulation. We acknowledge, however, that the phrasing involving ‘overall’ can be misleading and have revised this to focus on the comparison between LI and HI exercise, thereby enhancing precision and clarity.

      Note This is also where it would be really interesting to see the pain pressure data if it were to be included. Mainly to see whether it coheres with what the thermal stimulation stuff shows.

      We have provided the ratings for the pressure pain ratings in the SAL condition below (Author response image 3).

      Author response image 3.

      Pressure pain ratings in the SAL condition at stimulus intensity (VAS 30, 50, and 70). Bars depict the mean ratings in the saline (SAL) condition. Individual data points depict subject-specific mean pain ratings. Error bars depict the SEM.

      L259 - As mentioned in the comment above. Could the authors distinguish what is being shown in Figure 6A? Are the data presented as the pooled mean for all stimulation intensities? If not, what data is displayed per bar/column?

      We thank the reviewer for their comment. The reviewer is correct in assuming that the bars in Figure 6A depict the pooled means across all stimulus intensities (VAS 30, 50, 70) for each drug treatment condition and exercise intensity. To allow for a more detailed comprehension of the data, we have split Figure 6A by stimulus intensity, now depicting the pain ratings for LI and HI exercise and treatment condition (SAL and NLX) at VAS 30, 50, and 70 (Supplemental Figure S8). The LMER was extended to include the stimulus intensity and yielded a significant main effect of stimulus intensity (β = 1.39, CI [1.31, 1.47], SE = 0.04, t(2753.12) = -34.082, P < 0.001) and a significant interaction of stimulus intensity and drug treatment (β = 0.12, CI [0.01, 0.24], SE = 0.06, t(2751) = 2.13, P = 0.03) but no significant interaction of exercise intensity, drug treatment, and stimulus intensity (β = -0.05, CI [-0.20, 0.11], SE = 0.08, t(2751) = -0.56, P = 0.58).

      L278 - Can the authors please provide a reference that explains how W.kg-1 at FTP is a measure of fitness level?

      We thank the reviewer for their comment. The obtained FTP value was corrected for the weight of each participant (Watt/kg), yielding a weight-corrected fitness measure that allows for better comparison between subjects. We denoted this in the figures as W*kg-1 which serves to be the equivalent term.

      L296 - Take the line away from Figure 7A... Does the individual data show a positive relation between pain rating changes and W.kg-1? Besides the three data points (1 on the far right of the figure and the two on the far left), I find it hard to see any real trend.

      We acknowledge the reviewers’ concern regarding the regression line and the visual clarity of the individual data points. However, it is important to note that the significant main effect of fitness level on differences in pain ratings in the SAL condition (β = 6.45, CI [1.25, 11.65], SE = 2.56, t(38) = 2.52, P = 0.02) supports the assertion that higher fitness levels are associated with greater hypoalgesia following HI exercise compared to LI exercise. While the trend may not be visible for all data points, the statistical analysis provides a robust basis for the observed relationship (r = 0.33, P = 0.038).

      We have conducted an additional LMER model where we have excluded the subjects with the highest and lowest FTP values (sub-28 with 3.19 W/kg and sub-06 with 0.76 W/kg, respectively.) The LMER still yields a significant main effect of fitness level (β = 6.82, CI [1.25, 11.65], SE = 3.18, t(34) = 2.14, P = 0.039; Author response image 4) and a positive correlation between the difference ratings and fitness level approaching significance (r = 0.32, P = 0.057).

      Author response image 4.

      Fitness level on difference pain ratings (LI-HI exercise) without subjects with highest and lowest FTP (N = 37). (A) Subject-specific differences in heat pain ratings (dots) between LI and HI exercise conditions (LI – HI exercise pain ratings) and corresponding regression line pooled across all stimulus intensities in the SAL condition. Fitness level (FTP) showed a significant positive relation to heat pain ratings with a significant main effect of FTP (P = 0.039) on difference ratings.

      (4) Discussion

      L356 to 358 - Exactly. What you write here, I agree with. Your testing allowed you to judge whether there is an effect of aerobic exercise intensity on pain modulation. However, I think this has been a little conflated with the idea that there is "no overall effect of aerobic exercise on pain modulation" in other areas of the article (L358-361, Results, and Abstract). As per my previous comment, I am not sure this (no overall effect) is true.

      We agree with the reviewer and have adapted the manuscript so that the misleading phrase including ‘overall’ is removed.

      L358 to 365 - One addition to this debate about whether this is a hypoalgesia effect of aerobic exercise. In 358 - 361 (particularly the end of 361) there is a strong conclusion that there is no direct involvement of the endogenous opioid system. Then glance onto L364 to 365 and there is then an almost conflicting summary that a hypoalgesia effect driven by opioidergic regions of the brain (and ergo endogenous opioids) is in effect. If there were no direct endogenous opioid involvement, then differences between NALOXONE (blockade of the opioid mechanism) and SALINE conditions would not exist.

      We thank the reviewer for their comment. The structure of this paragraph aimed to guide the reader towards a more nuanced understanding of the possible mechanisms and caveats in exercise-induced pain modulation. Whilst our data suggest an effect of NLX on pain ratings where we showed significantly higher pain ratings in the NLX condition compared to the SAL condition we could not identify an interaction between treatment and exercise intensity. This suggests that there is no significant difference in opioidergic involvement between HI and LI exercise. Our exploratory analyses, however, show an effect of endogenous opioids involved as an underlying mechanism dependant on sex and fitness level.

      My perspective is that an exercise-induced hypoalgesia effect has occurred (based on the data in Figure 6A) but that this effect is certainly caveated by the sex and fitness levels that this study has observed (and kudos for it).

      As mentioned above, based on the current data we cannot untangle whether the reduced pain ratings in the SAL condition are due to habituation to noxious stimuli or an actual hypoalgesic effect of exercise (or potentially a mix of both). However, we fully agree with the reviewer that exercise-induced pain modulation is influenced by fitness level and sex.

      L390 - "endogenous pain modulation through μ-opioid receptors increases with increasing pain intensity". Aside from the general discussion about whether aerobic exercise causes a post-exercise hypoalgesia effect. This finding is also interesting for the pain incurred during exercise in the form of naturally occurring muscle pain and may also be clinically relevant as it could be that the endogenous pain modulation "system" could be primed through repeated exercise as your results show that the fitness level (i.e., a close correlate of how much someone has engaged in exercise and therefore 'activated' the endogenous pain modulation system) is associated with a more pronounced post-exercise hypoalgesia effect.

      This is an interesting aspect. With regards to the pain induced by exercise itself (i.e. muscle pain) we did not gather any data on this type of pain and interpreting this would be mere speculation. However, it is an interesting hypothesis to investigate in future studies whether the pain induced by exercise is potentially influenced by the endogenous opioid system. We agree with the reviewers’ interpretation that repeated exercise might prime the endogenous opioid system, especially in fitter individuals who engage more frequently in exercise and, thus, ‘train’ the endogenous opioid system. We have included this line of interpretation in the original manuscript, where we suggest that the mFC, a brain region with high µ-opioid receptor density, might be ‘trained’ by repeated exercise and, therefore, shows increase activation in fitter individuals after short bouts of exercise.

      L404 to 405 - "a resting baseline does not control for unspecific factors such as attentional load or distraction (Brooks et al., 2017; Sprenger et al., 2012) through exercise." I am not sure I agree. A control condition allows one to truly deduce whether exercise causes a hypoalgesia effect or not. The attentional load may be a factor, but I would argue this is distinct from endogenous pain modulation - unless there is a study that shows cognitive load alone can elicit endogenous opioids like exercise. About distraction, this would be the case if the pain measures were taken during the exercise. However, as the pain measures taken in the MRI were post-exercise and there was no added distraction related to the exercise present anymore, then I do not think any added effect of distraction due to the exercise and its effect on postexercise pain measure is relevant any longer.

      We agree with the reviewer that a resting baseline condition in the context of exercise induced pain modulation would allow for the investigation of a potential hypoalgesic effect of exercise compared to no exercise. It is important to note that both studies (Brooks et al., 2017; Sprenger et al., 2012) have indeed shown that the effect of cognitive pain modulation is mediated by endogenous opioids.

      L406 - I do not think a low-intensity exercise is a true "control" condition. It certainly does allow the study to compare the dose-response relationship but as the individual is exercising (even at a moderate physiological intensity) then comparison of HIGH vs LOW does not tell us whether exercise does or does not cause hypoalgesia. In contrast, the results from Figure 6A seem to show that even LOW intensity exercise has a hypoalgesia effect and this is a good thing for those who cannot exercise at high intensities (e.g., chronic populations).

      Please refer back to our general response before comment 1, where we have addressed this point.

      L410 - A small digression in relation to the exercise intensities:

      The intensity domains (moderate - heavy - severe) are not truly controlled within this study (mainly for the LOW condition), and therefore some participants could have exercised within different exercise intensity domains than others. To explain, the exercise intensity domains are distinguishable by the physiological responses associated with the boundaries of each of these domains. The FTP is believed to be a demarcation point between heavy and severe intensity domains (though kinesiologists debate the validity of this). Other concepts similar to FTP are Critical Power or the Respiratory Compensation Point. Ultimately, the boundary between heavy and severe intensity domains is characterised by the highest possible intensity by which a steady-state in oxygen kinetics (V̇ O2) occurs (Burnley & Jones, 2018). If this is expressed as a power output (Watts) and then a percentage of this power output is used to prescribe exercise intensity, then the physiological response is not always as expected. The reason is that for some people the gaseous exchange threshold (the demarcation point between the moderate and heavy intensity domains) is not always the same percentage between resting and FTP/Critical Power/Respiratory Compensation Point for each person. As a result, some individuals who are prescribed an intensity of 55% FTP/Critical Power/Respiratory Compensation Point may subsequently exercise within the moderate intensity domain (most people did based on the heart rate and RPE responses) whilst some others might actually exercise more within the heavy intensity domain. A quick check of Figures 3B-C could indicate that this might have been the case for two or three participants, but that is inference and speculation as we cannot truly know unless gas parameters were taken (which is perfectly understandable that they have not been taken because this study has done so much else). However, the importance of this for this study is that if some participants did indeed exercise at a slightly higher physiological intensity, this undermines the LOW condition as a "control" as the physiological stimulus between conditions (Brownstein et al., 2023). It means that the proposed differences in endogenous opioids (Vaegter et al., 2015; 2019) between exercise intensities may not have been present and therefore summarising a lack of an exercise induced hypoalgesia effect is slightly confounded. This is one factor contributing to my scepticism about the conclusion that there is a lack of an exercise-induced hypoalgesia response.

      We thank the reviewer for their comment as it touches upon the challenges of estimating exercise intensities precisely. It is, indeed, crucial to consider the boundaries between moderate, heavy, and severe intensity domains, as delineated by physiological markers such as the Functional Threshold Power (FTP), Critical Power, and the Respiratory Compensation Point (VO2max) (Burnley & Jones, 2018). Previous research has shown that the FTP and FTP20 tests are reliable and convenient methods to estimate approximate measures of VO2max (Denham et al., 2020) and that the FTP test is a useful test for performance prediction in moderately trained cyclists (Sørensen et al., 2019).

      We acknowledge that without direct measurements of VO2max, it is challenging to determine the precise intensity domain in which each participant was operating. While the RPE and HR might suggest that some participants performed in the moderate intensity domain in the LI exercise condition, we could still ascertain there to be a significant difference in the relative power (%FTP), heart rate (HR), and rating of perceived exertion (RPE) between the LI and HI exercise conditions. In the overall sample, the consistency in relative power, heart rate, and RPE responses among participants suggests that the exercise doses were effectively communicated and adhered to; therefore, the validity of the LI exercise condition remains robust.

      While we did not include metabolic assessments in our protocol, our study focused on providing a comprehensive analysis of the exercise-induced hypoalgesia phenomenon across two distinct exercise intensities. Additionally, the rationale for selecting specific exercise intensities was grounded in the existing literature, which indicates significant differences in the hypoalgesic response between exercise intensity levels (Jones et al., 2019; Vaegter et al., 2014).

      According to the reviewer, the potential lack of difference between the exercise conditions might contribute to the fact that there was no difference in endogenous opioid release and, thus, no difference in pain ratings between the exercise conditions. However, our data still suggests that there is an influence of endogenous opioids in the HI exercise condition in males with higher fitness levels. Together with recent findings on the association of µ-opioid receptor activation and fitness levels in men (Saanijoki et al., 2022), as well as the difference in µ-opioid receptor availability between high and moderate aerobic exercise (Saanijoki et al., 2018), we would hypothesise that the release of endogenous opioids after short HI bouts of exercise depend on fitness levels (and potentially sex).

      Finally, we propose that discussing exercise intensity domains within the context of our study enriches the understanding of exercise-induced hypoalgesia without undermining the integrity of our findings. We have, therefore, included this in the discussion of the manuscript.

      L417 - For some reason I am doubting this value (r = 0.61). Could this be checked? I think it is higher in their study. r = 0.88?

      Also, as someone with a kinesiology background, I would argue this is a given anyway. The maximum power one can cycle for 20 minutes is related to the maximum power one can cycle for 60 minutes, this is expected. (That is no slight on the authors of this study, more a remark that readers could look and figure that for themselves if they needed to know).

      We thank the reviewer for their comment. We have carefully re-checked the correlation coefficient between the FTP20 and FTP60 tests in the study by Borsczc et al. (2018) and have corrected the correlation coefficient to r = 0.88. We thank the reviewer for detecting this. Whilst we agree that it seems somehow intuitive that the FTP20 and FTP60 should correlate highly, we wanted to provide the reader with a better understanding of where the FTP20 tests originated from and how it is suitable to assess aerobic fitness levels without having to maintain a steady power output for 60 minutes.

      L428 - Kudos to the authors for taking a standardised approach to this. Hopefully, my comment earlier might provide some extra food for thought about exercise intensity. I think there are several other ways future research could prescribe exercise without the need for expensive and cumbersome bits of equipment to know how hard people are exercising.

      We strongly agree with the reviewer and hope that our study can inspire future research to implement more convenient and inexpensive ways to establish aerobic (and anaerobic) fitness levels.

      L456 to 458 - Would it be possible to revisit this and check whether the pooled mean of all stimulation intensities for pain intensity ratings after pressure pain is lower than 50? If so, I think it can also be assumed that there is a slight hypoalgesia effect occurring for pressure pain too.

      We have revisited the pressure pain ratings pooled across all stimulus intensities (VAS 30,50, and 70). Indeed, the ratings are below 50 VAS (Supplemental Figure S1A) in the SAL and NLX conditions. As mentioned before lower pain ratings after LI exercise cannot be taken as evidence for exercise-induced analgesia.

      L495 to L499 - I find this fascinating. Great finding.

      We thank the reviewer for their positive feedback.

      (5) Methods

      L650 - "Watts"

      We have changed the sentence accordingly.

      L651 - beats per minute can also be represented as b.min-1 and cadence as revolutions.min-1.

      To allow for easier interpretation of the results in a broader readership we would like to propose to maintain the original abbreviations.

      L678 - Just to check what the authors mean by "on the second experimental day", they are actually referring to Visit 2 of 3 (first experimental visit of 2) as it is shown in Figure 1?

      We apologise for the lack of clarity. Indeed, the second experimental day refers to the third visit in the study. We have added this to the sentence to increase clarity.

      L708 - would change the end of the sentence to "and remained blinded throughout the study"

      We have changed the sentence accordingly.

      L742 - comma after "in one participant".

      We have added the missing comma.

      L746 - slight mistype... RPE in brackets instead of PRE

      We have changed the abbreviation to RPE.

      L747 - In case the authors are interested in affective measures in future studies... Hardy and Rejeski (1989) have a 9-point Likert scale rating affective valence which might be useful to check out.

      Thank you. The scale by Hary and Rejeski (1989) is a very relevant measure of affective valence during exercise, and we will consider this in future studies.

      L755 - Four squares for the thermode to be applied were drawn on the arm but through the methods I can only seem to see that the thermode was applied to the second square during calibration. During the MRI scan, did someone move the thermode to different squares for different stimulations?

      We appreciate the reviewers' question. Indeed, the heat calibration and recalibration on the first and second day, respectively, have always been completed on the same skin patch (patch 2) to allow for comparability of calibration across days. During the experimental sessions, the thermode head was repositioned in a randomised order across participants (i.e., skin patch 14-3-2) before each block. This was done manually before the MRI block commenced. The order of thermode head position was kept constant within participants across experimental days (day 2 and day 3).

      L764 - ITI predefined?

      We thank the reviewer for their comment and would like to point to line 130 in the revised manuscript where the abbreviation for inter-trial-interval (ITI) was first introduced.

      (6) Other Sections + Supplementary Materials

      L891 - I apologise in advance for this comment as it is the most trivial comment you will ever receive, but there is an extra "." On this line after J.N. initials for methodology.

      We have changed the punctuation accordingly.

      Table S1 - Strictly speaking, some of the intensity denominations in this table are not exactly an "intensity".

      Iannetta et al. (2020) - https://doi.org/10.1249/mss.0000000000002147 provides a commentary on intensity domains as well as Burnley and Jones (2018) - https://doi.org/10.1080/17461391.2016.1249524

      Likewise in this table - the term "without fatigue" in the description column is not strictly true as participants will naturally fatigue but authors are referring more to a "steady state".

      We have changed the name of the column to ‘Description’ to describe the test phase as proposed by Allen and Coggen (2012) and previously implemented by McGrath et al. (2019) and not the ‘intensity domains’ (as specified by Iannetta et al. (2020)). Further, we have refined the wording in Table S1 and replaced the term ‘without fatigue’ with ‘steady state’.

      Once again, thank you to the authors for their great work on this project and to the editor for the chance to review this paper.

      We would like to thank this reviewer for their very insightful and important comments and for pointing out the strengths of the manuscript. We believe the suggestions will help to improve the quality of the manuscript.

      Reviewer #2 (Recommendations for the authors):

      Summary:

      This interesting study compared two different intensities of aerobic exercise (low-intensity, high-intensity) and their efficacy in inducing a hypoalgesic reaction (i.e. exercise-induced hypoalgesia; EIH). fMRI was used to identify signal changes in the brain, with the infusion of naloxone used to identify hypoalgesia mechanisms. No differences were found in postexercise pain perception between the high-intensity and low-intensity conditions, with naloxone infusion causing increased pain perception across both conditions which was mirrored by activation in the medial frontal cortex (identified by fMRI). However, the primary conclusion made in this manuscript (i.e. that aerobic exercise has no overall effect on pain in a mixed population sample) cannot be supported by this study design, because the methodology did not include a baseline (i.e. pain perception following no exercise) to compare high/low-intensity exercise against. Therefore, some of the statements/implications of the findings made in this manuscript need to be very carefully assessed.

      Strengths:

      (1) The use of fMRI and naloxone provides a strong approach by which to identify possible mechanisms of EIH.

      (2) The infusion of naloxone to maintain a stable concentration helps to ensure a consistent effect and that the time course of the protocol won't affect the consistency of changes in pain perception.

      (3) The manipulation checks (differences in intensity of exercise, appropriate pain induction) are approached in a systematic way.

      (4) Whilst the exploratory analyses relating to the interactions for fitness level and sex were not reported in the study pre-registation, they do provide some interesting findings which should be explored further.

      Weaknesses:

      (1) Given that there is no baseline/control condition, it cannot be concluded that aerobic exercise has no effect on pain modulation because that comparison has not been made (i.e. pain perception at 'baseline' has not been compared with pain perception after high/low intensity exercise). Some of the primary findings/conclusions throughout the manuscript state that there is 'No overall effect of aerobic exercise on pain modulation', but this cannot be concluded.

      (2) Across the manuscript, a number of terms are used interchangeably (and applied, it seems, incorrectly) which makes the interpretation of the manuscript difficult (e.g. how the author's use the term 'exercise-induced pain').

      (3) There is a lack of clarity on the interventions used in the methods, for example, it is not exactly clear the time and order in which the exercise tasks were implemented.

      (4) The exercise test (functional threshold power) used to set the intensity of the low/high exercise bouts is not an accurate means of demarcating steady state and non-steady state exercise. As a result, at the intensity selected for the high-intensity exercise in this study, it is likely that the challenge presented for the high-intensity exercise would have been very different between participants (e.g. some would have been in the 'heavy' domain, whereas others would be in the 'severe' domain).

      (5) It is likely that participants did not properly understand how to use the 6-20 Borg scale to rate their perceived effort, and so caution must be taken in how this RPE data is used/interpreted.

      (6) Although interesting, the secondary analyses (relating to the interaction effects of fitness level and sex) were not included in the study pre-registration, and so the study was not designed to undertake this analysis. These findings should be taken with caution.

      We thank the reviewer for their insightful comments that contribute to improving the quality of the manuscript. In response to the identified weaknesses, we have made key revisions to enhance clarity and rigor. Regarding the lack of a resting control condition, we acknowledge that our study does not assess the overall effect of exercise versus no exercise. Our primary objective was to compare high- (HI) and low-intensity (LI) exercise on pain modulation, hypothesizing that lower intensities would have minimal effects. We revised the manuscript to eliminate misleading phrases about an "overall" effect, clearly emphasizing our aim to investigate the comparative effects of different exercise intensities. To address terminology inconsistencies, we have adopted "exercise-induced pain modulation," reflecting existing literature that recognizes both hypoalgesia and hyperalgesia associated with exercise (Vaegter and Jones, 2020). We clarified this terminology in the introduction and specified the pain modalities used in our study. We also improved methodological transparency by better describing the timing and order of exercise and drug treatment interventions. Concerning exercise intensity estimation, we acknowledge the complexities in classifying moderate, heavy, and severe domains. We added the study by Wong et al. (2023) to discuss the potential limitations of the FTP estimation protocol. Although direct measures of VO2max or blood lactate are absent in our study, our findings, including perceived exertion (RPE) scores and relative power data, support that participants were primarily in the heavy-intensity domain during HI exercise. To clarify RPE ratings, we adjusted the presentation to align with the Borg scale's intended anchor points, ensuring greater accuracy in reported exertion levels. Statistical analyses confirm significant differences in RPE between exercise intensities. These revisions aim to clarify our intent and methodologies, ultimately strengthening the contribution of our research to understanding exercise-induced pain modulation.

      (1) Lines 27-33 - please present some data and accompanying statistical output in the results section of the abstract.

      We thank the reviewer for their comment. In the results section of the abstract, we report whether the findings are (not) significant using the general threshold of P < 0.05. However, we prefer not to include more detailed data and statistical outputs here, as these are thoroughly presented in the results section and do not contribute to the abstract’s primary purpose of providing a concise summary.

      (2) Line 29 - please indicate how fitness level was quantified.

      The functional threshold power (FTP) adjusted for weight served as an indication of cardiovascular fitness level. We have now included this in the abstract.

      (3) Line 35 - please include a sentence detailing the implications of your findings.

      We have now included a sentence on the implications of our findings in the abstract.

      (4) Introduction general - I appreciate that it was an exploratory analysis, however, the introduction does not particularly lay the groundwork for this (e.g., the influence of fitness level, sex, etc) - please include some background within the introduction to establish the role level of fitness/exercise/training/physical activity on pain modulation.

      A paragraph detailing the role of fitness level and sex in the context of exercise-induced pain modulation and endogenous opioid release was part of the introduction of our manuscript but has been removed as per the reviewing editor’s request (as the inclusion of sex and fitness level was not part of the preregistration). We have now re-included a shortened version of this paragraph to provide some background on these potentially crucial factors in exercise-induced pain modulation.

      (5) Lines 40-41 - reference needed.

      We thank the reviewer for detecting this and have now included references concerning the release of endogenous opioids and the term exercise-induced hypoalgesia.

      (6) Lines 48-49 - please provide the full terms for ACC and PAG (PAG has been provided on line 52, but should be presented earlier).

      We thank the reviewer for detecting this. We now introduce the abbreviations for the periaqueductal grey (PAG) and anterior cingulate cortex (ACC) in the correct lines.

      (7) Line 49 - the term exercise-induced pain is often used interchangeably (incorrectly) with many different types of pain experienced during/after exercise (e.g. muscle burn/ache, DOMS, injury etc.). Please see O'Malley et al 2024 (doi: 10.1113/EP091687).

      We thank the reviewer for their comment. Despite the distinction between different types of pain induced by exercise being important, this is less relevant for the current study. We would like to point out that the full term used is exercise-induced pain modulation, referring to the modulation of (experimental) pain through exercise. We have deliberately chosen this term as it summarises exercise-induced hypoalgesia as well as hyperalgesia. Therefore, we did not refer to pain induced by exercise and would disagree that this term has been used interchangeably with different types of pain in the current manuscript.

      (8) Line 57 - neither of these studies looked at exercise-induced pain, rather they examined experimentally induced pain (e.g. cold pressor test) or chronic pain and how exercise might exacerbate it. This leads back to the previous comment - it is important to define what is meant by exercise-induced pain (EIP) from the offset, and then remain consistent in the reference to this.

      We agree with the reviewer and have cited the studies accordingly. We would like to point out that the current study does not investigate exercise-induced pain but the modulation of experimental pain through exercise and have used the term exercise-induced pain modulation consistently in the manuscript to describe this.

      (9) Line 61 - Droste et al and Olausson et al are missing from the reference list.

      We apologise for this oversight and have now updated the reference list to include the studies by Droste et al. (1991) and Olaussen et al. (1986).

      (10) Line 61 - Do you mean exercise-induced hypoalgesia, or modulation of exercise-induced pain - it is not clear? EIH is introduced in Line 40 and in consistent with what the Koltyn study explored. Conversely, Koltyn induced pain using heat and pressure, rather than exercise.

      In this manuscript, we have opted for the term ‘exercise-induced pain modulation’ since previous research has shown that exercise can elicit hypoalgesia as well as hyperalgesia (for review see Vaegter and Jones (2020)). Thus, the term refers to the modulation of pain through exercise. We have now included a sentence detailing the use of the term ‘exercise-induced pain modulation’ in the first passage of the introduction. Corresponding to Koltyn et al. (2014), we have used heat and pressure stimuli to induce pain and investigate the modulating effect of different exercise intensities on these pain modalities.

      (11) Line 62 and 64 - Both the Janal study and Haier study are missing from the reference list.

      We apologise for this oversight and have now updated the reference list to include the studies by Janal et al. (1984) and Haier et al. (1981).

      (12) Line 62 and 64 - define long/short distance/duration.

      We have revised the terminology from "short-duration" to "short-distance" to facilitate a more precise comparison of the exercise protocols employed in the studies by Janal et al. (1984) and Haier et al. (1981). Specifically, the long-distance run conducted by Janal et al. (1984) spanned 6.3 miles (10.3 km), while the short-distance run executed by Haier et al. (1981) covered 1 mile (1.6 km).

      (13) Line 62 - what type of pain?

      Janal et al. (1984) implemented thermal, ischemic, and cold pressor pain in their study and observed a hypoalgesic effect in response to thermal and ischemic pain that was reversed under NLX administration. We have now specified this in the text.

      (14) Line 67 - please place "i.e., the insula, ACC and prefrontal regions" in parentheses.

      Done.

      (15) Lines 67-69 - please provide clarity on the nature of the interventions being employed. For example, are you referring to interventions to reduce/overcome pain? Or are you referring to approaches to experimentally induce or increase pain during exercise? In either case, please be specific on the interventions employed, and why this variation in approach may make it challenging to draw a conclusion

      The interventions employed by several studies aimed to investigate the pharmacological underpinnings of the pain modulatory effect of exercise and were, thus, pharmacological interventions. The primary objective of these interventions is usually not to reduce/induce/decrease/increase pain but to block a specific receptor type to infer the involvement/role of these receptor types in pain modulation through exercise. In the context of exercise and pain specifically, the most frequently used pharmacological intervention consists of administering a µ-opioid receptor antagonist (naltrexone/naloxone (NLX)). Depending on which type of µ-opioid receptor antagonist is used, different administration protocols are employed (i.e., oral or intravenous administration, different doses, only bolus without constant injection). This variability in the administration protocols of these pharmacological interventions can account for different findings of the extent of opioidergic involvement in exercise-induced pain modulation. We have now refined the according section to increase the precision and clarity of the interventions used.

      (16) Line 69 - administration of what?

      This passage refers to the variability of administration of µ-opioid receptor antagonists such as naloxone (NLX) or naltrexone. We have now specified this in the according line.

      (17) Line 74 - EIH?

      As described above, we have chosen the term 'exercise-induced pain modulation' as an umbrella term for both exercise-induced hypoalgesia and hyperalgesia. However, the reviewer is correct that specifically studies investigating exercise-induced hypoalgesia have been criticised. Still, the proposed criticism also applies to studies detecting hyperalgesia and we would, thus, argue to retain the term ‘exercise-induced pain modulation’ here for the sake of consistency.

      (18) Line 75 - please define "single-arm pre-post measurements"

      We appreciate the reviewers' comment. Single-arm pre-post measurement studies involve participants being assigned to a single experimental condition, with pain assessments conducted only once prior to and once following the intervention. This study design presents several limitations, particularly in the context of examining exercise-induced modulation of pain (Vaegter and Jones, 2020). Such designs do not consider the effects of habituation to noxious stimuli, as highlighted by Vaegter and Jones (2020). Consequently, when measuring pain levels with only one pre- and one post-intervention assessment, there is a risk of misinterpreting the outcomes where a reduction in post-intervention pain ratings might erroneously be credited to the exercise intervention itself, rather than being a result of habituation to the noxious stimuli experienced. Incorporating randomised controlled trials with multiple measurement blocks not only mitigates these limitations but also provides a clearer understanding of how individual bouts of exercise influence pain perception.

      (19) Line 84 - is (40) a reference?

      We apologise for this oversight and have now updated the reference by Borszcz et al. (2018) to be displayed correctly.

      (20) Line 86 - is that 10 min per block (i.e. 40 min exercise time), or 10 min in total? If the former please include "per block" at the end of the sentence (Line 87).

      The reviewer is correct in assuming that we employed 10 min of cycling per block, resulting in a total of 40 minutes of cycling. We have updated the sentence now including ‘per block’ as suggested by the reviewer.

      (21) Line 89 - when you refer to "painfulness" are you referring to the intensity of pain experienced? If so, I think "pain intensity" would be more appropriate.

      In the current study, participants were asked about the ‘painfulness’ of each stimulus based on previous studies (Horing et al., 2019; Horing & Büchel, 2022; Tinnermann et al., 2022). The term ‘painfulness’ is a composite measure of ‘pain intensity’ (sensory dimension) and ‘pain unpleasantness’ (affective dimension) (Talbot et al., 2019). Since unpleasantness is also a definitional criterion of pain (‘Terminology | International Association for the Study of Pain’, n.d.) and previous research shows a high correlation between ‘pain unpleasantness’ and ‘pain intensity’ (Granot et al., 2008; Talbot et al., 2019) we have opted for the term ‘painfulness’ as a more comprehensive measure. Inherently, these two measures are highly correlated.

      (22) Line 91-93 - the way this is written could be suggestive of this being separate to the cycling blocks. Please rephrase to confirm that this was administered prior to the commencement of the cycling blocks.

      We have refined the sentence to make it clearer that the drug treatment was administered before the cycling block commenced on each of the experimental days. We would like to further specify, that whilst the bolus dose of the treatment was administered prior to the experiment, a constant intravenous supply of SAL/NLX was maintained throughout the experiment using an infusion pump.

      (23) Methods general - why only 10 min of exercise? It is likely that there is a 'dose effect' of exercise on EIH, whereby the intensity of exercise and the duration of the exercise are important. Short-duration but high-intensity exercise can induce EIH, as can moderate duration low-intensity exercise. But, for this protocol, was the intensity high enough or long enough to meet the 'dose' needed?

      We thank the reviewer for their question. Our decision to employ 10-minute exercise blocks was rooted in both scientific evidence on exercise-induced hypoalgesia and the (clinical) applicability of the findings. Research has shown that exercise durations ranging from 8 minutes to 2 hours of aerobic exercise can induce hypoalgesia (for review see Koltyn (2002)). Specifically, several studies induce hypoalgesia at 10-15 minutes of aerobic exercise (Gomolka et al., 2019; Gurevich et al., 1994; Haier et al., 1981; Jones et al., 2019; Sternberg et al., 2001; Vaegter et al., 2015). Furthermore, many prior studies have employed exercise durations that are tailored to professional or amateur athletes which may not be practical for healthy individuals with lower fitness levels who may find it challenging to engage in longer sessions, such as an hour of running. When considering applying these findings to the clinical chronic pain population it is crucial to assess the manageability of proposed exercise protocols. We believe that 10 minutes of exercise, whilst being a relatively brief exercise duration, may still be sufficient to elicit exercise-induced hypoalgesia.

      (24) Methods general - what was the time gap between each round (i.e. after the fMRI, how long before the participant started the next cycling block?).

      After each fMRI run the participants were taken out of the MR scanner. The HR and SPO2 were measured and participants were given the chance to go to the restroom before positioning them on the bike and starting the next block. All in all, the time following the fMRI scan and before the new block commenced ranged between 5-10 minutes. We have now included this specification in the methods section.

      (25) Methods general - there is some evidence to show that the EIH effect is less consistently shown when heat is used to induce pain - was there a reason heat was used as the pain induction method here?

      We thank the reviewer for their comment. Indeed, previous meta-analyses by Naugle et al. (2012) report larger effect sizes for pressure pain (Cohen’s d = 0.69) closely followed by heat pain (d = 0.59). In light of this evidence, we included both pain modalities in the current study. Notably, we found no significant differences in pressure pain responses between LI and HI exercise. It is important to emphasise that the term "pressure pain" predominantly encompasses studies employing handheld pressure algometry, whereas our investigation utilised a pressure cuff. This methodological variation raises the possibility that our findings—and corresponding effect sizes—may not be directly comparable to prior pressure pain studies.

      (26) Methods general - please be consistent in the use of terminology. In some areas, you use the phrase "cycling block" whereas in other areas it is referred to as a "cycling run".

      We have revised the methods section to be more precise with the terms ‘run’ and ‘block’.

      (27) Line 571-573 - Please detail how participants were excluded based on scores from STAI and BDI-II.

      We apologise for the misspelling, as it should be that one participant was excluded based on a BMI (body mass index) below 18. No participant had to be excluded based on the STAI or BDI-II score in the current study. We have corrected this in the manuscript.

      (28) Line 636-651 - the FTP20 test has been shown not to be a valid marker of the separation between the heavy and severe exercise intensity domains (see Wong et al 2023 - https://doi.org/10.1080/02640414.2023.2176045). Given that participants completed the high intensity cycle in 'zone 4' (91-106% of FTP), it is probable that participants could have completed this 10 min in either the heavy or the severe exercise intensity domains, with significant implications for the relative challenge this 10 min of exercise. Why was zone 4 used? What are the implications of this? Please discuss and include this as a limitation.

      We thank the reviewer for their comment as it touches upon the challenges of accurately estimating exercise intensities. It is indeed crucial to consider the boundaries between moderate, heavy, and severe intensity domains, as delineated by physiological markers.

      The study by Wong et al. (2023) is interesting; it assesses blood lactate and VO2 levels at FTP and FTP+15 Watts. Despite being highly relevant for the field some of the findings should be interpreted with caution due to the low sample size of 13 participants, consisting of 11 male and only 2 female cyclists, which may limit generalisability. Additionally, the testing protocol implemented in the study to determine participants' FTP consisted of a 5-minute self paced pedalling at 100 Watts followed by a 20-minute maximal, self-paced time trial. This differs from the FTP20 test as implemented in the current study (see Supplemental Table S1) or by other studies (McGrath et al., 2019). The finding in Wong et al. (2023) that participants were only able to sustain cycling at FTP for an average of 33 minutes suggests that the deviating protocol overestimates FTP. Mackey and Horner (2021) propose that the validity of the FTP20 test might rely on the warm-up used before FTP20 testing and the training status of athletes.

      However, we acknowledge that without direct measurements of VO2max or blood lactate levels, it is challenging to determine the precise intensity domain in which each participant was operating in the current study. Still, the RPE (low: M = 8.59, SD = 1.32; high: M = 14.92, SD = 1.98) suggests that participants operated in the heavy-intensity domain in the HI exercise condition. This is further supported by the relative power (%FTP) maintained in the HI (M = 105; SD = 0.05; Author response image 5, purple) and LI (M = 58; SD = 0.06; Author response image 5, green) exercise conditions (difference: t(37) = 44.58, P < 2.2e-16, d = 6.46) confirming the accuracy of the implemented FTP test as well as the maintained power throughout the cycling blocks. Thus, we would argue that participants in the current study predominantly exercised the heavy domain during the HI exercise condition. We have included the relative Power in Figure 3A, replacing the absolute Power.

      Finally, we propose that discussing exercise intensity domains within the context of our study enriches the understanding of exercise-induced hypoalgesia without undermining the integrity of our findings. We have now included a discussion of the validity of the FTP20 test as a demarcation point concerning the intensity domains.

      Author response image 5.

      Raincloud plot of relative power (%FTP) during low (green) and high (purple) intensity exercise. Individual data points depict subject-specific averages across blocks.

      (29) Line 676 - please provide further information on each cycling run/block. Did each participant complete a total of 4 runs (i.e., a total of 40 minutes of exercise), with 2 runs completed at a high intensity and 2 runs completed at a low intensity in a randomised order (e.g., for one participant this could be 10 minutes at low, followed by 10 minutes at high, followed by 10 minutes a low, followed by 10 minutes at high)? Figure 1 details this nicely, however, it would be helpful to read in-text.

      The reviewer is correct in assuming that there were a total of 4 blocks on each experimental day. Participants completed cycling in 2 blocks at HI and in 2 blocks at LI in a pseudorandomised order. This order was kept constant across experimental days (i.e. completing the same block order on Day 2 and Day 3). We have detailed this further in the Methods section.

      (30) Discussion general - it is possible that EIH could be induced via different mechanisms and that these mechanisms are at least in part due to exercise intensity. For example, EIH from higher-intensity exercise might have some contribution from CPM.

      We thank the reviewer for their comment. Previous research aimed to disentangle the two seemingly similar mechanisms of exercise-induced hypoalgesia (EIH) and conditioned pain modulation (CPM) (Ellingson et al., 2014; Rice et al., 2019; Samuelly-Leichtag et al., 2018; Vaegter et al., 2014). CPM is typically induced by applying a tonic noxious stimulus that decreases pain sensitivity to another noxious stimulus applied simultaneously or shortly after at a distant body part (Graven-Nielsen & Arendt-Nielsen, 2010). Despite EIH and CPM showing distinct mechanisms, it cannot be completely ruled out that there are at least partially overlapping mechanisms driving the two phenomena (Rice et al., 2019). Due to our study design, where the time difference between cycling blocks and the applied pain was on average five minutes, it is unlikely that CPM is the driving pain modulatory mechanism in our study setup.

      (31) Line 101 - as this was preregistered, should the study design be followed and then reported?

      We have conducted the study adhering to the preregistered study design and now report the results for pressure pain (Supplemental Figure S1). Some of the preregistered analyses (i.e. directly comparing heat and pressure pain) were beyond the scope of the current study and will be reported separately.

      (32) Line 110 - please provide some data on the fitness levels and how this is classified as high/low.

      The FTP (relative to body weight) was used as an estimate of cardiovascular and endurance fitness (Valenzuela et al., 2018). We refrained from classifying the fitness levels dichotomously as low or high since this is a subjective measure in a sample of healthy individuals of diverse fitness levels. Instead, we utilised the FTP as a more nuanced metric for comparison.

      (33) Lines 159-160 - in the context of the difference in intensity between the sessions. But, it is likely that the high-intensity exercise would have posed quite different relative challenge between participants.

      We thank the reviewer for their comment. As described above, we did not obtain direct measurements of VO2max or blood lactate levels making it challenging to determine the precise intensity domain in which each participant was operating in the current study. However, all participants received the same instructions to the BORG rating scale ensuring the comparability of RPE across participants to a certain extent.

      (34) Figure 3C - what instructions and familiarisation were given to participants regarding the 6-20 Borg scale? In Figure 3C it looks as though several participants rated the low exercise intensity at 6. This would/should be equivalent to sitting quietly, so it looks as though at least several participants did not understand how to use the RPE - please discuss.

      Indeed, three participants rated the LI exercise condition at 6 due to an error in the translation of the scale instruction. Participants were instructed that the lower anchor point of the scale (6) referred to ‘extremely light’ instead of ‘no exertion’. Thus, we have rescaled the RPE ratings where a rating of 6 now corresponds to a 7 (‘extremely light’) on the BORG scale and again calculated the paired t-test. There is still a significant difference in the RPE between exercise intensities (t(38) = 19.65, P < 2.2e-16, d = 3.69; Author response image 6). We have corrected this in the manuscript accordingly and updated Figure 3C.

      Author response image 6.

      Raincloud plot of rating of perceived exertion (RPE) on the BORG scale during low (green) and high (purple) intensity exercise. Individual data points depict subject-specific averages across blocks. A rating of 6 reflects ‘no exertion’ and 20 reflects ‘maximal exertion’.

      (35) Line 171 - is (37, 38) a reference?

      We apologise for this oversight and have now updated the references to be displayed correctly.

      (36) Line 176-18 - is this interaction sufficiently powered? Differences between sexes are not mentioned in the pre-registered study

      We have conducted an additional post-hoc power analysis for the interaction of drug, fitness level, and sex on differential heat pain ratings. We employed the power analysis for mixed models implemented in R (powerCurve) with 1000 simulations. This revealed that with a power of α = 0.8, a sample size of n = 27 would have been sufficient to detect this effect (Author response image 7). Despite not having preregistered the factor ‘sex’, we believe that the observed results provide valuable insights that contribute to a deeper understanding of the data. We have established these analyses to be exploratory, emphasising the need for caution in their interpretation. However, we feel it is essential to report these findings to inform future studies, ensuring that such factors are adequately considered.

      Author response image 7.

      Post-hoc power analysis for behavioural effects from the linear mixed effects (LMER) model with interaction drug, fitness level, and sex using the R package powerCurve with α = 0.8 and 1000 simulations.

      (37) Line 227 - this is not what this analysis shows. The comparison is low vs high-intensity exercise on pain modulation, not exercise vs. no exercise. You cannot conclude that aerobic exercise has no effect on pain modulation because you did not do that comparison (i.e. no baseline (without exercise) for pain).

      We agree with the reviewer and have rephrased the sub-headline accordingly to reflect that there is no difference in exercise-induced hypoalgesia between HI and LI aerobic exercise.

      (38) Methods General - why was a control condition not used, or at least a baseline pain response, so that low/high-intensity exercise could be compared to a baseline? Given this, I'm not sure I agree with the study conclusions (abstract: 'These results indicate that aerobic exercise has no overall effect on pain in a mixed population sample') because you have compared high vs low-intensity exercise, not exercise vs. no exercise.

      As for the lack of a resting control condition, we acknowledge that our study was not designed to test the overall effect of exercise versus no exercise. However, our primary objective was to compare different exercise intensities, hypothesising that low-intensity (LI) exercise would induce less pain modulation as compared to high-intensity (HI) exercise. By exploring this, we aimed to enhance understanding of the dose-response relationship between exercise and pain modulation. To better reflect this focus, we have revised the misleading phrasing regarding the ‘overall’ effect of exercise to clearly emphasize our primary aim: comparing HI and LI exercise. This reviewer suggests an interesting interpretation of the data suggesting that exercise-induced hypoalgesia might have occurred for both exercise intensities since the pain ratings provided were lower than the anticipated intensities as determined by the calibration. Given that this difference is lower in the naloxone (NLX) condition could provide evidence of opioidergic mechanisms underlying this effect.

      Unfortunately, the current study is not designed to comprehensively answer this question since there was no resting control condition. In particular, the lower pain ratings under SAL (Figure 6) could be due to exercise triggering the descending pain modulatory system (DPMS), but equally due to the default activation of the DPMS. Only an additional “no exercise” condition could disentangle this. Furthermore, habituation to noxious stimuli can influence pain ratings, resulting in lower pain ratings during the experiment as compared to the calibration.

      (39) Line 285 - or that better-trained individuals have a greater EIH response to higher intensity exercise, but both those of low and high fitness have established EIH after low intensity exercise. Given there isn't a 'no exercise' baseline, it is hard to make conclusions about EIH effect generally, only comparisons between high/low exercise intensity.

      We thank the reviewer for their comment. We agree that we cannot establish whether all participants showed a hypoalgesic response to the LI exercise with the current study design. However, our results show that participants with higher fitness levels showed increased hypoalgesia after HI exercise compared to those with lower fitness levels. We have refined the sentence accordingly.

      (40) Figure 7A - the regression line here is not that convincing.

      We acknowledge the reviewers’ concern regarding the regression line. However, it is important to note that the significant main effect of fitness level on differences in pain ratings in the SAL condition (β = 6.45, CI [1.25, 11.65], SE = 2.56, t(38) = 2.52, P = 0.02) supports the assertion that higher fitness levels are associated with greater hypoalgesia following HI exercise compared to LI exercise. While the trend may not be visible for all data points, the statistical analysis provides a robust basis for the observed relationship (r = 0.33, P = 0.038).

      (41) Line 354 - the NLX infusion was double-blind, but what are the implications of participants knowing that they completed high/low-intensity exercise - this cannot be blinded.

      The reviewer is correct that the exercise intensities cannot be blinded. To account for potential expectation effects of exercise on several psychological and physiological domains (including pain), participants completed a questionnaire on the calibration day where they had to indicate their expectations of to what extent acute exercise affects several domains (Lindheimer et al., 2019). They could rate each domain on a Likert scale ranging from ‘large decrease’ (-3) to ‘large increase’ (3) with 0 denoting ‘no effect’. This format was chosen to allow measuring the direction and magnitude of expectation effects and to avoid being directive or suggestive (Lindheimer et al., 2019). Despite including other psychological and physiological domains in the questionnaire (i.e., stress, anxiety, energy, memory) we focused on the specific pain domains (muscle pain, joint pain, and whole body pain) to establish participant’s expectations regarding the effect of acute exercise on pain. We tested whether the expectation ratings for each pain type were significantly different from 0 (no effect) using a one-sample t-test.

      There was no significant effect for muscle pain (t(38) = 1.78, P = 0.08, M = 0.39, SE = 0.12), joint pain (t(38) = -0.12, P = 0.90, M = -0.03, SE = 0.11), or ‘whole-body pain (t(38) = -1.05, P = 0.30, M = -0.21, SE = 0.12) suggesting there to be no expectation effect on these pain domains in the overall sample (Supplemental Figure S10A). Since there is variation in the data we calculated the correlation of the expectation ratings in the different pain domains with the difference score between the pain ratings in the SAL condition (LI – HI rating; Supplemental Figure S10B). This analysis yielded no significant correlation in either of the pain domains (joint pain: r = 0.11, P = 0.49; muscle pain: r = -0.07, P = 0.68; whole-body pain: r = 0.07, P = 0.68).

      Moreover, given that we have not been able to show a difference between the exercise intensities on pain modulation, expectation effects are likely not to contribute to this null effect.

      (42) Line 356-358 - and this comparison (and primary hypothesis) is not blinded.

      While we agree with the reviewer that this comparison is not – and potentially cannot be – blinded, we would like to reiterate our results from the previous paragraph that indicate that such expectation effects of exercise on pain were not present in the sample and, thus, did not seem to have influenced the results. It is noteworthy that the double-blind design of our study design specifically pertains to the pharmacological intervention employed.

      (43) Line 358-360 - this could be explained by both types of exercise inducing EIH via the same mechanism (which is disrupted by NLX).

      We thank the reviewer for their comment and would like to refer back to the reviewer's comment number 38 for a response to this.

      (44) Line 360-361 - this conclusion cannot be drawn, because you have only compared high vs low intensity exercise. So, the conclusion should be 'These results suggest that there is no difference between high and low aerobic exercise intensity on heat-induced pain'.

      We agree with the reviewer and have rephrased the sentence to reflect the claim accurately.

      (45) Line 396 - as previously discussed, this conclusion cannot be drawn through this study design.

      We agree with the reviewer and have rephrased the sub-headline accordingly to reflect that there is no difference in exercise-induced hypoalgesia between HI and LI aerobic exercise.

      (46) Line 399 - please expand on this point - it is critical to the hypothesis and should also be included in the introduction. What intensities/duration/dose of aerobic exercise is generally established to cause EIH?

      We thank the reviewer and agree that this is a crucial aspect that requires further specification. Below we have expanded on the duration/intensities shown to elicit exercise-induced hypoalgesia and included a concise version of this detailed paragraph in the manuscript introduction.

      For aerobic exercise, different methods have been employed to determine exercise intensity levels i.e., through the VO2max, age-predicted HRmax, or incremental intensities (Koltyn, 2002). Most studies using VO2max as a measure of exercise intensity (Koltyn et al., 1996; Micalos & Arendt-Nielsen, 2016; Vaegter et al., 2014) were able to induce hypoalgesia with HI levels ranging between 65%-75% VO2max. When using the HRmax as a measure of determining exercise intensities, HI exercise at 70%-75% of the HRmax has been shown to produce greater hypoalgesia compared to moderate intensity at 50% HRmax (Naugle et al., 2014; Vaegter et al., 2014). Furthermore, previous research has suggested that HI exercise produces greater hypoalgesia compared to LI exercise (60-70% HRmax vs. light activity: M. D. Jones et al., 2019; 70% vs. 50% HRmax: Naugle et al., 2014; 75% vs. 50% VO2max: Vaegter et al., 2014).

      Furthermore, different durations can be regarded as suitable with durations between 8 minutes to 2 hours of aerobic exercise having been shown to induce hypoalgesia (for review see Koltyn (2002)). Hoffman et al. (2004) showed a hypoalgesic response after 30 minutes but not after 10 minutes at 75% VO2max of cycling. In contrast, other studies were able to induce hypoalgesia at 10-15 minutes of HI aerobic exercise (75% VO2may: Gomolka et al., 2019; 63% VO2max: Gurevich et al., 1994; self-paced: Haier et al., 1981; 60-70% HRmax: Jones et al., 2019; 85% HRmax: Sternberg et al., 2001; 75% VO2max: Vaegter et al., 2015).

      (47) Line 400-401 - please define high intensity.

      We thank the reviewer for their comment. The referenced studies by Vaegter et al. (2014) and Jones et al. (2019) based the estimation of HI and LI exercise on an age-related target heart rate corresponding to VO2max and HRmax, respectively. In Vaegter et al. (2014), the HI condition corresponded to 75% VO2max, while the LI to 50% VO2max. In Jones et al. (2019), the HI exercise condition corresponded to 60% and 70% of HRmax, while the LI condition was defined as pedalling slowly against a light resistance of 0.5 kg of force to maintain a rating of perceived exertion (RPE) not above resting. We have included this clarification in the relevant section to elucidate the intensities of the chosen exercise conditions.

      (48) Line 403-405 - I'm not sure I follow (perhaps I have misunderstood) - pain induction was completed after exercise in the MRI scanner, so there was no distraction effect of exercise in either condition. A baseline could have been established in the same way and there would be exactly the same conditions, just without prior exercise.

      We agree with the reviewer that a resting baseline condition in the context of exercise induced pain modulation allows for the investigation of a potential hypoalgesic effect of exercise compared to no exercise. Nevertheless, it is important to note that previous studies (Brooks et al., 2017; Sprenger et al., 2012) have shown that cognitive pain modulation is mediated by endogenous opioids. Therefore, tasks with different attentional loads potentially influence post-task pain ratings. Although, we agree with the reviewer that the effect of distraction or attentional load would be minimal in the MR scanner, there still could be an effect of different cognitive loads from exercise vs. no exercise. Nevertheless, we focus the discussion on investigating the dose-response relationship between different exercise intensities where an ‘active’ control condition might contribute to a more nuanced understanding of exercise-induced pain modulation.

      (49) Line 403-411 - this is fine (although I do not agree that this was the best methodological decision), however, it does limit the conclusions that can be drawn (as previously mentioned). That is, you cannot conclude that no EIH occurred, only that there was no difference between low and high-intensity exercise in post-exercise pain response.

      We agree with the reviewer that the comparison of HI vs. LI exercise does not allow for an interpretation of the overall effect of exercise as opposed to no exercise on pain modulation. The comparison of HI and LI exercise allows the investigation of a dose-response relationship of these distinct exercise intensities. While LI exercise might not be a 'pure' control condition in the traditional sense, it is valuable for exploring the complexities of exercise and pain interaction.

      (50) Line 419-422 - sorry I do not follow - you say that moderate intensity exercise most reliably induces EIH but then select exercise intensities that are likely to be in the heavy or severe intensity domain? Please also include in this discussion the limitations of FTP20 as a threshold marker (see Wong et al) and the implications on the results/conclusions.

      We thank the reviewer for their comment. In the referenced sentence, we have defined the HI exercise as described in the reviews. Specifically, Wewege and Jones (2020) reported hypoalgesia to be greater after higher-intensity exercise, although the intensity was not further specified. Naugle et al. (2012) noted that HI exercise (i.e., 75% of VO2max) produced greater hypoalgesia, while Koltyn (2002) indicated that hypoalgesia occurs at intensities ranging from 60% to 75% of VO2max but more reliably at 75% VO2max or higher. Consequently, we have removed the term ‘moderate’, as it does not accurately reflect what has been reported in the reviews and could be misleading. Moreover, we have clarified the specific criteria for what is considered high (or higher) intensity exercise in the referenced reviews.

      We kindly ask the reviewers to refer back to the previous comment (reviewer comment number 28) regarding the discussion of the intensity domains and the FTP20 test as demarcation point for these intensity domains.

      (51) Line 422-425 - indeed, pacing is an important element of this test, which inexperienced cyclists have difficulty with when they are not provided with proper familiarisation.

      We agree with the reviewer that the FTP20 test has mainly been validated and employed in experienced cyclists and requires further validation in non-athletes of both sexes. However, since we have used an extensive warm-up period and several paced steps (intervals, 5-minute time-trial) as well as recovery periods (Supplemental Table S1) based on McGrath et al. (2019) we propose that participants were thoroughly familiarised with the elements of pacing before the estimation of the FTP in the 20-minutes took place. On average, participants showed a variation of M = 21.80 Watts (SE = 1.44 Watts) during the 20-minute paced FTP20 test (Supplemental Figure S11A). Interestingly, our data suggests that participants with a higher FTP showed higher variation of power output (Watts) during the 20-minute FTP test compared to individuals with lower fitness levels (Supplemental Figure S11B).

      (52) Line 425-427 - please remove this, the RPE difference between exercise bouts is not evidence that participants cycled at FTP.

      We thank the reviewer for their comment. However, we would propose to include the rating of perceived exertion (RPE) since it shows that the exercise intensities have been perceived as significantly different by the participants. This behavioural measure of exertion is potentially important for a broader audience to understand the exercise implementation beyond physiological markers.

      (53) Line 432 - high vs. low-intensity aerobic exercise

      We have changed the sentence accordingly to support the claim of the study that there was no difference in exercise-induced pain modulation between HI and LI aerobic exercise.

      (54) Line 447-449 - this seems contradictory to the first line of this paragraph (430-432) - i.e. that the heterogenous sample may have caused the null finding. Why deliberately select a participant sample that is likely to lead to a null effect?

      In the current study, we aimed to include participants of diverse fitness levels and both sexes to verify the findings on exercise-induced pain modulation in a broader population. We consider this important concerning translational aspects of EIH. Indeed, our heterogeneous sample may have ‘caused’ the observed null effect, but at the same time, it suggests that more homogenous (sometimes composed solely of male athletes) samples employed in many earlier studies might have skewed the understanding of exercise-induced pain modulation and thus unintentionally suggested a (non-existing) generalisation of this effect to the general population.

      (55) Line 532-456 - although Koltyn found electrical pain to have the greatest effect?

      The review by Naugle et al. (2012) reported effect sizes for heat (Cohens d = 0.59) and pressure pain intensity (d = 0.69) following aerobic exercise but did not provide effect sizes for electrical pain intensity. They noted that the effect size for electrical pain intensity after isometric exercise was d = 0.40, which is lower than that for heat and pressure pain. While Koltyn (2002) stated that electrical and pressure stimuli induce exercise-induced hypoalgesia more consistently than thermal pain, the study did not clarify whether this applies to pain threshold, intensity, or tolerance, nor did they provide effect sizes. Given that electrical, pressure, and heat pain are the most commonly used methods to induce quantifiable pain in the context of exercise studies (Vaegter and Jones, 2020), we based our decision to use heat and pressure pain primarily on Naugle et al.'s findings.

      (56) Line 468-469 - why leave out content that was pre-registered (i.e. difference between pressure and heat pain) but includes analysis that wasn't (i.e. sex differences)? If a study is going to be pre-registered, then isn't it important to follow that design?

      We thank the reviewer for this comment. We have conducted the study adhering to the preregistered study design and now report the results for pressure pain (Supplemental Figure S1). Some of the preregistered analyses (i.e. directly comparing heat and pressure pain) were beyond the scope of the current study and will be reported separately.

      (57) Line 532-525 - and how could this have been accounted for?

      We apologise for any confusion, as we are unsure about the specific reference the reviewer is making based on the provided line numbers. We believe the question relates to how the potential effects of endocannabinoids were considered in the current study design, and we've addressed that in our response. In human studies, it is not possible to centrally block endocannabinoids, which makes it difficult to directly estimate their role in exercise-induced pain modulation in humans. Measuring endocannabinoids in the blood might not adequately capture changes in endocannabinoid levels in the brain throughout the different exercise intensity conditions. Despite these limitations, exploring the role of endocannabinoids in exercise-induced pain modulation presents a promising avenue for future research that could enhance our understanding of pain mechanisms and improve pain management strategies.

      58) Limitations General - please include the other limitations discussed in this review.

      Done.

      (59)Line 530 - please amend this conclusion, in line with previous comments.

      Done.

      We would like to thank the reviewer for critically evaluating the manuscript and providing insightful comments. We appreciate the reviewer recognising the strengths of our work and believe that their suggestions will contribute to improving the quality of the manuscript.

    1. Author response:

      The following is the authors’ response to the current reviews.

      Reviewer #1 (Public review): 

      Devakinandan et al. present a revised version of their manuscript. Their scRNA-seq data is a valuable resource to the community, and they further validate their findings via in situ hybridizations and electron microscopy. Overall, they have addressed my major concerns. I only have two minor comments. 

      (1) The authors note in Figure 4I, and K that because the number of C2 V2Rs or H2-Mv receptors increased while the normalized expression of Gnao1 remained constant (and likewise for V1Rs and Gnai2 in Figure 4-S4C) that their results are unlikely to be capturing doublets. I'm not sure that this is the case. If the authors added together two V2R cells the total count of every gene might double, but the normalized expression of Gnao1 would remain the same. To address this concern, the authors should also show the raw counts for Gnao1 as well as the total number of UMIs for these cells. 

      In Figure 4I, 4K and Figure 4-Figure supplement 4C, on Y-axis, we plotted the sum of normalized counts of all V1R/V2R/H2-Mv genes expressed in each cell along with the normalized expression value of Gnao1/Gnai2. Both VR/H2-Mv and Gnao1/Gnai2 are normalized values, with normalization based on LogNormalize (mentioned in methods). We show here plots of total expression calculated from raw counts corresponding to the same Figure. Raw counts of VRs/H2-Mv, Gnao1/Gnai2 are plotted separately due to difference in scale. The overall trend matches normalized counts, with minor fluctuations in Gnao1/Gnai2.     

      Author response image 1.

      As mentioned in our response to version-1 reviews and in our manuscript, doublets generally are a random combination of two cells and the probability that a combinatorial pattern is due to doublet is proportional to the abundance of cells expressing those genes. It is possible that some of the family-C V2R combinations represented by 2 cells are doublets because of their widespread expression. The frequency of combinatorial expression patterns, greater than a set threshold of 2 cells, that we observed for family ABD V2Rs or V1Rs (supplementary tables 7, 8) is an indication of co-expression and unlikely from random doublets. For instance, 134 cells express two V1Rs, of which 44 cells express Vmn1r85+Vmn1r86, 21 cells express Vmn1r184+Vmn1r185, 13 express Vmn1r56+Vmn1r57, 6 express Vmn1r168+Vmn1r177. Some of the co-expression combinations we reported were also identified and verified experimentally in Lee et al., 2019 and Hills et. al., 2024.

      The co-expression of multiple family-C2 V2Rs (Vmn2r2-Vmn2r7) along with ABD V2Rs per cell as shown in our data, has been shown experimentally in earlier studies.      

      (2) As requested, the authors have now added a colorbar to the pseudocolored images in Figures 7. However, this colorbar still doesn't have any units. Can the authors add some units, or clarify in the methods how the raw data relates to the colors (e.g. is it mapped linearly, at a logscale, with gamma or other adjustments, etc.)? Moreover, it's also unclear what the dots in the backgrounds of plots like Figure 7E mean. Are they pixels? Showing the individual lines, the average for each animal, or omitting them entirely, might make more sense. 

      We used the Fire LUT with linear scale within Fiji / Image-J software to assign scale to the pseudo-colored images in Figure 7. We will include this description in our methods and thank the reviewer for pointing it out. The dots in the background are mentioned in Figure 7 legend as fluorescence intensity values normalized to a 0-1 scale and color coded for each antibody. The trendline was fitted on these values.  

      Reviewer #2 (Public review): 

      Summary: 

      The study focuses on the vomeronasal organ, the peripheral chemosensory organ of the accessory olfactory system, by employing single-cell transcriptomics. The author analyzed the mouse vomeronasal organ, identifying diverse cell types through their unique gene expression patterns. Developmental gene expression analysis revealed that two classes of sensory neurons diverge in their maturation from common progenitors, marked by specific transient and persistent transcription factors. A comparative study between major neuronal subtypes, which differ in their G-protein sensory receptor families and G-protein subunits (Gnai2 and Gnao1, respectively), highlighted a higher expression of endoplasmic reticulum (ER) associated genes in Gnao1 neurons. Moreover, distinct differences in ER content and ultrastructure suggest some intriguing roles of ER in Gnao1-positive vomeronasal neurons. This work is likely to provide useful data for the community and is conceptually novel with the unique role of ER in a subset of vomeronasal neurons. This reviewer has some minor concerns and some suggestions to improve the manuscript. 

      Strengths: 

      (1) The study identified diverse cell types based on unique gene expression patterns, using single-cell transcriptomic. 

      (2) The analysis suggest that two classes of sensory neurons diverge during maturation from common progenitors, characterized by specific transient and persistent transcription factors. 

      (3) A comparative study highlighted differences in Gnai2- and Gnao1-positive sensory neurons. 

      (4) Higher expression of endoplasmic reticulum (ER) associated genes in Gnao1 neurons. 

      (5) Distinct differences in ER content and ultrastructure suggest unique roles of ER in Gnao1-positive vomeronasal neurons. 

      (6) The research provides conceptually novel on the unique role of ER in a subset of vomeronasal neurons, offering valuable insights to the community. 

      Reviewer #3 (Public review): 

      Summary: 

      In this manuscript, Devakinandan and colleagues have undertaken a thorough characterization of the cell types of the mouse vomeronasal organ, focusing on the vomeronasal sensory neurons (VSNs). VSNs are known to arise from a common pool of progenitors that differentiate into two distinct populations characterized by the expression of either the G protein subunit Gnao1 or Gnai2. Using single-cell RNA sequencing followed by unsupervised clustering of the transcriptome data, the authors identified three Gnai2+ VSN subtypes and a single Gnao1+ VSN type. To study VSN developmental trajectories, Devakinandan and colleagues took advantage of the constant renewal of the neuronal VSN pool, which allowed them to harvest all maturation states. All neurons were re-clustered and a pseudotime analysis was performed. The analysis revealed the emergence of two pools of Gap43+ clusters from a common lineage, which differentiate into many subclusters of mature Gnao1+ and Gnai2+ VSNs. By comparing the transcriptomes of these two pools of immature VSNs, the authors identified a number of differentially expressed transcription factors in addition to known markers. Next, by comparing the transcriptomes of mature Gnao1+ and Gnai2+ VSNs, the authors report an enrichment of ER-related genes in Gnao1+ VSNs. Using electron microscopy, they found that this enrichment was associated with specific ER morphology in Gnao1+ neurons. Finally, the authors characterized chemosensory receptor expression and co-expression (as well as H2-Mv proteins) in mature VSNs, which recapitulated known patterns. 

      Strengths: 

      The data presented here provide new and interesting perspectives on the distinguishing features between Gnao1+ and Gnai2+ VSNs. These features include newly identified markers, such as transcription factors, as well as an unsuspected ER-related peculiarity in Gnao1+ neurons, consisting in a hypertrophic ER and an enrichment in ER-related genes. In addition, the authors provide a comprehensive picture of specific co-expression patterns of V2R chemoreceptors and H2-Mv genes. 

      Importantly, the authors provide a browser (scVNOexplorer) for anyone to explore the data, including gene expression and co-expression, number and proportion of cells, with a variety of graphical tools (violin plots, feature plots, dot plots, ...). 


      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Devakinandan and colleagues present a manuscript analyzing single-cell RNAsequencing data from the mouse vomeronasal organ. The main advances in this manuscript are to identify and verify the differential expression of genes that distinguish apical and basal vomeronasal neurons. The authors also identify the enriched expression of ER-related genes in Gnao1 neurons, which they verify with in situ hybridizations and immunostaining, and also explore via electron microscopy. Finally, the results of this manuscript are presented in an online R shiny app. Overall, these data are a useful resource to the community. I have a few concerns about the manuscript, which I've listed below. 

      General Concerns: 

      (1) The authors mention that they were unable to identify the cells in cluster 13. This cluster looks similar to the "secretory VSN" subtype described in a recent preprint from C. Ron Yu's lab (10.1101/2024.02.22.581574). The authors could try comparing or integrating their data with this dataset (or that in Katreddi et al. 2022) to see if this is a common cell type across datasets (or arises from a specific type of cell doublets). In situ hybridizations for some of the marker genes for this cluster could also highlight where in the VNO these cells reside. 

      Cluster13 (Obp2a+) cells identified in our study have similar gene expression markers to “putative secretory” cells mentioned in Hills et al.. At the time this manuscript was available publicly, our publication was already communicated. We have now performed RNA-ISH to Obp2a, the topmost marker identified with this cluster, and found it to be expressed in cells from glandular tissue on the non-sensory side. Some of the other markers associated with this cluster such as Obp2b, Lcn3, belong to the lipocalin family of proteins. Hence in our estimate these markers collectively represent non-sensory glandular tissue. We have added Obp2a RNA-ISH to Figure 2-figure supplement-1A and results section in our revised manuscript. Cluster-13 also has cells expressing Vmn1r37, which typically is expressed in neuronal cells. However, we do not see Obp2a mRNA in the sensory epithelium. It is possible that cluster-13 comprises a heterogenous mixture of cells, some of which are clearly non-sensory cells from glandular tissue, co-clustered with other cell types as well as a  possibility that Obp2a is expressed below the detection level of our assay in neurons, which will require further experiments. We do not have any possible reason to confidently assign this cluster as a neuronal cell type, hence, we excluded it in downstream analysis of neurons. 

      We used the data from Hills et al., to compare co-expression characteristic of V2Rs, which is added as Figure 3-figure supplement 3. 

      (2) I found the UMAPs for the neurons somewhat difficult to interpret. Unlike Katreddi et al. 2022 or Hills et al. 2024, it's tricky to follow the developmental trajectories of the cells in the UMAP space. Perhaps the authors could try re-embedding the data using gene sets that don't include the receptors? It would also be interesting to see if the neuron clusters still cluster by receptor-type even when the receptors are excluded from the gene sets used for clustering. Plots relating the original clusters to the neuronal clusters, or dot plots showing marker gene expression for the neuronal clusters might both be useful. For example, right now it's difficult to interpret clusters like n8-13. 

      a) We have revised the UMAP in Figure 3A, and labeled mature, immature, progenitor neurons so that it is easier to follow the developmental trajectory. 

      b) In our revised text we have explicitly drawn equivalence between neuronal clusters from Figure 1 to re-clustered neurons in subsequent figures (Figure 3 and 4 in revised submission). For developmental analysis, we merged mature Gnao1, Gnai2 neuronal subclusters to two major clusters that are equivalent to original neuronal clusters in Figure 1. As UMAP is an arbitrary representation of cells, we also show expression of markers for major neuronal cell types in Figure 1C and Figure 3-figure supplement 1B, helpful in making the connection.  

      c) The purpose of re-clustering with higher resolution was to identify sub-populations within Gnao1 and Gnai1 neurons. It was useful to make sense of mature Gnao1 neurons, where family-C Vmn2r and H2-Mv expression maps onto distinct subclusters. Along with neuronal subclusters in revised Figure 3-figure supplement-1 we include a dot plot of gene expression markers. 

      d) In Figure 3-figure supplement-2, we show a comparison of neuronal clusters with and without VRs. Exclusion of VRs did not substantially alter mature neuron dichotomy into Gnao1/Gnai2. Only Gnao1 subclusters n1/n3 whose organization is dependent on family-C Vmn2r expression were affected, as well as redistribution of subcluster n8 from Gnai2 neurons. VR expression does not seem to be the primary determinant of VSN cluster identity.

      Reviewer #2 (Public Review): 

      Summary: 

      The study focuses on the vomeronasal organ, the peripheral chemosensory organ of the accessory olfactory system, by employing single-cell transcriptomics. The author analyzed the mouse vomeronasal organ, identifying diverse cell types through their unique gene expression patterns. Developmental gene expression analysis revealed that two classes of sensory neurons diverge in their maturation from common progenitors, marked by specific transient and persistent transcription factors. A comparative study between major neuronal subtypes, which differ in their G-protein sensory receptor families and G-protein subunits (Gnai2 and Gnao1, respectively), highlighted a higher expression of endoplasmic reticulum (ER) associated genes in Gnao1 neurons. Moreover, distinct differences in ER content and ultrastructure suggest some intriguing roles of ER in Gnao1-positive vomeronasal neurons. This work is likely to provide useful data for the community and is conceptually novel with the unique role of ER in a subset of vomeronasal neurons. This reviewer has some minor concerns and some suggestions to improve the manuscript. 

      Strengths: 

      (1) The study identified diverse cell types based on unique gene expression patterns, using single-cell transcriptomic. 

      (2) The analysis suggests that two classes of sensory neurons diverge during maturation from common progenitors, characterized by specific transient and persistent transcription factors. 

      (3) A comparative study highlighted differences in Gnai2- and Gnao1-positive sensory neurons. 

      (4) Higher expression of endoplasmic reticulum (ER) associated genes in Gnao1 neurons. 

      (5) Distinct differences in ER content and ultrastructure suggest unique roles of ER in Gnao1-positive vomeronasal neurons. 

      (6) The research provides conceptually novel on the unique role of ER in a subset of vomeronasal neurons, offering valuable insights to the community. 

      Weaknesses: 

      (1) The connection between observations from sc RNA-seq and EM is unclear.

      (2) The lack of quantification for the ER phenotype is a concern. 

      We have extensively quantified the ER phenotype as shown in Figure 7, Figure 7-figure supplement-1 in our revised version. We would like to point out that the connection between scRNA-seq and EM was made due to our observations in the same figures, that levels of a number of ER luminal and ER membrane proteins were higher in Gnao1 compared to Gnai2 neurons. This led us to hypothesize a differential ER content or ultrastructure, which was verified by EM.

      Reviewer #3 (Public Review): 

      Summary: 

      In this manuscript, Devakinandan and colleagues have undertaken a thorough characterization of the cell types of the mouse vomeronasal organ, focusing on the vomeronasal sensory neurons (VSNs). VSNs are known to arise from a common pool of progenitors that differentiate into two distinct populations characterized by the expression of either the G protein subunit Gnao1 or Gnai2. Using single-cell RNA sequencing followed by unsupervised clustering of the transcriptome data, the authors identified three Gnai2+ VSN subtypes and a single Gnao1+ VSN type. To study VSN developmental trajectories, Devakinandan and colleagues took advantage of the constant renewal of the neuronal VSN pool, which allowed them to harvest all maturation states. All neurons were re-clustered and a pseudotime analysis was performed. The analysis revealed the emergence of two pools of Gap43+ clusters from a common lineage, which differentiate into many subclusters of mature Gnao1+ and Gnai2+ VSNs. By comparing the transcriptomes of these two pools of immature VSNs, the authors identified a number of differentially expressed transcription factors in addition to known markers. Next, by comparing the transcriptomes of mature Gnao1+ and Gnai2+ VSNs, the authors report the enrichment of ER-related genes in Gnao1+ VSNs. Using electron microscopy, they found that this enrichment was associated with specific ER morphology in Gnao1+ neurons. Finally, the authors characterized chemosensory receptor expression and coexpression (as well as H2-Mv proteins) in mature VSNs, which recapitulated known patterns. 

      Strengths: 

      The data presented here provide new and interesting perspectives on the distinguishing features between Gnao1+ and Gnai2+ VSNs. These features include newly identified markers, such as transcription factors, as well as an unsuspected ER-related peculiarity in Gnao1+ neurons, consisting of a hypertrophic ER and an enrichment in ER-related genes. In addition, the authors provide a comprehensive picture of specific co-expression patterns of V2R chemoreceptors and H2-Mv genes. 

      Importantly, the authors provide a browser (scVNOexplorer) for anyone to explore the data, including gene expression and co-expression, number and proportion of cells, with a variety of graphical tools (violin plots, feature plots, dot plots, ...). 

      Weaknesses: 

      The study still requires refined analyses of the data and rigorous quantification to support the main claims. 

      The method description for filtering and clustering single-cell RNA-sequencing data is incomplete. The Seurat package has many available pipelines for single-cell RNA-seq analysis, with a significant impact on the output data. How did the authors pre-process and normalize the data? Was the pipeline used with default settings? What batch correction method was applied to the data to mitigate possible sampling or technical effects? Moreover, the authors do not describe how cell and gene filtering was performed. The data in Figure 7-Supplement 3 show that one-sixth of the V1Rs do not express any chemoreceptor, while over a hundred cells express more than one chemoreceptor. Do these cells have unusually high or low numbers of genes or counts? To exclude the possibility of a technical artifact in these observations, the authors should describe how they dealt with putative doublet cells or debris. Surprisingly, some clusters are characterized by the expression of specific chemoreceptors (VRs). Have these been used for clustering? If so, clustering should be repeated after excluding these receptors. 

      The identification of the VSN types should be consistent across the different analyses and validated. The data presented in Figure 1 lists four mature VSN types, whereas the re-clustering of neurons presented in Figure 3 leads to a different subdivision. At present, it remains unclear whether these clusters reflect the biology of the system or are due to over-clustering of the data, and therefore correspond to either noise or arbitrary splitting of continua. Clusters should be merged if they do not correspond to discrete categories of cells, and correspondence should be established between the different clustering analyses. To validate the detected clusters as cell types, markers characteristic of each of these populations can be evaluated by ISH or IHC. 

      There is a lack of quantification of imaging data, which provides little support for the ERrelated main claim. Quantification of co-expression and statistics on labeling intensity or coverage would greatly strengthen the conclusions and the title of the paper. 

      a) scRNA-seq data analysis methods: Our revised submission has expanded on the methods section with details of parameters, filtering criterion and software used.

      b) Inclusion/exclusion of VRs: Figure 3-Figure supplement-2 of our revised submission shows a comparison of neuronal sub-clusters with and without VRs. Overall sub-cluster identities were not affected by VR exclusion, except for Gnao1 sub-clusters n1/n3 -governed by family C Vmn2r1/Vmn2r2 and redistribution of Gnai2 cluster n8. The minimal effect of VRs on Gnai2 sub-clustering can also be confirmed by lack of V1R in the dot plot showing markers of neuronal clusters. 

      c) Neuronal clusters and potential over-clustering: we pooled neuronal cells from Figure-1 and re-clustered to identify sub-populations within Gnao1 and Gnai1 neurons. Several neuronal sub-clusters identified by us including progenitors, immature neurons and mature neurons are validated by previous studies with wellknown markers. Amongst the mature neurons, the biological basis of four Gnao1 neuron sub-clusters (n1-n4) is discussed in our co-expression section (Figure 4AE) and these are also validated by previous experimental studies. These Gnao1 clusters are organized according to the expression of family-C V2Rs (Vmn2r1 or Vmn2r2) as well as H2M_v_ genes. Within Gnai2 sub-clusters, n12 and n13 exclusively express markers that distinguish them from n8-n11 which we have described in our revised version. However, n8-n11 do not have definitive markers and whether these sub-clusters are part of a continuum or over-clustered, will require further extensive experiments and analysis. We prefer to show all subclusters, including Gnai2 sub-clusters, in Figure 3-Figure supplement-1, along with a dot plot of sub-cluster gene expression, so that this data is available for future experiments and analysis.  We share the concern that some Gnai2 sub-clusters may not have an obvious biological basis at this time. Hence in our revised submission, we have merged mature Gnao1 and mature Gnai2 sub-clusters for the developmental analysis shown in Figure 3A. 

      d) Quantification of the ER phenotype: In our revised submission, we provide extensive quantification of the ER phenotype in Figure 7, Figure7-figure supplement-1.   

      e) We think that the cells expressing zero as well as two V1Rs are real and cannot be attributed to debris or doublets for the following reasons:

      i) Cells expressing no V1Rs are not necessarily debris because they express other neuronal markers at the same level as cells that express one or two V1Rs. For instance, Gnai2 expression level across cells expressing 0, 1, 2 V1Rs is the same, which we have included in Figure 4-figure supplement 4-C of our revised submission. Higher expression threshold value used in our analysis may have somewhat increased the proportion of cells with zero V1Rs. Similarly, Gnao1 levels across cells expressing multiple V2Rs and H2-M_v_ per cell stay the same, indicating that these are unlikely to be doublets (Figure 4 I-K). The frequency of each co-expression combination (Supplementary Table 7 and 8) itself is an indication of whether it is represented by a single cell or an artifact.

      ii) Cells co-expressing V1R genes: We listed the frequency of cells co-expressing V1R gene combinations in Supplementary table - 8. Among 134 cells that express two V1Rs, 44 cells express Vmn1r85+Vmn1r86, 21 express Vmn1r184+Vmn1r185, 13 express Vmn1r56+Vmn1r57, 6 express Vmn1r168+Vmn1r177, and so on. Doublets generally are a random combination of two cells. Here, each specific co-expression combination represents multiple cells and is highly unlikely by random chance. Some of the co-expression combinations we reported were also identified and verified experimentally in Lee et al., 2019 and Hills et. al., 2024.  

      Recommendations for the authors:

      Reviewing Editor (Recommendations for the Authors): 

      The editor had a query about the analysis of FPRs, which are a third family of sensory receptors in the rodent VNO. 

      FPRs were found in our study as expressed in subsets of Gnai2 and Gnao1 neurons as well as non-neuronal cells. These can be easily searched in www.scvnoexplorer.com. For instance, Fpr1 and Fpr2 are expressed in immune cell clusters - 2,6,8,10; whereas Fpr-3 is expressed in Gnao1 subcluster n1. Consistent with earlier reports (10.1073/pnas.0904464106, 10.1038/nature08029) expression of Fpr-rs3, Fpr-rs4, Fprrs6, Fpr-rs7 is restricted to Gnai2 neurons, of which Fpr-rs3 and Fpr-rs4 are limited to Tmbim1+ Gnai2 neurons.  

      Reviewer #1 (Recommendations For The Authors):

      (1) The reference to "genders" on page 3 should be changed to "sexes". 

      We have modified the text.   

      (2) Did the authors identify any Ascl1+ GBCs in their data? 

      Ascl1+ GBCs were identified and are now marked in our revised version Figure3-figure supplement 1B.    

      (3) The plots in Figures 1B and 2B say they're depicting gene "Expression", but it looks like the gene expression was z-scored. If so, the authors should describe how the expression was scaled. 

      We have modified the legend title to ‘scaled expression’ and described the basis of scaling in the methods section of our revised version. 

      (4) The main text mentions Figure 2C, but maybe this refers to the right part of Figure 2B?

      Panel 2C was mistakenly not marked in the figure. We have now marked it in revised Figure 2.    

      (5) The authors should attempt to describe the other branch points in the trajectory shown in Figure 3A. If they don't seem biologically plausible, then the authors might want to reconsider using Slingshot for their analyses.

      We do not seek to claim additional branch points within mature Gnao1 or Gnai2 neurons from our analysis. Whether there exist additional branch points leading to subcategories within mature neurons, requires extensive experimental investigation. Hence, in our revised submission, we have merged mature Gnai2 / Gnao1 subclusters for pseudotime developmental analysis and to keep our analysis focused on the single branch point at immature neurons.    

      (6) The most significantly enriched gene in Figure 3B in immature Gnao1+ neurons is Cnpy1, which is also an ER protein. It could also be interesting to look at its expression or speculate on its function in immature neurons. 

      Multiple ER genes were found to be enriched in Gnao1 neurons. We would not be comfortable speculating on the function of individual genes, without a proper study, which is beyond the scope of this manuscript.      

      (7) For figures with pseudo-colored expressions, it would be useful to have color bars. I'm also not sure the pseudocolors are necessary; presenting the data in grayscale or a single color like green might also be sufficient. 

      We used pseudocolor in the IHC images of ER proteins, because there is a wide variation in the fluorescence signal intensity across apical to basal axis for various proteins. In some cases, gray scale images could lead to the false impression that there is no signal in apical Gnai2 neurons, whereas pseudocolor shows low fluorescence level in these neurons. We have added intensity scale bar to the figures in our revision version.  

      (8) For in situ images with two colors it would be more colorblind-friendly to use green and magenta rather than green and red.

      Since no single color palette can help readers with different types of colorblindness, we decided to rely on user’s operating systems that offer rendering of the images to a color palette based on their type of colorblindness. We believe this  would be a better option as described here: https://markusmeister.com/2021/07/26/figure-design-for-colorblindreaders-is-outdated/

      (9) The heatmap in Figure 7E would likely look more accurate without interpolation/aliasing/smoothing. 

      We have not performed smoothening on any of the heatmaps. We have noticed that sometimes heatmaps take time to load in software (such as Adobe Acrobat) leading to the impression of smoothing. Changing the zoom level or reopening the file may fix this.     

      (10) Rather than just citing the literature on the unfolded protein response in the MOE, it could be useful to cite work on the ATF5 expression and the UPR in the VNO (e.g.

      10.1101/239830v1 or 10.12688/f1000research.13659.1).

      We have cited and commented on the ATF5 VNO expression in our discussion. 

      (11) I might try to condense the discussion. Additionally, in the discussion, the section on receptor co-expression comes before that on the VNO ER, so I might consider reorganizing the figures and results to present all of the scRNA-seq analyses (including the receptor co-expression figure) first before the figures on the ER. 

      We welcome this suggestion and have reorganized figures and results such that the scRNA-seq analysis flow is maintained before ER results.   

      Reviewer #2 (Recommendations For The Authors): 

      (1) Upregulation of ER-related mRNAs and expanded ER lumen in Gnao1-positive neurons is interesting, but the connection between these observations is unclear. The authors can strengthen the link by adding immunohistochemistry of representative ER proteins to test if the upregulation of mRNAs related to ER results in increased levels of these proteins in the ER of these neurons.

      Connection between scRNA-seq and EM was made due to our observations that levels of a number of ER luminal and membrane proteins were higher in Gnao1 compared to Gnai2 neurons (Figure 7, Figure 7-figure supplement-1 in our revised submission). This led us to hypothesize a differential ER content or ultrastructure, which was verified by EM. We have also addressed the question of whether upregulation of mRNAs related to ER proteins results in their increased levels (Figure 7-figure supplement-2). In some cases, for example Hspa5 (Bip), mRNA as well as protein levels are upregulated in Gnao1 neurons (see Figure 3A volcano plot, Figure 5-figure supplement-1 RNA-ISH, Figure 7-figure supplement-1 comparison of mRNA levels, Figure 7F immunofluorescence). However, there are other genes in the same figures, for which mRNA levels are not upregulated, yet protein levels are higher in Gnao1 neurons. As mentioned in our text and discussion, upregulated mRNA levels as well as post-transcriptional mechanisms are both likely to play a role in upregulating ER protein levels in Gnao1 neurons.       

      (2) In Figure 3, the authors seemed to exclude cluster 13 from Figure1 in the pseudotime analysis without justification. 

      Cluster13 has markers such as Obp2a, Obp2b, Lcn3. We confirmed via RNA-ISH (Figure 2-figure supplement-1A in our revised submission) that Obp2a maps to cells from glandular tissue on the non-sensory side. Cluster-13 also has cells expressing Vmn1r37, which typically is expressed in neuronal cells. However, we do not see Obp2a mRNA in the sensory epithelium. It is possible that cluster-13 comprises a heterogenous mixture of cells, some of which are non-sensory glandular cells, co-clustered with other cell types as well as the possibility that Obp2a is expressed in neurons, below the detection level of our assay. Further experiments will be required to distinguish between these possibilities. We do not have any possible reason to confidently assign this cluster as a neuronal cell type, hence, it was excluded in the downstream analysis of neurons.

      (3) In Figure 3, the line appears to suggest that Gnao1-positive cells can be progenitors of Gnai2-positive cells. Please clarify. 

      We thank the reviewer for pointing this out. We did not seek to give the impression that Gnao1 cells can be progenitors of Gnai2 cells. This may be due to the placement of dots in the trajectory leading to misinterpretation and the UMAP itself. We have modified the pseudotime trajectory in our revised version to make it more intuitive. 

      (4) Figure 3: Please label pseudotime lineage cluster identities. 

      Cluster identities are now labeled in Figure 3A pseudotime lineage as well as in Figure 3-figure supplement-1 dot plot.     

      (5) Figure 4: Please label the genes used for in situ hybridization in the volcano plot. 

      Genes used for RNA-ISH are labeled (bold font) in the volcano plot in Figure 5A.  

      (6) Figure 4: Please clarify which genes shown in the in situ hybridization figures correspond to which GO terms. 

      We have added supplementary table-10 containing gene ontology terms associated with genes for which RNA-ISH was performed. 

      (7) The EM shown in Figure 5 makes this work unique and intriguing. However, the lack of quantification for the ER phenotype is a concern. For example, does the ER area of a given cell correlate with the relative position of the cells along the apical-basal axis of the vomeronasal organ? What about the ER morphology in the progenitor cells? 

      We show here a quantification of the ER area from the low magnification EM image shown in Figure 8A. The ER area shows an increase going towards the basal side of the cross-section. However, this quantification is complicated by the following factors: a) Processing for EM, results in some shrinkage of the tissue, b) Gnao1 neurons follow an invaginating pattern in cross-sections. Due to these reasons, some Gnao1 neurons could come very close to, and at times lie adjacent to Gnai2 neurons in EM cross-section. Due to a lack of contrast, it is harder to identify the ER within the cell at low mag, especially in the apical zone. The plot shown here does indicate that roughly, the ER area of a cell correlates with its position along the apical-basal axis. In our revised submission, we have quantified the fluorescence intensities of various ER proteins along the apical basal axis from confocal images (Figure 7, Figure 7-figure supplement-1).    

      Author response image 2.

      ROIs (yellow) are manually drawn in the sensory epithelium, wherever possible to identify ER without ambiguity. Area and centroid of ROI are calculated and x coordinates of centroid of each ROI are used to position ER area along the apical-basal axis as shown in the plot below.

      Establishing ER ultrastructure in progenitor or immature cells, as well as unambiguous quantification of ER area in mature neurons, requires identification of these cells in crosssections using fluorescent molecular markers, followed by performing correlative light and electron microscopy (CLEM). This procedure being technically challenging is beyond the scope of our manuscript.      

      Reviewer #3 (Recommendations For The Authors): 

      (1) The main claim is about ER differences between Gnao1+ and Gnai2+ VSN. The ISH, IHC, and EM microscopy images are not quantified and, therefore, poorly support this main claim.

      In our revised submission, we provide extensive quantification of the ER phenotype in Figure 7, Figure7-Figure supplement-1.  Quantification of ER area from EM images is challenging and described above it in our response to reviewer #2 recommendation 7.

      (2) The annotation of VSN subclusters should be more rigorous, consistent throughout the paper (VSN clusters are inconsistent between Figure 1 and Figure 3, and the multiplication of subclusters in Figure 3 is not discussed), and verified (using ISH or IHC) that they reflect discrete, actual cell types. The authors should provide a list of differentiating marker genes for the clusters in Figure 3. At present, it remains unclear whether these clusters are the result of over-clustering of cells (and therefore represent either noise or arbitrary splits of continua) or whether they reflect the biology of the system. Subsequent characterization of these curated VSN subtypes (as done in Figure 4) would add value to the study.

      We pooled neuronal cells from Figure-1 and re-clustered at higher resolution to identify subtypes. Several neuronal sub-clusters identified by us including progenitors, immature neurons and mature neurons are validated by previous studies with well-known markers. Amongst the mature neurons, the biological basis of four Gnao1 neuron sub-clusters (n1n4) is discussed in our analysis and these are also validated by previous experimental studies. These Gnao1 clusters are organized according to the expression of family-C V2Rs (Vmn2r1 or Vmn2r2) as well as H2Mv genes. Within Gnai2 sub-clusters, n12 and n13 exclusively express markers that distinguish them from n8-n11 which we have described in our revised version. However, Gnai2 n8-n11 do not have definitive markers and whether these sub-clusters are part of a continuum or over-clustered, will require further extensive experiments and analysis. We prefer to show all sub-clusters, including Gnai2 sub-clusters, in Figure 3-Figure supplement-1, along with a dot plot of sub-cluster gene expression, so that this data is available for future experiments and analysis. We share the concern that some Gnai2 sub-clusters may not have an obvious biological basis at this time. Hence in our revised submission, we have merged mature Gnao1 and mature Gnai2 sub-clusters for the developmental analysis shown in Figure 3A.

      (3) Some clusters are characterized by the expression of specific chemoreceptors (VRs). Have these been used for clustering? If so, clustering should be repeated after excluding these receptors.

      Figure 3-Figure supplement-2 of our revised submission shows a comparison of neuron clusters with and without VRs. We also describe in the results, specific clusters that are affected by exclusion of VRs.  

      (4) Given the title and the data, the paper should be structured around its main claim (i.e. differential ER environment between VSN types). For example, Figure 7, which deals with the characterization of receptor expression and co-expression in VSNs, is sandwiched between the validation of ER substructure (Figure 6) and the timing of coexpression of ER chaperone genes (Figure 8). The data presented in Figure 7 would fit better if used as a validation of the dataset prior to the investigation presented in the current Figure 4. In addition, we suggest that expression and co-expression diagnostics should be used to filter cells for subsequent analyses.

      We appreciate this suggestion and have reorganized the figures in our revised version.  Our subsequent analysis showing enrichment of ER related genes at RNA, protein level covers all Gnao1 neurons and is not restricted to a specific subset. This is reflected in the ISH and IHC of ER genes. 

      (5) Figure 7-Supplement 3 suggests the presence of co-expressed V1Rs in VSNs. It is unclear from the data presented whether these co-expressing cells are artifactual cell doublets and should be removed from the analysis or whether the expression of the coexpressed receptors reflects a reality. To better address this observation, one may want to see the expression levels of the individual co-expressed V1rs in Figure 7-Supplemet 3 rather than the sum of V1r expression. I am also concerned about the unusually high frequency of "empty" neurons (i.e. without expressed VRs). Could these be debris? 

      We think that the cells expressing zero as well as two V1Rs are real and cannot be attributed to debris or doublets for the following reasons:

      i) Cells expressing no V1Rs are not necessarily debris because they express other neuronal markers at the same level as cells that express one or two V1Rs. For instance Gnai2 expression level across cells expressing 0, 1, 2 V1Rs is the same, which we have included in Figure 4-figure supplement 4-C of our revised submission. Higher expression threshold values used in our analysis may have somewhat increased the proportion of cells with zero V1Rs. Similarly, Gnao1 levels across cells expressing multiple V2Rs and H2-M_v_ per cell stay the same, indicating that these are unlikely to be doublets (Figure 4 I-K). As doublets are formed randomly, the frequency of each co-expression combination (Supplementary Table 7 and 8) itself is an indication of whether it is represented by a single cell or an artifact.

      ii) Cells co-expressing V1R genes: All cells used for co-expression analysis were filtered via an expression threshold (Figure 4-figure supplement 1D), which eliminates cells with low counts of V1R expression. We listed the frequency of cells co-expressing V1R gene combinations in Supplementary table - 8. Among 134 cells that express two V1Rs, 44 cells express Vmn1r85+Vmn1r86, 21 express Vmn1r184+Vmn1r185, 13 express Vmn1r56+Vmn1r57, 6 express Vmn1r168+Vmn1r177, and so on. Doublets generally are a random combination of two cells. Here, each specific co-expression combination represents multiple cells and is highly unlikely by random chance.  iii) Some of the co-expression combinations we reported were identified earlier and verified experimentally in Lee et al., 2019 using FACS based single collection in 96-well plates following the cellseq-2 protocol with very low chance of doublets, and Hills et. al., 2024.  

      (6) The authors use either dot plots or scatter plots to show gene expression in cell clusters. It looks nice, but it is very difficult to deduce population levels of expression from these plots. Could we see the distribution of gene expression across clusters using more quantitative visualizations such as violin or box plots?

      Dot plots are majorly used in our manuscript to show markers of cell clusters in Figure 1, Figure 2 and Figure 3-figure supplement 1. We would like to show at least 5 gene markers for each cluster that are important to identify the cell type. Using violin plot or bar plot for this will make the panel extremely big and overwhelming, especially with 16 clusters in Figure 1 and 13 clusters in Figure 3-figure supplement 1 or make the bars/violin too small to interpret.  Hence, for the sake of simplicity, we used dot plots to give our reader a birds-eye of gene expression differences across clusters. Scatter plots were used when we want to compare the expression levels of genes between male and female samples and show the expression of two genes (VRs) simultaneously in a single cell. This cannot be achieved by Violin/box plot. However, we have made our dataset available at scvnoexplorer.com to explore the expression patterns across cell clusters with different visualization options, including violin or box plots.  

      (7) To investigate whether sex might bias clustering, the authors calculated the Pearson coefficient of gene expression between sexes for each cluster. Given the high coefficient observed across all clusters (although no threshold is used), the authors conclude that there was no bias. While the overall effect may show a strong similarity in gene expression in each cluster between the sexes, this overlooks all the genes that are significantly differentially expressed. It would be worth investigating and discussing these differences. Relatedly, what batch correction method was applied to the data (to mitigate any possible sampling or technical effect)?

      We chose the Pearson coefficient as a representative parameter to show that there is no bias. In addition, we have performed differential expression analysis for each cluster and the results are in supplementary table-1. Except known sexually dimorphic genes, other genes are not differentially expressed significantly with adjusted p-values greater than 0.05. This was also shown by earlier studies using bulk RNAseq (doi.org/10.1371/journal.pgen.1004593, doi.org/10.1186/s12864-017-4364-4). We used depth normalization to integrate samples and described this in the methods section of our revised version.

      (8) We found the method description to be incomplete for the single-cell RNA sequencing analyses. The method section should include a detailed explanation of the code used by the authors to analyze the data. The Seurat package has many available pipelines for single-cell RNA-seq analysis, which have a major impact on the output data. It is therefore imperative to describe which of these pipelines were used and whether the pipeline was run with default settings. 

      Our revised submission has expanded on the methods section with details of parameters, filtering criterion and software used.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      This study uses a variety of approaches to explore the role of the cerebellum, and in particular Purkinje cells (PCs), in the development of postural control in larval zebrafish. A chemogenetic approach is used to either ablate PCs or disrupt their normal activity and a powerful, high-throughput behavioural tracking system then enables quantitative assessment of swim kinematics. Using this strategy, convincing evidence is presented that PCs are required for normal postural control in the pitch axis. Calcium imaging further shows that PCs encode tilt direction. Evidence is also presented that suggests the role of the cerebellum changes over the course of early development, although this claim is rather less robust in the current version of the paper. Finally, the authors build on their prior work showing that both axial muscles and pectoral fins contribute to "climbs" and show evidence that suggests PCs are required for correct engagement of the fins during this behaviour. Overall, establishing a role for the cerebellum in postural control is not very surprising. However, a clear motivation of this study was to establish a robust experimental platform to investigate the changing role of cerebellar circuits in the development of postural control in the highly experimentally accessible zebrafish larvae, and in this regard, the authors have certainly succeeded.

      Overall, I consider this an excellent paper, with some room for improvement in aspects of presentation, discussion, and some aspects of the data analysis..

      We thank the reviewer for their kind comments and support. In the revision we have addressed their concerns regarding data presentation and analysis. Additionally, we have expanded our introduction and discussion to address questions of presentation.  

      Reviewer #2 (Public Review):

      Summary:

      Franziska Auer et al. investigate the role of cerebellar Purkinje cells in controlling posture in larval zebrafish using the chemogenetic tool TRPV1/capsaicin to bidirectionally manipulate (i.e., activate or ablate) these cells. This tool has been developed for zebrafish previously but has not been applied to Purkinje cells.

      High-throughput behavioral experiments are presented to monitor how body posture is affected by these perturbations. The analysis of postural control focuses on a specific subaspect of posture: the body tilt-angle relative to horizontal just before a swim bout is executed, quantified separately for pre-ascent and pre-dive bouts. They report a broad bimodal distribution of pre-ascent bout posture ranging from -20 to +40 degrees, while the pre-dive bout posture was more Gaussian, ranging between -40 and 0 degrees. The treatment effect is quantified as the change in the median of these distributions.

      Purkinje cell activation and ablation in 7 days post-fertilization (dpf) fish shifted the median of the ascending bout posture distributions to positive values. The authors hypothesize that the stochastic nature of the activation process might desynchronize Purkinje cell activity, thus abolishing Purkinje cells' role in postural control, similar to ablation. However, this does not explain why dive bout posture decreased upon activation but was unaffected by ablation. 

      To test whether the role of Purkinje cells in postural control matures over development, the authors repeated the ablation experiments at 14 dpf. They state that "at 14 dpf, the effects of Purkinje cell lesions on posture were more widespread than at 7 dpf." However, this effect size is comparable to that observed at 7 dpf, suggesting no further maturation of the role of Purkinje cells in pre-ascending bout postural control. The median pre-dive bout posture decreased at 14 dpf, contrasting with no effect at 7 dpf, yet this change was comparable in effect size to the activation effect on Purkinje cells at 7 dpf. The current data breadth may not be sufficient to conclude that signatures of emerging cerebellar control of posture across early development were uncovered.

      The study's exploration of activating Purkinje cells in freely swimming fish using TRPV1/ capsaicin is of special interest, but the practicability of this method is unclear from the current presentation. It would be beneficial to present the distribution of the percentage of activatable Purkinje cells across animals and time points to provide insight into the method's efficiency. Discussing this limitation and potential improvements would aid in evaluating the method, especially since the authors report that the activation experiments were labor-intensive, limiting repeat experiments. This may explain why the activation experiment at 7 dpf is the only data presented with cell activation, with other analyses performed using the cell ablation capabilities of the TRPV1/capsaicin method.

      Another data point at 14dpf would significantly strengthen the conclusions.

      The authors analyze Purkinje cell-controlled fin-trunk coordination by examining ascending bout posture across different swim bout speeds. They make the important finding that pectoral fin movements contribute significant lift for median and fast swim bouts but not for slow ones, and that Purkinje cell ablation disrupts lift generation at all speeds.

      Finally, the authors examined whether Purkinje cell activity encodes postural tilt-angle by performing calcium imaging on 31 cells from 8 fish using their Tilt In Place Microscope (TIPM). They report that they could decode the tilt-angle from individual neurons with a highly tuned response, and also from neurons that were not obviously tuned when pooling them and analyzing the population response. However, due to the non-simultaneous recordings across animals, definitive conclusions about populationlevel encoding should be made cautiously, it might be better to suggest potential population encoding that needs confirmation with more targeted experiments involving simultaneous recordings.

      Strengths:

      - The study introduces a novel application of the chemogenetic tool TRPV1/capsaicin to study cerebellar function in zebrafish.

      - High-throughput behavioral experiments provide detailed analysis of postural control.

      - The further investigation of Purkinje cell-controlled fin-trunk coordination offers new insights into motor control mechanisms.

      - The use of calcium imaging to decode postural tilt-angle from Purkinje cell activity presents interesting preliminary results on neuronal population encoding.

      Weaknesses:

      - The term "disruption" for postural control effects may lead to misleading expectations.

      - The supporting data show only subtle median shifts in postural angle, raising questions about the significance of observed effects. Statistical methods that account for the hierarchical structure of the data might be required to support the conclusions.

      - The study's data breadth may not be sufficient to conclude emerging cerebellar postural control across early development.

      - The current presentation does not adequately detail the practicability and efficiency of the TRPV1/capsaicin method for activating Purkinje cells, and the labor-intensive nature of these experiments constrains the ability to replicate and validate the findings.

      - Non-simultaneous recordings in calcium imaging necessitate cautious interpretation of population-level encoding results.

      We appreciate the reviewer's thoughtful and detailed feedback. In response, we have made several changes to highlight key points in our manuscript. We have adjusted our wording to more accurately reflect the scope of our findings. Finally, we have clarified and expanded the methods used.

      Reviewer #3 (Public Review):

      Summary:

      This paper uses a new chemogenetic tool to investigate the role of cerebellar Purkinje cells in postural control. Using a high-throughput behavioral assay, they show that activation or ablation of Purkinje cells affects various aspects of postural control in zebrafish larvae during spontaneous swimming and that the effects are more pronounced at later developmental time points, where the Purkinje cell number is much greater. Using a sophisticated imaging assay, they record Purkinje cell activity in response to the tilt of the fish and show that some Purkinje cells are tuned to tilt direction and that the direction can even be decoded from untuned neurons.

      Strengths:

      Overall the study is nice, using a range of tools to address a fundamental question about the role of the cerebellum in postural control in fish.

      Weaknesses:

      (1) The data in Figure 1 that establishes the method seems to be based on a very small number of experiments and lacks some statistical analysis.

      (2) The choice and presentation of the statistical and analysis methods used in Figures 2-5 could be improved.

      We thank the reviewer for their comments.  We have added additional statistical analyses for the activation experiments, and improved data presentation .

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Overall I think this is a great paper.

      * Introduction and Discussion.

      The Introduction (and Discussion) do little to explain what is understood about cerebellar control of posture and what major outstanding questions remain. The first paragraph of the Introduction seems to argue that the role of the cerebellum in control of posture is well established and line 24 attempts to motivate the present study by virtue of the fact that terrestrial locomotion is "complex". This might be true but is not necessarily a major obstacle given the suite of powerful approaches available in rodent neuroscience. What are the major challenges that are hard to tackle in rodents and what specific questions can the larval zebrafish help to answer? What about development (which gets no mention at all)? I'm not suggesting a comprehensive review of every aspect of cerebellar physiology, but I think the Introduction should attempt to outline the current hypotheses in a little more detail and highlight what we still need to understand.

      We take the Reviewer’s point that there is more to say in the Introduction. We feel that multi-dimensional limb biomechanics and proprioception are two aspects of terrestrial locomotion that support our use of the word “complexity.” However, we don’t dwell on this point because, as the reviewer correctly states, the suite of tools for rodent neuroscience & behavior is expansive and, in our opinion, not a limiting factor. Instead, we said what we felt we could regarding the potential contribution of the larval zebrafish in the last paragraph of the Discussion. In the revision, we have added details about the development of cerebellum to the introduction (though this, of course, is an expansive topic and well-beyond the scope of the Introduction), highlighted some of the historical limitations in rodent posture analysis, and set up the .

      * Figure 2: 'Arrows denote the shift towards more nose-up postures'. I think the distribution is quite easy to interpret without these arrows; I suggest removing them.

      We have removed the arrows.  

      * IQR is sometimes stated as a single number and sometimes as a range. It should be consistent and unless eLife has guidance to the contrary, I suggest that it be the latter.

      Thank you for pointing that out. We now report it as the value at the 25&75th %ile for all IQRs.  

      * Figure S2: For 14 dpf fish the axes are labelled PC2/3 - is this an error?

      We have changed it to a 3-dimensional plot for both 7 and 14 dpf data to show comparable plots for both ages (now Figure S5 F and G). For the analysis in the 14dpf fish the clearest separation was in the space defined by the 2nd and 3rd principal component.  

      * In the methods, there is insufficient detail given about fluorescent imaging.

      We added additional information to how the fluorescent imaging was performed to the ‘Confocal imaging’ section as well as to the ‘Functional imaging section’

      * Abstract

      In my opinion, the statement "Here, we used a powerful chemogenetic tool (TRPV1/ capsaicin) to *define the role of Purkinje cells*..." is too strong. Whilst the evidence that PCs are required for postural control is certainly strong, what exactly these cells do in the service of postural control is far from clear (as the authors indeed acknowledge in the Discussion). As such, I wouldn't say their role has been "defined".

      We change the word to “describe” to better reflect our findings

      * aldoca transgenic.

      This appears to be a beautiful transgenic line but the data showing the extent of its expression and evidence that in the cerebellum it exclusively labels PCs isn't clear enough.

      (i) Ideally Figure 1A would show an image of a whole animal to provide an overview of transgene expression but instead it seems to be (the legend is unclear) a cartoon with a confocal projection of part of the brain overlaid.

      We have updated the figure legend to be clearer that we show a cartoon of a larval zebrafish with the confocal image overlaid. The aldoca promotor has been previously described and exclusively labels Purkinje cells (10.1523/JNEUROSCI.3352-10.2010)

      (ii) Figure 1B shows expression in the cerebellum, but how are we to understand that all the labelled cells are PCs? Are all PCs labelled, or only a subset? Perhaps a double labelling with a PC in situ marker could be done to demonstrate colocalisation?

      As above, the aldoca promotor has been previously described; to the best of our knowledge in the Hibi lab’s hands (and ours) it labels Purkinje cells exclusively, and it labels all of them (10.1523/JNEUROSCI.3352-10.2010)  

      * Chemogenetic validation.

      Overall, the chemogenetic approach to abrogate PC function looks to be very powerful. The authors state in several places that a contribution of this paper is in its "establishing the validity of TRPV1/capsaicin-mediated perturbations". However, the data in Figure 1, along with various comments in other parts of the paper raise some questions:

      (i) For experiments depolarising PCs with 1µM CSn, the same size is tiny: Two transgenic animals and one control. Moreover, it is stated 'in one fish ... we observed a small number of neurons at the 9h timepoint with bright, speckled fluorescence suggestive of cell death". Was this one out of two transgenics?! In the discussion, I didn't understand the statement "ensure adequate brightness levels *to achieve sufficient depolarization without excitotoxicity*". Does this "excitotoxicity" relate to the specked fluorescence observation?

      Overall, the very small sample size and comments about excitotoxicity and cell death raise concerns about the approach that I think warrant clearer treatment in the results (including information about the assessment of transgene expression, % embryos judged to have suitable expression), especially as this paper is seeking to establish the validity of the method.

      We note first that the method has been previously validated (https://doi.org/10.1038/ nmeth.3691) and that we build on this work. For the experiment described, the point was to identify an acceptable duration for exposure. To that end, we analyzed 6 animals for up to 6h (including the washout experiments in Figure S1B) where we never observed any speckled fluorescence; we limited our behavioral experiments to 6h accordingly. We thought it would be worth including the observation of speckled fluorescence at 9h timepoint for future reference. To directly address the comment we have increased the number of analyzed cells and fish for the 1uM capsaicin experiments and added statistical analysis (lines 65-67).

      When screening for transgene expression we selected for fish that had clearly visible expression, but that did not look overly bright, and used the same criteria when screening fish for the GCaMP imaging and for behavior. Around a quarter of the fish that had aldoca:TRPV1-tagRFP expression had a usable expression level for the activation experiment. We have added this information to the Results (line 62) and Methods (line 369-372)

      (ii) The authors note "capsaicin could sporadically activate subsets of Purkinje cells" and further speculate about PC activity and synchrony in the discussion. Figure 1 seems to rely on single images at widely spaced time points but given that they are set up to do 2-photon calcium imaging, why didn't they collect continuous time series data and analyse the temporal patterns of activity across the transgenic PC population?

      We have added time series data for calcium imaging after 1uM of Capsaicin in TRPV1-  and TRPV1+ cells to Supplementary Figure S1A. Here too we see sporadic increases in calcium levels at similar rates: 0% for TRPV1- and 15-19% for TRPV1+ (see also Figure S1 legend)

      (iii) The axonopathy and cell death resulting from 10 µM Csn is quite dramatic.

      However, here the authors do not appear to have included a TRPV1 negative control (although oddly they did for 1 µM treatment) so it is currently unclear whether or not a high conc of Csn alone might be cytotoxic.

      Chen et al (https://doi.org/10.1038/nmeth.3691) have established the TRPV1/capsaicin method in zebrafish with broad neuronal label and did not see any effect with high doses of capsaicin in TRPV1 negative fish.  

      * Behavioural assessment - stats

      Overall, the disruption of postural stability after PC manipulations is convincing.

      However, I have a few queries about the statistics:

      (i) In this section, the statistical unit was not clear. The tables, which are otherwise very useful, give no indication of N. The legend text does report "8 repeats/149 control fish" and "across experimental repeats" suggesting the statistical unit might be the repeats rather than animals, but this should be clarified. In Figure 2G, individual data points should be plotted if N=8, or a representation of the distribution (eg violin or box and whisker plots) if N = 149.

      We apologize for the confusion. Given the variable numbers of bouts, a single experimental repeat does not allow for an accurate estimate of expected value. Below we simulated how accurately the median can be estimated based on increasing sample sizes (Author response image 1). Given that large numbers of bouts are necessary to accurately estimate the median we pool the data for all experiments and use resampling statistics to estimate bias in our estimate.

      Author response image 1.

      Median estimation based on increasing sample size

      (ii) Related to the above, I hope it might be easier to interpret the unexpected change in climb posture in ablation controls once the data for individual repeats is shown.

      When we analyze the data as single repeats we see considerable variability between different repeats due to undersampling. We tested the medians for the single repeats for outliers to ensure that the shift is not due to a single repeat skewing the distribution. We did not detect any outliers in the pre-lesion control or in the post-lesion control group. (Outliers were determined as deviating more than 3 times the scaled median absolute deviation (MAD) from the median. A scaling factor of 1.4826 was used to ensure that MAD-based outlier detection is consistent with other methods like Z-scores.) We added this information to line 133-134 and the method section under Statistics. 

      (iii) In some parts of this section, including the Tables, the authors report the 95% CI of the median, rather than IQR. In this case, they should report the z-value used for 95% CI estimation.

      As we are using resampling to estimate the 95% confidence interval of the median there is no z-value as in a traditional normal distribution based confidence interval; Instead, we explicitly define the 2.5th and 97.5th percentiles from the bootstrapped sample distribution, which captures the middle 95% of the data, representing the 95% confidence interval.

      * It is stated that "fish adopted more nose-up postures before *and throughout* climb bouts". Figure 2F seems to show posture before the climb, but where is the "throughout" data? It would be useful if Figure 2E, J could be extended to make a bit clearer these two phases of postural assessment.

      We removed the phrase ‘throughout climb bouts’ as we are not showing the posture throughout the bout and to avoid over complicating the interpretation.  

      * Why were PCs not activated at 14 dpf (eg using 1 µM Csn)?

      Due to shifts in priorities the first author will not be continuing this series of experiments, and so this additional experiment will have to wait for someone to pick up this line of inquiry

      * The authors appear to claim that the difference in phenotype in 7 versus 14 dpf animals following high conc Csn treatment is indicative of a changing role for cerebellar PCs over this developmental period. For instance, in reference to the 14 dpf ablation phenotype, the authors write "reveals the functional emergence of Purkinje cell control of dives" and in the abstract they talk about "emerging control of posture across early development". However, can they rule out that the phenotypic differences might instead reflect differential sensitivity of the relevant PC (sub)populations to CSn at the two ages? If this caveat cannot be discounted then I suggest it is acknowledged e.g. in the discussion.

      As previously established, all Purkinje cells are labeled in the aldoca line (10.1523/ JNEUROSCI.3352-10.2010). Fluorescence is brighter at 14dpf compared to 7dpf, suggesting higher levels of TRPV1. We therefore assume that at 14 dpf, the high concentration of Csn is sufficient to ablate Purkinje cells. At 14 dpf, cerebellar damage is visible under a standard dissecting microscope.The preponderance of evidence therefore speaks against a previously undiscovered subpopulation of TRPV1expressing Purkinje cells that are, by mechanisms yet unknown, resistant to high doses of capsaicin. 

      * Fin-body "coordination"

      The ideas and data around fin-body coordination are very intriguing.

      (i) The statement "fin engagement is speed-dependent" would benefit from a stats test to show this is indeed significant. The data in Figure 4B suggest a rather high degree of variance.

      This is an important point; we appreciate the Reviewer’s attention. We have added statistics to show this is speed dependent to line 167-169 and show the corresponding plot in the supplement in Figure S4.  "Here, we observed that fin engagement is speeddependent, with faster bouts producing greater lift for a given axial rotation (Spearman correlation coefficient: control 0.2193; 10uM capsaicin: 0.0397; Z-test after ztransformation: p < 0.001)  

      (ii) The statement "After capsaicin exposure, the slopes of the medium fast speed bins were significantly lower (Figure 4C), reflecting *a loss of speed-dependent modulation*" is not convincing. The slope is likely a function of both speed and Csn treatment, and the comparisons in Figure 4C appear to be testing the latter, not the former.

      We understand the reviewer’s point. However, the slope for the slow bouts remains unchanged. We therefore conclude that the reduction in fin-body slope is speed dependent and not a speed independent reduction of slope overall. 

      We have made this more clear by adding Supplementary Figure S4 and changing the text in line 177-179. 

      (iii) I'd like to understand more about the phenotype of the fin-amputated animals. Were any "bout" parameters changed? Did the animals still attempt climbs and was the distribution of the upward rotation parameter similar to controls? The text states "the slope of the relationship between upward rotation and lift was indistinguishable from zero" but the stats reported in the text are comparisons between groups while Table 5 shows 95% CIs that don't span zero. Some clarification would be useful here.

      We appreciate the Reviewer’s interest. We’ve studied climbing in fin-amputated animals at length here: https://doi.org/10.7554/eLife.45839 and here: https://doi.org/10.1016/ j.celrep.2023.112573 and have added these references in line 183.

      (iv) The authors repeatedly refer to fin-body *coordination* but it is not clear whether the loss of lift after PC ablation is a result of an explicit coordination defect (i.e. changes in the relative timing and/or kinematics between fins and axial motion components), versus a simple reduction in pectoral fin engagement. Either result could be interesting, but this should be clarified.

      Thank you for pointing that out. In the fastest speed bin, we observed an increase in upward rotation and a decrease in average fin lift. In contrast, the medium speed bin showed no significant changes in average fin lift or upward rotation (see Author response image 2 and Tables 4 and 5), yet already displayed coordination deficits. Based on these observations, we argue that Purkinje cell lesions primarily affect coordination, rather than simply reducing one specific parameter such as lift or rotation (line 293-298).

      We have added fin lift and rotation values from Author response image 2 for all speed bins to tables 4 and 5.  

      Author response image 2.

      Fin lift and rotation for slow, medium and fast bouts

      * PC activity and decoding of pitch direction.

      The clever TIPM method is used to collect calcium data that convincingly shows that individual PCs can encode pitch-tilt direction. However, a population of "not tuned" cells are also identified, and here I found the analysis of their responses and the argument that they encode pitch direction at a population level difficult to follow.

      (i) First, although the naming of the cells implies that individual neurons do not encode pitch direction, I did not find this convincing. Figures 5F/G suggest that several "not tuned" cells in fact show quite consistent differences in activity across trial types and indeed in terms of their average responses sit as far from the unity line as do several "tuned" cells.

      The Reviewer’s comment helped us clarify some key points. First, tuned and untuned cells were categorized based on a Directionality Index threshold of 0.35; some cells might look similar in 5F/G but the highly variable responses of Purkinje cells have highly variable response so overall there was no consistent tuning. We have clarified this in the text in line 203-207 Below we have plotted the Up versus Down responses for the 10 least tuned cells (sorted by directionality index). While some cells have higher responses on average to one direction we think that the variability makes it difficult to support a claim for “tuning.” We have also tested the support vector machine on the least tuned cells to confirm that the chosen cutoff for tuned/untuned is not affecting our claim that untuned cells can encode position.(see also Author response image 4)

      Author response image 3.

      Trial-by-trial variability

      (ii) It is therefore not very surprising that PCA (and the SVM decoder) distinguishes trial type. I would guess that PCA assigns the largest weights to these most tuned of the "not tuned" cells, and the 3-5 cell decoders do well when these cells happen to be sampled.

      Author response image 4.

      Decoding accuracy of the 3/5/7 least tuned cells

      This was an interesting idea. To rule out that it is only the most tuned cells that contain the information, we tested the decoder on the 3/5/7 least tuned cells; here too, 5 and more cells are better able to accurately decode the direction. We have add the decoding accuracy to the text in line 221-224

      (iii) As I understand the analysis, Figure 5G shows responses for "not tuned" cells over 21 trials (of each type) but these are not the same trials for the different cells? How then is population coding being assessed?

      We have updated the text and refer to this data as a “pseudo-population” in lines 216 and 218 for all experiments where we combined cells from different fish. For technical reasons, when we perform TIPM at eccentric angles we must use sparsely labelled fish to ensure that we can find the same cells over a 60 degree range. We have repeated our analyses for TIPM centered at the horizon, where we can record from entire populations from a single fish.  

      (iv) Furthermore, Figure S2 shows a somewhat different analysis with decoding accuracy measured on a fish-by-fish basis. In this case, are these decoders for simultaneously imaged neurons? Is this a cross-validated measure of decoding accuracy?

      Yes, as above, Figure S4 (former S2) looks at fish-by-fish basis of simultaneous recorded neurons. Yes, it was 5-fold cross validated. We have updated the text in line 490-494.

      Reviewer #2 (Recommendations For The Authors):

      - Postural control involves various aspects such as balance, coordination, relative body part orientations, and stability. Discussing these and presenting in this context the specific subaspect characterized in this study would help clarify which aspect of postural control the work focuses on.

      The Reviewer makes an interesting point, but we think their description of what constitutes postural control is overly broad. Specifically, control of “relative body part orientations in space” by definition requires coordination, and subserves balance and stability. We acknowledge, of course, that different aspects can be and often are treated independently. While interesting, a full treatment of what comprises “postural control” is beyond the scope of the paper, as it would require reconciling the terms across taxa, effectors, environments and well over a century of experiments.

      We contend that posture — particularly underwater — is best defined as the relative orientation of body parts in space. For fish, those parts consist of predominantly axial muscles and secondarily fins. We present these definitions in the Introduction and thank the Reviewer for encouraging us to more clearly shape our findings.

      - Disruption of posture or postural control: The use of the word "disruption" could lead to misleading expectations. While it may not be incorrect, it suggests a significant loss of equilibrium, an obvious increase in postural variability, or at least a noticeable effect when observing an individual animal's behavior. However, the supporting data show only a subtle median shift in postural angle within a very broad distribution averaged over many individuals. This effect was only significant when comparing fish with a control group, not when comparing fish posture before and after the treatment.

      Replacing "disruption" with "modification" would be more cautious.

      We take the Reviewer’s point and have adjusted our wording to "modifies postural control.” In lines 137, 266, and 283

      - Statistical significance: Consider aligning the asterisk notation with conventional standards (e.g., * for p < 0.05, ** for p < 0.01, *** for p < 0.001) to enhance clarity for readers. On the other hand, the individual measurements might not be independent (e.g., measurements from the same fish, or the same tank are likely to be correlated), so using the Wilcoxon rank-sum test (Mann-Whitney U test) on pooled data might lead to incorrect conclusions. Methods that account for the hierarchical structure of the data might be required to support the conclusions.

      We take the Reviewer’s point about the importance of conventions, however we have never found “more stars = more significant” to be all that helpful in evaluating claims. Instead, we’ve opted to have both a significance and effect size criteria; a “star” here reflects our considered confidence in the difference we observe. 

      We agree that the hierarchical nature of pooled data is worth considering/presenting.

      We performed a two-way analysis of variance (ANOVA) on the interquartile ranges (IQRs) of the single experimental repeats for the 7 days post-fertilization (dpf) activation, 7dpf lesion, and 14dpf lesion experiments. The ANOVA revealed no significant main effects, supporting the strategy of pooling experimental repeats to estimate distributions.

      The results of the ANOVA, along with the IQRs for all experimental repeats, are presented in Tables 6-11. We have also clarified this in the methods section in lines 505-509.

      - Data representation: All data of postural angles should be represented in the form of violin plots to show the underlying distributions of the postural angles, especially given that the effect size is small relative to the dispersion of the distribution of the postural angle and that this distribution is also not Gaussian but bimodal, and different before and after the treatments.

      We take the Reviewer’s point that seeing the full distribution can be useful. We have added plots of the raw distributions for the data in Figure 3 as supplemental Figure S3.

      - Showing the distributions will provide the necessary information for the reader to evaluate the importance of the effect. For all data shown in Table 1, the distributions should be presented in the supplementary information.

      As requested, we have added the distributions of the data in Table 1 to the supplement (Figure S2)

      - Roll posture: A statement about whether roll posture is perturbed by Purkinje cell manipulation would be a piece of important additional information helping to understand how strong the 'disruption' of posture is.

      We haven’t assessed roll posture, as this is not practical in the current version of the SAMPL apparatus. We have added this limitation to the results (line 116) but also note that as our manipulations are bilateral, we don’t anticipate any systematic changes to roll.   

      - Comparison with other methods: Add a discussion on how the TRPV1/capsaicin method compares with other methods, such as using nitroreductase (Ntr) for targeted pharmaco-genetic ablation of cells by treatment with metronidazole or the the possibility to to ablate Purkinje cells by KillerRed as the author lab has done previously. Both methods have been applied to ablate Purkinje cells in larval zebrafish. What are the advantages of the TRPV1 method compared to these when neglecting the activation possibility?

      Thank you for that suggestion, we have added a section to the discussion where we compare the TRPV1/capsaicin lesion to other lesion methods (lines 334-336)

      - Describe the decoding algorithm: The decoding algorithm used could be described more in detail in the methods section.

      We have described the decoding algorithm in more detail in the methods under ‘Functional GCaMP imaging in Purkinje cells.’ Line 488+ 

      We used a support vector machine (SVM) with a linear kernel. The SVM model was trained using k-fold cross-validation, which splits the data into k subsets (folds). At each iteration, the model was trained on k-1 folds and tested on the remaining fold, ensuring that the model performance was evaluated on unseen data in each fold. Permutations were performed on randomized trial identity as a null hypothesis (5-fold cross-validation; 100 shuffles for randomization). Accuracy was calculated as 1 minus the classification loss.  

      - Availability of code: The link to the data and code repository is not working.

      Thank you for pointing that out, we have fixed it now. In the lower right of the page you can see the history of all changes to the repository, including the entry on 2023-09-08 where the corresponding author set it to “public.” When we checked thanks to your comment, it had been set to “private,” without any record of when/why. We have reset it 2024-10-17. We will continue to check it periodically in the future and apologize in advance if it is unavailable; this is the first time we’ve seen that happen.

      - Electrophysiological Control: Including an electrophysiological characterization of the activation of Purkinje cells by the TRPV1/capsaicin would significantly strengthen the validity of the method.

      We take the Reviewer’s point that electrophysiological characterization is a way to strengthen the validity of the method. However, Chen et al (h"ps://doi.org/10.1038/ nmeth.3691) have performed electrophysiology during neuronal activation and concluded that TRPV1 activation with capsaicin indeed increases neuronal activity and firing rates increased. Our calcium imaging and lesion experiments amply demonstrate that Purkinje cells are sensitive to TRPV1-mediated currents. We therefore do not believe that the additional information gained by arduous electrophysiological evaluation is merited here.

      - Describe more in detail how climb and dive bouts are defined. The height difference between consecutive bouts measured 250ms before the bout of executions.

      Climb and Dive bouts are split by the angle of their trajectory. If the fish moves up (i.e. trajectory larger 0) it is considered a climb bout and vice versa for dive bouts. 250ms prior to the maximum speed is roughly the time the fish initiate a bout, so the pre-bout posture is measured when at this point. The time-courses of bouts are dissected extensively in Zhu et. al. 2023. We have added a definition for climb and dive bouts to the method section under ‘Behavior analysis’ line 453 and 454.  

      - Figure 1H: Why can't you ablate all Purkinje cells but only about 80%?

      This is an excellent question. We opted for an extremely conservative count, and included everything that was still resembling a cell, even if it might not be functional/ already dying. Our counts are therefore likely an underestimate of the percentage of cells that were lost. We have added this point to the text in lines 393 395

      - Figure 2C: The method is not fully clear. At 8dpf 0.1uM capsaicin is added to the chamber. At what time after the application of capsaicin did the behavioral recording start?

      We recorded after about 10-15min after adding the 1uM Csn to the chambers. The fish were fed after the 6h in capsaicin. We have added this information to the method section line 404 - 408.

      - Figure 2F: What indicates the shown confidence interval? Also median with a 95% confidence interval calculated over the experiments in parallel?

      The distributions shown in Figure 2F take data from all experiments pooled. We use resampling methods to determine the variability in our estimates. The distribution plots are showing the median and the 25th and 75th percentile of the resampled distribution. We have added this information to the figure legends.

      - Figure 3: Subtitles on panel D and E indicating <climb bout posture> and would facilitate reading.

      We have added the subtitles to those panels.

      - Figure 4: Describe in the methods how recordings from individual fish were mapped onto each other to superimpose the Purkinje cell locations recorded from the 8 fish.

      We have added the respective section to the methods: Line 481 - 483

      “To map the anatomical locations of the recorded cells, we imaged overview stacks for each fish. These stacks were manually aligned in Illustrator, and the cells included in the analysis were reidentified and color-coded according to their tuning properties.”

      Reviewer #3 (Recommendations For The Authors):

      Major points:

      (1) Lines 74-81. The data presented here and in later experiments to argue for an effect of capsaicin on neural activity lacks statistical rigor because of the apparently very small numbers of animals/cells assessed. For example, the control appears to involve 4 cells assessed from 1 animal, and the experimental group is just 2 animals. Given that the interpretation of the paper depends upon this result, it is worthwhile to show the result more clearly, and with some statistical analysis. They argue in the discussion that "Our imaging assay established that 1 µM of capsaicin would stochastically activate subsets of Purkinje cells" which seems a stretch from the data as presented.

      We appreciate this point, which was shared by Reviewer 1. We have added more data and performed statistical analysis (line 63 - 67 as well as Figure S1A)

      (2) I found the practice of sorting effects by a mixture of effect size and p-value to be a little arbitrary, although in this case, it seems likely that it identified the most relevant effects. I would have preferred to see some attempt to correct for multiple comparisons (e.g. by resampling with the identities of fish shuffled to estimate the distribution of each measurement for this population size), followed by filtering for effect size after establishing a corrected threshold for significance.

      We take the Reviewer’s point, though we note that critical values for effect size and pvalue are inevitably “a little arbitrary.” We can’t do the exact analysis the Reviewer suggests as we do not measure data from individual fish for these experiments. However, we did calculate new critical p-values (added to the Tables) that account for multiple comparisons using Šidák’s method.

      (3) Figure 4. The data here is a little strange in that the slope in the control condition for medium speed is given as much larger than for slow, but the data in the two cases appears largely overlapping for most of the range of behavior, only diverging for the most extreme rotations. It seems perhaps that the measurement of slope is strongly dependent on these most extreme values. The authors might want to consider the use of robust regression methods which might mitigate these effects.

      This is an interesting observation and we appreciate the Reviewer’s thoughtful suggestion. We now use a robust regression method (bisquare weighting of residuals).

      We have adjusted all values in lines 175 - 177  and added the regression method to the Methods section line 520.

      (4) Figure 5. The 'principal component analysis' description is extremely unclear. The text says that PCA 'showed near-complete segregation of trial types' but it is not explained how this was achieved with PCA or how this was quantified. Figure panels show the data plotted using different pairs of PCs showing visual evidence of segregation. In the methods, it is stated that "We performed principal component analysis" and that "cells were used for principal component analysis and subsequent support vector machine decoding analysis". What is meant exactly by 'performed PCA'? Was PCA used in a dimensionality reduction step? And if so, how many and which PCs were chosen and why? For visualization of the separation, the authors show arbitrary pairs of PCs. Could it be better to use a method more suited to that purpose such as linear discriminant analysis?

      PCA was used to define a subspace to qualitatively evaluate if different trials could be separated. Once it became clear that it could, we next trained a binary decoder on the complete dataset (i.e. no dimensionality reduction). We did not perform linear discriminant analysis as the unsupervised PCA already showed separation of trial types.  We have made this clearer in lines 212 - 214.

      (5) Why does the decoding analysis use only untuned cells? Isn't it equally, or more, interesting to know how well tilt can be encoded using all cells? It is unclear to me what we learn by selecting only untuned cells for this analysis (although I agree it is interesting that this does work).

      We focused exclusively on untuned cells because including even a single highly tuned cell for the population coding will lead to excellent results. By using untuned cells we test if there is some directionality information that is not visible just by looking at the up/ down responses of single cells. We have made this clear in lines 217 - 218

      Minor points and corrections:

      (1) Maybe consider losing the words 'powerful' (I think it is overused and not well defined) and 'reagent'. Reagent is normally used for something that participates in a reaction. It is a bit odd to use it to refer to a transgenic animal. Later it is called a 'tool' which seems better.

      We have changed the wording and refer to it as tool for the whole paper.  

      (2) Figure 1D. Please use a color bar to indicate the scale.

      We have added a color scale to the panel

      (3) Saying that 'posture' increases is confusing, although the meaning can be inferred from the overall context and the definitions in the Methods - could Posture be capitalized to indicate a specific definition is being used rather than the general meaning?

      This suggestion agrees with those made by Reviewer 2. We have changed the wording to “postural angle.” 

      (4) The arrowheads in Figure 2FHK are unnecessary and confusing (why are some horizontal and some vertical?).

      Thank you for that suggestion, we have removed the arrowheads.

      (5) Figure 3 The legend should indicate that the image is shown with an inverted lookup table.

      We have updated the legend

      (6) Figure 3 D and E Titles would be helpful, so it is not necessary to refer to the legend to understand the difference.

      We have added titles to the figure panels

      (7) The dwell time for the 2-photon experiments is given in the manuscript, but I think the authors meant microseconds?

      Thank you for pointing that out. We have corrected it to microseconds.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We performed multiple new experiments and analyses in response to the reviewers concerns, and incorporated the results of these analyses in the main text, and in multiple substantially revised or new figures. Before embarking on a point-by-point reply to the reviewers’ concerns, we here briefly summarize our most important revisions.

      First, we addressed a concern shared by Reviewers #1-3 about a lack of information about our DNA sequences. To this end, we redesigned multiple figures (Figures 3, 4, 5, S8, S9, S10, S11, and S12) to include the DNA sequences of each tested promoter, the specific mutations that occurred in it, the resulting changes in position-weight-matrix (PWM) scores, and the spacing between promoter motifs. Second, Reviewers #1 and #2 raised concerns about a lack of validation of our computational predictions and the resulting incompleteness of the manuscript. To address this issue, we engineered 27 reporter constructs harboring specific mutations, and experimentally validated our computational predictions with them. Third, we expanded our analysis to study how a more complete repertoire of other sigma 70 promoter motifs such as the UP-element and the extended -10 / TGn motif affects gene expression driven by the promoters we study. Fourth, we addressed concerns by Reviewer #3 about the role of the Histone-like nucleoid-structuring protein (H-NS) in promoter emergence and evolution. We did this by performing both experiments and computational analyses, which are now shown in the newly added Figure 5. Fifth, to satisfy Reviewer #3’s concerns about missing details in the Discussion, we have rewritten this section, adding additional details and references. 

      We next describe these and many other changes in a point-by-point reply to each reviewer’s comments. In addition, we append a detailed list of changes to each section and figure to the end of this document.

      Reviewer #1 (Public Review):

      Summary:

      This study by Fuqua et al. studies the emergence of sigma70 promoters in bacterial genomes. While there have been several studies to explore how mutations lead to promoter activity, this is the first to explore this phenomenon in a wide variety of backgrounds, which notably contain a diverse assortment of local sigma70 motifs in variable configurations. By exploring how mutations affect promoter activity in such diverse backgrounds, they are able to identify a variety of anecdotal examples of gain/loss of promoter activity and propose several mechanisms for how these mutations interact within the local motif landscape. Ultimately, they show how different sequences have different probabilities of gaining/losing promoter activity and may do so through a variety of mechanisms.

      We thank Reviewer #1 for taking the time to read and provide critical feedback on our manuscript. Their summary is fundamentally correct.

      Major strengths and weaknesses of the methods and results:

      This study uses Sort-Seq to characterize promoter activity, which has been adopted by multiple groups and shown to be robust. Furthermore, they use a slightly altered protocol that allows measurements of bi-directional promoter activity. This combined with their pooling strategy allows them to characterize expressions of many different backgrounds in both directions in extremely high throughput which is impressive! A second key approach this study relies on is the identification of promoter motifs using position weight matrices (PWMs). While these methods are prone to false positives, the authors implement a systematic approach which is standard in the field. However, drawing these types of binary definitions (is this a motif? yes/no) should always come with the caveat that gene expression is a quantitative trait that we oversimplify when drawing boundaries.

      The point is well-taken. To clarify this and other issues, we have added a section on the limitations of our work to the Discussion. Within this section we include the following sentences (lines 675-680):

      “Additionally, future studies will be necessary to address the limitations of our own work. First, we use binary thresholding to determine i) the presence or absence of a motif, ii) whether a sequence has promoter activity or not, and iii) whether a part of a sequence is a hotspot or not. While chosen systematically, the thresholds we use for these decisions may cause us to miss subtle but important aspects of promoter evolution and emergence.”

      Their approach to randomly mutagenizing promoters allowed them to find many anecdotal examples of different types of evolutions that may occur to increase or decrease promoter activity. However, the lack of validation of these phenomena in more controlled backgrounds may require us to further scrutinize their results. That is, their explanations for why certain mutations lead or obviate promoter activity may be due to interactions with other elements in the 'messy' backgrounds, rather than what is proposed.

      Thank you for raising this important point. To address it, we have conducted extensive new validation experiments for the newest version of this manuscript. For the “anecdotal” examples you described, we created 27 reporter constructs harboring the precise mutation that leads to the loss or gain of gene expression, and validated its ability to drive gene expression. The results from these experiments are in Figures 3, 4, 5, and Supplemental Figures S8-S11, and are labeled with a ′ (prime) symbol.

      These experiments not only confirm the increases and decreases in fluorescence that our analysis had predicted. They also demonstrate, with the exception of two (out of 27) falsepositive discoveries, that background mutations do not confound our analysis. We mention these two exceptions (lines 364-367):

      “In two of these hotspots, our validation experiments revealed no substantial difference in gene expression as a result of the hotspot mutation (Fig S8F′ and Fig S8J′). In both of these false positives, new -10 boxes emerge in locations without an upstream -35 box.”

      An appraisal of whether the authors achieved their aims, and whether the results support their conclusions:

      The authors express a key finding that the specific landscape of promoter motifs in a sequence affects the likelihood that local mutations create or destroy regulatory elements. The authors have described many examples, including several that are non-obvious, and show convincingly that different sequence backgrounds have different probabilities for gaining or losing promoter activity. While this overarching conclusion is supported by the manuscript, the proposed mechanisms for explaining changes in promoter activity are not sufficiently validated to be taken for absolute truth. There is not sufficient description of the strength of emergent promoter motifs or their specific spacings from existing motifs within the sequence. Furthermore, they do not define a systematic process by which mutations are assigned to different categories (e.g. box shifting, tandem motifs, etc.) which may imply that the specific examples are assigned based on which is most convenient for the narrative.

      To summarize, Reviewer #1 criticizes the following three aspects of our work in this comment. 1) The mechanisms we proposed are not sufficiently validated. 2) The description of motifs, spacing, and PWM scores are not shown. 3) How mutations are classified into different categories (i.e. box-shifting, tandem motifs, etc.) is not systematically defined. 

      These are all valid criticisms. In response, we performed an extensive set of follow-up experiments and analyses, and redesigned the majority of the figures. Here is a more detailed response to each criticism:

      (1) Proposed mechanisms for explaining changes in promoter activity are not sufficiently validated. We engineered 27 reporter constructs harboring the specific mutations in the parents that we had predicted to change promoter activity. For each, we compared their fluorescence levels with their wild-type counterpart. The results from these experiments are in Figures 3 and 4, 5, and Supplemental Figures S8, S9, S10, S11, and S12, and are labeled with a ′ (prime) symbol.

      (2) No sufficient description of the strength of emergent promoter motifs or their specific spacings. We redesigned the figures to include the DNA sequences of the parent sequences, as well as the degenerate consensus sequences for each mutation. We additionally now highlight the specific motif sequences, their respective PWM scores, and by how much the score changes upon mutation. Finally, we annotated the spacing of motifs. These changes are in Figures 3, 4, 5, and Supplemental Figures S8, S9, S10, S11, and S12.

      We note that in many cases, high-scoring PWM hits for the same motif can overlap (i.e. two -10 motifs or two -35 motifs overlap). Additionally, the proximity of a -35 and -10 box does not guarantee that the two boxes are interacting. Together, these two facts can result in an ambiguity of the spacer size between two boxes. To avoid any reporting bias, we thus often report spacer sizes as a range (see Figure panels 4F, S8D, S8F-L, S9A, S9H, S10A, and S10E). The smallest spacer we annotate is in Figure 4F with 10 bp, and the largest is in Figure S8D with 26 bp. Any more “extreme” distances are not annotated and for the reader to decide if an interaction is present or not.

      (3) No systematic process by which mutations are assigned to different categories such as box shifting, tandem motifs, etc. We opted to reformulate these categories completely, because the phenotypic effects of a previously mentioned “tandem motif” was actually a byproduct of H-NS repression (see the newly added Figure S12). 

      We also agree that the categories were ambiguous. We now introduce two terms: homo-gain and hetero-gain of -10 and -35 boxes. The manuscript now clearly defines these terms, and the relevant passage now reads as follows (lines 430-435): 

      “We found that these mutations frequently create new boxes overlapping those we had identified as part of a promoter

      (Fig S9). This occurs when mutations create a -10 box overlapping a -10 box, a -35 box overlapping a -35 box, a -10 box overlapping a -35 box, or a -35 box overlapping a -10 box. We call the resulting event a “homo-gain” when the new box is of the same type as the one it overlaps, and otherwise a “hetero-gain”. In either case, the creation of the new box does not always destroy the original box.”

      Impact of the work on the field, and the utility of the methods and data to the community: From this study, we are more aware of different types of ways promoters can evolve and devolve, but do not have a better ability to predict when mutations will lead to these effects. Recent work in the field of bacterial gene regulation has raised interest in bidirectional promoter regions. While the authors do not discuss how mutations that raise expression in one direction may affect another, they have created an expansive dataset that may enable other groups to study this interesting phenomenon. Also, their variation of the Sort-Seq protocol will be a valuable example for other groups who may be interested in studying bidirectional expression. Lastly, this study may be of interest to groups studying eukaryotic regulation as it can inform how the evolution of transcription factor binding sites influences short-range interactions with local regulator elements. Any additional context to understand the significance of the work:

      The task of computationally predicting whether a sequence drives promoter activity is difficult. By learning what types of mutations create or destroy promoters from this study, we are better equipped for this task.

      We thank Reviewer #1 again for their time and their thoughtful comments.

      Reviewer #2 (Public Review):

      Summary:

      Fuqua et al investigated the relationship between prokaryotic box motifs and the activation of promoter activity using a mutagenesis sequencing approach. From generating thousands of mutant daughter sequences from both active and non-active promoter sequences they were able to produce a fantastic dataset to investigate potential mechanisms for promoter activation. From these large numbers of mutated sequences, they were able to generate mutual information with gene expression to identify key mutations relating to the activation of promoter island sequences.

      We thank Reviewer #2 for reading and providing a thorough review of our manuscript. 

      Strengths:

      The data generated from this paper is an important resource to address this question of promoter activation. Being able to link the activation of gene expression to mutational changes in previously nonactive promoter regions is exciting and allows the potential to investigate evolutionary processes relating to gene regulation in a statistically robust manner. Alongside this, the method of identifying key mutations using mutual information in this paper is well done and should be standard in future studies for identifying regions of interest.

      Thank you for your kind words.

      Weaknesses:

      While the generation of the data is superb the focus only on these mutational hotspots removes a lot of the information available to the authors to generate robust conclusions. For instance.

      (1) The linear regression in S5 used to demonstrate that the number of mutational hotspots correlates with the likelihood of a mutation causing promoter activation is driven by three extreme points.

      A fair criticism. In response, we have chosen to remove the analysis of this trend from the manuscript entirely. (Additionally, Pnew and mutual information calculations both relied on the fluorescence scores of daughter sequences, so the finding was circular in its logic.)

      (2) Many of the arguments also rely on the number of mutational hotspots being located near box motifs. The context-dependent likelihood of this occurring is not taken into account given that these sequences are inherently box motif rich. So, something like an enrichment test to identify how likely these hot spots are to form in or next to motifs.

      Another good point. To address it, we carried out a computational analysis where we randomly scrambled the nucleotides of each parent sequence while maintaining the coordinates for each mutual information “hotspot.” This scrambling results in significantly less overlap with hotspots and boxes. This analysis is now depicted in Figure 2C and described in lines 272-296.

      (3) The link between changes in expression and mutations in surrounding motifs is assessed with two-sided Mann Whitney U tests. This method assumes that the sequence motifs are independent of one another, but the hotspots of interest occur either in 0, 3, 4, or 5s in sequences. There is therefore no sequence where these hotspots can be independent and the correlation causation argument for motif change on expression is weakened.

      This is a fair criticism and a limitation of the MWU test. To better support our reasoning, we engineered 27 reporter constructs harboring the specific mutations in the parents that we had predicted to change promoter activity. For each, we compared their fluorescence levels with their wild-type counterpart. The results from these experiments are in Figures 3, 4, 5, and Supplemental Figures S8, S9, S10, S11, and S12 and are labeled with a ′ (prime) symbol.

      These experiments not only confirm the increases and decreases in fluorescence that our analysis had predicted. They also demonstrate, with the exception of two (out of 27) falsepositive discoveries, that background mutations do not confound our analysis. We mention these two exceptions (lines 364-367):

      “In two of these hotspots, our validation experiments revealed no substantial difference in gene expression as a result of the hotspot mutation (Fig S8F′ and Fig S8J′). In both of these false positives, new -10 boxes emerge in locations without an upstream -35 box.”

      (4) The distance between -10 and -35 was mentioned briefly but not taken into account in the analysis.

      We have now included these spacer distances where appropriate. These changes are in Figures 3, 4, 5, and Supplemental Figures S8, S9, S10, S11, and S12.

      We note that in many cases, high-scoring PWM hits for the same motif can overlap (i.e. two -10 motifs or two -35 motifs overlap). Additionally, the proximity of a -35 and -10 box does not guarantee that the two boxes are interacting. Together, these two facts can result in an ambiguity of the spacer size between two boxes. To avoid any reporting bias, we thus often report spacer sizes as a range (see Figure panels 4F, S8D, S8F-L, S9A, S9H, S10A, and S10E). The smallest spacer we annotate is in Figure 4F with 10 bp, and the largest is in Figure S8D with 26 bp. More “extreme” distances are not annotated, and for the reader to decide if an interaction is present or not.

      The authors propose mechanisms of promoter activation based on a few observations that are treated independently but occur concurrently. To address this using complementary approaches such as analysis focusing on identifying important motifs, using something like a glm lasso regression to identify significant motifs, and then combining with mutational hotspot information would be more robust.

      This is a great idea, and we pursued it as part of the revision. For each parent sequence, we mapped the locations of all -10 and -35 box motifs in the daughters, then reduced each sequence to a binary representation, either encoding or not encoding these motifs, also referred to as a “hot-encoded matrix.” We subsequently performed a Lasso regression between the hot-encoded matrices and the fluorescence scores of each daughter sequence. The regression then outputs “weights” to each of the motifs in the daughters. The larger a motif’s weight is, the more the motif influences promoter activity. The Author response image 1 describes our workflow.

      Author response image 1.

      We really wanted this analysis to work, but unfortunately, the computational model does not act robustly, even when testing multiple values for the hyperparameter lambda (λ), which accounts for differences in model biases vs variance.

      The regression assigns strong weights almost exclusively to -10 boxes, and assigns weak to even negative weights to -35 boxes. While initially exciting, these weights do not consistently align with the results from the 27 constructs with individual mutations that we tested experimentally. This ultimately suggests that the regression is overfitting the data.

      We do think a LASSO-regression approach can be applied to explore how individual motifs contribute to promoter activity. However, effectively implementing such a method would require a substantially more complex analysis. We respectfully believe that such an approach would distract from the current narrative, and would be more appropriate for a computational journal in a future study. 

      Because this analysis was inconclusive, we have not made it part of the revised manuscript. However, we hope that our 27 experimentally validated new constructs with individual mutations are sufficient to address the reviewer’s concerns regarding independent verification of our computational predictions.

      Other elements known to be involved in promoter activation including TGn or UP elements were not investigated or discussed.

      Thank you for highlighting this potentially important oversight. In response, we have performed two independent analyses to explore the role of TGn in promoter emergence in evolution. First, we computationally searched for -10 boxes with the bases TGn immediately upstream of them in the parent sequences, and found 18 of these “extended -10 boxes” in the parents (lines 143145):

      “On average, each parent sequence contains ~5.32 -10 boxes and ~7.04 -35 boxes (Fig S1). 18 of these -10 boxes also include the TGn motif upstream of the hexamer.”

      However, only 20% of these boxes were found in parents with promoter activity (lines 182-185):

      “We also note that 30% (15/50) of parents have the TGn motif upstream of a -10 box, but only 20% (3/15) of these parents have promoter activity (underlined with promoter activity: P4-RFP, P6-RFP, P8-RFP, P9-RFP, P10-RFP, P11GFP, P12-GFP, P17-GFP, P18-GFP, P18-RFP, P19-RFP, P22-RFP, P24-GFP, P25-GFP, P25-RFP). “

      Second, we computationally searched through all of the daughter sequences to identify new -10 boxes with TGn immediately upstream. We found 114 -10 boxes with the bases TGn upstream. However, only 5 new -10 boxes (2 with TGn) were associated with increasing fluorescence (lines 338-345):

      “On average, 39.5 and 39.4 new -10 and -35 boxes emerged at unique positions within the daughter sequences of each mutagenized parent (Fig 3A,B), with 1’562 and 1’576 new locations for -10 boxes and -35 boxes, respectively. ~22% (684/3’138) of these new boxes are spaced 15-20 bp away from their cognate box, and ~7.3% (114/1’562) of the new -10 boxes have the TGn motif upstream of them. However, only a mere five of the new -10 boxes and four of the new 35 boxes are significantly associated with increasing fluorescence by more than +0.5 a.u. (Fig 3C,D).”

      In addition, we now study the role of UP elements. This analysis showed that the UP element plays a negligible role in promoter emergence within our dataset.  It is discussed in a new subsection of the results (lines 591-608).

      Collectively, these additional analyses suggest that the presence of TGn plus a -10 box is insufficient to create promoter activity, and that the UP element does not play a significant role in promoter emergence or evolution.

      Reviewer #3 (Public Review):

      Summary:

      Like many papers in the last 5-10 years, this work brings a computational approach to the study of promoters and transcription, but unfortunately disregards or misrepresents much of the existing literature and makes unwarranted claims of novelty. My main concerns with the current paper are outlined below although the problems are deeply embedded.

      We thank Reviewer #3 for taking the time to review this manuscript. We have made extensive changes to address their concerns about our work.

      Strengths:

      The data could be useful if interpreted properly, taking into account i) the role of translation ii) other promoter elements, and iii) the relevant literature.

      Weaknesses:

      (1) Incorrect assumptions and oversimplification of promoters.

      - There is a critical error on line 68 and Figure 1A. It is well established that the -35 element consensus is TTGACA but the authors state TTGAAA, which is also the sequence represented by the sequence logo shown and so presumably the PWM used. It is essential that the authors use the correct -35 motif/PWM/consensus. Likely, the authors have made this mistake because they have looked at DNA sequence logos generated from promoter alignments anchored by either the position of the -10 element or transcription start site (TSS), most likely the latter. The distance between the TSS and -10 varies. Fewer than half of E. coli promoters have the optimal 7 bp separation with distances of 8, 6, and 5 bp not being uncommon (PMID: 35241653). Furthermore, the distance between the -10 and -35 elements is also variable (16,17, and 18 bp spacings are all frequently found, PMID: 6310517). This means that alignments, used to generate sequence logos, have misaligned -35 hexamers. Consequently, the true consensus is not represented. If the alignment discrepancies are corrected, the true consensus emerges. This problem seems to permeate the whole study since this obviously incorrect consensus/motif has been used throughout to identify sequences that resemble -35 hexamers.

      We respectfully but strongly disagree that our analysis has misrepresented the true nature of -35 boxes. First, accounting for more A’s at position 5 in the PWM is not going to lead to a “critical error.” This is because positions 4-6 of the motif barely have any information content (bits) compared to positions 1-3 (see Fig 1A). This assertion is not just based on our own PWM, but based on ample precedent in the literature. In PMID 14529615, TTG is present in 38% of all -35 boxes, but ACA only in 8%. In PMID 29388765, with the -10 instance TATAAT, the -35 instance TTGCAA yields stronger promoters compared to the -35 instance TTGACA (See their Figure 3B).

      In PMID 29745856 (Figure 2), the most information content lies in positions 1-3, with the A and C at position 5 both nearly equally represented, as in our PWM. In PMID 33958766 (Figure 1) an experimentally-derived -35 box is even reduced to a “partial” -35 box which only includes positions 1 and 2, with consensus: TTnnnn.

      In addition, we did not derive the PWMs as the reviewer describes. The PWMs we use are based on computational predictions that are in excellent agreement with experimental results. Specifically, the PWMs we use are from PMID 29728462, which acquired 145 -10 and -35 box sequences from the top 3.3% of computationally predicted boxes from Regulon DB. See PMID 14529615 for the computational pipeline that was used to derive the PWMs, which independently aligns the -10 and -35 boxes to create the consensus sequences. The -35 PWMs significantly and strongly correlates with an experimentally derived -35 box (see Supporting Information from Figure S4 of Belliveau et al., PNAS 2017. Pearson correlation coefficient = 0.89). Within the 145 -35 boxes, the exact consensus sequence (TTGACA) that Reviewer #3 is concerned about is present 6 times in our matrix, and has a PWM score above the significance threshold. In other words, TTGACA, is classified to be a -35 box in our dataset.

      We now provide DNA sequences for each of the figures to improve accessibility and reproducibility. A reader can now use any PWM or method they wish to interpret the data.

      - An uninformed person reading this paper would be led to believe that prokaryotic promoters have only two sequence elements: the -10 and -35 hexamers. This is because the authors completely ignore the role of the TG motif, UP element, and spacer region sequence. All of these can compensate for the lack of a strong -35 hexamer and it's known that appending such elements to a lone -10 sequence can create an active promoter (e.g. PMIDs 15118087, 21398630, 12907708, 16626282, 32297955). Very likely, some of the mutations, classified as not corresponding to a -10 or -35 element in Figure 2, target some of these other promoter motifs.

      Thank you for bringing this oversight to our attention. We have performed two independent analyses to explore the role of TGn in promoter emergence in evolution. First, we computationally searched for -10 boxes with the bases TGn immediately upstream of them in the parent sequences, and found 18 of these “extended -10 boxes” in the parents (lines 143145):

      “On average, each parent sequence contains ~5.32 -10 boxes and ~7.04 -35 boxes (Fig S1). 18 of these -10 boxes also include the TGn motif upstream of the hexamer.”

      However, only 20% of these boxes were found in parents with promoter activity (lines 182-185):

      “We also note that 30% (15/50) of parents have the TGn motif upstream of a -10 box, but only 20% (3/15) of these parents have promoter activity (underlined with promoter activity: P4-RFP, P6-RFP, P8-RFP, P9-RFP, P10-RFP, P11GFP, P12-GFP, P17-GFP, P18-GFP, P18-RFP, P19-RFP, P22-RFP, P24-GFP, P25-GFP, P25-RFP).”

      Second, we computationally searched through all of the daughter sequences to identify new -10 boxes with TGn immediately upstream. We found 114 -10 boxes with the bases TGn upstream. However, only 5 new -10 boxes (2 with TGn) were associated with increasing fluorescence (lines 338-345):

      “On average, 39.5 and 39.4 new -10 and -35 boxes emerged at unique positions within the daughter sequences of each mutagenized parent (Fig 3A,B), with 1’562 and 1’576 new locations for -10 boxes and -35 boxes, respectively. ~22% (684/3’138) of these new boxes are spaced 15-20 bp away from their cognate box, and ~7.3% (114/1’562) of the new -10 boxes have the TGn motif upstream of them. However, only a mere five of the new -10 boxes and four of the new 35 boxes are significantly associated with increasing fluorescence by more than +0.5 a.u. (Fig 3C,D).”

      In addition, we now study the role of UP elements. This analysis showed that the UP element plays a negligible role in promoter emergence within our dataset.  It is discussed in a new subsection of the results (lines 591-608) and in the newly added Figure S13.

      Collectively, these additional analyses suggest that the presence of TGn plus a -10 box is insufficient to create promoter activity, and that the UP element does not play a significant role in promoter emergence or evolution.

      - The model in Figure 4C is highly unlikely. There is no evidence in the literature that RNAP can hang on with one "arm" in this way. In particular, structural work has shown that sequencespecific interactions with the -10 element can only occur after the DNA has been unwound (PMID: 22136875). Further, -10 elements alone, even if a perfect match to the consensus, are non-functional for transcription. This is because RNAP needs to be directed to the -10 by other promoter elements, or transcription factors. Only once correctly positioned, can RNAP stabilise DNA opening and make sequence-specific contacts with the -10 hexamer. This makes the notion that RNAP may interact with the -10 alone, using only domain 2 of sigma, extremely unlikely.

      This is a valid criticism, and we thank the reviewer for catching this problem. In response, we have removed the model and pertinent figures throughout the entire manuscript.

      (2) Reinventing the language used to describe promoters and binding sites for regulators.

      - The authors needlessly complicate the narrative by using non-standard language. For example, On page 1 they define a motif as "a DNA sequence computationally predicted to be compatible with TF binding". They distinguish this from a binding site "because binding sites refer to a location where a TF binds the genome, rather than a DNA sequence". First, these definitions are needlessly complicated, why not just say "putative binding sites" and "known binding sites" respectively? Second, there is an obvious problem with the definitions; many "motifs" with also be "bindings sites". In fact, by the time the authors state their definitions, they have already fallen foul of this conflation; in the prior paragraph they stated: "controlled by DNA sequences that encode motifs for TFs to bind". The same issue reappears throughout the paper.

      We agree that this was needlessly complicated. We now just refer to every sequence we study as a motif. A -10 box is a motif, a -35 box is a motif, a putative H-NS binding site is an H-NS motif, etc. The word “binding site” no longer occurs in the manuscript.

      - The authors also use the terms "regulatory" and non-regulatory" DNA. These terms are not defined by the authors and make little sense. For instance, I assume the authors would describe promoter islands lacking transcriptional activity (itself an incorrect assumption, see below)as non-regulatory. However, as horizontally acquired sections of AT-rich DNA these will all be bound by H-NS and subject to gene silencing, both promoters for mRNA synthesis and spurious promoters inside genes that create untranslated RNAs. Hence, regulation is occurring.

      Another fair point. We have thus changed the terminology throughout to “promoter” and “nonpromoter.”

      - Line 63: "In prokaryotes, the primary regulatory sequences are called promoters". Promoters are not generally considered regulatory. Rather, it is adjacent or overlapping sites for TFs that are regulatory. There is a good discussion of the topic here (PMID: 32665585). 

      We have rewritten this. The sentence now reads (lines 67-69):

      “A canonical prokaryotic promoter recruits the RNA polymerase subunit σ70 to transcribe downstream sequences (Burgess et al., 1969; Huerta and Collado-Vides, 2003; Paget and Helmann, 2003; van Hijum et al., 2009).”

      (3) The authors ignore the role of translation.

      - The authors' assay does not measure promoter activity alone, this can only be tested by measuring the amount of RNA produced. Rather, the assay used measures the combined outputs of transcription and translation. If the DNA fragments they have cloned contain promoters with no appropriately positioned Shine-Dalgarno sequence then the authors will not detect GFP or RFP production, even though the promoter could be making an RNA (likely to be prematurely terminated by Rho, due to a lack of translation). This is known for promoters in promoter islands (e.g. Figure 1 in PMID: 33958766).

      We agree that this is definitely a limitation of our study, which we had not discussed sufficiently. In response, we now discuss this limitation in a new section of the discussion (lines 680-686):

      “Second, we measure protein expression through fluorescence as a readout for promoter activity. This readout combines transcription and translation. This means that we cannot differentiate between transcriptional and post-transcriptional regulation, including phenomena such as premature RNA termination (Song et al., 2022; Uptain and Chamberlin, 1997), post-transcriptional modifications (Mohanty and Kushner, 2006), and RNA-folding from riboswitch-like sequences (Mandal and Breaker, 2004).”

      - In Figure S6 it appears that the is a strong bias for mutations resulting in RFP expression to be close to the 3' end of the fragment. Very likely, this occurs because this places the promoter closer to RFP and there are fewer opportunities for premature termination by Rho.

      The reviewer raises a very interesting possibility. To validate it, we have performed the following analysis. We took the RFP expression values from the 9’934 daughters with single mutations in all 25 parent sequences (P1-RFP, P2-RFP, … P25-RFP), and plotted the location of the single mutation (horizontal axis) against RFP expression (vertical axis) in Author response image 2. 

      Author response image 2.

      The distribution is uniform across the sequences, showing that distance from the RBS is not likely the reason for this observation. Since this analysis was uninformative with respect to distance from the RBS, we chose not to include it in the manuscript.

      (4) Ignoring or misrepresenting the literature.

      - As eluded to above, promoter islands are large sections of horizontally acquired, high ATcontent, DNA. It is well known that such sequences are i) packed with promoters driving the expression on RNAs that aren't translated ii) silenced, albeit incompletely, by H-NS and iii) targeted by Rho which terminates untranslated RNA synthesis (PMIDs: 24449106, 28067866, 18487194). None of this is taken into account anywhere in the paper and it is highly likely that most, if not all, of the DNA sequences the authors have used contain promoters generating untranslated RNAs.

      Thank you for pointing out that our original submission was incomplete in this regard. We address these concerns by new analyses, including some new experiments. First, Rhodependent termination is associated with the RUT motif, which is very rich in Cytosines (PMID: 30845912). Given that our sequences confer between 65%-78% of AT-content, canonical rhodependent termination is unlikely. However, we computationally searched for rho-dependent terminators using the available code from PMID: 30845912, but the algorithm did not identify any putative RUTs. Because this analysis was not informative, we did not include it in the paper.

      We analyzed the role of H-NS on promoter emergence and evolution within our dataset using both experimental and computational approaches. These additional analyses are now shown in the newly-added Figure 5 and the newly-added Figure S12. We found that H-NS represses P22-GFP and P12-RFP and affects the bidirectionality of P20. More specifically, to analyze the effects of H-NS, we first compared the fluorescence levels of parent sequences in a Δhns background vs the wild-type (dh5α) background in Figure 5A. We found 6 candidate H-NS targets, with P22-GFP and P12-RFP exhibiting the largest changes in fluorescence (lines 496506):

      “We plot the fluorescence changes in Fig 5A as distributions for the 50 parents, where positive and negative values correspond to an increase or decrease in fluorescence in the Δhns background, respectively. Based on the null hypothesis that the parents are not regulated by H-NS, we classified outliers in these distributions (1.5 × the interquartile range) as H-NS-target candidates. We refer to these outliers as “candidates” because the fluorescence changes could also result from indirect trans-effects from the knockout (Mattioli et al., 2020; Metzger et al., 2016). This approach identified 6 candidates for H-NS targets (P2-GFP, P19-GFP, P20-GFP, P22-GFP, P12-RFP, and P20-RFP). For GFP, the largest change occurs in P22-GFP, increasing fluorescence ~1.6-fold in the mutant background (two-tailed t-test, p=1.16×10-8) (Fig 5B). For RFP, the largest change occurs in P12-RFP, increasing fluorescence ~0.5-fold in the mutant background (two-tailed t-test, p=4.33×10-10) (Fig 5B).” 

      We also observed that the Δhns background affected the bidirectionality of P20 (lines 507-511):

      “We note that for template P20, which is a bidirectional promoter, GFP expression increases ~2.6-fold in the Δhns background (two-tailed t-test, p=1.59×10-6). Simultaneously, RFP expression decreases ~0.42-fold in the Δhns background (two-tailed t-test, p=4.77×10-4) (Fig S12A). These findings suggest that H-NS also modulates the directionality of P20’s bidirectional promoter through either cis- or trans-effects.”

      We then searched for regions where losing H-NS motifs in hotspots significantly changed fluorescence. We identified 3 motifs in P12-RFP and P22-GFP (lines 522-528):

      “For P22-GFP, a H-NS motif lies 77 bp upstream of the mapped promoter. Mutations which destroy this motif significantly increase fluorescence by +0.52 a.u. (two-tailed MWU test, q=1.07×10-3) (Fig 5E). For P12-RFP, one H-NS motif lies upstream of the mapped promoter’s -35 box, and the other upstream of the mapped promoter’s -10 box. Mutations that destroy these H-NS motifs significantly increase fluorescence by +0.53 and +0.51 a.u., respectively (two-tailed MWU test, q=3.28×10-40 and q=4.42 ×10-50) (Fig 5F,G). Based on these findings, we conclude that these motifs are bound by H-NS.”

      We are grateful for the suggestion to look at the role of H-NS in our dataset. Our analysis revealed a more plausible explanation to what we formerly referred to as a “Tandem Motif” in the original submission. Previously, we had shown that in P12-RFP, when a -35 box is created next to the promoter’s -35 box, or a -10 box next to the promoter’s -10 box, that expression decreases. These new -10 and -35 boxes, however, also overlap with the two H-NS motifs in P12-RFP. We tested these exact point mutations in reporter plasmids and in the Δhns background, and found that the Δhns background rescues this loss in expression (see Figure S12). This analysis is in the newly added subsection: “The binding of H-NS changes when new 10 and -35 boxes are gained” and can be found at lines 529-563. We summarize the findings in a final paragraph of the section (lines 556-563):

      “To summarize, we present evidence that H-NS represses both P22-GFP and P12-RFP in cis. H-NS also modulates the bidirectionality of P20-GFP/RFP in cis or trans. In P22-GFP, the strongest H-NS motif lies upstream of the promoter. In P12-RFP, the strongest H-NS motifs lie  upstream of the -10 and -35 boxes of the promoter. We note that there are 16 additional H-NS motifs surrounding the promoter in P12-RFP that may also regulate P12-RFP (Fig S12G). Mutations in two of these two H-NS motifs can create additional -10 and -35 boxes that appear to lower expression. However, the effects of these mutations are insignificant in the absence of H-NS, suggesting that these mutations actually modulate H-NS binding.”

      We also agree that the majority of these sequences are likely driving the expression of many untranslated RNAs (see Purtov et al., 2014). We thus now define a promoter more carefully as follows (lines 113-119):

      “In this study, we define a promoter as a DNA sequence that drives the expression of a (fluorescent) protein whose expression level, measured by its fluorescence, is greater than a defined threshold. We use a threshold of 1.5 arbitrary units (a.u.) of fluorescence. This definition does not distinguish between transcription and translation. We chose it because protein expression is usually more important than RNA expression whenever natural selection acts on gene expression, because it is the primary phenotype visible to natural selection (Jiang et al., 2023).” 

      We also state this as a limitation of our study in the Discussion (lines 680-686):

      “Second, we measure protein expression through fluorescence as a readout for promoter activity. This readout combines transcription and translation. This means that we cannot differentiate between transcriptional and post-transcriptional regulation, including phenomena such as premature RNA termination (Song et al., 2022; Uptain and Chamberlin, 1997), post-transcriptional modifications (Mohanty and Kushner, 2006), and RNA-folding from riboswitch-like sequences (Mandal and Breaker, 2004).”

      - The authors state that GC content does not correlate with the emergence of new promoters. It is known that GC content does correlate to the emergence of new promoters because promoters are themselves AT-rich DNA sequences (e.g. see Figure 1 of PMID: 32297955). There are two reasons the authors see no correlation in this work. First, the DNA sequences they have used are already very AT-rich (between 65 % and 78 % AT-content). Second, they have only examined a small range of different AT-content DNA (i.e. between 65 % and 78 %). The effect of AT-content on promoter emerge is most clearly seen between AT-content of between around 40 % and 60 %. Above that level, the strong positive correlation plateaus.

      We respectfully disagree that the reviewer’s point is pertinent because what the reviewer is referring to is the likelihood that the sequence is a promoter, which indeed increases with AT content, but we are focused on the likelihood that a sequence becomes a promoter through DNA mutation. We note that if a DNA sequence is more AT-rich, then it is more likely to have -10 and -35 boxes, because their consensus sequences are also AT-rich. However, H-NS and other transcriptional repressors also bind to AT-rich sequences. This could also explain the saturation observed above 60% AT-content in PMID 32297955. Perhaps we can address this trend in future works.

      - Once these authors better include and connect their results to the previous literature, they can also add some discussion of how previous papers in recent years may have also missed some of this important context.

      We apologize for this oversight. We have rewritten the Discussion section to include the following points below. Many of the newly added references come from the group of David Grainger, who works on H-NS repression, bidirectional promoters, promoter emergence, promoter motifs, and spurious transcription in E. coli. More specifically:

      (1) The role of pervasive transcription and the likelihood of promoter emergence (lines 614-621):

      “Instead, we present evidence that promoter emergence is best predicted by the level of background transcription each non-promoter parent produces, a phenomenon also referred to as “pervasive transcription” (Kapranov et al., 2007).

      From an evolutionary perspective, this would suggest that sequences that produce such pervasive transcripts – including the promoter islands (Panyukov and Ozoline, 2013) and the antisense strand of existing promoters (Dornenburg et al., 2010; Warman et al., 2021), may have a proclivity for evolving de-novo promoters compared to other sequences (Kapranov et al., 2007; Wade and Grainger, 2014).”

      (2) How our results contradict the findings from Bykov et al., 2020 (lines 622-640):

      “A previous study randomly mutagenized the appY promoter island upstream of a GFP reporter, and isolated variants with increased and decreased GFP expression. The authors found that variants with higher GFP expression acquired mutations that 1) improve a -10 box to better match its consensus, and simultaneously 2) destroy other -10 and -35 boxes (Bykov et al., 2020). The authors concluded that additional -10 and -35 boxes repress expression driven by promoter islands. Our data challenge this conclusion in several ways. 

      First, we find that only ~13% of -10 and -35 boxes in promoter islands actually contribute to promoter activity. Extrapolating this percentage to the appY promoter island, ~87% (100% - 13%) of the motifs would not be contributing to its activity. Assuming the appY promoter island is not an outlier, this would insinuate that during random mutagenesis, these inert motifs might have accumulated mutations that do not change fluorescence. Indeed, Bykov et al. (Bykov et al., 2020) also found that a similar frequency of -10 and -35 boxes were destroyed in variants selected for lower GFP expression, which supports this argument. Second, we find no evidence that creating a -10 or -35 box lowers promoter activity in any of our 50 parent sequences. Third, we also find no evidence that destruction of a -10 or -35 box increases promoter activity without plausible alternative explanations, i.e. overlap of the destroyed box with a H-NS site, destruction of the promoter, or simultaneous creation of another motif as a result of the destruction. In sum, -10 and 35 boxes are not likely to repress promoter activity.”

      (3) How other sequence features besides the -10 and -35 boxes may influence promoter emergence and activity (lines 661-671):

      “These findings suggest that we are still underestimating the complexity of promoters. For instance, the -10 and -35 boxes, extended -10, and the UP-element may be one of many components underlying promoter architecture. Other components may include flanking sequences (Mitchell et al., 2003), which have been observed to play an important role in eukaryotic transcriptional regulation (Afek et al., 2014; Chiu et al., 2022; Farley et al., 2015; Gordân et al., 2013). Recent studies on E. coli promoters even characterize an AT-rich motif within the spacer sequence (Warman et al., 2020), and other studies use longer -10 and -35 box consensus sequences (Lagator et al., 2022). Another possibility is that there is much more transcriptional repression in the genome than anticipated (Singh et al., 2014). This would also coincide with the observed repression of H-NS in P22-GFP and P12-RFP, and accounts of H-NSrepression in the full promoter island sequences (Purtov et al., 2014).”

      (4) The limits of our experimental methodology (lines 675-686):

      “Additionally, future studies will be necessary to address the limitations of our own work. First, we use binary thresholding to determine i) the presence or absence of a motif, ii) whether a sequence has promoter activity or not, and iii) whether a part of a sequence is a hotspot or not. While chosen systematically, the thresholds we use for these decisions may cause us to miss subtle but important aspects of promoter evolution and emergence. Second, we measure protein expression through fluorescence as a readout for promoter activity. This readout combines transcription and translation. This means that we cannot differentiate between transcriptional and post-transcriptional regulation, including phenomena such as premature RNA termination (Song et al., 2022; Uptain and Chamberlin, 1997), posttranscriptional modifications (Mohanty and Kushner, 2006), and RNA-folding from riboswitch-like sequences (Mandal and Breaker, 2004) “

      (5) An updated take-home message (lines 687-694):

      “Overall, our study demonstrates that -10 and -35 boxes neither prevent existing promoters from driving expression, nor do they prevent new promoters from emerging by mutation. It shows how mutations can create new -10 and -35 boxes near or on top of preexisting ones to modulate expression. However, randomly creating a new -10 or -35 box will rarely create a new promoter, even if the new box is appropriately spaced upstream or downstream of a cognate box. Ultimately our study demonstrates that promoter models need to be further scrutinized, and that using mutagenesis to create de-novo promoters can provide new insights into promoter regulatory logic.”

      (5) Lack of information about sequences used and mutations.

      - To properly assess the work any reader will need access to the sequences cloned at the start of the work, where known TSSs are within these sequences (ideally +/- H-NS, which will silence transcription in the chromosomal context but may not when the sequences are removed from their natural context and placed in a plasmid). Without this information, it is impossible to assess the validity of the authors' work.

      Thank you for raising this point. Please see Data S1 for the 25 template sequences (P1-P25) used in this study, and Data S2 for all of the daughter sequences.

      For brevity, we have addressed the reviewer’s request to look at the role of H-NS in their comment (4) “Ignoring or misrepresenting the literature.”

      We do not have information about the predicted transcription start sites (TSS) for the parent sequences because the program which identified them (Platprom) is no longer available. Regardless, having TSS coordinates would not validate or invalidate our findings, since we already know that the promoter islands produce short transcripts throughout their sequences, and we are primarily interested in promoters which can produce complete transcripts.

      - The authors do not account for the possibility that DNA sequences in the plasmid, on either side of the cloned DNA fragment, could resemble promoter elements. If this is the case, then mutations in the cloned DNA will create promoters by "pairing up" with the plasmid sequences. There is insufficient information about the DNA sequences cloned, the mutations identified, or the plasmid, to determine if this is the case. It is possible that this also accounts for mutational hotspots described in the paper.

      We agree that these are important points. To address the criticism that we provided insufficient information, we now redesigned all our figures to provide this information. Specifically, the figures now include the DNA sequences, their PWM predictions, and the exact mutations that lead to promoter activity. The figures with these changes are Figures 3, 4, 5, and Supplemental Figures S8, S9, S10, S11, and S12. We now also provide more details about pMR1 in a new section of the methods (lines 740-748):

      “Plasmid MR1 (pMR1)

      The plasmid MR1 (pMR1) is a variant of the plasmid RV2 (pRV2) in which the kan resistance gene has been swapped with the cm resistance gene (Guazzaroni and Silva-Rocha, 2014). Plasmid pMR1 encodes the BBa_J34801 ribosomal binding site (RBS, AAAGAGGAGAAA) 6 bp upstream of the start codon for GFP(LVA). The plasmid also encodes a putative RBS (AAGGGAGG) (Cazemier et al., 1999) 5 bp upstream of the start codon for mCherry on the opposite strand.

      The plasmid additionally contains the low-to-medium copy number origin of replication p15A (Westmann et al., 2018).

      A map of the plasmid is available on the Github repository: https://github.com/tfuqua95/promoter_islands

      The reviewer also makes a valid point about promoter elements of the plasmid itself. We addressed it with the following new analyses. First we re-examined each of the examples where new -10 and -35 boxes are gained or lost, to see if any of these hotspots occur on the flanking ends of the parent sequences. We looked specifically at the ends because they could potentially interact with -10 and -35 box-like sequences on the plasmid to form a promoter. 

      Only one of these hotspots (out of 27) occurred at the end of the cloned sequences, and is thus a candidate for the phenomenon the reviewer hypothesized. This hotspot occurs in P9-GFP, where gaining a -10 box at the left flank increases expression (see Figure S8E-F’). There is indeed a -35 box 22-23 bp upstream of this -10 box on the plasmid, which could potentially affect promoter activity. 

      We tested the GFP expression of a construct harboring the point mutation which creates this -10 box on the left flank of P9-GFP. However, there was no significant difference in fluorescence between this construct and the wile-type P9-GFP (see Figure S8E-F’). Thus, this -35 box on pMR1 is not likely creating a new promoter.

      (6) Overselling the conclusions.

      Line 420: The paper claims to have generated important new insights into promoters. At the same time, the main conclusion is that "Our study demonstrates that mutations to -10 and -35 boxes motifs are the primary paths to create new promoters and to modulate the activity of existing promoters". This isn't new or unexpected. People have been doing experiments showing this for decades. Of course, mutations that make or destroy promoter elements create and destroy promoters. How could it be any other way?

      In hindsight, we agree that the original conclusion was not very novel. Our new conclusion is that -10 and -35 boxes do not repress transcription, and that our current promoter models, even with the additional motifs like the UP-element and the extended -10, are insufficient to understand promoters (lines 687-694):

      “Overall, our study demonstrates that -10 and -35 boxes neither prevent existing promoters from driving expression, nor do they prevent new promoters from emerging by mutation. It shows how mutations can create new -10 and -35 boxes near or on top of preexisting ones to modulate expression. However, randomly creating a new -10 or -35 box will rarely create a new promoter, even if the new box is appropriately spaced upstream or downstream of a cognate box. Ultimately our study demonstrates that promoter models need to be further scrutinized, and that using mutagenesis to create de-novo promoters can provide new insights into promoter regulatory logic.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      I would like to start by thanking the authors for presenting an interesting and well-written article for review. This paper is a welcome addition to the field, addressing modern questions in the longstanding area of bacterial gene regulation. It is both enlightening and inspiring. While I do have suggestions, I hope these are not perceived as a lack of optimism for the work.

      Thank you for your kind words and suggestions, and for providing an astute and constructive review. We feel that manuscript has greatly improved with your suggested changes.

      ABSTRACT:

      Line 11: The sentence, "It is possible that these motifs influence..." Could be rewritten to be clearer as it is the most important point of the manuscript. It is not obvious that you're talking about how the local landscape of motifs affects the probability of promoters evolving/devolving in this location.

      We have changed the sentence to read, “Here, we ask whether the presence of such motifs in different genetic sequences influences promoter evolution and emergence.”

      INTRODUCTION:

      Line 68: Is the -35 consensus motif not TTGACA? Here it is listed as TTGAAA.

      Corrected from TTGAAA to TTGACA

      RESULTS:

      Line 92-94. In finding that the. The main takeaway from this work is that different sequences have different likelihoods of mutations creating promoters and so I believe this claim could be explored deeper with more quantitative information. Could the authors supplement this claim by including? Could you look at whether there is a correlation between the baseline expression of a parent sequence and Pnew? I expect even the inactive sequences to have some variability in measured expression.

      Thank you for this great idea. We followed up on it by plotting the baseline parent sequence fluorescence scores against Pnew. You are indeed correct, i.e., Pnew increases with baseline expression following a sigmoid function, and is now shown in Figure 1D. To report our new observations, we have added the following section to the Results (lines 219-232):

      “Although mutating each of the 40 non-promoter parent sequences could create promoter activity, the likelihood Pnew that a mutant has promoter activity, varies dramatically among parents. For each non-promoter parent, Fig 1D shows the percentage of active daughter sequences. The median Pnew is 0.046 (std. ± 0.078), meaning that ~4.6% of all mutants have promoter activity. The lowest Pnew is 0.002 (P25-GFP) and the highest 0.41 (P8-RFP), a 205-fold difference.

      We hypothesized that these large differences in Pnew could be explained by minute differences in the fluorescence scores of each parent, particularly if its score was below 1.5 a.u. Plotting the fluorescence scores of each parent (N=50) and their respective Pnew values as a scatterplot (Fig 1E), we can fit these values to a sigmoid curve (see methods). This finding helps to explain why P8-RFP has a high Pnew (0.41) and P25-GFP a low Pnew (0.002), as their fluorescence scores are 1.380 and 1.009 a.u., respectively. The fact that the inflection point of the fitted curve is at 1.51 a.u. further justifies our use of 1.5 a.u. as a cutoff for promoter and non-promoter activity.”

      Another potentially interesting analysis would be to see if k-mer content is correlated with Pnew. That is, determine the abundance of all hexamers in the sequence and see if Pnew is correlated with the number of hexamers present that is one nucleotide distance away from the consensus motifs (such as TcGACA or TAcAAT).

      We performed the suggested analysis by searching for k-mers that correlate with Pnew and found that no k-mer significantly correlates with Pnew (lines 240-248):

      “We then asked whether any k-mers ranging from 1-6 bp correlated with the non-promoter Pnew values (5,460 possible k-mers). 718 of these 1-6 bp k-mers are present 3 or more times in at least one non-promoter parent. We calculated a linear regression between the frequency of these 718 k-mers and each Pnew value, and adjusted the p-values to respective q-values (Benjamini-Hochberg correction, FDR=0.05). This analysis revealed six k-mers: CTTC, GTTG,

      ACTTC, GTTGA, AACTTC, TAACTT which correlate with Pnew. However, these correlations are heavily influenced by an outlying Pnew value of 0.41 (P8-RFP) (Fig S5C-H), and upon removing P8-RFP from the analysis, no k-mer significantly correlates with Pnew (data not shown)”

      Line 152-157: How did you define the thresholds for 'active' or 'inactive'? It is not clear in the methods how this distinction was made.

      We have more clearly defined these thresholds in the text. A sequence with promoter activity has a fluorescence score greater than 1.5 a.u. (lines 168-172):

      “We declared a daughter sequence to have promoter activity or to be a promoter if its score was greater than or equal to 1.5 a.u., as this score lies at the boundary between no fluorescence and weak fluorescence based on the sort-seq bins (methods). Otherwise, we refer to a daughter sequence as having no promoter activity or being a non-promoter.”

      Lines: 152-157: In trying to find the parent expression levels, no figure was available showing the distribution of parent expression levels. Furthermore, In looking at Data S2 & filtering out for sequences with distance 0 from the parent, I found the most active sequences did not match up with the sequences described as active in this section (e.g. p19 and p20 have a higher topstrand mean over P22, yet are not listed as active top strand sequences).

      We really appreciate you taking the time to examine the supplemental data. We previously listed the parents that had only GFP activity but no RFP activity (P22), and only RFP activity but no GFP activity (P6, P12, P13, P18, P21). We then said that P19 and P20 were bidirectional promoters, because they showed both GFP and RFP activity. In hindsight, we realize that our wording was confusing. We thus rewrote the affected paragraph, such that the bidirectional promoters are now in both lists of GFP/RFP active parents. We also now make the distinction between “templates” which comprise our 25 promoter island fragments, and “parents”, where we treat both strands separately (50 parents total). The paragraph in question now reads (lines 173-187):

      “Because some sequences in our library are unmutated parent sequences, we determined that 10/50 of the parent sequences already encode promoter activity before mutagenesis. Specifically, three parents drove expression on the top strand (P19-GFP, P20-GFP, P22-GFP), and five did on the bottom strand (P6-RFP, P12-RFP, P13-RFP, P18-RFP, P19-RFP, P20-RFP, P21-RFP). Two parents harbor bidirectional promoters (P19 and P20). The remaining 40 parent sequences are non-promoters, with an average fluorescence score of 1.39 a.u. We note that some of these parents have a fluorescence score higher than 1.39 a.u., but less than 1.50 a.u. such as P8-RFP (1.38 a.u.), P16-RFP (1.39 a.u.), P9-GFP (1.49 a.u.), and P1-GFP (1.47 a.u.). Whether these are truly “promoters” or not, is based solely on our threshold value of 1.5 a.u. We also note that 30% (15/50) of parents have the TGn motif upstream of a -10 box, but only 20% (3/15) of these parents have promoter activity (underlined with promoter activity: P4-RFP, P6-RFP, P8-RFP, P9RFP, P10-RFP, P11-GFP, P12-GFP, P17-GFP, P18-GFP, P18-RFP, P19-RFP, P22-RFP, P24-GFP, P25-GFP, P25RFP). See Fig S4 for fluorescence score distributions for each parent and its daughters, and Data S2 for all daughter sequence fluorescence scores.”

      Please include a supplementary figure showing the different parent expression levels (GFP mean +/- sd). Also, please explain the discrepancy in the 'active sequences' compared to Data S2 or correct my misunderstanding.

      We have added this plot to Figure S4B. The discrepancy arose because we listed the parents that had only GFP activity but no RFP activity (P22), and only RFP activity but no GFP activity (P6, P12, P13, P18, P21). We then said that P19 and P20 were bidirectional promoters, because they showed both GFP and RFP activity. previous response regarding the ambiguity.

      Line 182: I do not see 'Fuqua and Wagner 2023' in the references (though I am familiar with the preprint).

      We have added Fuqua and Wagner, BiorXiv 2023 to the references.

      Lines 197 - 200: The distribution of hotspot locations should be compared to the distribution of mutations in the library. e.g. It is not notable that 17% of mutations are in -10 motifs if 17% of all mutations are in -10 motifs.

      Thank you for raising this point. To address it, we carried out a computational analysis where we randomly scrambled the nucleotides of each parent sequence while maintaining the coordinates for each mutual information “hotspot.” This scrambling results in significantly less overlap with hotspots and boxes. This analysis is now depicted in Figure 2C and written in lines 272-296.

      Lines 253-264: Examples 3B, 3D, and 3F should indicate the spacing between the new and existing motifs. Are these close to the 15-19 bp spacer lengths preferred by sigma70?

      Point well taken. We now annotate the spacing of motifs in Figures 3, 4, 5, and Supplemental Figures S8, S9, S10, and S11. We note that in many cases, high-scoring PWM hits for the same motif can overlap (i.e. two -10 motifs or two -35 motifs overlap). Additionally, the proximity of a 35 and -10 box does not guarantee that the two boxes are interacting. Together, these two facts can result in an ambiguity of the spacer size between two boxes. To avoid any reporting bias, we thus often report spacer sizes as a range (see Figure panels 4F, S8D, S8F-L, S9A, S9H, S10A, and S10E). The smallest spacer we annotate is in Figure 4F with 10 bp, and the largest is in Figure S8D with 26 bp. Any more “extreme” distances are not annotated, and for the reader to decide if an interaction is present or not.

      Line 255: While fun, I am concerned about the 'Shiko' analogy. My understanding is the prevailing theory is that -35 recognition occurs before -10 recognition (https://doi.org/10.1073/pnas.94.17.9022, 10.1101/sqb.1998.63.141). Given this, the 'Shiko -35' concept in 3H is a bit awkward as it suggests that sigma70 stops at -10 motifs before planting down on the -35. Considering the cited paper is still in the preprint stages (and did not observe these Shiko -35 emergences), I am concerned about how this particular example will be received by the community. Perhaps more care could be done to verify that this example is consistent with generally accepted mechanisms of promoter recognition or a short clarification could be added to clarify the extent of the analogy.

      Thank you for raising this point. We decided to remove the Shiko analogy, because several readers assumed that it relates to the physical binding of RNA polymerase, rather than being an evolutionary mechanism of mutations forming complementary motifs in a stepwise manner.

      Lines 323-326: It would be helpful to describe a more systematic approach to defining emergence events into different categories. A clear definition of each category in the methods or main text would help others consistently refer to these concepts in the future. This could be helped by showing the actual parent vs daughter sequences as a supplementary figure to figures 4B, 4D, & 4G.

      We agree this could have been more clearly communicated. We have addressed this by 1) simplifying the nomenclatures of these categories and  2) clearly defining these categories, and 3) showing the actual parent vs daughter sequences in Figure 4, and Supplemental Figures S9, S10, S11, and S12. More specifically:

      (1) Simplifying the nomenclature. We highlight events where gaining new -10 and -35 boxes can modify the promoter activity of parent sequences with promoter activity. This occurs when a new -10 or -35 box appears that partially overlaps with the -10 or -35 box of the actual promoter. Thus, we rename two terms: hetero-gain and homo-gain, shown in Figure 4B:

      (2) We clearly define these categories (lines 430-435):

      “We found that these mutations frequently create new boxes overlapping those we had identified as part of a promoter (Fig S9). This occurs when mutations create a -10 box overlapping a -10 box, a -35 box overlapping a 35 box, a -10 box overlapping a -35 box, or a -35 box overlapping a -10 box. We call the resulting event a “homogain” when the new box is of the same type as the one it overlaps, and otherwise a “hetero-gain”. In either case, the creation of the new box does not always destroy the original box.”

      In the original manuscript, there was an additional third category, where gaining a -35 box upstream of the promoter’s -35 box, and gaining a -10 box upstream of the promoter’s -10 box decreased expression. We referred to this as a “tandem motif” and it can be found in Figure S12C,D. However, in response to comment “(4) Ignoring or misrepresenting the literature” from Reviewer #3, we carried out an analysis of the binding of H-NS (see Figure 5 and Figure S12). This analysis revealed that this “tandem motif” phenomenon was actually the result of changing the affinity of H-NS to these regions. Thus, the “tandem motif” is probably spurious.

      DISCUSSION:

      Line 378-379: Since hotspots are essentially areas where promoters appear, wouldn't it be obvious that having more hotspots (i.e. areas where more promoters appear) would equate to a higher probability of new promoters? It would be helpful to clarify why this isn't obvious. This could be resolved by adding more complexity to the statement, such as showing that the level of mutual information found in a hotspot or across all hotspots in a sequence is correlated with Pnew.

      A fair criticism. In response, we have chosen to remove the analysis of this trend from the manuscript entirely. (Additionally, Pnew and mutual information calculations both relied on the fluorescence scores of daughter sequences, so the finding was circular in its logic.)

      Line 394-396: This comparison of findings to Bykov et al should include a bit more justification for the proposed mechanism and how it specifically was observed in this paper. What did they observe and how do these findings relate?

      We gladly followed this suggestion, and added the following two paragraphs to the discussion (lines 622-640).

      “A previous study randomly mutagenized the appY promoter island upstream of a GFP reporter, and isolated variants with increased and decreased GFP expression. The authors found that variants with higher GFP expression acquired mutations that 1) improve a -10 box to better match its consensus, and simultaneously 2) destroy other -10 and -35 boxes (Bykov et al., 2020). The authors concluded that additional -10 and -35 boxes repress expression driven by promoter islands. Our data challenge this conclusion in several ways. 

      First, we find that only ~13% of -10 and -35 boxes in promoter islands actually contribute to promoter activity. Extrapolating this percentage to the appY promoter island, ~87% (100% - 13%) of the motifs would not be contributing to its activity. Assuming the appY promoter island is not an outlier, this would insinuate that during random mutagenesis, these inert motifs might have accumulated mutations that do not change fluorescence. Indeed, Bykov et al. (Bykov et al., 2020) also found that a similar frequency of -10 and -35 boxes were destroyed in variants selected for lower GFP expression, which supports this argument. Second, we find no evidence that creating a -10 or -35 box lowers promoter activity in any of our 50 parent sequences. Third, we also find no evidence that destruction of a -10 or -35 box increases promoter activity without plausible alternative explanations, i.e. overlap of the destroyed box with a H-NS site, destruction of the promoter, or simultaneous creation of another motif as a result of the destruction. In sum, -10 and 35 boxes are not likely to repress promoter activity. “

      METHODS:

      Line 500: Could you provide more details on PMR1 (e.g. size, copy number, RBS strength) or a reference? I could not find this easily.

      Thank you for pointing out this oversight. In response, we have added the following subsection to the methods (lines 740-748):

      “Plasmid MR1 (pMR1)

      The plasmid MR1 (pMR1) is a variant of the plasmid RV2 (pRV2) in which the kan resistance gene has been swapped with the cm resistance gene (Guazzaroni and Silva-Rocha, 2014). Plasmid pMR1 encodes the BBa_J34801 ribosomal binding site (RBS, AAAGAGGAGAAA) 6 bp upstream of the start codon for GFP(LVA). The plasmid also encodes a putative RBS (AAGGGAGG) (Cazemier et al., 1999) 5 bp upstream of the start codon for mCherry on the opposite strand.

      The plasmid additionally contains the low-to-medium copy number origin of replication p15A (Westmann et al., 2018).

      A map of the plasmid is available on the Github repository: https://github.com/tfuqua95/promoter_islands.”

      Line 581: What was the sequencing instrument &/or depth?

      We now report this information as follows (Methods, lines 918-922):

      “Illumina sequencing

      The amplicon pool was sequenced by Eurofins Genomics (Eurofins GmbH, Germany) using a NovaSeq 6000 (Illumina, USA) sequencer, with an S4 flow cell, and a PE150 (Paired-end 150 bp) run. In total, 282’843’000 reads and 84’852’900’000 bases were sequenced. Raw sequencing reads can be found here: https://www.ncbi.nlm.nih.gov/bioproject/1071572.”

      SUPPLEMENT:

      Supplementary Figure 2: Why does the GFP control produce a bimodal distribution?

      The GFP+ culture was inoculated directly from a glycerol stock. The bimodal distribution probably results from a subset of the bacteria having lost the GFP-coding insert, because the left-most peak coincides with the negative control.

      Reviewer #2 (Recommendations For The Authors):

      This paper would benefit from a clear definition of what constitutes an active promoter as this is only mentioned as justification for the use of arbitrary values for fluorescence.

      Good point. To clarify, we now include this new paragraph in the introduction (lines 112-119):

      “In this study, we define a promoter as a DNA sequence that drives the expression of a (fluorescent) protein whose expression level, measured by its fluorescence, is greater than a defined threshold. We use a threshold of 1.5 arbitrary units (a.u.) of fluorescence. This definition does not distinguish between transcription and translation. We chose it because protein expression is usually more important than RNA expression whenever natural selection acts on gene expression, because it is the primary phenotype visible to natural selection (Jiang et al., 2023).”

      There needs to be a clear distinction in the use of the word sequences as often interchange sequences when meaning the 25 parent sequences and then the 50 possible sequences directions the promoter can act. It is confusing going from one to the other.

      We agree that this distinction is important. To make it clearer, we now introduce an additional term (lines 119-130). Our experiments start from 25 promoter island fragments (P1-P25), which we now call template sequences. Each template sequence comprises both DNA strands. The parent sequences are the top and bottom strands of each template sequence. Therefore, there are now 50 parent sequences (P1-GFP, P1-RFP, P2-GFP…, P25-RFP). By treating each strand as its own sequence, we no longer have to refer to the strand, avoiding the earlier confusion.

      The description of the hotspots is often unclear and trying to determine if 3 out of 9 hotspots come from one parent sequence or multiple is not possible. A table denoting this information would be most helpful.

      We agree, and now provide this information in Data S3.

      Finally, the description of the proposed mechanism of promoter activation via mutation of motifs should not be in the results but in the discussion, as it has insufficient evidence and would require further experimental validation.

      We remedied this problem by providing experimental validation of the proposed mechanisms. Specifically, we created the precise mutations that caused a loss or gain of a -10 or a -35 box, and measured the level of gene expression they drive with a plate reader. Because we chose to provide this experimental validation, we opted to leave the mechanisms of promoter activation in the results section.

      The (Fuqua and Wagner 20023) paper is not in the references.

      We have added Fuqua and Wagner, BiorXiv 2023 to the references.

      I enjoyed the paper and wish the authors the best for their future work.

      Thank you for taking the time to review our manuscript!

      Reviewer #3 (Recommendations For The Authors):

      The paper has major flaws. For example:

      The data need to be analysed with correct promoter sequence element sequences (TTGACA for the -35 element).

      The discrepancy lies in the frequency of A’s vs C’s at position #5 of the PWM. Our PWM was built with more A’s than C’s at this position, but also includes C’s in this position. However, we respectfully disagree that using a different -35 box PWM is going to change the outcomes of our study. First, positions 4-6 of the PWM barely have any information content (bits) compared to positions 1-3 (see Fig 1A). This assertion is not just based on our own PWM, but based on ample precedent in the literature. In PMID 14529615, TTG is present in 38% of all -35 boxes, but ACA only 8%. In PMID 29388765, with the -10 instance TATAAT, the -35 instance TTGCAA yields stronger promoters compared to the -35 instance TTGACA (See their Figure 3B). In PMID 29745856 (Figure 2), the most information content lies in positions 1-3, with the A and C at position 5 both nearly equally represented, as in our PWM. In PMID 33958766 (Figure 1) an experimentally-derived -35 box is even reduced to a “partial” -35 box which only includes positions 1 and 2, with consensus: TTnnnn. Additionally, the -35 box PWM that we used significantly and strongly correlates with an experimentally derived -35 box (see Supporting Information from Figure S4 of Belliveau et al., PNAS 2017. Pearson correlation coefficient = 0.89). We now provide DNA sequences for each of the figures to improve accessibility and reproducibility. A reader can now use any PWM or method they wish to interpret the data.

      The data need to be analysed taking into account the role of other promoter elements and sequences for translation.

      Point well taken. 

      Thank you for bringing this oversight to our attention. We have performed two independent analyses to explore the role of TGn in promoter emergence in evolution. First, we computationally searched for -10 boxes with the bases TGn immediately upstream of them in the parent sequences, and found 18 of these “extended -10 boxes” in the parents (lines 143145):

      “On average, each parent sequence contains ~5.32 -10 boxes and ~7.04 -35 boxes (Fig S1). 18 of these -10 boxes also include the TGn motif upstream of the hexamer.”

      However, only 20% of these boxes were found in parents with promoter activity (lines 182-185):

      “We also note that 30% (15/50) of parents have the TGn motif upstream of a -10 box, but only 20% (3/15) of these parents have promoter activity (underlined with promoter activity: P4-RFP, P6-RFP, P8-RFP, P9-RFP, P10-RFP, P11GFP, P12-GFP, P17-GFP, P18-GFP, P18-RFP, P19-RFP, P22-RFP, P24-GFP, P25-GFP, P25-RFP).” 

      Second, we computationally searched through all of the daughter sequences to identify new -10 boxes with TGn immediately upstream. We found 114 -10 boxes with the bases TGn upstream. However, only 5 new -10 boxes (2 with TGn) were associated with increasing fluorescence (lines 338-345):

      “Mutations indeed created many new -10 and -35 boxes in our daughter sequences. On average, 39.5 and 39.4 new 10 and -35 boxes emerged at unique positions within the daughter sequences of each mutagenized parent (Fig 3A,B), with 1’562 and 1’576 new locations for -10 boxes and -35 boxes, respectively. ~22% (684/3’138) of these new boxes are spaced 15-20 bp away from their cognate box, and ~7.3% (114/1’562) of the new -10 boxes have the TGn motif upstream of them. However, only a mere five of the new -10 boxes and four of the new -35 boxes are significantly associated with increasing fluorescence by more than +0.5 a.u. (Fig 3C,D).”

      In addition, we now study the role of UP elements. This analysis showed that the UP element plays a negligible role in promoter emergence within our dataset.  It is discussed in a new subsection of the results (lines 591-608).

      “The UP-element does not strongly influence promoter activity in our dataset.

      The UP element is an additional AT-rich promoter motif that can lie stream of a -35 box in a promoter sequence (Estrem et al., 1998; Ross et al., 1993). We asked whether the creation of UP-elements also creates or modulates promoter activity in our dataset. To this end, we first identified a previously characterized position-weight matrix for the UP element (NNAAAWWTWTTTTNNWAAASYM, PWM threshold score = 19.2 bits) (Estrem et al., 1998) (Fig S13A). We then computationally searched for UP-element-specific hotspots within the parent sequences, i.e., locations in which mutations that gain or lose UP-elements lead to significant fluorescence increases (Mann-Whitney U-test, Fig S7 and methods. See Data S8 for the coordinates, fluorescence changes, and significance). The analysis did not identify any UP elements whose mutation significantly changes fluorescence. 

      We then repeated the analysis with a less stringent PWM threshold of 4.8 bits (1/4th of the PWM threshold score). This time, we identified 74 “UP-like” elements that are created or destroyed at unique positions within the parents. 23 of these motifs significantly change fluorescence when created or destroyed. However, even with this liberal threshold, none of these UP-like elements increase fluorescence by more than 0.5 a.u. when gained, or decrease fluorescence by more than 0.5 a.u. when lost (Fig S13B). This finding ultimately suggests that the UP element plays a negligible role in promoter emergence within our dataset.”

      Collectively, these additional analyses suggest that the presence of TGn plus a -10 box is insufficient to create promoter activity, and that the UP element does not play a significant role in promoter emergence or evolution.

      The full sequences used need to be provided and mutations resulting in new promoters need to be shown.

      To Figures 3, 4, 5, and Supplemental Figures S8, S9, S10, S11, and S12, we have added the sequences which created or the destroyed the promoters, and their PWM scores.

      The paper needs to be rewritten to take into account the relevant literature on i) promoter islands (i.e. sections of horizontally acquired AT-rich DNA) ii) generation and loss of promoters by mutation.

      We have rewritten the introduction. The majority of these points are now addressed in the following two new paragraphs (lines 92-112):

      “Recent work shows that mutations can help new promoters to emerge from promoter motifs or from sequences adjacent to such motifs (Bykov et al., 2020; Fuqua and Wagner, 2023; Yona et al., 2018). However, encoding -10 and -35 boxes is insufficient to drive complete transcription of a gene coding sequence. For instance, the E. coli genome contains clusters of -10 and -35 boxes that are bound by RNA polymerase and produce short oligonucleotide fragments, but rarely create complete transcripts. Such clusters are called promoter islands, and are strongly associated with horizontally-transferred DNA (Bykov et al., 2020; Panyukov and Ozoline, 2013; Purtov et al., 2014; Shavkunov et al., 2009). 

      There are two proposed explanations for why promoter islands do not create full transcripts. First, the TF H-NS may repress promoter activity in promoter islands. This is because in a Δhns background, transcript levels from the promoter islands increases (Purtov et al., 2014). However, mutagenizing a specific promoter island (appY) until it transcribes a GFP reporter, reveals that in-vitro H-NS binding does not significantly change when GFP levels increase (Bykov et al., 2020). Thus, it is not clear whether H-NS actually represses the complete transcription of these sequences. The second proposed explanation is that excessive promoter motifs silence transcription. The aforementioned study found that promoter activity increases when mutations improve a -10 box to better match its consensus (TAAAAAT→TATACT), while simultaneously destroying surrounding -10 and -35 boxes (Bykov et al., 2020). However, we note that if these surrounding motifs never contributed to GFP fluorescence to begin with, then mutations could also simply have accumulated in them during random mutagenesis without affecting promoter activity.”

      In closing, we would like to thank all three reviewers again for your time to engage with this manuscript.

      Summary of specific changes that we have made to each section of the manuscript 

      • Abstract

      - We updated the abstract to include the finding that more than 1’500 new -10s and 35s are created in our dataset, but only ~0.3% of them actually create de-novo promoter activity.

      - We no longer highlight the conclusion that the majority of promoters emerge and evolve from -10 and -35 boxes.

      • Introduction

      - We have added more background information about the UP-element and the TGn motif.

      - We better describe the promoter islands and the results identified by Bykov et al., 2020.

      • Results: Promoter island sequences are enriched with motifs for -10 and -35 boxes.

      - We clarify how the -10 and -35 PWMs we use were derived.

      - We refer to the 25 promoter island fragments as “Template sequences” (P1-P25). The “parent sequences” now correspond to the top and bottom strands of each template (N=50, P1-GFP, P1-RFP, P2-GFP, …, P25-RFP).

      - We elaborate that ~7% of the -10 boxes in the template sequences have the TGn motif.

      - In the previous version of the manuscript, if there were overlapping -10 boxes or overlapping -35 box, we counted these to be a single -10 box or a single -35 box, respectively. In the new version of the manuscript, we now treat each motif as an independent box. Because of this, the number of -10 and -35 boxes per parent have slightly increased.  

      •Results: Non-promoters vary widely in their potential to become promoters.

      - We make a clear distinction between promoters and non-promoters, and define the parent sequences.

      - We note that only 20% of parents with an “extended -10 box” have promoter activity.

      • Results: Promoter emergence correlates with minute differences in background promoter levels.

      - We added an analysis where we compare Pnew to the parent fluorescence levels, even if they are below 1.5 a.u. We find that the distribution of Pnew matches a sigmoid function.

      • Results: Promoter emergence does not correlate with simple sequence features

      - We added an analysis comparing k-mer counts to Pnew.

      - We updated the way we count -10 and -35 boxes, and recalculated the correlation with Pnew. The P and R2 values have changed, but Pnew still does not significantly correlate with -10 or -35 box counts.

      • Results: Promoters emerge and evolve only from specific subsets of -10 and -35 boxes

      - We have added an analysis where we computationally scramble the wild-type parent sequences while maintaining the coordinates of the mutual information hotspots. This reveals that the overlap with -10 and -35 motifs is not a coincidence of dense promoter motif encoding.

      We found a computational error in our analysis and updated the percent overlap between -10 boxes and -35 boxes with mutual information hotspots. The results are similar. o 14% of -10 boxes overlap with hotspots with our new way of defining -10 and -35 boxes.

      • Results: New -10 and -35 boxes readily emerge, but rarely lead to de-novo promoter activity

      - We quantify how often a new -10 and -35 box is created at a unique position within our collection of promoter fragments, and how often this results in a -10 and -35 box being appropriately spaced, and how often this actually leads to de-novo promoter activity. o We quantify how often a TGn sequence lies upstream of a new -10 box.

      • Results: Promoters can emerge when mutations create motifs but not by destroying them.

      - For each example, we added the DNA sequences of the wild-type region of interest and the mutant region of interest that results in the gain of promoter activity, and their respective PWM scores. 

      - We created constructs to validate each example by testing their fluorescence on a plate reader.

      - We removed the P1-GFP example from the main figure, as it was a false-positive in the dataset. It is now in Fig S8.

      - We removed the Shiko Emergence metaphor because it could be confused with a binding mechanism for RNA polymerase.

      • Results – Gaining new motifs over existing motifs increases and decreases promoter activity.

      - We removed the “Tandem motif” because it is more likely caused by H-NS binding.

      - We renamed the mechanisms to be “hetero-gain” and “homo-gain” for simplicity, and clearly define how we classified each sequence into each category.

      - We now include the DNA sequences, the PWM scores, the spacer lengths, and the fluorescence values from constructs harboring the predicted point mutations.

      • Results – Histone-like nucleoid-structuring protein (H-NS) represses P12-RFP and P22-GFP.

      - This is a new analysis, which explores the role of the TF H-NS in repressing the parent sequences. 

      - We identified putative H-NS motifs in P12-RFP and P22-GFP.

      - We show experimentally that in a H-NS null background, a bidirectional promoter (P20) becomes unidirectional, even though P20 does not contain an obvious H-NS motif.

      - In the original version of the manuscript, we describe a phenomenon where gaining a -35 box upstream of a promoter’s -35 box, or a -10 box upstream of a promoter’s -10 box significantly decreases expression. We called this phenomenon a “tandem motif.” However, in the newest version of the manuscript, we find that these fluorescence decreases are rescued in a H-NS null background, suggesting the finding was actually due to H-NS binding modulation and not -10 and -35 boxes.

      • Results – The UP-element does not strongly influence promoter activity in our dataset.

      We used a PWM for the UP element to see if gaining or losing UP motifs was significantly correlated with increasing or decreasing expression. Even with a liberal PWM threshold, the analysis did not find any UP elements.

      • Discussion

      - We rewrote the discussion to account for the new analyses and the results on H-NS, the UP-element, and the extended -10.

      - We better explain how our results clash with the results from the Bykov paper.

      - We fit our results into the context of David Grainger’s papers.

      • Methods

      - Added an explanation about pMR1.

      - Added methods describing how we created the point mutation constructs.

      - Added the methods for the plate reader.

      - Added the methods for Illumina sequencing.

      - Added the methods for the sigmoid curve-fitting.

      • Figure 1

      - Panel E compares how Pnew (the probability of a daughter sequence having a fluorescence score greater than 1.5 a.u.) associates with the fluorescence scores of each parent sequence.

      - Panel F was originally in Figure S5. In the originally submitted version of the manuscript, if there were overlapping -10s or overlapping -35s, we counted these to be a single -10 or a single -35, respectively. In the new version of the manuscript, we now treat each motif as an independent box. Because of this, the r2 and p values have changed, but the conclusions have not (Pnew still does not significantly correlate with -10 or -35 box counts).

      • Figure 2

      - Panel C now includes a stacked barplot showing the percentage of -10 and -35 boxes that overlap with mutual information hotspots when the parent sequences are randomly scrambled computationally.

      • Figure 3

      - Panels A-C were added to explain how we define a new -10/-35 box, how many such new boxes each parent has. These panels also illustrate how we associate the presence or absence of a motif with significant changes in fluorescence scores of the daughter sequences.

      - We moved the example of P1-GFP to Figure S8 because when we tested the specific mutation which leads to gaining the -10 box, fluorescence did not change.

      - We now include the DNA sequences, the PWM scores, the spacer lengths, and the fluorescence values from reporter constructs harboring the point mutations predicted by our computational analyses.

      - Cartoons of RNA polymerase have been removed.

      • Figure 4

      - The tandem-motif has been removed from the figure.

      - Cartoons of RNA polymerase have been removed.

      - We now include the DNA sequences, the PWM scores, the spacer lengths, and the fluorescence values from constructs harboring the point mutations predicted by our computational analyses.

      • Figure 5

      - This is a new figure analyzing the role of H-NS in promoter evolution and emergence.

      • Figure S4

      - Panel B now shows the wild-type parent scores and their standard deviations from the sort-seq experiment.

      • Figure S5

      - Panels with -10 and -35 box counts moved to Figure 1.

      - The panel comparing Pnew to hotspot counts was removed.

      - Correlations between different k-mers and Pnew are added to panels C-H.

      • Figure S8

      - We now include the DNA sequences, the PWM scores, the spacer lengths, and the fluorescence values from constructs harboring the point mutations predicted by our computational analyses.

      • Figure S9

      - We now include the DNA sequences, the PWM scores, the spacer lengths, and the fluorescence values from constructs harboring the point mutations predicted by our computational analyses.

      • Figure S10

      - We now include the DNA sequences, the PWM scores, the spacer lengths, and the fluorescence values from constructs harboring the point mutations predicted by our computational analyses.

      • Figure S11

      - Added DNA sequences and PWM scores.

      • Figure S12

      - A new figure with further insights about H-NS.

      • Figure S13

      - A new figure regarding the UP-element analysis.

      • Figure S14

      - Added Panel D to show how we created mutant reporter constructs for validation.

    1. Author response:

      The issue of a control without blue light illumination was raised. Clearly without the light we will not obtain any signal in the fluorescence microscopy experiments, which would not be very informative. Instead, we changed the level of blue light illumination in the fluorescence microscopy experiments (figure 4A) and the response of the bacteria scales with dosage. It is very hard to find an alternative explanation, beyond that the blue light is stressing the bacteria and modulating their membrane potentials.

      One of the referees refuses to see wavefronts in our microscopy data. We struggle to understand whether it is an issue with definitions (Waigh has published a tutorial on the subject in Chapter 5 of his book ‘The physics of bacteria: from cells to biofilms’, T.A.Waigh, CUP, 2024 – figure 5.1 shows a sketch) or something subtler on diffusion in excitable systems. We stand by our claim that we observe wavefronts, similar to those observed by Prindle et al<sup>1</sup> and Blee et al<sup>2</sup> for B. subtilis biofilms.

      The referee is questioning our use of ThT to probe the membrane potential. We believe the Pilizota and Strahl groups are treating the E. coli as unexcitable cells, leading to their problems. Instead, we believe E. coli cells are excitable (containing the voltage-gated ion channel Kch) and we now clearly state this in the manuscript. Furthermore, we include a section here discussing some of the issues with ThT.


      Use of ThT as a voltage sensor in cells

      ThT is now used reasonably widely in the microbiology community as a voltage sensor in both bacterial [Prindle et al]1 and fungal cells [Pena et al]12. ThT is a small cationic fluorophore that loads into the cells in proportion to their membrane potential, thus allowing the membrane potential to be measured from fluorescence microscopy measurements.

      Previously ThT was widely used to quantify the growth of amyloids in molecular biology experiments (standardized protocols exist and dedicated software has been created)13 and there is a long history of its use14. ThT fluorescence is bright, stable and slow to photobleach.

      Author response image 1 shows a schematic diagram of the ThT loading in E. coli in our experiments in response to illumination with blue light. Similar results were previously presented by Mancini et al15, but regimes 2 and 3 were mistakenly labelled as artefacts.

      Author response image 1.

      Schematic diagram of ThT loading during an experiment with E. coli cells under blue light illumination i.e. ThT fluorescence as a function of time. Three empirical regimes for the fluorescence are shown (1, 2 and 3).

      The classic study of Prindle et al on bacterial biofilm electrophysiology established the use of ThT in B. subtilis biofilms by showing similar results occurred with DiSc3 which is widely used as a Nernstian voltage sensor in cellular biology1 e.g. with mitochondrial membrane potentials in eukaryotic organisms where there is a large literature. We repeated such a comparative calibration of ThT with DiSc3 in a previous publication with both B. subtilis and P. aeruginosa cells2. ThT thus functioned well in our previous publications with Gram positive and Gram negative cells.

      However, to our knowledge, there are now two groups questioning the use of ThT and DiSc3 as voltage sensors with E. coli cells15-16. The first by the Pilizota group claims ThT only works as a voltage sensor in regime 1 of Author response image 1 using a method based on the rate of rotation of flagellar motors. Another slightly contradictory study by the Strahl group claims DiSc316 only acts as a voltage sensor with the addition of an ionophore for potassium which allows free movement of potassium through the E. coli membranes.

      Our resolution to this contradiction is that ThT does indeed work reasonably well with E. coli. The Pilizota group’s model for rotating flagellar motors assumes the membrane voltage is not varying due to excitability of the membrane voltage (otherwise a non-linear Hodgkin Huxley type model would be needed to quantify their results) i.e. E. coli cells are unexcitable. We show clearly in our study that ThT loading in E. coli is a function of irradiation with blue light and is a stress response of the excitable cells. This is in contradiction to the Pilizota group’s model. The Pilizota group’s model also requires the awkward fiction of why cells decide to unload and then reload ThT in regimes 2 and 3 of Author response image 1 due to variable membrane partitioning of the ThT. Our simple explanation is that it is just due to the membrane voltage changing and no membrane permeability switch needs to be invoked. The Strahl group’s16 results with DiSc3 are also explained by a neglect of the excitable nature of E. coli cells that are reacting to blue light irradiation. Adding ionophores to the E. coli membranes makes the cells unexcitable, reduces their response to blue light and thus leads to simple loading of DiSc3 (the physiological control of K+ in the cells by voltage-gated ion channels has been short circuited by the addition of the ionophore).

      Further evidence of our model that ThT functions as a voltage sensor with E. coli include:

      1) The 3 regimes in Author response image 1 from ThT correlate well with measurements of extracellular potassium ion concentration using TMRM i.e. all 3 regimes in Author response image 1 are visible with this separate dye (figure 1d).

      2) We are able to switch regime 3 in Author response image 1, off and then on again by using knock downs of the potassium ion channel Kch in the membranes of the E. coli and then reinserting the gene back into the knock downs. This cannot be explained by the Pilizota model.

      We conclude that ThT works reasonably well as a sensor of membrane voltage in E. coli and the previous contradictory studies15-16 are because they neglect the excitable nature of the membrane voltage of E. coli cells in response to the light used to make the ThT fluoresce.

      Three further criticisms of the Mancini et al method15 for calibrating membrane voltages include:

      1) E. coli cells have clutches that are not included in their models. Otherwise the rotation of the flagella would be entirely enslaved to the membrane voltage allowing the bacteria no freedom to modulate their speed of motility.

      2) Ripping off the flagella may perturb the integrity of the cell membrane and lead to different loading of the ThT in the E. coli cells.

      3) Most seriously, the method ignores the activity of many other ion channels (beyond H+) on the membrane voltage that are known to exist with E. coli cells e.g. Kch for K+ ions. The Pilizota groups uses a simple Nernstian battery model developed for mitochondria in the 1960s. It is not adequate to explain our results.

      An additional criticism of the Winkel et al study17 from the Strahl group is that it indiscriminately switches between discussion of mitochondria and bacteria e.g. on page 8 ‘As a consequence the membrane potential is dominated by H+’. Mitochondria are slightly alkaline intracellular organelles with external ion concentrations in the cytoplasm that are carefully controlled by the eukaryotic cells. E. coli are not i.e. they have neutral internal pHs, with widely varying extracellular ionic concentrations and have reinforced outer membranes to resist osmotic shocks (in contrast mitochondria can easily swell in response to moderate changes in osmotic pressure).

      A quick calculation of the equilibrium membrane voltage of E. coli can be easily done using the Nernst equation dependent on the extracellular ion concentrations defined by the growth media (the intracellular ion concentrations in E. coli are 0.2 M K+ and 10-7 M H+ i.e. there is a factor of a million fewer H+ ions). Thus in contradiction to the claims of the groups of Pilizota15 and Strahl17, H+ is a minority determinant to the membrane voltage of E. coli. The main determinant is K+. For a textbook version of this point the authors can refer to Chapter 4 of D. White, et al’s ‘The physiology and biochemistry of prokaryotes’, OUP, 2012, 4th edition.

      Even in mitochondria the assumption that H+ dominates the membrane potential and the cells are unexcitable can be questioned e.g. people have observed pulsatile depolarization phenomena with mitochondria18-19. A large number of K+ channels are now known to occur in mitochondrial membranes (not to mention Ca2+ channels; mitochondria have extensive stores of Ca2+) and they are implicated in mitochondrial membrane potentials. In this respect the seminal Nobel prize winning research of Peter Mitchell (1961) on mitochondria needs to be amended20. Furthermore, the mitochondrial work is clearly inapplicable to bacteria (the proton motive force, PMF, will instead subtly depend on non-linear Hodgkin-Huxley equations for the excitable membrane potential, similar to those presented in the current article). A much more sophisticated framework has been developed to describe electrophysiology by the mathematical biology community to describe the activity of electrically excitable cells (e.g. with neurons, sensory cells and cardiac cells), beyond Mitchell’s use of the simple stationary equilibrium thermodynamics to define the Proton Motive Force via the electrochemical potential of a proton (the use of the word ‘force’ is unfortunate, since it is a potential). The tools developed in the field of mathematical electrophysiology8 should be more extensively applied to bacteria, fungi, mitochondria and chloroplasts if real progress is to be made.


      Related to the previous point, we now cite articles from the Pilizota and Strahl groups in the main text (one from each group). Unfortunately, the space constraints of eLife mean we cannot make a more detailed discussion in the main article.

      In terms of modelling the ion channels, the Hodgkin-Huxley type model proposes that the Kch ion channel can be modelled as a typical voltage-gated potassium ion channel i.e. with a 𝑛<sup>4</sup> term in its conductivity. The literature agrees that Kch is a voltage-gated potassium ion channel based on its primary sequence<sup>3</sup>. The protein has the typical 6 transmembrane helix motif for a voltage-gated ion channel. The agent-based model assumes little about the structure of ion channels in E. coli, other than they release potassium in response to a threshold potassium concentration in their environment. The agent based model is thus robust to the exact molecular details chosen and predicts the anomalous transport of the potassium wavefronts reasonably well (the modelling was extended in a recent Physical Review E article(<sup>4</sup>). Such a description of reaction-anomalous diffusion phenomena has not to our knowledge been previously achieved in the literature<sup>5</sup> and in general could be used to describe other signaling molecules.

      1. Prindle, A.; Liu, J.; Asally, M.; Ly, S.; Garcia-Ojalvo, J.; Sudel, G. M., Ion channels enable electrical communication in bacterial communities. Nature 2015, 527, 59.

      2. Blee, J. A.; Roberts, I. S.; Waigh, T. A., Membrane potentials, oxidative stress and the dispersal response of bacterial biofilms to 405 nm light. Physical Biology 2020, 17, 036001.

      3. Milkman, R., An E. col_i homologue of eukaryotic potassium channel proteins. _PNAS 1994, 91, 3510-3514.

      4. Martorelli, V.; Akabuogu, E. U.; Krasovec, R.; Roberts, I. S.; Waigh, T. A., Electrical signaling in three-dimensional bacterial biofilms using an agent-based fire-diffuse-fire model. Physical Review E 2024, 109, 054402.

      5. Waigh, T. A.; Korabel, N., Heterogeneous anomalous transport in cellular and molecular biology. Reports on Progress in Physics 2023, 86, 126601.

      6. Hodgkin, A. L.; Huxley, A. F., A quantitative description of membrane current and its application to conduction and excitation in nerve. Journal of Physiology 1952, 117, 500.

      7. Dawson, S. P.; Keizer, J.; Pearson, J. E., Fire-diffuse-fire model of dynamics of intracellular calcium waves. PNAS 1999, 96, 606.

      8. Keener, J.; Sneyd, J., Mathematical Physiology. Springer: 2009.

      9. Coombes, S., The effect of ion pumps on the speed of travelling waves in the fire-diffuse-fire model of Ca2+ release. Bulletin of Mathematical Biology 2001, 63, 1.

      10. Blee, J. A.; Roberts, I. S.; Waigh, T. A., Spatial propagation of electrical signals in circular biofilms. Physical Review E 2019, 100, 052401.

      11. Gorochowski, T. E.; Matyjaszkiewicz, A.; Todd, T.; Oak, N.; Kowalska, K., BSim: an agent-based tool for modelling bacterial populations in systems and synthetic biology. PloS One 2012, 7, 1.

      12. Pena, A.; Sanchez, N. S.; Padilla-Garfias, F.; Ramiro-Cortes, Y.; Araiza-Villaneuva, M.; Calahorra, M., The use of thioflavin T for the estimation and measurement of the plasma membrane electric potential difference in different yeast strains. Journal of Fungi 2023, 9 (9), 948.

      13. Xue, C.; Lin, T. Y.; Chang, D.; Guo, Z., Thioflavin T as an amyloid dye: fibril quantification, optimal concentration and effect on aggregation. Royal Society Open Science 2017, 4, 160696.

      14. Meisl, G.; Kirkegaard, J. B.; Arosio, P.; Michaels, T. C. T.; Vendruscolo, M.; Dobson, C. M.; Linse, S.; Knowles, T. P. J., Molecular mechanisms of protein aggregation from global fitting of kinetic models. Nature Protocols 2016, 11 (2), 252-272.

      15. Mancini, L.; Tian, T.; Guillaume, T.; Pu, Y.; Li, Y.; Lo, C. J.; Bai, F.; Pilizota, T., A general workflow for characterization of Nernstian dyes and their effects on bacterial physiology. Biophysical Journal 2020, 118 (1), 4-14.

      16. Buttress, J. A.; Halte, M.; Winkel, J. D. t.; Erhardt, M.; Popp, P. F.; Strahl, H., A guide for membrane potential measurements in Gram-negative bacteria using voltage-sensitive dyes. Microbiology 2022, 168, 001227.

      17. Derk te Winkel, J.; Gray, D. A.; Seistrup, K. H.; Hamoen, L. W.; Strahl, H., Analysis of antimicrobial-triggered membrane depolarization using voltage sensitive dyes. Frontiers in Cell and Developmental Biology 2016, 4, 29.

      18. Schawarzlander, M.; Logan, D. C.; Johnston, I. G.; Jones, N. S.; Meyer, A. J.; Fricker, M. D.; Sweetlove, L. J., Pulsing of membrane potential in individual mitochondria. The Plant Cell 2012, 24, 1188-1201.

      19. Huser, J.; Blatter, L. A., Fluctuations in mitochondrial membrane potential caused by repetitive gating of the permeability transition pore. Biochemistry Journal 1999, 343, 311-317.

      20. Mitchell, P., Coupling of phosphorylation to electron and hydrogen transfer by a chemi-osmotic type of mechanism. Nature 1961, 191 (4784), 144-148.

      21. Baba, T.; Ara, M.; Hasegawa, Y.; Takai, Y.; Okumura, Y.; Baba, M.; Datsenko, K. A.; Tomita, M.; Wanner, B. L.; Mori, H., Construction of Escherichia Coli K-12 in-frame, single-gene knockout mutants: the Keio collection. Molecular Systems Biology 2006, 2, 1.

      22. Schinedlin, J.; al, e., Fiji: an open-source platform for biological-image analysis. Nature Methods 2012, 9, 676.

      23. Hartmann, R.; al, e., Quantitative image analysis of microbial communities with BiofilmQ. Nature Microbiology 2021, 6 (2), 151.


      The following is the authors’ response to the original reviews.

      Critical synopsis of the articles cited by referee 2:

      (1) ‘Generalized workflow for characterization of Nernstian dyes and their effects on bacterial physiology’, L.Mancini et al, Biophysical Journal, 2020, 118, 1, 4-14.

      This is the central article used by referee 2 to argue that there are issues with the calibration of ThT for the measurement of membrane potentials. The authors use a simple Nernstian battery (SNB) model and unfortunately it is wrong when voltage-gated ion channels occur. Huge oscillations occur in the membrane potentials of E. coli that cannot be described by the SNB model. Instead a Hodgkin Huxley model is needed, as shown in our eLife manuscript and multiple other studies (see above). Arrhenius kinetics are assumed in the SNB model for pumping with no real evidence and the generalized workflow involves ripping the flagella off the bacteria! The authors construct an elaborate ‘work flow’ to insure their ThT results can be interpreted using their erroneous SNB model over a limited range of parameters.

      (2) ‘Non-equivalence of membrane voltage and ion-gradient as driving forces for the bacterial flagellar motor at low load’, C.J.Lo, et al, Biophysical Journal, 2007, 93, 1, 294.

      An odd de novo chimeric species is developed using an E. coli  chassis which uses Na+ instead of H+ for the motility of its flagellar motor. It is not clear the relevance to wild type E. coli, due to the massive physiological perturbations involved. A SNB model is using to fit the data over a very limited parameter range with all the concomitant errors.

      (3) Single-cell bacterial electrophysiology reveals mechanisms of stress-induced damage’, E.Krasnopeeva, et al, Biophysical Journal, 2019, 116, 2390.

      The abstract says ‘PMF defines the physiological state of the cell’. This statement is hyperbolic. An extremely wide range of molecules contribute to the physiological state of a cell. PMF does not even define the electrophysiology of the cell e.g. via the membrane potential. There are 0.2 M of K+ compared with 0.0000001 M of H+ in E. coli, so K+ is arguably a million times more important for the membrane potential than H+ and thus the electrophysiology!

      Equation (1) in the manuscript assumes no other ions are exchanged during the experiments other than H+. This is a very bad approximation when voltage-gated potassium ion channels move the majority ion (K+) around!

      In our model Figure 4A is better explained by depolarisation due to K+ channels closing than direct irreversible photodamage. Why does the THT fluorescence increase again for the second hyperpolarization event if the THT is supposed to be damaged? It does not make sense.

      (4) ‘The proton motive force determines E. coli robustness to extracellular pH’, G.Terradot et al, 2024, preprint.

      This article expounds the SNB model once more. It still ignores the voltage-gated ion channels. Furthermore, it ignores the effect of the dominant ion in E. coli, K+. The manuscript is incorrect as a result and I would not recommend publication.

      In general, an important problem is being researched i.e. how the membrane potential of E. coli is related to motility, but there are serious flaws in the SNB approach and the experimental methodology appears tenuous.

      Answers to specific questions raised by the referees

      Reviewer #1 (Public Review):

      Summary:

      Cell-to-cell communication is essential for higher functions in bacterial biofilms. Electrical signals have proven effective in transmitting signals across biofilms. These signals are then used to coordinate cellular metabolisms or to increase antibiotic tolerance. Here, the authors have reported for the first time coordinated oscillation of membrane potential in E. coli biofilms that may have a functional role in photoprotection.

      Strengths:

      - The authors report original data.

      - For the first time, they showed that coordinated oscillations in membrane potential occur in E. Coli biofilms.

      - The authors revealed a complex two-phase dynamic involving distinct molecular response mechanisms.

      - The authors developed two rigorous models inspired by 1) Hodgkin-Huxley model for the temporal dynamics of membrane potential and 2) Fire-Diffuse-Fire model for the propagation of the electric signal.

      - Since its discovery by comparative genomics, the Kch ion channel has not been associated with any specific phenotype in E. coli. Here, the authors proposed a functional role for the putative K+ Kch channel : enhancing survival under photo-toxic conditions.

      We thank the referee for their positive evaluations and agree with these statements.

      Weaknesses:

      - Since the flow of fresh medium is stopped at the beginning of the acquisition, environmental parameters such as pH and RedOx potential are likely to vary significantly during the experiment. It is therefore important to exclude the contributions of these variations to ensure that the electrical response is only induced by light stimulation. Unfortunately, no control experiments were carried out to address this issue.

      The electrical responses occur almost instantaneously when the stimulation with blue light begins i.e. it is too fast to be a build of pH. We are not sure what the referee means by Redox potential since it is an attribute of all chemicals that are able to donate/receive electrons. The electrical response to stress appears to be caused by ROS, since when ROS scavengers are added the electrical response is removed i.e. pH plays a very small minority role if any.

      - Furthermore, the control parameter of the experiment (light stimulation) is the same as that used to measure the electrical response, i.e. through fluorescence excitation. The use of the PROPS system could solve this problem.

      >>We were enthusiastic at the start of the project to use the PROPs system in E. coli as presented by J.M.Krajl et al, ‘Electrical spiking in E. coli probed with a fluorescent voltage-indicating protein’, Science, 2011, 333, 6040, 345. However, the people we contacted in the microbiology community said that it had some technical issues and there have been no subsequent studies using PROPs in bacteria after the initial promising study. The fluorescent protein system recently presented in PNAS seems more promising, ‘Sensitive bacterial Vm sensors revealed the excitability of bacterial Vm and its role in antibiotic tolerance’, X.Jin et al, PNAS, 120, 3, e2208348120.

      - Electrical signal propagation is an important aspect of the manuscript. However, a detailed quantitative analysis of the spatial dynamics within the biofilm is lacking. In addition, it is unclear if the electrical signal propagates within the biofilm during the second peak regime, which is mediated by the Kch channel. This is an important question, given that the fire-diffuse-fire model is presented with emphasis on the role of K+ ions.

      We have presented a more detailed account of the electrical wavefront modelling work and it is currently under review in a physical journal, ‘Electrical signalling in three dimensional bacterial biofilms using an agent based fire-diffuse-fire model’, V.Martorelli, et al, 2024 https://www.biorxiv.org/content/10.1101/2023.11.17.567515v1

      - Since deletion of the kch gene inhibits the long-term electrical response to light stimulation (regime II), the authors concluded that K+ ions play a role in the habituation response. However, Kch is a putative K+ ion channel. The use of specific drugs could help to clarify the role of K+ ions.

      Our recent electrical impedance spectroscopy publication provides further evidence that Kch is associated with large changes in conductivity as expected for a voltage-gated ion channel (https://pubs.acs.org/doi/10.1021/acs.nanolett.3c04446, 'Electrical impedance spectroscopy with bacterial biofilms: neuronal-like behavior', E.Akabuogu et al, ACS Nanoletters, 2024, in print.

      - The manuscript as such does not allow us to properly conclude on the photo-protective role of the Kch ion channel.

      That Kch has a photoprotective role is our current working hypothesis. The hypothesis fits with the data, but we are not saying we have proven it beyond all possible doubt.

      - The link between membrane potential dynamics and mechanosensitivity is not captured in the equation for the Q-channel opening dynamics in the Hodgkin-Huxley model (Supp Eq 2).

      Our model is agnostic with respect to the mechanosensitivity of the ion channels, although we deduce that mechanosensitive ion channels contribute to ion channel Q.

      - Given the large number of parameters used in the models, it is hard to distinguish between prediction and fitting.

      This is always an issue with electrophysiological modelling (compared with most heart and brain modelling studies we are very conservative in the choice of parameters for the bacteria). In terms of predicting the different phenomena observed, we believe the model is very successful.

      Reviewer #2 (Public Review):

      Summary of what the authors were trying to achieve:

      The authors thought they studied membrane potential dynamics in E.coli biofilms. They thought so because they were unaware that the dye they used to report that membrane potential in E.coli, has been previously shown not to report it. Because of this, the interpretation of the authors' results is not accurate.

      We believe the Pilizota work is scientifically flawed.

      Major strengths and weaknesses of the methods and results:

      The strength of this work is that all the data is presented clearly, and accurately, as far as I can tell.

      The major critical weakness of this paper is the use of ThT dye as a membrane potential dye in E.coli. The work is unaware of a publication from 2020 https://www.sciencedirect.com/science/article/pii/S0006349519308793 [sciencedirect.com] that demonstrates that ThT is not a membrane potential dye in E. coli. Therefore I think the results of this paper are misinterpreted. The same publication I reference above presents a protocol on how to carefully calibrate any candidate membrane potential dye in any given condition.

      We are aware of this study, but believe it to be scientifically flawed. We do not cite the article because we do not think it is a particularly useful contribution to the literature.

      I now go over each results section in the manuscript.

      Result section 1: Blue light triggers electrical spiking in single E. coli cells

      I do not think the title of the result section is correct for the following reasons. The above-referenced work demonstrates the loading profile one should expect from a Nernstian dye (Figure 1). It also demonstrates that ThT does not show that profile and explains why is this so. ThT only permeates the membrane under light exposure (Figure 5). This finding is consistent with blue light peroxidising the membrane (see also following work Figure 4 https://www.sciencedirect.com/science/article/pii/S0006349519303923 [sciencedirect.com] on light-induced damage to the electrochemical gradient of protons-I am sure there are more references for this).

      The Pilizota group invokes some elaborate artefacts to explain the lack of agreement with a simple Nernstian battery model. The model is incorrect not the fluorophore.

      Please note that the loading profile (only observed under light) in the current manuscript in Figure 1B as well as in the video S1 is identical to that in Figure 3 from the above-referenced paper (i.e. https://www.sciencedirect.com/science/article/pii/S0006349519308793 [sciencedirect.com]), and corresponding videos S3 and S4. This kind of profile is exactly what one would expect theoretically if the light is simultaneously lowering the membrane potential as the ThT is equilibrating, see Figure S12 of that previous work. There, it is also demonstrated by the means of monitoring the speed of bacterial flagellar motor that the electrochemical gradient of protons is being lowered by the light. The authors state that applying the blue light for different time periods and over different time scales did not change the peak profile. This is expected if the light is lowering the electrochemical gradient of protons. But, in Figure S1, it is clear that it affected the timing of the peak, which is again expected, because the light affects the timing of the decay, and thus of the decay profile of the electrochemical gradient of protons (Figure 4 https://www.sciencedirect.com/science/article/pii/S0006349519303923 [sciencedirect.com]).

      We think the proton effect is a million times weaker than that due to potasium i.e. 0.2 M K+ versus 10-7 M H+. We can comfortably neglect the influx of H+ in our experiments.

      If find Figure S1D interesting. There authors load TMRM, which is a membrane voltage dye that has been used extensively (as far as I am aware this is the first reference for that and it has not been cited https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1914430 [ncbi.nlm.nih.gov]/). As visible from the last TMRM reference I give, TMRM will only load the cells in Potassium Phosphate buffer with NaCl (and often we used EDTA to permeabilise the membrane). It is not fully clear (to me) whether here TMRM was prepared in rich media (it explicitly says so for ThT in Methods but not for TMRM), but it seems so. If this is the case, it likely also loads because of the damage to the membrane done with light, and therefore I am not surprised that the profiles are similar.

      The vast majority of cells continue to be viable. We do not think membrane damage is dominating.

      The authors then use CCCP. First, a small correction, as the authors state that it quenches membrane potential. CCCP is a protonophore (https://pubmed.ncbi.nlm.nih.gov/4962086 [pubmed.ncbi.nlm.nih.gov]/), so it collapses electrochemical gradient of protons. This means that it is possible, and this will depend on the type of pumps present in the cell, that CCCP collapses electrochemical gradient of protons, but the membrane potential is equal and opposite in sign to the DeltapH. So using CCCP does not automatically mean membrane potential will collapse (e.g. in some mammalian cells it does not need to be the case, but in E.coli it is https://www.biorxiv.org/content/10.1101/2021.11.19.469321v2 [biorxiv.org]). CCCP has also been recently found to be a substrate for TolC (https://journals.asm.org/doi/10.1128/mbio.00676-21 [journals.asm.org]), but at the concentrations the authors are using CCCP (100uM) that should not affect the results. However, the authors then state because they observed, in Figure S1E, a fast efflux of ions in all cells and no spiking dynamics this confirms that observed dynamics are membrane potential related. I do not agree that it does. First, Figure S1E, does not appear to show transients, instead, it is visible that after 50min treatment with 100uM CCCP, ThT dye shows no dynamics. The action of a Nernstian dye is defined. It is not sufficient that a charged molecule is affected in some way by electrical potential, this needs to be in a very specific way to be a Nernstian dye. Part of the profile of ThT loading observed in https://www.sciencedirect.com/science/article/pii/S0006349519308793 [sciencedirect.com] is membrane potential related, but not in a way that is characteristic of Nernstian dye.

      Our understanding of the literature is CCCP poisons the whole metabolism of the bacterial cells. The ATP driven K+ channels will stop functioning and this is the dominant contributor to membrane potential.

      Result section 2: Membrane potential dynamics depend on the intercellular distance

      In this chapter, the authors report that the time to reach the first intensity peak during ThT loading is different when cells are in microclusters. They interpret this as electrical signalling in clusters because the peak is reached faster in microclusters (as opposed to slower because intuitively in these clusters cells could be shielded from light). However, shielding is one possibility. The other is that the membrane has changed in composition and/or the effective light power the cells can tolerate (with mechanisms to handle light-induced damage, some of which authors mention later in the paper) is lower. Given that these cells were left in a microfluidic chamber for 2h hours to attach in growth media according to Methods, there is sufficient time for that to happen. In Figure S12 C and D of that same paper from my group (https://ars.els-cdn.com/content/image/1-s2.0-S0006349519308793-mmc6.pdf [ars.els-cdn.com]) one can see the effects of peak intensity and timing of the peak on the permeability of the membrane. Therefore I do not think the distance is the explanation for what authors observe.

      Shielding would provide the reverse effect, since hyperpolarization begins in the dense centres of the biofilms. For the initial 2 hours the cells receive negligible blue light. Neither of the referee’s comments thus seem tenable.

      Result section 3: Emergence of synchronized global wavefronts in E. coli biofilms

      In this section, the authors exposed a mature biofilm to blue light. They observe that the intensity peak is reached faster in the cells in the middle. They interpret this as the ion-channel-mediated wavefronts moved from the center of the biofilm. As above, cells in the middle can have different membrane permeability to those at the periphery, and probably even more importantly, there is no light profile shown anywhere in SI/Methods. I could be wrong, but the SI3 A profile is consistent with a potential Gaussian beam profile visible in the field of view. In Methods, I find the light source for the blue light and the type of microscope but no comments on how 'flat' the illumination is across their field of view. This is critical to assess what they are observing in this result section. I do find it interesting that the ThT intensity collapsed from the edges of the biofilms. In the publication I mentioned https://www.sciencedirect.com/science/article/pii/S0006349519308793#app2 [sciencedirect.com], the collapse of fluorescence was not understood (other than it is not membrane potential related). It was observed in Figure 5A, C, and F, that at the point of peak, electrochemical gradient of protons is already collapsed, and that at the point of peak cell expands and cytoplasmic content leaks out. This means that this part of the ThT curve is not membrane potential related. The authors see that after the first peak collapsed there is a period of time where ThT does not stain the cells and then it starts again. If after the first peak the cellular content leaks, as we have observed, then staining that occurs much later could be simply staining of cytoplasmic positively charged content, and the timing of that depends on the dynamics of cytoplasmic content leakage (we observed this to be happening over 2h in individual cells). ThT is also a non-specific amyloid dye, and in starving E. coli cells formation of protein clusters has been observed (https://pubmed.ncbi.nlm.nih.gov/30472191 [pubmed.ncbi.nlm.nih.gov]/), so such cytoplasmic staining seems possible.

      >>It is very easy to see if the illumination is flat (Köhler illumination) by comparing the intensity of background pixels on the detector. It was flat in our case. Protons have little to do with our work for reasons highlighted before. Differential membrane permittivity is a speculative phenomenon not well supported by any evidence and with no clear molecular mechanism.

      Finally, I note that authors observe biofilms of different shapes and sizes and state that they observe similar intensity profiles, which could mean that my comment on 'flatness' of the field of view above is not a concern. However, the scale bar in Figure 2A is not legible, so I can't compare it to the variation of sizes of the biofilms in Figure 2C (67 to 280um). Based on this, I think that the illumination profile is still a concern.

      The referee now contradicts themselves and wants a scale bar to be more visible. We have changed the scale bar.

      Result section 4: Voltage-gated Kch potassium channels mediate ion-channel electrical oscillations in E. coli

      First I note at this point, given that I disagree that the data presented thus 'suggest that E. coli biofilms use electrical signaling to coordinate long-range responses to light stress' as the authors state, it gets harder to comment on the rest of the results.

      In this result section the authors look at the effect of Kch, a putative voltage-gated potassium channel, on ThT profile in E. coli cells. And they see a difference. It is worth noting that in the publication https://www.sciencedirect.com/science/article/pii/S0006349519308793 [sciencedirect.com] it is found that ThT is also likely a substrate for TolC (Figure 4), but that scenario could not be distinguished from the one where TolC mutant has a different membrane permeability (and there is a publication that suggests the latter is happening https://onlinelibrary.wiley.com/doi/10.1111/j.1365-2958.2010.07245.x [onlinelibrary.wiley.com]). Given this, it is also possible that Kch deletion affects the membrane permeability. I do note that in video S4 I seem to see more of, what appear to be, plasmolysed cells. The authors do not see the ThT intensity with this mutant that appears long after the initial peak has disappeared, as they see in WT. It is not clear how long they waited for this, as from Figure S3C it could simply be that the dynamics of this is a lot slower, e.g. Kch deletion changes membrane permeability.

      The work that TolC provides a possible passive pathway for ThT to leave cells seems slightly niche. It just demonstrates another mechanism for the cells to equilibriate the concentrations of ThT in a Nernstian manner i.e. driven by the membrane voltage.

      The authors themselves state that the evidence for Kch being a voltage-gated channel is indirect (line 54). I do not think there is a need to claim function from a ThT profile of E. coli mutants (nor do I believe it's good practice), given how accurate single-channel recordings are currently. To know the exact dependency on the membrane potential, ion channel recordings on this protein are needed first.

      We have good evidence form electrical impedance spectroscopy experiments that Kch increases the conductivity of biofilms  (https://pubs.acs.org/doi/10.1021/acs.nanolett.3c04446, 'Electrical impedance spectroscopy with bacterial biofilms: neuronal-like behavior', E.Akabuogu et al, ACS Nanoletters, 2024, in print.

      Result section 5: Blue light influences ion-channel mediated membrane potential events in E. coli

      In this chapter the authors vary the light intensity and stain the cells with PI (this dye gets into the cells when the membrane becomes very permeable), and the extracellular environment with K+ dye (I have not yet worked carefully with this dye). They find that different amounts of light influence ThT dynamics. This is in line with previous literature (both papers I have been mentioning: Figure 4 https://www.sciencedirect.com/science/article/pii/S0006349519303923 [sciencedirect.com] and https://ars.els-cdn.com/content/image/1-s2.0-S0006349519308793-mmc6.pdf [ars.els-cdn.com] especially SI12), but does not add anything new. I think the results presented here can be explained with previously published theory and do not indicate that the ion-channel mediated membrane potential dynamics is a light stress relief process.

      The simple Nernstian battery model proposed by Pilizota et al is erroneous in our opinion for reasons outlined above. We believe it will prove to be a dead end for bacterial electrophysiology studies.

      Result section 6: Development of a Hodgkin-Huxley model for the observed membrane potential dynamics

      This results section starts with the authors stating: 'our data provide evidence that E. coli manages light stress through well-controlled modulation of its membrane potential dynamics'. As stated above, I think they are instead observing the process of ThT loading while the light is damaging the membrane and thus simultaneously collapsing the electrochemical gradient of protons. As stated above, this has been modelled before. And then, they observe a ThT staining that is independent from membrane potential.

      This is an erroneous niche opinion. Protons have little say in the membrane potential since there are so few of them. The membrane potential is mostly determined by K+.

      I will briefly comment on the Hodgkin Huxley (HH) based model. First, I think there is no evidence for two channels with different activation profiles as authors propose. But also, the HH model has been developed for neurons. There, the leakage and the pumping fluxes are both described by a constant representing conductivity, times the difference between the membrane potential and Nernst potential for the given ion. The conductivity in the model is given as gK*n^4 for potassium, gNa*m^3*h sodium, and gL for leakage, where gK, gNa and gL were measured experimentally for neurons. And, n, m, and h are variables that describe the experimentally observed voltage-gated mechanism of neuronal sodium and potassium channels. (Please see Hodgkin AL, Huxley AF. 1952. Currents carried by sodium and potassium ions through the membrane of the giant axon of Loligo. J. Physiol. 116:449-72 and Hodgkin AL, Huxley AF. 1952. A quantitative description of membrane current and its application to conduction and excitation in nerve. J. Physiol. 117:500-44).

      In the 70 years since Hodgkin and Huxley first presented their model, a huge number of similar models have been proposed to describe cellular electrophysiology. We are not being hyperbolic when we state that the HH models for excitable cells are like the Schrödinger equation for molecules. We carefully adapted our HH model to reflect the currently understood electrophysiology of E. coli.

      Thus, in applying the model to describe bacterial electrophysiology one should ensure near equilibrium requirement holds (so that (V-VQ) etc terms in authors' equation Figure 5 B hold), and potassium and other channels in a given bacterium have similar gating properties to those found in neurons. I am not aware of such measurements in any bacteria, and therefore think the pump leak model of the electrophysiology of bacteria needs to start with fluxes that are more general (for example Keener JP, Sneyd J. 2009. Mathematical physiology: I: Cellular physiology. New York: Springer or https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0000144 [journals.plos.org])

      The reference is to a slightly more modern version of a simple Nernstian battery model. The model will not oscillate and thus will not help modelling membrane potentials in bacteria. We are unsure where the equilibrium requirement comes from (inadequate modelling of the dynamics?)

      Result section 7: Mechanosensitive ion channels (MS) are vital for the first hyperpolarization event in E. coli.

      The results that Mcs channels affect the profile of ThT dye are interesting. It is again possible that the membrane permeability of these mutants has changed and therefore the dynamics have changed, so this needs to be checked first. I also note that our results show that the peak of ThT coincides with cell expansion. For this to be understood a model is needed that also takes into account the link between maintenance of electrochemical gradients of ions in the cell and osmotic pressure.

      The evidence for permeability changes in the membranes seems to be tenuous.

      A side note is that the authors state that the Msc responds to stress-related voltage changes. I think this is an overstatement. Mscs respond to predominantly membrane tension and are mostly nonspecific (see how their action recovers cellular volume in this publication https://www.pnas.org/doi/full/10.1073/pnas.1522185113 [pnas.org]). Authors cite references 35-39 to support this statement. These publications still state that these channels are predominantly membrane tension-gated. Some of the references state that the presence of external ions is important for tension-related gating but sometimes they gate spontaneously in the presence of certain ions. Other publications cited don't really look at gating with respect to ions (39 is on clustering). This is why I think the statement is somewhat misleading.

      We have reworded the discussion of Mscs since the literature appears to be ambiguous. We will try to run some electrical impedance spectroscopy experiments on the Msc mutants in the future to attempt to remove the ambiguity.

      Result section 8: Anomalous ion-channel-mediated wavefronts propagate light stress signals in 3D E. coli biofilms.

      I am not commenting on this result section, as it would only be applicable if ThT was membrane potential dye in E. coli.

      Ok, but we disagree on the use of ThT.

      Aims achieved/results support their conclusions:

      The authors clearly present their data. I am convinced that they have accurately presented everything they observed. However, I think their interpretation of the data and conclusions is inaccurate in line with the discussion I provided above.

      Likely impact of the work on the field, and the utility of the methods and data to the community:

      I do not think this publication should be published in its current format. It should be revised in light of the previous literature as discussed in detail above. I believe presenting it in it's current form on eLife pages would create unnecessary confusion.

      We believe many of the Pilizota group articles are scientifically flawed and are causing the confusion in the literature.

      Any other comments:

      I note, that while this work studies E. coli, it references papers in other bacteria using ThT. For example, in lines 35-36 authors state that bacteria (Bacillus subtilis in this case) in biofilms have been recently found to modulate membrane potential citing the relevant literature from 2015. It is worth noting that the most recent paper https://journals.asm.org/doi/10.1128/mbio.02220-23 [journals.asm.org] found that ThT binds to one or more proteins in the spore coat, suggesting that it does not act as a membrane potential in Bacillus spores. It is possible that it still reports membrane potential in Bacillus cells and the recent results are strictly spore-specific, but these should be kept in mind when using ThT with Bacillus.

      >>ThT was used successfully in previous studies of normal B. subtilis cells (by our own group and A.Prindle, ‘Spatial propagation of electrical signal in circular biofilms’, J.A.Blee et al, Physical Review E, 2019, 100, 052401, J.A.Blee et al, ‘Membrane potentials, oxidative stress and the dispersal response of bacterial biofilms to 405 nm light’, Physical Biology, 2020, 17, 2, 036001, A.Prindle et al, ‘Ion channels enable electrical communication in bacterial communities’, Nature, 2015, 527, 59-63). The connection to low metabolism pore research seems speculative.

      Reviewer #3 (Public Review):

      It has recently been demonstrated that bacteria in biofilms show changes in membrane potential in response to changes in their environment, and that these can propagate signals through the biofilm to coordinate bacterial behavior. Akabuogu et al. contribute to this exciting research area with a study of blue light-induced membrane potential dynamics in E. coli biofilms. They demonstrate that Thioflavin-T (ThT) intensity (a proxy for membrane potential) displays multiphasic dynamics in response to blue light treatment. They additionally use genetic manipulations to implicate the potassium channel Kch in the latter part of these dynamics. Mechanosensitive ion channels may also be involved, although these channels seem to have blue light-independent effects on membrane potential as well. In addition, there are challenges to the quantitative interpretation of ThT microscopy data which require consideration. The authors then explore whether these dynamics are involved in signaling at the community level. The authors suggest that cell firing is both more coordinated when cells are clustered and happens in waves in larger, 3D biofilms; however, in both cases evidence for these claims is incomplete. The authors present two simulations to describe the ThT data. The first of these simulations, a Hodgkin-Huxley model, indicates that the data are consistent with the activity of two ion channels with different kinetics; the Kch channel mutant, which ablates a specific portion of the response curve, is consistent with this. The second model is a fire-diffuse-fire model to describe wavefront propagation of membrane potential changes in a 3D biofilm; because the wavefront data are not presented clearly, the results of this model are difficult to interpret. Finally, the authors discuss whether these membrane potential changes could be involved in generating a protective response to blue light exposure; increased death in a Kch ion channel mutant upon blue light exposure suggests that this may be the case, but a no-light control is needed to clarify this.

      In a few instances, the paper is missing key control experiments that are important to the interpretation of the data. This makes it difficult to judge the meaning of some of the presented experiments.

      (1) An additional control for the effects of autofluorescence is very important. The authors conduct an experiment where they treat cells with CCCP and see that Thioflavin-T (ThT) dynamics do not change over the course of the experiment. They suggest that this demonstrates that autofluorescence does not impact their measurements. However, cellular autofluorescence depends on the physiological state of the cell, which is impacted by CCCP treatment. A much simpler and more direct experiment would be to repeat the measurement in the absence of ThT or any other stain. This experiment should be performed both in the wild-type strain and in the ∆kch mutant.

      ThT is a very bright fluorophore (much brighter than a GFP). It is clear from the images of non-stained samples that autofluorescence provides a negligible contribution to the fluorescence intensity in an image.

      (2) The effects of photobleaching should be considered. Of course, the intensity varies a lot over the course of the experiment in a way that photobleaching alone cannot explain. However, photobleaching can still contribute to the kinetics observed. Photobleaching can be assessed by changing the intensity, duration, or frequency of exposure to excitation light during the experiment. Considerations about photobleaching become particularly important when considering the effect of catalase on ThT intensity. The authors find that the decrease in ThT signal after the initial "spike" is attenuated by the addition of catalase; this is what would be predicted by catalase protecting ThT from photobleaching (indeed, catalase can be used to reduce photobleaching in time lapse imaging).

      Photobleaching was negligible over the course of the experiments. We employed techniques such as reducing sample exposure time and using the appropriate light intensity to minimize photobleaching.

      (3) It would be helpful to have a baseline of membrane potential fluctuations in the absence of the proposed stimulus (in this case, blue light). Including traces of membrane potential recorded without light present would help support the claim that these changes in membrane potential represent a blue light-specific stress response, as the authors suggest. Of course, ThT is blue, so if the excitation light for ThT is problematic for this experiment the alternative dye tetramethylrhodamine methyl ester perchlorate (TMRM) can be used instead.

      Unfortunately the fluorescent baseline is too weak to measure cleanly in this experiment. It appears the collective response of all the bacteria hyperpolarization at the same time appears to dominate the signal (measurements in the eLife article and new potentiometry measurements).

      (4) The effects of ThT in combination with blue light should be more carefully considered. In mitochondria, a combination of high concentrations of blue light and ThT leads to disruption of the PMF (Skates et al. 2021 BioRXiv), and similarly, ThT treatment enhances the photodynamic effects of blue light in E. coli (Bondia et al. 2021 Chemical Communications). If present in this experiment, this effect could confound the interpretation of the PMF dynamics reported in the paper.

      We think the PMF plays a minority role in determining the membrane potential in E. coli. For reasons outlined before (H+ is a minority ion in E. coli compared with K+).

      (5) Figures 4D - E indicate that a ∆kch mutant has increased propidium iodide (PI) staining in the presence of blue light; this is interpreted to mean that Kch-mediated membrane potential dynamics help protect cells from blue light. However, Live/Dead staining results in these strains in the absence of blue light are not reported. This means that the possibility that the ∆kch mutant has a general decrease in survival (independent of any effects of blue light) cannot be ruled out.

      >>Both strains of bacterial has similar growth curve and also engaged in membrane potential dynamics for the duration of the experiment. We were interested in bacterial cells that observed membrane potential dynamics in the presence of the stress. Bacterial cells need to be alive to engage in membrane potential  dynamics (hyperpolarize) under stress conditions. Cells that engaged in membrane potential dynamics and later stained red were only counted after the entire duration. We believe that the wildtype handles the light stress better than the ∆kch mutant as measured with the PI.

      (6) Additionally in Figures 4D - E, the interpretation of this experiment can be confounded by the fact that PI uptake can sometimes be seen in bacterial cells with high membrane potential (Kirchhoff & Cypionka 2017 J Microbial Methods); the interpretation is that high membrane potential can lead to increased PI permeability. Because the membrane potential is largely higher throughout blue light treatment in the ∆kch mutant (Fig. 3AB), this complicates the interpretation of this experiment.

      Kirchhoff & Cypionka 2017 J Microbial Methods, using fluorescence microscopy, suggested that changes in membrane potential dynamics can introduce experimental bias when propidium iodide is used to confirm the viability of tge bacterial strains, B subtilis (DSM-10) and Dinoroseobacter shibae, that are starved of oxygen (via N2 gassing) for 2 hours. They attempted to support their findings by using CCCP in stopping the membrane potential dynamics (but never showed any pictoral or plotted data for this confirmatory experiment). In our experiment methodology, cell death was not forced on the cells by introducing an extra burden or via anoxia. We believe that the accumulation of PI in ∆kch mutant is not due to high membrane potential dynamics but is attributed to the PI, unbiasedly showing damaged/dead cells. We think that propidium iodide is good for this experiment. Propidium iodide is a dye that is extensively used in life sciences. PI has also been used in the study of bacterial electrophysiology (https://pubmed.ncbi.nlm.nih.gov/32343961/, ) and no membrane potential related bias was reported.

      Throughout the paper, many ThT intensity traces are compared, and described as "similar" or "dissimilar", without detailed discussion or a clear standard for comparison. For example, the two membrane potential curves in Fig. S1C are described as "similar" although they have very different shapes, whereas the curves in Fig. 1B and 1D are discussed in terms of their differences although they are evidently much more similar to one another. Without metrics or statistics to compare these curves, it is hard to interpret these claims. These comparative interpretations are additionally challenging because many of the figures in which average trace data are presented do not indicate standard deviation.

      Comparison of small changes in the absolute intensities is problematic in such fluorescence experiments. We mean the shape of the traces is similar and they can be modelled using a HH model with similar parameters.

      The differences between the TMRM and ThT curves that the authors show in Fig. S1C warrant further consideration. Some of the key features of the response in the ThT curve (on which much of the modeling work in the paper relies) are not very apparent in the TMRM data. It is not obvious to me which of these traces will be more representative of the actual underlying membrane potential dynamics.

      In our experiment, TMRM was used to confirm the dynamics observed using ThT. However, ThT appear to be more photostable than TMRM (especially towars the 2nd peak). The most interesting observation is that with both dyes, all phases of the membrane potential dynamics were conspicuous (the first peak, the quiescent period and the second peak). The time periods for these three episodes were also similar.

      A key claim in this paper (that dynamics of firing differ depending on whether cells are alone or in a colony) is underpinned by "time-to-first peak" analysis, but there are some challenges in interpreting these results. The authors report an average time-to-first peak of 7.34 min for the data in Figure 1B, but the average curve in Figure 1B peaks earlier than this. In Figure 1E, it appears that there are a handful of outliers in the "sparse cell" condition that likely explain this discrepancy. Either an outlier analysis should be done and the mean recomputed accordingly, or a more outlier-robust method like the median should be used instead. Then, a statistical comparison of these results will indicate whether there is a significant difference between them.

      The key point is the comparison of standard errors on the standard deviation.

      In two different 3D biofilm experiments, the authors report the propagation of wavefronts of membrane potential; I am unable to discern these wavefronts in the imaging data, and they are not clearly demonstrated by analysis.

      The first data set is presented in Figures 2A, 2B, and Video S3. The images and video are very difficult to interpret because of how the images have been scaled: the center of the biofilm is highly saturated, and the zero value has also been set too high to consistently observe the single cells surrounding the biofilm. With the images scaled this way, it is very difficult to assess dynamics. The time stamps in Video S3 and on the panels in Figure 2A also do not correspond to one another although the same biofilm is shown (and the time course in 2B is also different from what is indicated in 2B). In either case, it appears that the center of the biofilm is consistently brighter than the edges, and the intensity of all cells in the biofilm increases in tandem; by eye, propagating wavefronts (either directed toward the edge or the center) are not evident to me. Increased brightness at the center of the biofilm could be explained by increased cell thickness there (as is typical in this type of biofilm). From the image legend, it is not clear whether the image presented is a single confocal slice or a projection. Even if this is a single confocal slice, in both Video S3 and Figure 2A there are regions of "haze" from out-of-focus light evident, suggesting that light from other focal planes is nonetheless present. This seems to me to be a simpler explanation for the fluorescence dynamics observed in this experiment: cells are all following the same trajectory that corresponds to that seen for single cells, and the center is brighter because of increased biofilm thickness.

      We appreciate the reviewer for this important observation. We have made changes to the figures to address this confusion. The cell cover has no influence on the observed membrane potential dynamics. The entire biofilm was exposed to the same blue light at each time. Therefore all parts of the biofilm received equal amounts of the blue light intensity. The membrane potential dynamics was not influenced by cell density (see Fig 2C). 

      The second data set is presented in Video S6B; I am similarly unable to see any wave propagation in this video. I observe only a consistent decrease in fluorescence intensity throughout the experiment that is spatially uniform (except for the bright, dynamic cells near the top; these presumably represent cells that are floating in the microfluidic and have newly arrived to the imaging region).

      A visual inspection of Video S6B shows a fast rise, a decrease in fluorescence and a second rise (supplementary figure 4B). The data for the fluorescence was carefully obtained using the imaris software. We created a curved geometry on each slice of the confocal stack. We analyzed the surfaces of this curved plane along the z-axis. This was carried out in imaris.

      3D imaging data can be difficult to interpret by eye, so it would perhaps be more helpful to demonstrate these propagating wavefronts by analysis; however, such analysis is not presented in a clear way. The legend in Figure 2B mentions a "wavefront trace", but there is no position information included - this trace instead seems to represent the average intensity trace of all cells. To demonstrate the propagation of a wavefront, this analysis should be shown for different subpopulations of cells at different positions from the center of the biofilm. Data is shown in Figure 8 that reflects the velocity of the wavefront as a function of biofilm position; however, because the wavefronts themselves are not evident in the data, it is difficult to interpret this analysis. The methods section additionally does not contain sufficient information about what these velocities represent and how they are calculated. Because of this, it is difficult for me to evaluate the section of the paper pertaining to wave propagation and the predicted biofilm critical size.

      The analysis is considered in more detail in a more expansive modelling article, currently under peer review in a physics journal, ‘Electrical signalling in three dimensional bacterial biofilms using an agent based fire-diffuse-fire model’, V.Martorelli, et al, 2024 https://www.biorxiv.org/content/10.1101/2023.11.17.567515v1

      There are some instances in the paper where claims are made that do not have data shown or are not evident in the cited data:

      (1) In the first results section, "When CCCP was added, we observed a fast efflux of ions in all cells"- the data figure pertaining to this experiment is in Fig. S1E, which does not show any ion efflux. The methods section does not mention how ion efflux was measured during CCCP treatment.

      We have worded this differently to properly convey our results.

      (2) In the discussion of voltage-gated calcium channels, the authors refer to "spiking events", but these are not obvious in Figure S3E. Although the fluorescence intensity changes over time, it's hard to distinguish these fluctuations from measurement noise; a no-light control could help clarify this.

      The calcium transients observed were not due to noise or artefacts.

      (3) The authors state that the membrane potential dynamics simulated in Figure 7B are similar to those observed in 3D biofilms in Fig. S4B; however, the second peak is not clearly evident in Fig. S4B and it looks very different for the mature biofilm data reported in Fig. 2. I have some additional confusion about this data specifically: in the intensity trace shown in Fig. S4B, the intensity in the second frame is much higher than the first; this is not evident in Video S6B, in which the highest intensity is in the first frame at time 0. Similarly, the graph indicates that the intensity at 60 minutes is higher than the intensity at 4 minutes, but this is not the case in Fig. S4A or Video S6B.

      The confusion stated here has now been addressed. Also it should be noted that while Fig 2.1 was obtained with LED light source, Fig S4A was obtained using a laser light source. While obtaining the confocal images (for Fig S4A ), the light intensity was controlled to further minimize photobleaching. Most importantly, there is an evidence of slow rise to the 2nd peak in Fig S4B. The first peak, quiescence and slow rise to second peak are evident.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Scientific recommendations:

      - Although Fig 4A clearly shows that light stimulation has an influence on the dynamics of cell membrane potential in the biofilm, it is important to rule out the contribution of variations in environmental parameters. I understand that for technical reasons, the flow of fresh medium must be stopped during image acquisition. Therefore, I suggest performing control experiments, where the flow is stopped before image acquisition (15min, 30min, 45min, and 1h before). If there is no significant contribution from environmental variations (pH, RedOx), the dynamics of the electrical response should be superimposed whatever the delay between stopping the flow stop and switching on the light.

      In this current research study, we were focused on studying how E. coli cells and biofilms react to blue light stress via their membrane potential dynamics. This involved growing the cells and biofilms, stopping the media flow and obtaining data immediately. We believe that stopping the flow not only helped us to manage data acquisition, it also helped us reduce the effect of environmental factors. In our future study we will expand the work to include how the membrane potential dynamics evolve in the presence of changing environmental factors for example such induced by stopping the flow at varied times.

      - Since TMRM signal exhibits a linear increase after the first response peak (Supplementary Figure 1D), I recommend mitigating the statement at line 78.

      - To improve the spatial analysis of the electrical response, I suggest plotting kymographs of the intensity profiles across the biofilm. I have plotted this kymograph for Video S3 and it appears that there is no electrical propagation for the second peak. In addition, the authors should provide technical details of how R^2(t) is measured in the first regime (Figure 7E).

      See the dedicated simulation article for more details. https://www.biorxiv.org/content/10.1101/2023.11.17.567515v1

      - Line 152: To assess the variability of the latency, the authors should consider measuring the variance divided by the mean instead of SD, which may depend on the average value.

      We are happy with our current use of standard error on the standard deviation. It shows what we claim to be true.

      - Line 154-155: To truly determine whether the amplitude of the "action potential" is independent of biofilm size, the authors should not normalise the signals.

      Good point. We qualitatively compared both normalized and unnormalized data. Recent electrical impedance spectroscopy measurements (unpublished) indicate that the electrical activity is an extensive quantity i.e. it scales with the size of the biofilms.

      - To precise the role of K+ in the habituation response, I suggest using valinomycin at sub-inhibitory concentrations (10µM). Besides, the high concentration of CCCP used in this study completely inhibits cell activity. Not surprisingly, no electrical response to light stimulation was observed in the presence of CCCP. Finally, the Kch complementation experiment exhibits a "drop after the first peak" on a single point. It would be more convincing to increase the temporal resolution (1min->10s) to show that there is indeed a first and a second peak.

      An interesting experiment for the future.

      - Line 237-238: There are only two points suggesting that the dynamics of hyperpolarization are faster at higher irradiance(Fig 4A). The authors should consider adding a third intermediate point at 17µW/mm^2 to confirm the statement made in this sentence.

      Multiple repeats were performed. We are confident of the robustness of our data.

      - Line 249 + Fig 4E: It seems that the data reported on Fig 4E are extracted from Fig 4D. If this is indeed the case, the data should be normalised by the total population size to compare survival probabilities under the two conditions. It would also be great to measure these probabilities (for WT and ∆kch) in the presence of ROS scavengers.

      - To distinguish between model fitting and model predictions, the authors should clearly state which parameters are taken from the literature and which parameters are adjusted to fit the experimental data.

      - Supplementary Figure 4A: why can't we see any wavefront in this series of images?

      For the experimental data, the wavefront was analyzed by employing the imaris software. We systematically created a ROI with a curved geometry within the confocal stack (the biofilm). The fluorescence of ThT was traced along the surface of the curved geometry was analyzed along the z-axis.

      - Fig 7B: Could the authors explain why the plateau is higher in the simulations than in the biofilm experiments? Could they add noise on the firing activities?

      See the dedicated Martorelli modelling article. In general we would need to approach stochastic Hodgkin-Huxley modelling and the fluorescence data (and electrical impedance spectroscopy data) presented does not have extensive noise (due to collective averaging over many bacteria cells).

      - Supplementary Figure 4B: Why can't we see the second peak in confocal images?

      The second peak is present although not as robust as in Fig 2B. The confocal images were obtained with a laser source. Therefore we tried to create a balance between applying sufficient light stress on the bacterial cells and mitigating photobleaching.

      Editing recommendations:

      The editing recommendations below has been applied where appropriate

      - Many important technical details are missing (e.g. R^2, curvature, and 445nm irradiance measurements). Error bars are missing from most graphs. The captions should clearly indicate if these are single-cell or biofilm experiments, strain name, illumination conditions, number of experiments, SD, or SE. Please indicate on all panels of all figures in the main text and in the supplements, which are the conditions: single cell vs. biofilm, strains, medium, centrifugal vs centripetal etc..., where relevant. Please also draw error bars everywhere.

      We have now made appropriate changes. We specifically use cells when we were dealing with single cells and biofilms when we worked on biofilms. We decided to describe the strain name either on the panel or the image description.

      - Line 47-51: The way the paragraph is written suggests that no coordinated electrical oscillations have been observed in Gram-negative biofilms. However, Hennes et al (referenced as 57 in this manuscript) have shown that a wave of hyperpolarized cells propagates in Neisseria gonorrhoea colony, which is a Gram-negative bacterium.

      We are now aware of this work. It was not published when we first submitted our work and the authors claim the waves of activity are due to ROS diffusion NOT propagating waves of ions (coordinated electrical wavefronts).

      - Line 59: "stressor" -> "stress" or "perturbation".

      The correction has been made.

      - Line 153: Please indicate in the Material&Methods how the size of the biofilm is measured.

      The biofilm size was obtained using BiofilmQ and the step by step guide for using BiofilmQ were stated..

      - Figure 2A: Please provide associated brightfield images to locate bacteria.

      - Line 186: Please remove "wavefront" from the caption. Fig2B only shows the average signal as a function of time.

      This correction has been implemented.

      - Fig 3B,C: Please indicate single cell and biofilm on the panels and also WT and ∆kch.

      - Line 289: I suggest adding "in single cell experiments" to the title of this section.

      - Fig 5A: blue light is always present at regular time intervals during regime I and II. The presence of blue light only in regime I could be misleading.

      - Fig 5C: The curve in Fig 5D seems to correspond to the biofilm case. The curve given by the model, should be compared with the average curve presented in Fig 1D.

      - Fig 6A, B, and C: These figures could be moved to supplements.

      - Line 392: Replace "turgidity" with "turgor pressure".

      - Fig 7C,E: Please use a log-log scale to represent these data and indicate the line of slope 1.

      - Fig 7E: The x-axis has been cropped.

      - Please provide a supplementary movie for the data presented in Fig 7E.

      - Line 455: E. Coli biofilms do not express ThT.

      - Line 466: "\gamma is the anomalous exponent". Please remove anomalous (\gamma can equal 1 at this stage).

      - Line 475: Please replace "section" with "projection".

      - Line 476: Please replace "spatiotemporal" with "temporal". There is no spatial dependency in either figure.

      - Line 500: Please define Eikonal approximation.

      - Fig 8 could be moved to supplements.

      - Line 553: "predicted" -> "predict".

      - Line 593: Could the authors explain why their model offers much better quantitative agreement?

      - Line 669: What does "universal" mean in that context?

      - Line 671: A volume can be pipetted but not a concentration.

      - Line 676: Are triplicates technical or biological replicates?

      - Sup Fig1: Please use minutes instead of seconds in panel A.

      - Model for membrane dynamics: "The fraction of time the Q+ channel is open" -> "The dynamics of Q+ channel activity can be written". Ditto for K+ channel...

      - Model for membrane dynamics: "the term ... is a threshold-linear". This function is not linear at all. Why is it called linear? Also, please describe what \sigma is.

      - ABFDF model: "releasing a given concentration" -> "releasing a local concentration" or "a given number" but it's not \sigma anymore. Besides, this \sigma is unlikely related to the previous \sigma used in the model of membrane potential dynamics in single cells. Please consider renaming one or the other. Also, ions are referred to as C+ in the text and C in equation 8. Am I missing something?

      Reviewer #2 (Recommendations For The Authors):

      I have included all my comments as one review. I have done so, despite the fact that some minor comments could have gone into this section, because I decided to review each Result section. I thus felt that not writing it as one review might be harder to follow. I have however highlighted which comments are minor suggestions or where I felt corrections.

      However, while I am happy with all my comments being public, given their nature I think they should be shown to authors first. Perhaps the authors want to go over them and think about it before deciding if they are happy for their manuscript to be published along with these comments, or not. I will highlight this in an email to the editor. I question whether in this case, given that I am raising major issues, publishing both the manuscript and the comments is the way to go as I think it might just generate confusion among the audience.

      Reviewer #3 (Recommendations For The Authors):

      I was unable to find any legends for any of the supplemental videos in my review materials, and I could not open supplemental video 5.

      I made some comments in the public review about the analysis and interpretation of the time-to-fire data. One of the other challenges in this data set is that the time resolution is limited- it seems that a large proportion of cells have already fired after a single acquisition frame. It would be ideal to increase the time resolution on this measurement to improve precision. This could be done by imaging more quickly, but that would perhaps necessitate more blue light exposure; an alternative is to do this experiment under lower blue light irradiance where the first spike time is increased (Figure 4A).

      In the public review, I mentioned the possible impact of high membrane potential on PI permeability. To address this, the experiment could be repeated with other stains, or the viability of blue light-treated cells could be addressed more directly by outgrowth or colony-forming unit assays.

      In the public review, I mentioned the possible combined toxicity of ThT and blue light. Live/dead experiments after blue light exposure with and without ThT could be used to test for such effects, and/or the growth curve experiment in Figure 1F could be repeated with blue light exposure at a comparable irradiance used in the experiment.

      Throughout the paper and figure legends, it would help to have more methodological details in the main text, especially those that are critical for the interpretation of the experiment. The experimental details in the methods section are nicely described, but the data analysis section should be expanded significantly.

      At the end of the results section, the authors suggest a critical biofilm size of only 4 µm for wavefront propagation (not much larger than a single cell!). The authors show responses for various biofilm sizes in Fig. 2C, but these are all substantially larger. Are there data for cell clusters above and below this size that could support this claim more directly?

      The authors mention image registration as part of their analysis pipeline, but the 3D data sets in Video S6B and Fig. S4A do not appear to be registered- were these registered prior to the velocity analysis reported in Fig. 8?

      One of the most challenging claims to demonstrate in this paper is that these membrane potential wavefronts are involved in coordinating a large, biofilm-scale response to blue light. One possible way to test this might be to repeat the Live/Dead experiment in planktonic culture or the single-cell condition. If the protection from blue light specifically emerges due to coordinated activity of the biofilm, the Kch mutant would not be expected to show a change in Live/Dead staining in non-biofilm conditions.

      Line 140: How is "mature biofilm" defined? Also on this same line, what does "spontaneous" mean here?

      Line 151: "much smaller": Given that the reported time for 3D biofilms is 2.73 {plus minus} 0.85 min and in microclusters is 3.27 {plus minus} 1.77 min, this seems overly strong.

      Line 155: How is "biofilm density" characterized? Additionally, the data in Figure 2C are presented in distance units (µm), but the text refers to "areal coverage"- please define the meaning of these distance units in the legend and/or here in the text (is this the average radius?).

      Lines 161-162: These claims seem strong given the data presented before, and the logic is not very explicit. For example, in the second sentence, the idea that this signaling is used to "coordinate long-range responses to light stress" does not seem strongly evidenced at this point in the paper. What is meant by a long-range response to light stress- are there processes to respond to light that occur at long-length scales (rather than on the single-cell scale)? If so, is there evidence that these membrane potential changes could induce these responses? Please clarify the logic behind these conclusions.

      Lines 235-236: In the lower irradiance conditions, the responses are slower overall, and it looks like the ThT intensity is beginning to rise at the end of the measurement. Could a more prominent second peak be observed in these cases if the measurement time was extended?

      Line 242-243: The overall trajectories of extracellular potassium are indeed similar, but the kinetics of the second peak of potassium are different than those observed by ThT (it rises some minutes earlier)- is this consistent with the idea that Kch is responsible for that peak? Additionally, the potassium dynamics also reflect the first peak- is this surprising given that the Kch channel has no effect on this peak?

      Line 255-256: Again, this seems like a very strong claim. There are several possible interpretations of the catalase experiment (which should be discussed); this experiment perhaps suggests that ROS impacts membrane potential, but does not obviously indicate that these membrane potential fluctuations mitigate ROS levels or help the cells respond to ROS stress. The loss of viability in the ∆kch mutant might indicate a link between these membrane potential experiments and viability, but it is hard to interpret without the no-light control I mention in the public review.

      Lines 313-315: "The model predicts... the external light stress". Please clarify this section. Where this prediction arises from in the modeling work? Second, I am not sure what is meant by "modulates the light stress" or "keeps the cell dynamics robust to the intensity of external light stress" (especially since the dynamics clearly vary with irradiance, as seen in Figure 4A).

      Line 322: I am not sure what "handles the ROS by adjusting the profile of the membrane potential dynamics" means. What is meant by "handling" ROS? Is the hypothesis that membrane potential dynamics themselves are protective against ROS, or that they induce a ROS-protective response downstream, or something else? Later in lines 327-8 the authors write that changes in the response to ROS in the model agree with the hypothesis, but just showing that ROS impacts the membrane potential does not seem to demonstrate that this has a protective effect against ROS.

      Line 365-366: This section title seems confusing- mechanosensitive ion channels totally ablate membrane potential dynamics, they don't have a specific effect on the first hyperpolarization event. The claim that mechanonsensitive ion channels are specifically involved in the first event also appears in the abstract.

      Also, the apparent membrane potential is much lower even at the start of the experiment in these mutants- is this expected? This seems to imply that these ion channels also have a blue light independent effect.

      Lines 368, 371: Should be VGCCs rather than VGGCs.

      Line 477: I believe the figure reference here should be to Figure 7B, not 6B.

      Line 567-568: "The initial spike is key to registering the presence of the light stress." What is the evidence for this claim?

      Line 592-594: "We have presented much better quantitative agreement..." This is a strong claim; it is not immediately evident to me that the agreement between model and prediction is "much better" in this work than in the cited work. The model in Figure 4 of reference 57 seems to capture the key features of their data. Clarification is needed about this claim.

      Line 613: "...strains did not have any additional mutations." This seems to imply that whole genome sequencing was performed- is this the case?

      Line 627: I believe this should refer to Figure S2A-B rather than S1.

      Line 719: What percentage of cells did not hyperpolarize in these experiments?

      Lines 751-754: As I mentioned above, significant detail is missing here about how these measurements were made. How is "radius" defined in 3D biofilms like the one shown in Video S6B, which looks very flat? What is meant by the distance from the substrate to the core, since usually in this biofilm geometry, the core is directly on the substrate? Most importantly, this only describes the process of sectioning the data- how were these sections used to compute the velocity of ThT signal propagation?

      I also have some comments specifically on the figure presentation:

      Normalization from 0 to 1 has been done in some of the ThT traces in the paper, but not all. The claims in the paper would be easiest to evaluate if the non-normalized data were shown- this is important for the interpretation of some of the claims.

      Some indication of standard deviation (error bars or shading) should be added to all figures where mean traces are plotted.

      Throughout the paper, I am a bit confused by the time axis; the data consistently starts at 1 minute. This is not intuitive to me, because it seems that the blue light being applied to the cells is also the excitation laser for ThT- in that case, shouldn't the first imaging frame be at time 0 (when the blue light is first applied)? Or is there an additional exposure of blue light 1 minute before imaging starts? This is consequential because it impacts the measured time to the first spike. (Additionally, all of the video time stamps start at 0).

      Please increase the size of the scale bars and bar labels throughout, especially in Figure 2A and S4A.

      In Figure 1B and D, it would help to decrease the opacity on the individual traces so that more of them can be discerned. It would also improve clarity to have data from the different experiments shown with different colored lines, so that variability between experiments can be clearly visualized.

      Results in Figure 1E would be easier to interpret if the frequency were normalized to total N. It is hard to tell from this graph whether the edges and bin widths are the same between the data sets, but if not, they should be. Also, it would help to reduce the opacity of the sparse cell data set so that the full microcluster data set can be seen as well.

      Biofilm images are shown in Figures 2A, S3A, and Video S3- these are all of the same biofilm. Why not take the opportunity to show different experimental replicates in these different figures? The same goes for Figure S4A and Video S6B, which again are of the same biofilm.

      Figure 2C would be much easier to read if the curves were colored in order of their size; the same is true for Figure 4A and irradiance.

      The complementation data in Figure S3D should be moved to the main text figure 3 alongside the data about the corresponding knockout to make it easier to compare the curves.

      Fig.ure S3E: Is the Y-axis in this graph mislabeled? It is labeled as ThT fluorescence, but it seems that it is reporting fluorescence from the calcium indicator?

      Video S6B is very confusing - why does the video play first forwards and then backwards? Unless I am looking very carefully at the time stamps it is easy to misinterpret this as a rise in the intensity at the end of the experiment. Without a video legend, it's hard to understand this, but I think it would be much more straightforward to interpret if it only played forward. (Also, why is this video labeled 6B when there is no video 6A?)

    1. Author response:

      The following is the authors’ response to the original reviews.

      The points raised let us critically rethink our approach, our results, and our conclusions. Furthermore, it gave us the chance to elaborate on some critical aspects that were mentioned. With the help of the reviewers, we made some clarifications in the point-by-point responses and implemented them in the manuscript. Furthermore, we modified the figures as suggested:

      - The colors in Figure 1C, D, G and H have been adapted as suggested

      - We added a Figure2-figure supplement 1, which strengthens our conclusion in Figure 2

      - As asked by reviewer #1 (weaknesses #3), we added the data about neutrophil numbers in the different organs (Figure 6-figure supplement 3C).

      Reviewer #1 (Public Review):

      Summary:

      - Extracellular ATP represents a danger-associated molecular pattern associated to tissue damage and can act also in an autocrine fashion in macrophages to promote proinflammatory responses, as observed in a previous paper by the authors in abdominal sepsis. The present study addresses an important aspect possibly conditioning the outcome of sepsis that is the release of ATP by bacteria. The authors show that sepsis-associated bacteria do in fact release ATP in a growth dependent and strain-specific manner. However, whether this bacterial derived ATP play a role in the pathogenesis of abdominal sepsis has not been determined. To address this question, a number of mutant strains of E. coli has been used first to correlate bacterial ATP release with growth and then, with outer membrane integrity and bacterial death. By using E. coli transformants expressing the ATP-degrading enzyme apyrase in the periplasmic space, the paper nicely shows that abdominal sepsis by these transformants results in significantly improved survival. This effect was associated with a reduction of peritoneal macrophages and CX3CR1+ monocytes, and an increase in neutrophils. To extrapolate the function of bacterial ATP from the systemic response to microorganisms, the authors exploited bacterial OMVs either loaded or not with ATP to investigate the systemic effects devoid of living microorganisms. This approach showed that ATP-loaded OMVs induced degranulation of neutrophils after lysosomal uptake, suggesting that this mechanism could contribute to sepsis severity.

      Strengths:

      - A strong part of the study is the analysis of E. coli mutants to address different aspects of bacterial release of ATP that could be relevant during systemic dissemination of bacteria in the host.

      We want to thank the reviewer for recognizing this important aspect of our experimental approach.

      Weaknesses:

      - As pointed out in the limitations of the study whether ATP-loaded OMVs provide a mechanistic proof of the pathogenetic role of bacteria-derived ATP independently of live microorganisms in sepsis is interesting but not definitively convincing. It could be useful to see whether degranulation of neutrophils is differentially induced by apyrase-expressing vs control E. coli transformants.

      We thank the reviewer for raising several important points. In our study, we assessed local and systemic effects of released bacterial ATP. The consequences of local bacterial ATP release were assessed using an apyrase-expressing E. coli transformant. Locally, bacterial ATP resulted in a decrease in neutrophil numbers and we hypothesize that directly released bacterial ATP either leads to neutrophil death (e.g. via P2X7 receptor (Proietti et al., 2019)) or interferes with the recruitment of neutrophils (e.g. via P2Y receptors (Junger, 2011)).

      The systemic consequences were assessed using ATP-loaded and empty OMV. We have shown that degranulation is induced by OMV-derived bacterial ATP. ATP-containing OMV are engulfed by neutrophils, reach its endolysosomal compartment and might activate purinergic receptors, which then lead to aberrant degranulation. This concept, that needs to be explored in future studies, is fundamentally different from classical purinergic signaling via directly released bacterial ATP into the extracellular space.

      It is possible that neutrophil degranulation is also modulated by directly released bacterial ATP. We agree that this should be assessed in future studies. Also, the role of OMV-derived bacterial ATP should be assessed locally as well as the importance of directly released vs. OMV-mediated bacterial ATP dissected locally. Based on our measurements (Figure 4-figure supplement 1A and Figure 5C), we estimate that the effect of OMV-derived bacterial ATP might be much smaller than the effects of directly released bacterial ATP. Thus, direct ATP release might predominate locally. However, we fully agree that this has to be investigated in a future study to reconcile the different aspects of bacterial ATP signaling. A paragraph will be added to the manuscript, in which we discuss this particular issue.

      - Also, the increase of neutrophils in bacterial ATP-depleted abdominal sepsis, which has better outcomes than "ATP-proficient" sepsis, seems difficult to correlate to the hypothesized tissue damage induced by ATP delivered via non-infectious OMVs.

      We fully acknowledge the mentioned discrepancy. What we propose is that bacterial ATP exhibits different functions that are dependent on the release mechanism (see above). Locally, in the peritoneal cavity, neutrophil numbers are decreased by directly released bacterial ATP. Remotely, ATP is delivered via OMV and impacts on neutrophil function. We agree that, in particular, in the peritoneal cavity, both effects may play a role. However, the impact of directly released bacterial ATP seems to be dominant (see above).

      We propose that neutrophils are decreased locally because of directly released bacterial ATP, which prevents efficient infection control and, therefore, impairs sepsis survival. In addition, these fewer neutrophils might even be dysregulated by the engulfment of bacterial ATP delivered via OMV, which leads to an upregulated and possibly aberrant degranulation process worsening local and remote tissue damage. We agree that in addition to neutrophil numbers, the function of local neutrophils should be assessed with and without the influence of OMV-delivered bacterial ATP. This could be done by RNA sequencing of primary neutrophils from the peritoneal cavity or neutrophil cell lines as well as degranulation assays.

      - Are the neutrophils counts affected by ATP delivered via OMVs?

      This is difficult to show in the peritoneal cavity where we have both, directly released bacterial ATP and OMV-derived bacterial ATP. We assessed such putative difference, however, for the systemic organs and the blood, where we did not find any differences in neutrophil numbers.

      Author response image 1.

      - A comparison of cytokine profiles in the abdominal fluids of E. coli and OMV treated animals could be helpful in defining the different responses induced by OMV-delivered vs bacterial-released ATP. The analyses performed on OMV treated versus E. coli infected mice are not closely related and difficult to combine when trying to draw a hypothesis for bacterial ATP in sepsis.

      We fully agree that there are several open questions that remain to be elucidated, in particular, to differentiate the local role of directly released versus OMV-delivered bacterial ATP. In this study, we laid the foundation for future in vivo research to examine the specific role of bacterial ATP in sepsis. Such future research avenues might be to investigate the local effects of OMV-delivered bacterial ATP, and how neutrophil migration, apoptosis and degranulation are altered. We agree that exploration of the local secretory immune response and cytokine profiles are relevant to understand the different mechanisms of how bacterial ATP alters sepsis. However, such experiments should be ideally performed in systems where the source and the delivery of ATP can be modulated locally.

      - Also it was not clear why lung neutrophils were used for the RNAseq data generation and analysis.

      Thank you for this remark. We have chosen primary lung neutrophils for four reasons:

      (1) Isolation of primary lung neutrophils allowed us to assess an in vivo response that would not have been possible with cell lines.

      (2) The lung and the respiratory system are among the clinically most important organs affected during sepsis resulting in a significant cause of mortality.

      (3) We show in Figure 6C that specifically in the lung, OMV are engulfed by neutrophils, which shows the relevance of the lung also in our study context.

      (4) And finally, lung neutrophils were chosen to examine specifically distant and not local effects.

      Reviewer #2 (Public Review):

      Summary:

      - In their manuscript "Released Bacterial ATP Shapes Local and Systemic Inflammation during Abdominal Sepsis", Daniel Spari et al. explored the dual role of ATP in exacerbating sepsis, revealing that ATP from both host and bacteria significantly impacts immune responses and disease progression.

      Strengths:

      - The study meticulously examines the complex relationship between ATP release and bacterial growth, membrane integrity, and how bacterial ATP potentially dampens inflammatory responses, thereby impairing survival in sepsis models. Additionally, this compelling paper implies a concept that bacterial OMVs act as vehicles for the systemic distribution of ATP, influencing neutrophil activity and exacerbating sepsis severity.

      We thank the reviewer for mentioning these key points and supporting the relevance of our study.

      Weaknesses:

      (1) The researchers extracted and cultivated abdominal fluid on LB agar plates, then randomly picked 25 colonies for analysis. However, they did not conduct 16S rRNA gene amplicon sequencing on the fluid itself. It is worth noting that the bacterial species present may vary depending on the individual patients. It would be beneficial if the authors could specify whether they've verified the existence of unculturable species capable of secreting high levels of Extracellular ATP.

      Most septic complications are caused by a limited spectrum of bacteria, belonging mainly either to the Firmicutes or the Proteobacteria phyla, including E. coli, K. pneumoniae, S. aureus or E. faecalis (Diekema et al., 2019; Mureșan et al., 2018). We validated this well documented existing evidence by randomly assessing 25 colonies. For the planned experiments, it was crucial to work with culturable bacteria; otherwise, ATP measurements, the modulation of ATP generation or loading of OMV would not have been possible. Using such culturable bacteria allowed us to describe mechanisms of ATP release.

      We fully agree that hard-to-culture or unculturable bacteria might contribute significantly to septic complications. This, however, would need to be explored in future studies using extensive culturing methods (Cheng et al., 2022).

      (2) Do mice lacking commensal bacteria show a lack of extracellular ATP following cecal ligation puncture?

      ATP is typically secreted by many cells of the host in active and passive manners in the case of any injury, including cecal ligation and puncture (Burnstock, 2016; Dosch et al., 2018; Eltzschig et al., 2012; Idzko et al., 2014). We hypothesize that bacterial ATP is a potential priming agent at early stages of sepsis, and indeed, at such early time points, a comparison of peritoneal ATP levels between germfree and colonized mice could support our hypothesis. Future studies addressing this question must, however, correct for the different immune responses between germ-free and colonized mice. This is of utmost importance, especially for the cecal ligation and puncture model, since the cecum of germ-free mice is extremely large, making such experiments hard to control.

      (3) The authors isolated various bacteria from abdominal fluid, encompassing both Gram-negative and Gram-positive types. Nevertheless, their emphasis appeared to be primarily on the Gram-negative E. coli. It would be beneficial to ascertain whether the mechanisms of Extracellular ATP release differ between Gram-positive and Gram-negative bacteria. This is particularly relevant given that the Gram-positive bacterium E. faecalis, also isolated from the abdominal fluid, is recognized for its propensity to release substantial amounts of Extracellular ATP.

      We fully agree with this comment. In this paper, we used E. coli as our model organism to determine the principles of sepsis-associated bacterial ATP release and therefore focused on gram-negative bacteria. In addition to the direct, growth-dependent release, we found a relevant impact of OMV-delivered bacterial ATP. For this latter purpose, a gram-negative strain, in which OMV generation has been well described (Schwechheimer & Kuehn, 2015), was chosen. Recently, gram-positive bacteria have been shown to secrete ATP and OMV as well (Briaud & Carroll, 2020; Hironaka et al., 2013; Iwase et al., 2010). Given the fundamental differences in the structure of the cell wall of gram-positive bacteria and the mechanisms of OMV generation and release, future studies are required to assess the relevance of directly released and OMV-delivered ATP in gram-positive bacteria.

      (4) The authors observed changes in the levels of LPM, SPM, and neutrophils in vivo. However, it remains uncertain whether the proliferation or migration of these cells is modulated or inhibited by ATP receptors like P2Y receptors. This aspect requires further investigation to establish a convincing connection.

      We fully agree with this comment. The decrease in LPM and the consequential predomination of SPM have been well described after inflammatory stimuli in the context of the macrophage disappearance reaction (Ghosn et al., 2010). Also, it has been shown that purinergic signaling modulates infiltration of neutrophils and can lead to cell death as a consequence of  P2Y and P2X receptor activation (Junger, 2011; Proietti et al., 2019). In our study, we propose that intracellular purinergic receptors contribute to neutrophil function during sepsis. After introducing the general principles and fundaments of bacterial ATP with our studies, we fully agree that additional experiments need to address downstream purinergic receptor activation. That, however, would go beyond the scope of our study.

      (5) Additionally, is it possible that the observed in vivo changes could be triggered by bacterial components other than Extracellular ATP? In this research field, a comprehensive collection of inhibitors is available, so it is desirable to utilize them to demonstrate clearer results.

      This question is of utmost importance and defined the choice of our model and experimental approach. When we started the project, we used two different E. coli mutants that release low (ompC) and high (eaeH) amounts of ATP. However, the limitation of this approach is that these are different bacteria, which may also differ in the components they secrete or the surface proteins they express. We, therefore, decided against that approach. With the approach we finally used (same bacterium, just with and without ATP), we aimed to minimize the influence of non-ATP bacterial components.

      (6) Have the authors considered the role of host-derived Extracellular ATP in the context of inflammation?

      Yes, the role of host-derived extracellular ATP in inflammation and sepsis is well-established with contradictory results (Csóka et al., 2015; Ledderose et al., 2016). This conflicting data was the rationale to test the relevance of bacterial ATP. We suggest that bacterial ATP is essential in the early phase of sepsis when bacteria invade the sterile compartment and before efficient host response, including the eukaryotic release of ATP, is established.

      (7) The authors mention that Extracellular ATP is rapidly hydrolyzed by ectonucleotases in vivo. Are the changes of immune cells within the peritoneal cavity caused by Extracellular ATP released from bacterial death or by OMVs?

      This is a relevant question that was also asked by reviewer #1, and we answered it in detail above (weaknesses comment #1 and #2). From our ATP measurements (Figure 4-figure supplement 1A and Figure 5C), we conclude that locally, the role of directly released bacterial ATP (extracellular) predominates over OMV-derived bacterial ATP. Furthermore, the mechanisms between directly released and OMV-derived bacterial ATP (within OMV, engulfed and transported to the endolysosomal compartment) are different, and especially extracellular ATP has been described to lead to apoptosis via P2X7 signaling.

      (8) In the manuscript, the sample size (n) for the data consistently remains at 2. I would suggest expanding the sample size to enhance the robustness and rigor of the results.

      Two biological replicates (independent cultures) were only used for the bacteria cultures in Figure 1, Figure 2, and Figure 3, which achieved similar results and the standard deviation remained very small, indicating its robustness. In the in vitro experiments in Figure 5 we used a sample size of 6 (three biological replicates measured in technical duplicates), since we saw bigger deviations in our measurements. For the in vivo experiments, we always used 5 or more animals in at least two independent experiments.

      Reviewer #2 (Recommendations For The Authors):

      (9). Line 37: 11 million sepsis-related deaths were reported "in" 2017.

      The passage has been corrected as suggested.

      (10) By the way, the similar colors used in Figure 1C and G are too chaotic, making it difficult to distinguish.

      We agree, the colors have been adapted.

      Author response image 2.

      (11). All "in vivo" and "in vitro" should be italicized.

      We italicized all of them.

      (12). The title of Figure 4 is confusing: "Impairs sepsis outcome in vivo?" Could you make it more specific?

      We agree, the title has been rephrased:

      “Bacterial ATP reduces neutrophil counts and reduces survival in a mouse model of abdominal sepsis.”

      (13) Line 314-316: The sentence "Potentially, despite the lack of a transporter, ATP may similarly to eukaryotic cells leak (Yegutkin et al., 2006) across the inner membrane into the periplasmic space that lacks the enzymes for ATP generation." sounds odd.

      This passage was reformulated in the manuscript.

      “Despite the lack of a transporter, ATP may leak across the inner membrane into the periplasmic space. Such leakage may be similar to baseline leakage in eukaryotic cells (Yegutkin et al., 2006).”

      (14) The numerical notation in the paper is odd: sometimes it uses a prime symbol as a superscript (such as line 504), and sometimes it does not (such as line 421). Should it be standardized to "3,200" and "150,000"?

      Thank you for this remark. The numbers have been standardized throughout the manuscript.

      (15) Line "0.4 mm EP cuvettes" should be "0.4 cm EP cuvettes"

      The specified passage has been corrected as suggested.

      References

      Briaud, P., & Carroll, R. K. (2020). Extracellular Vesicle Biogenesis and Functions in Gram-Positive Bacteria. Infection and Immunity, 88(12), 10.1128/iai.00433-20. https://doi.org/10.1128/iai.00433-20

      Burnstock, G. (2016). P2X ion channel receptors and inflammation. Purinergic Signalling, 12(1), 59–67. https://doi.org/10.1007/s11302-015-9493-0

      Cheng, A. G., Ho, P.-Y., Aranda-Díaz, A., Jain, S., Yu, F. B., Meng, X., Wang, M., Iakiviak, M., Nagashima, K., Zhao, A., Murugkar, P., Patil, A., Atabakhsh, K., Weakley, A., Yan, J., Brumbaugh, A. R., Higginbottom, S., Dimas, A., Shiver, A. L., … Fischbach, M. A. (2022). Design, construction, and in vivo augmentation of a complex gut microbiome. Cell, 185(19), 3617-3636.e19. https://doi.org/10.1016/j.cell.2022.08.003

      Csóka, B., Németh, Z. H., Törő, G., Idzko, M., Zech, A., Koscsó, B., Spolarics, Z., Antonioli, L., Cseri, K., Erdélyi, K., Pacher, P., & Haskó, G. (2015). Extracellular ATP protects against sepsis through macrophage P2X7 purinergic receptors by enhancing intracellular bacterial killing. The FASEB Journal, 29(9), 3626–3637. https://doi.org/10.1096/fj.15-272450

      Diekema, D. J., Hsueh, P.-R., Mendes, R. E., Pfaller, M. A., Rolston, K. V., Sader, H. S., & Jones, R. N. (2019). The Microbiology of Bloodstream Infection: 20-Year Trends from the SENTRY Antimicrobial Surveillance Program. Antimicrobial Agents and Chemotherapy, 63(7), e00355-19. https://doi.org/10.1128/AAC.00355-19

      Dosch, M., Gerber, J., Jebbawi, F., & Beldi, G. (2018). Mechanisms of ATP Release by Inflammatory Cells. International Journal of Molecular Sciences, 19(4), 1222. https://doi.org/10.3390/ijms19041222

      Eltzschig, H. K., Sitkovsky, M. V., & Robson, S. C. (2012). Purinergic Signaling during Inflammation. New England Journal of Medicine, 367(24), 2322–2333. https://doi.org/10.1056/NEJMra1205750

      Ghosn, E. E. B., Cassado, A. A., Govoni, G. R., Fukuhara, T., Yang, Y., Monack, D. M., Bortoluci, K. R., Almeida, S. R., Herzenberg, L. A., & Herzenberg, L. A. (2010). Two physically, functionally, and developmentally distinct peritoneal macrophage subsets. Proceedings of the National Academy of Sciences, 107(6), 2568–2573. https://doi.org/10.1073/pnas.0915000107

      Hironaka, I., Iwase, T., Sugimoto, S., Okuda, K., Tajima, A., Yanaga, K., & Mizunoe, Y. (2013). Glucose Triggers ATP Secretion from Bacteria in a Growth-Phase-Dependent Manner. Applied and Environmental Microbiology, 79(7), 2328–2335. https://doi.org/10.1128/AEM.03871-12

      Idzko, M., Ferrari, D., & Eltzschig, H. K. (2014). Nucleotide signalling during inflammation. Nature, 509(7500), 310–317. https://doi.org/10.1038/nature13085

      Iwase, T., Shinji, H., Tajima, A., Sato, F., Tamura, T., Iwamoto, T., Yoneda, M., & Mizunoe, Y. (2010). Isolation and Identification of ATP-Secreting Bacteria from Mice and Humans. Journal of Clinical Microbiology, 48(5), 1949–1951. https://doi.org/10.1128/JCM.01941-09

      Junger, W. G. (2011). Immune cell regulation by autocrine purinergic signalling. Nature Reviews Immunology, 11(3), 201–212. https://doi.org/10.1038/nri2938

      Ledderose, C., Bao, Y., Kondo, Y., Fakhari, M., Slubowski, C., Zhang, J., & Junger, W. G. (2016). Purinergic Signaling and the Immune Response in Sepsis: A Review. Clinical Therapeutics, 38(5), 1054–1065. https://doi.org/10.1016/j.clinthera.2016.04.002

      Mureșan, M. G., Balmoș, I. A., Badea, I., & Santini, A. (2018). Abdominal Sepsis: An Update. The Journal of Critical Care Medicine, 4(4), 120–125. https://doi.org/10.2478/jccm-2018-0023

      Proietti, M., Perruzza, L., Scribano, D., Pellegrini, G., D’Antuono, R., Strati, F., Raffaelli, M., Gonzalez, S. F., Thelen, M., Hardt, W.-D., Slack, E., Nicoletti, M., & Grassi, F. (2019). ATP released by intestinal bacteria limits the generation of protective IgA against enteropathogens. Nature Communications, 10(1), Article 1. https://doi.org/10.1038/s41467-018-08156-z

      Schwechheimer, C., & Kuehn, M. J. (2015). Outer-membrane vesicles from Gram-negative bacteria: Biogenesis and functions. Nature Reviews Microbiology, 13(10), 605–619. https://doi.org/10.1038/nrmicro3525

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      Major concerns:

      (1) It is not clear about the biological significance of the inhibitory effects of human Abeta42 on gammasecretase activity. As the authors mentioned in the Discussion, it is plausible that Abeta42 may concentrate up to microM level in endosomes. However, subsets of FAD mutations in APP and presenilin 1 and 2 increase Abeta42/Abeta40 ratio and lead to Abeta42 deposition in brain. APP knock-in mice NLF and NLGF also develop Abeta42 deposition in age-dependent manner, although they produce more human Abeta42 than human Abeta40. 

      If the production of Abeta42 is attenuated, which results in less Abeta42 deposition in brain. So, it is unlikely that human Abeta42 interferes gamma-secretase activity in physiological conditions. This reviewer has an impression that inhibition of gamma-secretase by human Abeta42 is an interesting artifact in high Abeta42 concentration. If the authors disagree with this reviewer's comment, this manuscript needs more discussion in this point of view. 

      We thank the Reviewer for raising this key conceptual point, we acknowledge that it was insufficiently discussed in the original manuscript. In response to this point, we introduced the following paragraph in the discussion section of the revised manuscript:

      “From a mechanistic standpoint, the competitive nature of the Aβ42-mediated inhibition implies

      that it is partial, reversible, and regulated by the relative concentrations of the Aβ42 peptide (inhibitor) and the endogenous substrates (Figure 10C and 10D). The model that we put forward is that cellular uptake, as well as endosomal production of Aβ, result in increased intracellular concentration of Aβ42, facilitating γ-secretase inhibition and leading to the buildup of APP-CTFs (and γ-secretase substrates in general). As Aβ42 levels fall, the augmented concentration of substrates shifts the equilibrium towards their processing and subsequent Aβ production. As Aβ42 levels rise again, the equilibrium is shifted back towards inhibition. This cyclic inhibitory mechanism will translate into pulses of (partial) γsecretase inhibition, which will alter γ-secretase mediated-signaling (arising from increased CTF levels at the membrane or decreased release of soluble intracellular domains from substrates). These alterations may affect the dynamics of systems oscillating in the brain, such as NOTCH signaling, implicated in memory formation, and potentially others (related to e.g. cadherins, p75 or neuregulins). It is worth noting that oscillations in γ-secretase activity induced by treatment with a γ-secretase inhibitor semagacestat have been proposed to have contributed to the cognitive alterations observed in semagacestat treated patients in the failed Phase-3 IDENTITY clinical trial (7) and that semagacestat, like Aβ42, acts as a high affinity competitor of substrates (85).

      The convergence of Aβ42 and tau at the synapse has been proposed to underlie synaptic dysfunction in AD (86-89), and recent assessment of APP-CTF levels in synaptosome-enriched fractions from healthy control, SAD and FAD brains (temporal cortices) has shown that APP fragments concentrate at higher levels in the synapse in AD-affected than in control individuals (90).  Our analysis adds that endogenous Aβ42 concentrates in synaptosomes derived from end-stage AD brains to reach ~10 nM, a concentration that in CM from human neurons inhibits γ-secretase in PC12 cells (Figure 7). Furthermore, the restricted localization of Aβ in endolysosomal vesicles, within synaptosomes, likely increases the local peptide concentration to the levels that inhibit γ-secretase-mediated processing of substrates in this compartment. In addition, we argue that the deposition of Aβ42 in plaques may be preceded a critical increase in the levels of Aβ present in endosomes and the cyclical inhibition of γsecretase activity that we propose. Under this view, reductions in γ-secretase activity may be a (transient) downstream consequence of increases in Aβ due to failed clearance, as represented by plaque deposition, contributing to AD pathogenesis.“

      We have also added figures 10C and 10D, presented here for convenience.

      Author response image 1.

      (2) It is not clear whether the FRET-based assay in living cells really reflects gamma-secretase activity.

      This reviewer thinks that the authors need at least biochemical data, such as levels of Abeta. 

      We have established a novel, HiBiT tag based assay reporting on the global γ-secretase activity in cells, using as a proxy the total levels of secreted HiBiT-tagged Aβ peptides. The assay and findings are presented in the revised manuscript as follows:

      In the result section, in the “Aβ42 treatment leads to the accumulation of APP C-terminal fragments in neuronal cell lines and human neuron” subsection:

      “The increments in the APP-CTF/FL ratio suggested that Aβ42 (partially) inhibits the global γ-

      secretase activity. To further investigate this, we measured the direct products of the γ-secretase mediated proteolysis of APP. Since the detection of the endogenous Aβ products via standard ELISA methods was precluded by the presence of exogenous human Aβ42 (treatment), we used an N-terminally tagged version of APPC99 and quantified the amount of total secreted Aβ, which is a proxy for the global γsecretase activity. Briefly, we overexpressed human APPC99 N-terminally tagged with a short 11 amino acid long HiBiT tag in human embryonic kidney (HEK) cells, treated these cultures with human Aβ42 or p3 17-42 peptides at 1 μM or DAPT (GSI) at 10 µM, and determined total HiBiT-Aβ levels in conditioned media (CM). DAPT was considered to result in full γ-secretase inhibition, and hence the values recorded in DAPT treated conditions were used for the background subtraction. We found a ~50% reduction in luminescence signal, directly linked to HiBiT-Aβ levels, in CM of cells treated with human Aβ42 and no effect of p3 peptide treatment, relative to the DMSO control (Figure 3D). The observed reduction in the total Aβ products is consistent with the partial inhibition of γ -secretase by Aβ42.”

      In Methods:

      “Analysis of γ-secretase substrate proteolysis in cultured cells using secreted HiBiT-Aβ or -Aβ-like peptide levels as a proxy for the global γ-secretase endopeptidase activity

      HEK293 stably expressing APP-CTF (C99) or a NOTCH1-based substrate (similar in size as

      APP- C99) both N-terminally tagged with the HiBiT tag were plated at the density of 10000 cells per 96-well, and 24h after plating treated with Aβ or p3 peptides diluted in OPTIMEM (Thermo Fisher Scientific) supplemented with 5% FBS (Gibco). Conditioned media was collected and subjected to analysis using Nano-Glo® HiBiT Extracellular Detection System (Promega). Briefly, 50 µl of the medium was mixed with 50 µl of the reaction mixture containing LgBiT Protein (1:100) and Nano-Glo HiBiT Extracellular Substrate (1:50) in Nano-Glo HiBiT Extracellular Buffer, and the reaction was incubated for 10 minutes at room temperature. Luminescence signal corresponding to the amount of the extracellular HiBiT-Aβ or -Aβ-like peptides was measured using victor plate reader with default luminescence measurement settings.”

      As the direct substrate of γ -secretase was used in this analysis, the observed reduction (~50%) in the levels of N-terminally-tagged (HiBiT) Aβ peptides in the presence of 1 µM Aβ42, relative to control conditions, demonstrates a selective inhibition of γ-secretase by Aβ42 (not by the p3). These data complement the FRET-based findings presented in Figure 5.

      (3) Processing of APP-CTF in living cells is not only the cleavage by gamma-secretase. This reviewer thinks that the authors need at least biochemical data, such as levels of Abeta in Figures 4, 5 and 7.

      We tried to measure the levels of Aβ peptides secreted by cells into the culture medium directly by ELISA (using different protocols) or MS (using established methods, as reported in Koch et al, 2023), but exogenous Aβ42 (treatment) present at relatively high levels interfered with the readout and rendered the analysis inconclusive. 

      However, we were successful in the determination of total secreted (HiBiT-tagged) Aβ peptides from the HiBiT tagged APP-C99 substrate, as indicated in the previous point. The quantification of the levels of these peptides showed that Aβ42 treatment resulted in ~50% reduction in the γ -secretase mediated processing of the tagged substrate.    

      In addition, we would like to highlight that our analysis of the contribution of other APP-CTF degradation pathways, using cycloheximide-based assays in the constant presence of γ-secretase inhibitor, failed to reveal significant differences between Aβ42 treated cells and controls (Figure 6B & C). The lack of a significant impact of Aβ42 on the half-life of APP-CTFs under the conditions of γsecretase inhibition maintained by inhibitor treatment is consistent with the proposed Aβ42-mediated inhibitory mechanism.

      (4) Similar to comment #3. Processing of Pancad-CTF and p75 in living cells may be not only the cleavage by gamma-secretase. This reviewer thinks that the authors need at least biochemical data, such as levels of ICDs in Figures 6C and E. 

      To address this comment we have now performed additional experiments where we measured Nterminal Aβ-like peptides derived from NOTCH1-based substrate using the HiBiT-based assay. These experiments showed a reduction in the aforementioned peptides in the cells treated with Aβ42 relative to the vehicle control, and hence further confirmed the inhibitory action of Aβ42. These new data have been included as Figure 8D in the revised manuscript and described as follow:

      Finally, we measured the direct N-terminal products generated by γ-secretase proteolysis from a HiBiT-tagged NOTCH1-based substrate, an estimate of the global γ-secretase activity. We quantified the Aβ-like peptides secreted by HEK 293 cells stably expressing this HiBiT-tagged substrate upon treatment with 1 µM Aβ1-42,  p3 17-42 peptide or  DAPT (GSI) (Figure 8D). DAPT treatment was considered to result in a complete γ-secretase inhibition, and hence the values recorded in the DAPT condition were used for background subtraction. A ~20% significant reduction in the amount of secreted

      N-terminal HiBiT-tagged peptides derived from the NOTCH1-based substrates in cells treated with Aβ1-

      42 supports the inhibitory action of Aβ1-42 on γ-secretase mediated proteolysis.

      Minor concerns:

      (1) Murine Abeta42 may be converted to murine Abeta38 easily, compared to human Abeta42. This may be a reason why murine Abeta42 exhibits no inhibitory effect on gamma-secretase activity. 

      In order to address this question, we performed additional experiments where we assessed the processing of murine Aβ42 into Aβ38. Analogous to human Aβ42, the murine Aβ42 peptide was not processed to Aβ38 in the assay conditions. These new data have been integrated in the manuscript and added as a Supplementary figure 1B.

      (2) It is curious to know the levels of C99 and C83 in cells in supplementary figure 3.  

      The conditions used in these assays were analogous to the conditions used in the figure 3 (i.e. treatment with Aβ peptides at 1 µM concentrations). Such conditions were associated with profound and consistent APP-CTF accumulation in this model system.

      Reviewer #2 (Recommendations For The Authors):

      In the current study, the authors show that Aβs with low affinity for γ-secretase, but when present at relatively high concentrations, can compete with the longer, higher affinity APPC99 substrate for binding and processing. They also performed kinetic analyses and demonstrate that human Aβ1-42 inhibits γ-secretase-mediated processing of APP C99 and other substrates. Interestingly, neither murine Aβ1-42 nor human p3 (17-42 amino acids in Aβ) peptides exerted inhibition under similar conditions. The authors also show that human Aβ1-42-mediated inhibition of γ-secretase activity results in the accumulation of unprocessed, which leads to p75-dependent activation of caspase 3 in basal forebrain cholinergic neurons (BFCNs) and PC12 cells. 

      These analyses demonstrate that, as seen for γ-secretase inhibitors, Aβ1-42 potentiates this marker of apoptosis. However, these are no any in vivo data to support the physiological significance of the current finding. The author should show in APP KO mice whether gamma-secretase enzymatic activity is elevated or not, and putting back Aβ42 peptide will abolish these in vivo effects. 

      The findings presented in this manuscript form the basis for further in vitro and in vivo research to investigate the mechanisms of inhibition and its contribution to brain pathophysiology. Here, we used well-controlled model systems to investigate a novel mechanism of Aβ42 toxicity. Multiple mechanisms regulate the local concentration of Aβ42 in vivo, making the dissection of the biochemical mechanisms of the inhibition more complex. Nevertheless, beyond the scope of this report, we consider these very reasonable comments as a motivation for further research activities. 

      The experimental concentrations for Aβ42 peptide in the assay are too high, which are far beyond the physiological concentrations or pathological levels. The artificial observations are not supported by any in vivo experimental evidence.

      It is correct that in the majority of the experiments we used low μM concentrations of Aβ42. However, we would like to note that we have also performed experiments where conditioned medium collected from human APP.Swe expressing neurons was used as a source of Aβ. In these experiments total Aβ concentration was in low nM range (0.5-1 nM) (Figure 7). Treatment with this conditioned medium  led to the increase APP-CTF levels, supporting  that low nM concentrations of Aβ are sufficient for partial inhibition of  γ-secretase. 

      In addition, we highlight that analyses of the brains of the AD affected individuals have shown that APPCTFs accumulate in both sporadic and genetic forms of the disease (Pera et al. 2013, Vaillant-Beuchot et al. 2021); and recently, Ferrer-Raventós et al. 2023 have revealed a correlation between APP-CTFs and Aβ levels at the synapse (Ferrer-Raventós et al. 2023). We therefore assessed the concentration of Aβ42 in synaptosomes derived from frontal cortices of post-mortem AD and age-matched non-demented (ND) control individuals. Our findings and conclusions are included in the revised version as follows: 

      In the results section:

      “We next investigated the levels of Aβ42 in synaptosomes derived from frontal cortices of post-mortem AD and age-matched non-demented (ND) control individuals (Figure 10B). Towards this, we prepared synaptosomes from frozen brain tissues using Percoll gradient procedure (62, 63). Intact synaptosomes were spun to obtain a pellet which was resuspended in minimum amount of PBS, allowing us to estimate the volume containing the resuspended synaptosome sample. This is likely an overestimate of the actual synaptosome volume. Finally, synaptosomes were lysed in RIPA buffer and Aβ peptide concentrations measured using ELISA (MSD). We observed that the concentration of Aβ42 in the synaptosomes from (end-stage) AD tissues was significantly higher (10.7 nM)  than those isolated from non-demented tissues (0.7 nM), p<0.0005***. These data provide evidence for accumulation at nM concentrations of endogenous Aβ42 in synaptosomes in end-stage AD brains. Given that we measured Aβ42 concentration in synaptosomes, we speculate that even higher concentrations of this peptide may be present in the endolysosome vesicle system, and therein inhibit the endogenous processing of APP-CTF at the synapse. Of note treatment of PC12 cells with conditioned medium containing even lower amounts of Aβ (low nanomolar range (0.5-1 nM)) resulted in the accumulation of APP-CTFs.” 

      In the discussion: 

      “The convergence of Aβ42 and tau at the synapse has been proposed to underlie synaptic dysfunction in AD (86-89), and recent assessment of APP-CTF levels in synaptosome-enriched fractions from healthy control, SAD and FAD brains (temporal cortices) has shown that APP fragments concentrate at higher levels in the synapse in AD-affected than in control individuals (90).  Our analysis adds that endogenous Aβ42 concentrates in synaptosomes derived from end-stage AD brains to reach ~10 nM, a concentration that in CM from human neurons inhibits γ-secretase in PC12 cells (Figure 7). Furthermore, the restricted localization of Aβ in endolysosomal vesicles, within synaptosomes, likely increases the local peptide concentration to the levels that inhibit γ-secretase-mediated processing of substrates in this compartment. In addition, we argue that the deposition of Aβ42 in plaques may be preceded by a critical increase in the levels of Aβ present in endosomes and the cyclical inhibition of γ-secretase activity that we propose. Under this view, reductions in γ-secretase activity may be a (transient) downstream consequence of increases in Aβ due to failed clearance, as represented by plaque deposition, contributing to AD pathogenesis. ”

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      In 2019, Wilkinson and colleagues (PMID: 31142833) managed to break the veil in a 20-year open question on how to properly culture and expand Hematopoietic Stem Cells (HSCs). Although this study is revolutionizing the HSC biology field, several questions regarding the mechanisms of expansion remain open. Leveraging on this gap, Zhang et al.; embarked on a much-needed investigation regarding HSC self-renewal in this particular culturing setting.

      The authors firstly tacked the known caveat that some HSC membrane markers are altered during in vitro cultures by functionally establishing EPCR (CD201) as a reliable and stable HSC marker (Figure 1), demonstrating that this compartment is also responsible for long-term hematopoietic reconstitution (Figure 3). Next in Figure 2, the authors performed single-cell omics to shed light on the potential mechanisms involved in HSC maintenance, and interestingly it was shown that several hematopoietic populations like monocytes and neutrophils are also present in this culture conditions, which has not been reported. The study goes on to functionally characterize these cultured HSCs (cHSC). The authors elegantly demonstrate using state-of-the-art barcoding strategies that these culturing conditions provoke heterogeneity in the expanding HSC pool (Figure 4). In the last experiment (Figure 5), it was demonstrated that cHSC not only retain their high EPCR expression levels but upon transplantation, these cells remain more quiescent than freshly-isolated controls.

      Taken together, this study independently validates that the proposed culturing system works and provides new insights into the mechanisms whereby HSC expansion takes place.

      Most of the conclusions of this study are well supported by the present manuscript, some aspects regarding experimental design and especially the data analysis should be clarified and possibly extended.

      1) The first major point regards the single-cell (sc) omics performed on whole cultured cells (Figure 2):

      a. The authors claim that both RNA and ATAC were performed and indeed some ATAC-seq data is shown in Figure 2B, but this collected data seems to be highly underused.

      We appreciate the opportunity to clarify our analytical approach and the rationale behind it. In our study, we employed a novel deep learning framework, SAILERX, for our analysis. This framework is specifically designed to integrate multimodal data, such as RNAseq and ATACseq. The advantage of SAILERX lies in its ability to correct for technical noise inherent in sequencing processes and to align information from different modalities. Unlike methods that force a hard alignment of modalities into a shared latent space, SAILERX allows for a more refined integration. It achieves this by encouraging the local structures of the two modalities, as measured by pairwise similarities.

      To put it more simply, SAILERX combines RNAseq and ATACseq data, ensuring that the unique characteristics of each data type are respected and used to enhance the overall biological picture, rather than forcing them into a uniform framework.

      While it is indeed possible to analyze the ATAC-seq and RNA-seq modalities separately, and we acknowledge the potential value in such an approach, our primary objective in this study was to highlight the relatively low content of HSCs in cultures. This finding is a key point of our work, and the multiome data support this from a molecular point of view.

      The Seurat object we provide was created to facilitate further analysis by interested researchers. This object simplifies the exploration of both the ATAC-seq and RNA-seq data, allowing for additional investigations that may be of interest to the scientific community. We hope this explanation clarifies our methodology and its implications.

      b. It's not entirely clear to this reviewer the nature of the so-called "HSC signatures"(SF2C) and why exactly these genes were selected. There are genes such as Mpl and Angpt1 which are used for Mk-biased HSCs. Maybe relying on other HSC molecular signatures (PMID: 12228721, for example) would not only bring this study more into the current field context but would also have a more favorable analysis outcome. Moreover reclustering based on a different signature can also clarify the emergence of relevant HSC clusters.

      In our study, the selection of the HSC signature in our work was based on well-referenced datasets on well-defined HSPCs, as detailed in the "v. HSC signature" section of our methods. This signature was projected also to another single-cell RNA sequencing dataset generated from ex vivo expanded HSC culture (PMID: 35971894, see Author response image 1 below), demonstrating again an association primarily to the most primitive cells (at least based on gene expression).

      Author response image 1.

      Projection of "our" HSC signature on scRNAseq data from independent work.

      In further response to the suggestion here, we have also examined the molecular signature of HSCs referenced in PMID: 12228721 but also of another HSC signature from PMID: 26004780 in our data (Author response image 2). While these signatures do indeed enrich for cells that fall in the cluster of molecularly defined HSCs, our analysis indicates that neither of them significantly improves the identification of HSCs in our dataset compared to the signature we originally used. This finding reinforces our confidence in the appropriateness of our chosen HSC signature for this study.

      Author response image 2.

      Projection of alternative HSC signatures onto the SAILERX UMAP.

      Regarding the specific genes Mpl and Angpt1, we respectfully oppose the view that these genes are exclusively associated with MK-biased HSCs. There is substantial evidence supporting the broader role of Mpl in regulating HSCs, regardless of any particular "lineage bias". Similarly, while Angpt1 has been less extensively studied, its role in HSCs, as examined in PMID: 25821987, suggests a more general association with HSCs rather than a specific impact on MKs. Therefore, we maintain that it is more accurate to consider these genes as HSC-associated rather than restricted to MK-biased HSCs.

      Finally, addressing the comment on reclustering based on different signatures, we would like to clarify that the clustering process is independent of the projection of signatures. The clustering aims to identify cell populations based on their overall molecular profiles, and while signatures can aid in characterizing these populations, they do not influence the clustering process itself.

      c. The authors took the hard road to perform experiments with the elegant HSC-specific Fgd5-reporter, and they claim in lines 170-171 that it "failed to clearly demarcate in our single-cell multimodal data". This seems like a rather vague statement and leads to the idea that the scRNA-seq experiment is not reliable. It would be interesting to show a UMAP with this gene expression regardless and also potentially some other HSC markers.

      We understand the concerns raised about our statement on the performance of the Fgd5-reporter in our multimodal data analysis. Our aim was not to suggest that single-cell molecular data are unreliable. Instead, we intended to point out specific challenges associated with scRNA sequencing, notably the high rates of dropout. Regarding the specific example of Fgd5, it appears this transcript is not efficiently captured by 10x technology. Our previous 10x scRNA-seq experiments on cells from the Fgd5 reporter strain (Säwén et al., eLife 2018; Konturek-Ciesla et al., Cell Rep. 2023) support this observation. Despite cells being sorted as Fgd5-reporter positive, many showed no detectable transcripts.

      We consider it pertinent to note that our study integrates ATAC-seq data in conjunction with single-cell molecular data. We believe that this integration, coupled with the analytical methods we have employed, potentially offers a way to address some of the limitations typically associated with scRNA sequencing. However, in assessing frequencies, we observe that the number of candidate HSCs identified via single-cell molecular data is substantially higher compared to those identified through flow cytometry, the latter which we demonstrate correlate functionally with genuine long-term repopulating activity.

      With respect to Fgd5, as depicted in our analysis below, there appears to be an enrichment of cells in the cluster identified as HSCs, as well as a significant representation in the cycling cell cluster (Author response image 3). Regarding the projection of other individual genes, the Seurat object we have provided allows for such projections to be readily performed. This offers an opportunity for further exploration and validation of our findings by interested researchers.

      Author response image 3.

      Feature plot depicting Fgd5 expression in the SAILERX UMAP.

      2) During the discussion and in Figure 4, the authors ponder and demonstrate that this culturing system can provoke divert HSC close expansion, having also functional consequences. This a known caveat from the original system, but in more recent publications from the original group (PMID: 36809781 and PMID: 37385251) small alterations into the protocol seem to alleviate clone selection. It's intriguing why the authors have not included these parameters at least in some experiments to show reproducibility or why these studies are not mentioned during the discussion section.

      Thank you for pointing out the recent publications (PMID: 36809781 and PMID: 37385251) that discuss modifications to the HSC culturing system. We appreciate the opportunity to address why these were not included in our discussion or experiments.

      Firstly, it is important to note that these papers were published after the submission of our manuscript. In fact, one of the studies (PMID: 36809781) references the preprint version of our work on Biorxiv. This timing meant that we were unable to consider these studies in our initial manuscript or incorporate any of their findings into our experimental designs.

      Furthermore, as strong advocates for the peer-review system, we prioritize references that have undergone this rigorous process. Preprints, while valuable for early dissemination of research findings, do not offer the same level of scrutiny and validation as peer-reviewed publications. Our approach was to rely on the most relevant and rigorously reviewed literature available to us at the time of submission. This included, most notably, the original and ground-breaking work by Wilkinson et al., which provided a foundational basis for our research.

      We acknowledge that the field of HSC research is rapidly evolving, and new findings, such as those mentioned, are continually emerging. These new studies undoubtedly contribute valuable insights into HSC culturing systems and their optimization. However, given the timing of their publication relative to our study, we were not able to include them in our analysis or discussion.

      3) In this reviewer's opinion, the finding that transplanted cHSC are more quiescent than freshly isolated controls is the most remarkable aspect of this manuscript. There is a point of concern and an intriguing thought that sprouts from this experiment. It is empirical that for this experiment the same HSC dose is transplanted between both groups. This however is technically difficult since the membrane markers from both groups are different. Although after 8 weeks chimerism levels seem to be the same (SF5D) for both groups, it would strengthen the evidence if the author could demonstrate that the same number of HSCs were transplanted in both groups, likely by limiting dose experiments. Finally, it's interesting that even though EE100 cells underwent multiple replication rounds (adding to their replicative aging), these cells remained more quiescent once they were in an in vivo setting. Since the last author of this manuscript has also expertise in HSC aging, it would be interesting to explore whether these cells have "aged" during the expansion process by assessing whether they display an aged phenotype (myeloid-skewed output in serial transplantations and/or assisting their transcriptional age).

      We thank the reviewer for the insightful observations regarding the quiescence of transplanted cultured HSCs. We appreciate the opportunity to clarify the experimental design and its implications, particularly in the context of HSC aging.

      The primary aim of comparing cKit-enriched bone BM cells with cultured cells was to investigate if ex vivo activated HSCs exhibit a similar proliferation pattern to in vivo quiescent HSCs post-transplantation. This comparison was crucial for evaluating the similarity between in vitro cultured and "unmanipulated" HSC behavior. While we acknowledge the technical challenge of transplanting equivalent HSC doses between groups due to differing membrane markers, our study design focused on assessing stem cell activity post-culture. This was quantitatively evaluated by calculating the repopulating units (detailed in Table 1 and Fig S4G), rather than through a limiting dilution assay. There exists a plethora of literature demonstrating the correlation between these assays, although of course the limiting dilution assay is designed to provide a more exact output.

      Regarding the intriguing aspect of HSC aging in the context of ex vivo expansion, our observations indicate that both the subfraction of ex vivo expanded cells (Fig 3 and Fig S3) and the entire cultured population (Fig 4B, Fig 5B, Fig S4A, and Fig S5B) maintain long-term multilineage reconstitution capacity post-transplantation. This suggests that the PVA-culture system does not lead to apparent signs of "HSC aging," despite the cells undergoing active self-renewal in vitro. This is further supported by our serial transplantation experiments, where cultured cells continued to demonstrate multilineage capacity rather than any evident myeloid-biased reconstitution 16 weeks post-second transplantation (see Author response image 4 below).

      Author response image 4.

      Serial transplantation behavior of ex vivo expanded HSCs. 5 million whole BM cells from primary transplantation were transplanted together with 5 million competitor whole BM cells. The control group was transplanted with 100 cHSCs freshly isolated from BM for the primary transplantation. Mann-Whitney test was applied and the asterisks indicate significant differences. , p < 0.05; , p < 0.01; ***, p < 0.0001. Error bars denote SEM.

      However, we recognize the complexity of defining HSC aging and the potential for the culture system to influence certain aspects of this process. The association of aging signature genes with HSC primitiveness and young signature genes with differentiation presents an interesting dichotomy. Our analysis of a native dataset on young mice and the projection of aged signatures onto our multiome data (as shown below for a set of genes known to be induced at higher levels in aged HSCs (f.i. Wahlestedt et al., Nature Comm 2017), aging scRNAseq data from PMID: 36581635) does not directly indicate that the culture system promotes HSC aging compared to aged Lin-Sca+Kit+ cells. Yet, we do not rule out the possibility that culturing may influence other facets of the HSC aging process.

      In conclusion, while our current data do not provide direct evidence of induced HSC aging through the culture system, this remains a compelling area for future research. The potential impact of ex vivo culture on aspects of the HSC aging process warrants further exploration, and we appreciate your suggestion in this regard.

      Author response image 5.

      No evident signs of "molecular aging" following ex vivo expansion of HSCs. Young and aged scRNAseq data from PMID: 36581635 were integrated and explored from the perspective of known genes associating to HSC aging. The top row depicts contribution to UMAPs from young and aged cells (two left plots), cell cycle scores of the cells, and the expression of EPCR and CD48 as examples markers for primitive and more differentiated cells, respectively. The expression of the HSC aging-associated genes Wwtr1, Cavin2, Ghr, Clu and Aldh1a1 was then assessed in the data as well as in the SAILERX UMAP of cultured HSCs (bottom row).

      Reviewer #2 (Public Review):

      Summary:

      In this study, Zhang and colleagues characterise the behaviour of mouse hematopoietic stem cells when cultured in PVA conditions, a recently published method for HSC expansion (Wilkinson et al., Nature, 2019), using multiome analysis (scRNA-seq and scATACseq in the same single cell) and extensive transplantation experiments. The latter are performed in several settings including barcoding and avoiding recipient conditioning. Collectively the authors identify several interesting properties of these cultures namely: 1) only very few cells within these cultures have long-term repopulation capacity, many others, however, have progenitor properties that can rescue mice from lethal myeloablation; 2) single-cell characterisation by combined scRNAseq and scATACseq is not sufficient to identify cells with repopulation capacity; 3) expanded HSCs can be engrafted in unconditioned host and return to quiescence.

      The authors also confirm previous studies that EPCRhigh HSCs have better reconstitution capability than EPCRlow HSCs when transplanted.

      Strengths:

      The major strength of this manuscript is that it describes how functional HSCs are expanded in PVA cultures to a deeper extent than what has been done in the original publication. The authors are also mindful of considering the complexities of interpreting transplantation data. As these PVA cultures become more widely used by the HSC community, this manuscript is valuable as it provides a better understanding of the model and its limitations.

      Novelty aspects include:

      • The authors determined that small numbers of expanded HSCs enable transplantation into non-conditioned syngeneic recipients.

      • This is to my knowledge the first report characterising the output of PVA cultures by multiome. This could be a very useful resource for the field.

      • They are also the first to my knowledge to use barcoding to quantify HSC repopulation capacity at the clonal level after PVA culture.

      • It is also useful to report that HSCs isolated from fetal livers do expand less than their adult counterparts in these PVA cultures.

      Weaknesses:

      • The analysis of the multiome experiment is limited. The authors do not discuss what cell types, other than functional or phenotypic HSCs are present in these cultures (are they mostly progenitors or bona fide mature cells?) and no quantifications are provided.

      The primary objective of our manuscript was to characterize the features of HSCs expanded from ex vivo culture. In this context, our analysis of the single cell multiome sequencing data was predominantly centered on elucidating the heterogeneity of cultures, along with subsequent in vivo functional analysis. This focus is reflected in our comparisons between the molecular features of ex vivo cultured candidate HSCs (cHSCs) and "fresh/unmanipulated" HSCs, as illustrated in Figures 2D-E of our manuscript.

      Our findings provide substantial evidence that ex vivo expanded cells share significant similarities with HSCs isolated from the BM in terms of molecular features, differentiation potential, heterogeneity, and in vivo stem cell activity/function. This suggests that the ex vivo culture system closely mimics several aspects of the in vivo environment, thereby broadening the potential applications of this system for HSC research.

      Regarding the presence of other cell types in the cultures, it is important to note that most cells did not express mature lineage markers, suggesting their immature status. However, we acknowledge the presence of some mature lineage marker-positive cells within the cultures. These cells are represented by the endpoints in our SAILERX UMAP, indicating a progression from immature to more differentiated states within the culture system.

      While the main emphasis of our study was on HSCs, we understand the importance of acknowledging and briefly discussing the presence and characteristics of other cell types in the cultures. This aspect provides a more comprehensive understanding of the culture system and its impact on cellular heterogeneity, although it was for the most part beyond the scope of our studies.

      • Barcoding experiments are technically elegant but do not bring particularly novel insights. We respectfully disagree with the view that our barcoding experiments do not offer novel insights. We believe that the application of barcoding technology in our study represents a significant advancement over previous methods, both in terms of quantitative rigor and ethical considerations.

      In the foundational work by Wilkinson et al., clonal assessments were indeed performed, but these were limited in scope and largely served as proof of concept. Our use of barcoding technology, on the other hand, allowed for a comprehensive quantitative assessment of the expansion potential of HSC clones. This technology enabled us to rigorously quantify the number of HSC clones capable of undergoing at least three self-renewing divisions (e.g. those clones present in 5 separate animals), while also revealing the heterogeneity in their expansion potential.

      One alternative approach could have been to culture single HSCs and distribute the progeny among multiple mice for analysis. However, when considering the sheer number of mice that would be required for such an experiment for quantitative assessments, it becomes evident that viral barcoding is a far superior method. Not only does it provide a more efficient and scalable approach to assessing clonal expansion, but it also significantly reduces the number of animals required for the study, aligning with the principles of ethical research and animal welfare.

      In conclusion, we assert that the barcoding experiments conducted in our study are not only technically robust but also yield novel quantitative insights into the dynamics of HSC clones within expansion cultures. These insights have value not only for current research but also hold potential implications for future applications.

      • The number of mice analysed in certain experiments is fairly low (Figures 1 and 5).

      We would like to clarify our approach in the context of the 3R (replacement, refinement, and reduction) policy, which guides ethical considerations in animal research.

      In alignment with the 3R principles, our study was designed to minimize the use of experimental animals wherever possible. For most experiments, including those presented in Figures 1 and 5, we adopted a standard of using five mice per group. Based on the effect sizes we observed, we concluded that this sample size was appropriate for most parts of our study.

      Specifically for Figure 5, we used two animals per time point, totaling seven animals per treatment group. It is important to note that we did not monitor the same animals over time but used different animals at each time point, as mice had to be sacrificed for the type of analyses conducted. Despite the seemingly small sample size, the results we obtained were remarkably consistent across groups. This consistency provided strong evidence that ex vivo activated HSCs return to a more quiescent state after being transplanted into unconditioned recipients. Given the clear and consistent nature of these results, we determined that including more animals for the purpose of additional statistical analysis was not necessary.

      Our approach reflects a balance between adhering to ethical standards in animal research and ensuring the scientific validity and reliability of our findings. We believe that the sample sizes chosen for our experiments are justified by the consistent and significant results we obtained, which contribute meaningfully to our understanding of HSC behavior post-transplantation.

      • The manuscript remains largely descriptive. While the data can be used to make useful recommendations to future users working with PVA cultures and in general with HSCs, those recommendations could be more clearly spelled out in the discussion.

      We fully agree that many aspects of our study are indeed descriptive, which is reflective of the exploratory and foundational nature of this type of research.

      We have strived to provide clear and direct recommendations for researchers interested in utilizing the PVA culture system, which we believe are evident throughout our manuscript:

      1) Utility of Viral Delivery in HSC Research: Our research, particularly through the use of barcoding experiments, underscores the effectiveness of viral delivery methods in HSC studies. While barcoding itself is a significant tool, it is the underlying process of viral delivery that truly exemplifies the potential of this approach. Our work shows that the culture system is highly conducive to maintaining HSC activity, which is critical for genetic manipulation. This is evident not only in our current study but also in our previous work that included for transient delivery methods (Eldeeb et al., Cell Reports 2023).

      2) Non-conditioned transplantation: Our findings suggest that non-conditioned transplantation can be a valuable method in studying both normal and malignant hematopoiesis. This approach can complement genetic lineage tracing models, providing a more native and physiological context for hematopoietic research. We state this explicitly in our discussion.

      3) Integration with recent technical advances: The combination of the PVA culture system with recent developments in transplantation biology, genome engineering, and single-cell technologies holds significant promise. This integration is likely to yield exciting discoveries with relevance to both basic and clinically oriented hematopoietic research. This is the end statement of our discussion.

      While our manuscript is in a way tailored to those with experience in HSC research, we have made a concerted effort to ensure that the content is accessible and informative to a broader audience, including those less familiar with this area of study. Our intention is to provide a resource that is both informative for experts in the field and approachable for newcomers.

      • The authors should also provide a discussion of the other publications that have used these methods to date.

      We would like to clarify that the scope of literature on the specific methods we employed, particularly in the context of our research objectives, is not extensive. Most of the existing references on these methods come from a relatively narrow range of research groups. In preparing our manuscript, we tried to be comprehensive yet selective in our citations to maintain focus and relevance. Our referencing strategy was guided by the aim to include literature that was most directly pertinent to our study's methodologies and findings.

      Overall, the authors succeeded in providing a useful set of experiments to better interpret what type of HSCs are expanded in PVA cultures. More in-depth mining of their bioinformatic data (by the authors or other groups) is likely to highlight other interesting/relevant aspects of HSC biology in relation to this expansion methodology.

      We are grateful for the overall positive assessment of our work and the recognition of its contributions to understanding HSC expansion in PVA cultures.

      We agree that every study, including ours, has its limitations, particularly regarding the scope and depth of exploration. It is challenging to cover every aspect comprehensively in a single study. Our research aimed to provide a foundational understanding of HSCs in PVA cultures, and we are pleased that this goal appears to have been met.

      We also concur with your point on the potential for further in-depth mining of our bioinformatic data. Our hope is that this data can serve as a resource (or at least a starting point) for other investigators.

      In conclusion, we hope that our responses have adequately addressed your queries and clarified any concerns. We are committed to contributing to the growth of knowledge in HSC research and look forward to the advancements that our study might enable, both within our team and the wider scientific community.

      Reviewer #1 (Recommendations For The Authors):

      1) In Line 150, the R packages can/should be mentioned just in the method section;

      We have moved this text to the methods section.

      2) In Figure F3C adding a legend next to the plot would assist the reader in identifying which populations are referred to, as the same color pellet is used for other panels;

      We have now adjusted the figure legend position to make it more clear for the reader.

      3) In Figure 4D, for the pre-culture experiments 1000 cHSCs were used and then in the post-culture 1200 cHSCs were used. Can the authors justify the different numbers?

      The decision to use 1000 cHSCs in the pre-culture experiments and 1200 cHSCs in the post-culture experiments was not based on a specific rationale favoring one cell number over the other. In our Method section, we have detailed our experimental design, which was structured to provide robust and reliable readouts of HSC behavior and characteristics in different conditions.

      We consider the two cell numbers – 1000 and 1200 – to be quite similar in the context of our experimental aims. Since the readouts here are based on clonal assessments, this slight difference in cell numbers is unlikely to significantly impact the overall conclusions drawn from these experiments. The primary focus of our study was on qualitative aspects of HSC behavior and function, rather than on quantitative differences that might arise from small variations in initial cell numbers.

      4) In SF5F it would help readers if a line plot (per group) was also shown together with the dot plots. Moreover, applying statistics to the trend lines (Wilcoxon, for example) would strengthen the argument that cHSCs divide less than control cells.

      We would like to clarify that the data presented in SF5F were derived from different animals at each respective time point. As such, the data points at each time point represent independent measurements from separate animals, rather than a continuous measurement from the same set of animals over time. Therefore, creating a line plot that connects each time point within a group would inadvertently convey a misleading impression of a longitudinal study on the same animals, which is not reflective of the actual experimental design. Instead, the dot plot format was chosen as it more accurately depicts the independent and discrete nature of the measurements at each time point. Our current data presentation method was selected to provide the most accurate and transparent representation of our findings.

      Reviewer #2 (Recommendations For The Authors):

      Listed below are recommendations to further improve this manuscript:

      Major Comments

      1) Fig 1: the authors showed that EPCRhigh HSCs have better reconstitution capability than EPCRlow HSCs via bone marrow transplantation. Additionally, mice receiving cultured EPCRhigh SLAM LSK cells were more efficiently radioprotected than those receiving PVA expanded EPCRlow SLAM LSK.

      a. In addition to Fig.1F, authors should show the lineage distributions and chimerism of mice receiving cultured EPCRhigh and EPCRlow SLAM LSK respectively.

      We have indeed analyzed the lineage distribution in these experiments, and our findings indicate no statistically significant differences between the groups (see graph in Author response image 6). This suggests that the cultured EPCRhigh and EPCRlow SLAM LSK cells do not preferentially differentiate into specific lineages in a way that would impact the overall interpretation of our results.

      Author response image 6.

      Regarding the chimerism in peripheral blood (PB) lineages, Fig. 1F in our manuscript currently shows the PB myeloid chimerism. We chose to focus on this parameter as it most directly relates to our study's objectives. We did here not transplant with competitor cells, and in most cases, the chimerism levels reached 100% for lineages other than T cells (T cells being more radioresistant). Based on our analysis, including data on chimerism in other PB lineages would not significantly enhance the understanding of the functional capacity of the transplanted cells, as the myeloid chimerism data already provides a robust indicator of their engraftment and functional potential.

      We believe that our current presentation of data in Fig. 1F, along with the additional analyses provided in the results section, offers a comprehensive understanding of the behavior and potential of the cultured EPCRhigh and EPCRlow SLAM LSK cells.

      b. Fig1F: only 5 mice were used in each group. Could this result occur by chance? Testing with Fisher's exact test with the data provided results in p=0.16. The authors should consider adding more animals or adding the p-value above (or from another relevant test) for readers' consideration.

      We acknowledge the point that only five mice were used in each group and understand the concern regarding the robustness of our findings.

      As correctly noted, applying Fisher's exact test to the data in Fig. 1F results in a p-value which does not reach the conventional threshold for statistical significance. However, one might also consider the analysis of the KM survival curve, which associated with a p-value of 0.0528 (Fig. 1F, left graph below; Gehan-Breslow-Wilcoxon test). A similar test on the single-cell culture transplantation experiment (Fig. 1E, right graph below) also demonstrated statistical significance (p-value = 0.0485).

      While these p-values meet (or are very close to) the conventional criteria for statistical significance (p<0.05), we have chosen to place greater emphasis on effect sizes rather than strictly on p-values. This decision is based on our belief that effect sizes provide a more direct and meaningful measure of the biological impact observed in our experiments. We find that the effect sizes observed are compelling and consistent with the overall narrative of our study.

      Author response image 5.

      2) The characterisation of the multiome experiment is highly underdeveloped.

      a. From an experimental point of view, it is not clear how the PVA culture for this experiment was started. Are there technical/biological replicates? Have several PVA cultures been pooled together?

      We have included these details in the revised text to ensure a comprehensive understanding of our experimental setup.

      b. Fig2B: The authors should present more data as to how each of the clusters was annotated (bubble plot of marker genes used for annotation?) and importantly the percentage of cells in each of the clusters. It is particularly relevant to note what % is the cluster annotated as HSCs and compare that to the % of phenotypic HSCs and the % repopulating HSCs calculated in the transplantation experiments.

      In our study, the annotation of clusters was primarily based on reference genes for cell types from prior works in the field, such as from our recent work (Konturek-Ciesla et al., Cell Reports 2023). Additionally, we employed transcription factor (TF) motifs to assign identities to these clusters. This approach is relatively standard in the field, and we believe it provides a robust framework for our analysis. We included information on some of the key TF motifs used to guide our annotations.

      Regarding the assignment of a percentage to cells within the HSC cluster, we initially had reservations about the utility of this measure. This is because the transcriptional identity of HSCs might not align precisely with their identity based on candidate HSC protein markers. There are complexities related to transcriptional continuums that could influence the interpretation of such data. However, acknowledging your request for this information, we have now included the percentage of cells in the HSC cluster in Fig. 2B for reference.

      We also wish to highlight that when isolating EPCR+ cells, which encompasses a range of CD48 expression, clustering becomes much less distinct, as shown in Fig. 2E. Most of these cells do not demonstrate long-term functional HSC activity in a transplantation setting (as presented in Figure 3). This observation underscores the challenges in deducing HSC identity based solely on molecular data and reinforces the importance of functional validation.

      c. Are there any mature cells in these PVA cultures? The annotations presented in the table under the UMAP are vague: Are cluster 4 monocytes or monocytes progenitors? Same for clusters 0,1 and 7 - are these progenitors or more mature cells? How were HPCs (cluster 3) distinguished from cHSCs (cluster 5)?

      We agree with your observation that the annotations for certain clusters, such as clusters 4, 0, 1, and 7, as well as the distinction between HPCs (cluster 3) and cHSCs (cluster 5), appear vague. This vagueness to some extent stems from the challenges inherent in comparing cultured cells to their counterparts isolated directly from animals. Most reference data defining cell types are derived from cells in their native state, and less is known about how these definitions translate to the progeny of HSPCs cultured in vitro.

      In our study, we used the expression of reference genes and enriched transcription factor motifs to annotate clusters. This method, while useful, has its limitations in precisely defining the maturation stage of cells in culture. The enrichment of lineage-defining factors at the ends of the UMAP suggests the presence of more mature cells, whereas the lack of lineage marker expression in the majority of cells implies a general lack of terminal differentiation.

      This issue is not necessarily unique to the culture situation, as similar challenges in cell type annotation are encountered in other contexts, such as the analysis of granulocyte-macrophage progenitors in bone marrow, where a vast range of cell types and clusters are identified (e.g., PMID: 26627738). To try to address these challenges, we employed an approach detailed in the methods section under the header "iv. ATAC processing and cluster annotation." We assessed marker genes for clusters using Enrichr for cell types, relying on databases designed to provide gene expression identities to defined cell types. This methodology informed our references to the clusters.

      In summary, while our annotations provide a general overview of the cell types present in the cultures, we acknowledge the complexities and limitations in precisely defining these types, particularly in distinguishing between progenitors and more mature cells. We hope this explanation clarifies our approach and the considerations behind our cluster annotations, but at the same time feel that the alternative approaches have their own drawbacks.

      d. What is the meaning of the trajectories presented in Figure 2C? In the absence of a comparison to i) what is observed either when HSCs are cultured in control/non-expanding conditions ii) an in vivo landscape of differentiation in mouse bone marrow; this analysis does not bring any relevant piece of information.

      We understand the perspective on comparisons to control conditions and in vivo differentiation landscapes. However, we respectfully disagree with the viewpoint that the analysis that we have performed does not bring relevant information.

      The trajectory analysis in Figure 2C is intended to provide insights into the cell types generated in our PVA cultures and the potential differentiation pathways they may follow. This kind of analysis is particularly valuable in the context of understanding how in vitro cultures can support HSC maintenance and differentiation, which is a topic of significant interest in the field. For instance, studies like PMID: 31974159 have highlighted the importance of combining in vitro HSC cultures with molecular investigations.

      While we acknowledge that our analysis would benefit from a direct comparison to control or non-expanding conditions, as well as to an in vivo differentiation landscape, we believe that the information provided by our current analysis still holds substantial value. It offers a glimpse into the possible cellular dynamics and differentiation routes within our culture system, which can be a valuable reference point for other investigators working with similar systems.

      Regarding the confidence in computed differentiation trajectories, we recognize that this is an area where caution is warranted. Computational approaches to define cell differentiation pathways have inherent limitations and should be interpreted within the context of their assumptions and the data available. This challenge is not unique to our work but is a broader issue in the field of computational biology.

      In conclusion, while we agree that additional comparative analyses could further enrich our findings, we maintain that the trajectory analysis presented in Figure 2C contributes meaningful insights into cell differentiation in our PVA culture system. We believe these insights are of interest and value to researchers exploring the complex interplay of HSC maintenance and differentiation in vitro.

      3) The addition of barcoding experiments is appreciated. However, it is already known that upon transplantation clonal output is highly heteroegeneous, with a small number of clones predominating over others. This is particularly the case after myeloablation conditioning.

      a. The "pre-culture" experimental design makes sense. The "post-culture" one is however ambiguous in terms of result interpretation. The authors observe fewer clones contributing to a large proportion of the graft (>5%) than in the "pre-culture" setting. Their interpretation is that expanded HSCs are functionally more homogeneous than the input HSCs. However, in the pre-culture experiment, there are 19 days of expansion during which there will be selection pressures over culture plus ongoing differentiation. In the post-culture experiment, there is no time for such pressures to be exerted. Therefore the conclusion drawn by the authors is not the only conclusion. I would encourage the authors to compare the "pre-culture" experiment to an experiment in which cHSCs are in culture for 48h, then barcoded, and then transplanted. This would be much more informative and would allow a proper comparison of expanded HSCs vs input HSCs.

      We understand the perspective that a shorter culture period would reduce the influence of selection pressures and differentiation, potentially allowing for a more direct comparison between expanded HSCs and input HSCs. However, we would like to point out that similar experiments have been conducted in the past, as referenced in our work (PMID: 28224997) and others (PMID: 21964413). These studies have demonstrated a significant heterogeneity in the reconstituting clones when barcoding is done early and cells are transplanted directly.

      In light of previous research, we are confident that our methodology — tracking the fates of candidate HSC clones throughout the culture period and assessing the outcomes of individual cells from these expanding clones — yields significant and pertinent insights. We want to highlight the significance of barcoding cells late in the culture, a strategy that allows us to barcode cells that have already been subjected to potential selection pressures within the culture environment. Our primary objective is to investigate the effects of these selection pressures on the subsequent in vivo behavior of the cells that emerge from this process. By focusing on this aspect, we aim to deepen the understanding of how in vitro culture conditions influence the functional characteristics and heterogeneity of HSCs after expansion. We believe this approach provides a unique perspective on the adaptive changes HSCs undergo during culture and their implications for transplantation efficacy and HSC biology. Our study thus addresses a critical question in the field: how do the conditions and selection pressures inherent to in vitro culture impact the quality and behavior of HSCs upon their return to an in vivo environment?

      b. Another experiment the authors may consider is barcoding in unconditioned recipients as there the bottleneck of selecting specific clones should be lower. In addition, this could nicely complement the return to quiescence observed in Figure 5 (see point below)

      We agree that this experiment could provide valuable insights, particularly in understanding how different selection pressures might affect HSC clones in various transplantation contexts. It would indeed be a worthwhile complement to our observations in Figure 5 regarding the return to quiescence of HSCs post-transplantation.

      However, we would like to point out that our study already includes a substantial amount of data and analyses aimed at addressing specific research questions within this defined scope. The addition of an experiment with barcoding in unconditioned recipients, while undoubtedly relevant and interesting, would extend beyond the boundaries we set for this particular study.

      4) Figure 5D-F, only 2 animals per condition were tested, so the experiment is underpowered for any statistics. How about cell viability of cHSC after in vitro culture? The authors have also not tested whether there is a difference in cell viability post-transplant between EE100 and control. In addition, comparing cell cycle profiles of donor EPCR+ HSCs in these transplanted mice would provide additional evidence to support the conclusion.

      Regarding the sample size, we acknowledge that only two animals per condition were used in these experiments, which limits the statistical power for robust quantitative analysis. This decision was guided by ethical considerations to minimize animal use, in line with the 3Rs principle (Replacement, Reduction, Refinement). Despite the small sample size, we believe that the strong trends observed in these experiments are indicative and consistent with our broader findings, although we recognize the limitations in terms of statistical generalization. At the same time, as we have written in the public response: "Specifically for Figure 5, we used two animals per time point, totaling seven animals per treatment group. It is important to note that we did not monitor the same animals over time but used different animals at each time point, as mice had to be sacrificed for the type of analyses conducted."

      In the context of post-transplant analysis, conducting separate viability assessments on transplanted cells is not typically informative. This is because non-viable cells would naturally be eliminated through biological processes such as phagocytosis soon after transplantation. Therefore, any post-transplant viability analysis would not provide meaningful insights into the engraftment potential or behavior of the transplanted cells.

      However, it is important to note that in all our cell isolation and analysis protocols, we routinely include viability markers. This practice ensures that the cell populations we study and report on are indeed viable. Including these markers is a standard part of our methodology and contributes to the accuracy and reliability of our data.

      Regarding the comparison of cell cycle profiles, we chose to focus on the cell trace assay as a means to monitor and track cell division history, which directly addresses the central theme here - informing on the proliferation and quiescence dynamics of transplanted HSCs. While comparing cell cycle profiles could perhaps offer an additional layer of information, we did not deem it essential for our core objectives.

      5) Several publications have used these PVA cultures and made comments on their strengths and limitations. They do not overlap with this study but should be discussed here for completeness (for example Che et al, Cell Reports, 2022; Becker et al., Cell Stem Cell, 2023; Igarashi, Blood Advances, 2023).

      See comments to reviewer 1.

      Minor Comments

      Figure 1C: should add in the legend that this is in peripheral blood.

      Figure 2C: typo in the title.

      Figure 3A: typo in "equivalent". We thank the reviewer for catching these errors, which we have now corrected.

      Figure 3B and 3C: symbol colours of EPCRhighCD48+ and EPCR- are too similar to distinguish the 2 groups easily. We highly recommend using contrasting colours.

      For easier visualization, we have changed the symbol types and colors in our revised version.

      Fig3B and S3A-B: authors should show statistical significance in comparing the 4 fractions. We have now added this information.

      In the discussion, the authors rightly point out a paper that described EPCR+ HSCs. There are other papers that also looked at EPCR intensity (high vs low), for example, Umemoto et al., EMBO J, 2022.

      While we acknowledge the relevance of the paper you mentioned, we faced constraints in the number of references we could include. Therefore, we prioritized citing the original demonstration of EPCR as an HSC marker, particularly focusing on the work by the Mulligan laboratory, which established that cells expressing the highest levels of EPCR exhibit the most potent HSC activity. We believe this reference most directly supports the core focus of our study and provides the necessary context for our findings.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This valuable study reports on the potential of neural networks to emulate simulations of human ventricular cardiomyocyte action potentials for various ion channel parameters with the advantage of saving simulation time in certain conditions. The evidence supporting the claims of the authors is solid, although the inclusion of open analysis of drop-off accuracy and validation of the neural network emulators against experimental data would have strengthened the study. The work will be of interest to scientists working in cardiac simulation and quantitative pharmacology.

      Thank you for the kind assessment. It is important for us to point out that, while limited, experimental validation was performed in this study and is thoroughly described in the work.

      Reviewer 1 - Comments

      This manuscript describes a method to solve the inverse problem of finding the initial cardiac activations to produce a desired ECG. This is an important question. The techniques presented are novel and clearly demonstrate that they work in the given situation. The paper is well-organized and logical.

      Strengths:

      This is a well-designed study, which explores an area that many in the cardiac simulation community will be interested in. The article is well written and I particularly commend the authors on transparency of methods description, code sharing, etc. - it feels rather exemplary in this regard and I only wish more authors of cardiac simulation studies took such an approach. The training speed of the network is encouraging and the technique is accessible to anyone with a reasonably strong GPU, not needing specialized equipment.

      Weaknesses:

      Below are several points that I consider to be weaknesses and/or uncertainties of the work:

      C I-(a) I am not convinced by the authors’ premise that there is a great need for further acceleration of cellular cardiac simulations - it is easy to simulate tens of thousands of cells per day on a workstation computer, using simulation conditions similar to those of the authors. I do not really see an unsolved task in the field that would require further speedup of single-cell simulations. At the same time, simulations offer multiple advantages, such as the possibility to dissect mechanisms of the model behaviour, and the capability to test its behaviour in a wide array of protocols - whereas a NN is trained for a single purpose/protocol, and does not enable a deep investigation of mechanisms. Therefore, I am not sure the cost/benefit ratio is that strong for single-cell emulation currently.

      An area that is definitely in need of acceleration is simulations of whole ventricles or hearts, but it is not clear how much potential for speedup the presented technology would bring there. I can imagine interesting applications of rapid emulation in such a setting, some of which could be hybrid in nature (e.g. using simulation for the region around the wavefront of propagating electrical waves, while emulating the rest of the tissue, which is behaving more regularly/predictable, and is likely to be emulated well), but this is definitely beyond of the scope of this article.

      Thank you for this point of view. Simulating a population of few thousand cells is completely feasible on single desktop machines and for fixed, known parameters, emulation may not fill ones need. Yet we still foresee a great untapped potential for rapid evaluations of ionic models, such as for the gradient-based inverse problem, presented in the paper. Such inverse optimization requires several thousand evaluations per cell and thus finding maximum conductances for the presented experimental data set (13 cell pairs control/drug → 26 APs) purely through simulations would require roughly a day of simulation time even in a very conservative estimation (3.5 seconds per simulation, 1000 simulations per optimization). Additionally, the emulator provides local sensitivity information between the AP and maximum conductances in the form of the gradient, which enables a whole new array of efficient optimization algorithms [Beck, 2017]. To further emphasize these points, we added the number of emulations and runtime of each conducted experiment in the specific section and a paragraph in the discussion that addresses this point:

      "Cardiomyocyte EP models are already very quick to evaluate in the scale of seconds (see Section 2.3.1), but the achieved runtime of emulations allows to solve time consuming simulation protocols markedly more efficient. One such scenario is the presented inverse maximum conductance estimation problem (see Section 3.1.2 and Section 3.1.3), where for estimating maximum conductances of a single AP, we need to emulate the steady state AP at least several hundred times as part of an optimization procedure. Further applications include the probabilistic use of cardiomyocyte EP models with uncertainty quantification [Chang et al., 2017, Johnstone et al., 2016] where thousands of samples of parameters are potentially necessary to compute a distribution of the steady-state properties of subsequent APs, and the creation of cell populations [Muszkiewicz et al., 2016, Gemmell et al., 2016, Britton et al., 2013]." (Section 4.2)

      We believe that rapid emulations are valuable for several use-cases, where thousands of evaluations are necessary. These include the shown inverse problem, but similarly arise in uncertainty quantification, or cardiomyocyte population creation. Similarly, new use-cases may arise as such efficient tools become available. Additionally, we provided the number of evaluations along with the runtimes for each of the conducted experiments, showing how essential these speedups are to realize these experiments in reasonable timeframes. Utilizing these emulations in organ-level electrophysiological models is a possibility, but the potential problems in such scenarios are much more varied and depend on a number of factors, making it hard to pin-point the achievable speed-up using ionic emulations.

      C I-(b) The authors run a cell simulation for 1000 beats, training the NN emulator to mimic the last beat. It is reported that the simulation of a single cell takes 293 seconds, while emulation takes only milliseconds, implying a massive speedup. However, I consider the claimed speedup achieved by emulation to be highly context-dependent, and somewhat too flattering to the presented method of emulation. Two specific points below:

      First, it appears that a not overly efficient (fixed-step) numerical solver scheme is used for the simulation. On my (comparable, also a Threadripper) CPU, using the same model (”ToR-ORd-dyncl”), but a variable step solver ode15s in Matlab, a simulation of a cell for 1000 beats takes ca. 50 seconds, rather than 293 of the authors. This can be further sped up by parallelization when more cells than available cores are simulated: on 32 cores, this translates into ca. 2 seconds amortized time per cell simulation (I suspect that the NN-based approach cannot be parallelized in a similar way?). By amortization, I mean that if 32 models can be simulated at once, a simulation of X cells will not take X50 seconds, but (X/32)50. (with only minor overhead, as this task scales well across cores).

      Second, and this is perhaps more important - the reported speed-up critically depends on the number of beats in the simulation - if I am reading the article correctly, the runtime compares a simulation of 1000 beats versus the emulation of a single beat. If I run a simulation of a single beat across multiple simulated cells (on a 32-core machine), the amortized runtime is around 20 ms per cell, which is only marginally slower than the NN emulation. On the other hand, if the model was simulated for aeons, comparing this to a fixed runtime of the NN, one can get an arbitrarily high speedup.

      Therefore, I’d probably emphasize the concrete speedup less in an abstract and I’d provide some background on the speedup calculation such as above, so that the readers understand the context-dependence. That said, I do think that a simulation for anywhere between 250 and 1000 beats is among the most reasonable points of comparison (long enough for reasonable stability, but not too long to beat an already stable horse; pun with stables was actually completely unintended, but here it is...). I.e., the speedup observed is still valuable and valid, albeit in (I believe) a somewhat limited sense.

      We agree that the speedup comparison only focused on a very specific case and needs to be more thoroughly discussed and benchmarked. One of the main strengths of the emulator is to cut the time of prepacing to steady state, which is known to be a potential bottleneck for the speed of the single-cell simulations. The time it takes to reach the steady state in the simulator is heavily dependant on the actual maximum conductance configuration and the speed-up is thus heavily reliant on a per-case basis. The differences in architecture of the simulator and emulator further makes direct comparisons very difficult. In the revised version we now go into more detail regarding the runtime calculations and also compare it to an adaptive time stepping simulation (Myokit [Clerx et al., 2016]) in a new subsection:

      "The simulation of a single AP (see Section 2.1) sampled at a resolution of 20kHz took 293s on one core of a AMD Ryzen Threadripper 2990WX (clock rate: 3.0GHz) in CARPentry. Adaptive timestep solver of variable order, such as implemented in Myokit [Clerx et al., 2016], can significantly lower the simulation time (30s for our setup) by using small step sizes close to the depolarization (phase 0) and increasing the time step in all other phases. The emulation of a steady state AP sampled at a resolution of 20kHz for t ∈ [−10, 1000]ms took 18.7ms on a AMD Ryzen 7 3800X (clock rate: 3.9GHz) and 1.2ms on a Nvidia A100 (Nvidia Corporation, USA), including synchronization and data copy overhead between CPU and GPU.

      "The amount of required beats to reach the steady state of the cell in the simulator has a major impact on the runtime and is not known a-priori. On the other hand, both simulator and emulator runtime linearly depends on the time resolution, but since the output of the emulator is learned, the time resolution can be chosen at arbitrarily without affecting the AP at the sampled times. This makes direct performance comparisons between the two methodologies difficult. To still be able to quantify the speed-up, we ran Myokit using 100 beats to reach steady state, taking 3.2s of simulation time. In this scenario, we witnessed a speed-up of 171 and 2 · 103 of our emulator on CPU and GPU respectively (again including synchronization and data copy overhead between CPU and GPU in the latter case). Note that both methods are similarly expected to have a linear parallelization speedup across multiple cells.

      For the inverse problem, we parallelized the problem for multiple cells and keep the problem on the GPU to minimize the overhead, achieving emulations (including backpropagation) that run in 120µs per AP at an average temporal resolution of 2kHz. We consider this the peak performance which will be necessary for the inverse problem in Section 3.1.2." (Section 2.3.1)

      Note that the mentioned parallelization across multiple machines/hardware applies equally to the emulator and simulator (linear speed-up), though the utilization for single cells is most likely different (single vs. multi-cell parallelization).

      C I-(c) It appears that the accuracy of emulation drops off relatively sharply with increasing real-world applicability/relevance of the tasks it is applied to. That said, the authors are to be commended on declaring this transparently, rather than withholding such analyses. I particularly enjoyed the discussion of the not-always amazing results of the inverse problem on the experimental data. The point on low parameter identifiability is an important one and serves as a warning against overconfidence in our ability to infer cellular parameters from action potentials alone. On the other hand, I’m not that sure the difference between small tissue preps and single cells which authors propose as another source of the discrepancy will be that vast beyond the AP peak potential (probably much of the tissue prep is affected by the pacing electrode?), but that is a subjective view only. The influence of coupling could be checked if the simulated data were generated from 2D tissue samples/fibres, e.g. using the Myokit software.

      Given the points above (particularly the uncertain need for further speedup compared to running single-cell simulations), I am not sure that the technology generated will be that broadly adopted in the near future.

      However, this does not make the study uninteresting in the slightest - on the contrary, it explores something that many of us are thinking about, and it is likely to stimulate further development in the direction of computationally efficient emulation of relatively complex simulations.

      We agree that the parameter identifiability is an important point of discussion. While the provided experimental data gave us great insights already, we still believe that given the differences in the setup, we can not draw conclusions about the source of inaccuracies with absolute certainty. The suggested experiment to test the influence of coupling is of interest for future works and has been integrated into the discussion. Further details are given in the response to the recommendation R III- (t)

      Reviewer 2 - Comments

      Summary:

      This study provided a neural network emulator of the human ventricular cardiomyocyte action potential. The inputs are the corresponding maximum conductances and the output is the action potential (AP). It used the forward and inverse problems to evaluate the model. The forward problem was solved for synthetic data, while the inverse problem was solved for both synthetic and experimental data. The NN emulator tool enables the acceleration of simulations, maintains high accuracy in modeling APs, effectively handles experimental data, and enhances the overall efficiency of pharmacological studies. This, in turn, has the potential to advance drug development and safety assessment in the field of cardiac electrophysiology.

      Strengths:

      1) Low computational cost: The NN emulator demonstrated a massive speed-up of more than 10,000 times compared to the simulator. This substantial increase in computational speed has the potential to expedite research and drug development processes

      2) High accuracy in the forward problem: The NN emulator exhibited high accuracy in solving the forward problem when tested with synthetic data. It accurately predicted normal APs and, to a large extent, abnormal APs with early afterdepolarizations (EADs). High accuracy is a notable advantage over existing emulation methods, as it ensures reliable modeling and prediction of AP behavior

      C II-(a) Input space constraints: The emulator relies on maximum conductances as inputs, which explain a significant portion of the AP variability between cardiomyocytes. Expanding the input space to include channel kinetics parameters might be challenging when solving the inverse problem with only AP data available.

      Thank you for this comment. We consider this limitation a major drawback, as discussed in Section 4.3. Identifiability is already an issue when only considering the most important maximum conductances. Further extending the problem to include kinetics will most likely only increase the difficulty of the inverse problem. For the forward problem though, it might be of interest to people studying ionic models to further analyze the effects of channel kinetics.

      C II-(b) Simplified drug-target interaction: In reality, drug interactions can be time-, voltage-, and channel statedependent, requiring more complex models with multiple parameters compared to the oversimplified model that represents the drug-target interactions by scaling the maximum conductance at control. The complex model could also pose challenges when solving the inverse problem using only AP data.

      Thank you pointing out this limitation. We slightly adapted Section 4.3 to further highlight some of these limitations. Note however that the experimental drugs used have been shown to be influenced by this drug interaction in varying degrees [Li et al., 2017] (e.g. dofetilide vs. cisapride). However, the discrepancy in identifiability was mostly channel-based (0%-100%), whereas the variation in identifiability between drugs was much lower (39%-66%).

      C II-(c) Limited data variety: The inverse problem was solved using AP data obtained from a single stimulation protocol, potentially limiting the accuracy of parameter estimates. Including AP data from various stimulation protocols and incorporating pacing cycle length as an additional input could improve parameter identifiability and the accuracy of predictions.

      The proposed emulator architecture currently only considers the discussed maximum conductances as input and thus can only compensate when using different stimulation protocols. However, the architecture itself does not prohibit including any of these as parameters for future variants of the emulator. We potentially foresee future works extending on the architecture with modified datasets to include other parameters of importance, such as channel kinetics, stimulation protocols and pacing cycle lengths. These will however vary between the actual use-cases one is interested in.

      C II-(d) Larger inaccuracies in the inverse problem using experimental data: The reasons for this result are not quite clear. Hypotheses suggest that it may be attributed to the low parameter identifiability or the training data set were collected in small tissue preparation.

      The low parameter identifiability on some channels (e.g. GK1) poses a problem, for which we state multiple potential reasons. As of yet, no final conclusion can be drawn, warranting further research in this area.

      Reviewer 3 - Comments

      Summary:

      Grandits and colleagues were trying to develop a new tool to accelerate pharmacological studies by using neural networks to emulate the human ventricular cardiomyocyte action potential (AP). The AP is a complex electrical signal that governs the heartbeat, and it is important to accurately model the effects of drugs on the AP to assess their safety and efficacy. Traditional biophysical simulations of the AP are computationally expensive and time-consuming. The authors hypothesized that neural network emulators could be trained to predict the AP with high accuracy and that these emulators could also be used to quickly and accurately predict the effects of drugs on the AP.

      Strengths:

      One of the study’s major strengths is that the authors use a large and high-quality dataset to train their neural network emulator. The dataset includes a wide range of APs, including normal and abnormal APs exhibiting EADs. This ensures that the emulator is robust and can be used to predict the AP for a variety of different conditions.

      Another major strength of the study is that the authors demonstrate that their neural network emulator can be used to accelerate pharmacological studies. For example, they use the emulator to predict the effects of a set of known arrhythmogenic drugs on the AP. The emulator is able to predict the effects of these drugs, even though it had not been trained on these drugs specifically.

      C III-(a) One weakness of the study is that it is important to validate neural network emulators against experimental data to ensure that they are accurate and reliable. The authors do this to some extent, but further validation would be beneficial. In particular for the inverse problem, where the estimation of pharmacological parameters was very challenging and led to particularly large inaccuracies.

      Thank you for this recommendation. Further experimental validation of the emulator in the context of the inverse problem would be definitely beneficial. Still, an important observation is that the identifiability varies greatly between channels. While the inverse problem is an essential reason for utilizing the emulator, it is also empirically validated for the pure forward problem and synthetic inverse problem, together with the (limited) experimental validation. The sources of problems arising in estimating the maximum conductances of the experimental tissue preparations are important to discuss in future works, as we now further emphasize in the discussion. See also the response to the recommendations R III-(t).

      Reviewer 1 - Recommendations

      R I-(a) Could further detail on the software used for the emulation be provided? E.g. based on section 2.2.2, it sounds like a CPU, as well as GPU-based emulation, is possible, which is neat.

      Indeed as suspected, the emulator can run on both CPUs and GPUs and features automatic parallelization (per-cell, but also multi-cell), which is enabled by the engineering feats of PyTorch [Paszke et al., 2019]. This is now outlined in a bit more detail in Sec. 2 and 5.

      "The trained emulator is provided as a Python package, heavily utilizing PyTorch [Paszke et al., 2019] for the neural network execution, allowing it to be executed on both CPUs and NVidia GPUs." (Section 5)

      R I-(b) I believe that a potential use of NN emulation could be also in helping save time on prepacing models to stability - using the NN for ”rough” prepacing (e.g. 1000 beats), and then running a simulation from that point for a smaller amount of time (e.g. 50 beats). One could monitor the stability of states, so if the prepacing was inaccurate, one could quickly tell that these models develop their state vector substantially, and they should be simulated for longer for full accuracy - but if the model was stable within the 50 simulated beats, it could be kept as it is. In this way, the speedup of the NN and accuracy and insightfulness of the simulation could be combined. However, as I mentioned in the public review, I’m not sure there is a great need for further speedup of single-cell simulations. Such a hybrid scheme as described above might be perhaps used to accelerate genetic algorithms used to develop new models, where it’s true that hundreds of thousands to millions of cells are eventually simulated, and a speedup there could be practical. However one would have to have a separate NN trained for each protocol in the fitness function that is to be accelerated, and this would have to be retrained for each explored model architecture. I’m not sure if the extra effort would be worth it - but maybe yes to some people.

      Thank you for this valuable suggestion. As pointed out in C I-(a), one goal of this study was to reduce the timeconsuming task of prepacing. Still, in its current form the emulator could not be utilized for prepacing simulators, as only the AP is computed by the emulator. For initializing a simulation at the N-th beat, one would additionally need all computed channel state variables. However, a simple adaptation of the emulator architecture would allow to also output the mentioned state variables.

      R I-(c) Re: ”Several emulator architectures were tried on the training and validation data sets and the final choice was hand-picked as a good trade-off between high accuracy and low computational cost” - is it that the emulator architecture was chosen early in the development, and the analyses presented in the paper were all done with one previously selected architecture? Or is it that the analyses were attempted with all considered architectures, and the well-performing one was chosen? In the latter case, this could flatter the performance artificially and a test set evaluation would be worth carrying out.

      We apologize for the unclear description of the architectural validation. The validation was in fact carried out with 20% of the training data (data set #1), which is however completely disjoint with the test set (#2, #3, #4, formerly data set #1 and #2) on which the evaluation was presented. To further clarify the four different data sets used in the study, we now dedicated an additional section to describing each set and where it was used (see also our response below R I-(d)), and summarize them in Table 1, which we also added at R II-(a). The cited statement was slightly reworked.

      "Several emulator architectures were tried on the training and validation data sets and the final choice was hand-picked as a good trade-off between high accuracy on the validation set (#1) and low computational runtime cost." (Section 2.2.2)

      R I-(d) When using synthetic data for the forward and inverse problem, with the various simulated drugs, is it that split of the data into training/validation test set was done by the drug simulated (i.e., putting 80 drugs and the underlying models in the training set, and 20 into test set)? Or were the data all mixed together, and 20% (including drugs in the test set) were used for validation? I’m slightly concerned by the potential of ”soft” data leaks between training/validation sets if the latter holds. Presumably, the real-world use case, especially for the inverse problem, will be to test drugs that were not seen in any form in the training process. I’m also not sure whether it’s okay to reuse cell models (sets of max conductances) between training and validation tests - wouldn’t it be better if these were also entirely distinct? Could you please comment on this?

      We completely agree with the main points of apprehension that training, validation and test sets all serve a distinct purpose and should not be arbitrarily mixed. However, this is only a result of the sub-optimal description of our datasets, which we heavily revised in Section 2.2.1 (Data, formerly 2.3.1). We now present the data using four distinct numbers: The initial training/validation data, now called data set #1 (formerly no number), is split 80%/20% into training and validation sets (for architectural choices) respectively. The presented evaluations in Section 2.3 (Evaluation) are purely performed on data set #2 (normal APs, formerly #1), #3 (EADs, formerly #2) and #4 (experimental).

      R I-(e) For the forward problem on EADs, I’m not sure if the 72% accuracy is that great (although I do agree that the traces in Fig 12-left also typically show substantial ICaL reactivation, but this definitely should be present, given the IKr and ICaL changes). I would suggest that you also consider the following design for the EAD investigation: include models with less severe upregulation of ICaL and downregulation of IKr, getting a population of models where a part manifests EADs and a part does not. Then you could run the emulator on the input data of this population and be able to quantify true, falsexpositive, negative detections. I think this is closer to a real-world use case where we have drug parameters and a cell population, and we want to quickly assess the arrhythmic risk, with some drugs being likely entirely nonrisky, some entirely risky, and some between (although I still am not convinced it’s that much of an issue to just simulate this in a couple of thousands of cells).

      Thank you for pointing out this alternative to address the EAD identification task. Even though the values chosen in Table 2 seem excessively large, we still only witnessed EADs in 171 of the 950 samples. Especially border cases, which are close to exhibiting EADs are hardest to estimate for the NN emulator. As suggested, we now include the study with the full 950 samples (non-EAD & EAD) and classify the emulator AP into one of the labels for each sample. The mentioned 72.5% now represent the sensitivity, whereas our accuracy in such a scenario becomes 90.8% (total ratio of correct classifications):

      "The data set #3 was used second and Appendix C shows all emulated APs, both containing the EAD and non-EAD cases. The emulation of all 950 APs took 0.76s on the GPU specified in Section 2.2.3 We show the emulation of all maximum conductances and the classification of the emulation. The comparison with the actual EAD classification (based on the criterion outlined in Appendix A) results in true-positive (EAD both in the simulation and emulation), false-negative (EAD in the simulation, but not in the emulation), false-positive (EAD in the emulation, but not in the simulation) and true-negative (no EAD both in the emulation and simulation). The emulations achieved 72.5% sensitivity (EAD cases correctly classified) and 94.9% specificity (non-EAD cases correctly classified), with an overall accuracy of 90.8% (total samples correctly classified). A substantial amount of wrongly classified APs showcase a notable proximity to the threshold of manifesting EADs. Figure 7 illustrates the distribution of RMSEs in the EAD APs between emulated and ground truth drugged APs. The average RMSE over all EAD APs was 14.5mV with 37.1mV being the maximum. Largest mismatches were located in phase 3 of the AP, in particular in emulated APs that did not fully repolarize." (Section 3.1.1)

      R I-(f) Figure 1 - I think a large number of readers will understand the mathematical notation describing inputs/outputs; that said, there may be a substantial number of readers who may find that hard to read (e.g. lab-based researchers, or simulation-based researchers not familiar with machine learning). At the same time, this is a very important part of the paper to explain what is done where, so I wonder whether using words to describe the inputs/outputs would not be more practical and easier to understand (e.g. ”drug-based conductance scaling factor” instead of ”s” ?). It’s just an idea - it needs to be tried to see if it wouldn’t make the figure too cluttered.

      We agree that the mathematical notation may be confusing to some readers. As a compromise between using verbose wording and mathematical notation, we introduced a legend in the lower right corner of the figure that shortly describes the notation in order to help with interpreting the figure.

      R I-(g) ”APs with a transmembrane potential difference of more than 10% of the amplitude between t = 0 and 1000 ms were excluded” - I’m not sure I understand what exactly you mean here - could you clarify?

      With this criterion, we try to discard data that is far away from fully repolarizing within the given time frame, which applies to 116 APs in data set #1 and 50 APs in data set #3. We added a small side note into the text:

      "APs with a transmembrane potential difference of more than 10% of the amplitude between t = 0 and 1000ms (indicative of an AP that is far away from full repolarization) were excluded." (Section 2.2.1)

      R I-(h) Speculation (for the future) - it looks like a tool like this could be equally well used to predict current traces, as well as action potentials. I wonder, would there be a likely benefit in feeding back the currents-traces predictions on the input of the AP predictor to provide additional information? Then again, this might be already encoded within the network - not sure.

      Although not possible with the chosen architecture (see also R I-(b)), it is worth thinking about an implementation in future works and to study differences to the current emulator.

      Entirely minor points:

      R I-(i) ”principle component analysis” → principal component analysis

      Fixed

      R I-(j) The paper will be probably typeset by elife anyway, but the figures are often quite far from their sections, with Results figures even overflowing into Discussion. This can be often fixed by using the !htb parameters (\begin{figure}[!htb]), or potentially by using ”\usepackage[section]{placeins}” and then ”\FloatBarrier” at the start and end of each section (or subsection) - this prevents floating objects from passing such barriers.

      Thank you for these helpful suggestions. We tried reducing the spacing between the figures and their references in the text, hopefully improving the reader’s experience.

      R I-(k) Alternans seems to be defined in Appendix A (as well as repo-/depolarization abnormalities), but is not really investigated. Or are you defining these just for the purpose of explaining what sorts of data were also included in the data?

      We defined alternans since this was an exclusion criterion for generating simulation data.

      Reviewer 2 - Recommendations

      R II-(a) Justification for methods selection: Explain the rationale behind important choices, such as the selection of specific parameters and algorithms.

      Thank you for this recommendation, we tried to increase transparency of our choices by introducing a separate data section that summarizes all data sets and their use cases in Section 2.2.1 and also collect many of the explanations there. Additionally we added an overview table (Table 1) of the utilized data.

      Author response table 1.

      Table 1: Summary of the data used in this study, along with their usage and the number of valid samples. Note that each AP is counted individually, also in cases of control/drug pairs.

      R II-(b) Interpretation of the evaluation results: After presenting the evaluation results, consider interpretations or insights into what the results mean for the performance of the emulator. Explain whether the emulator achieved the desired accuracy or compare it with other existing methods. In the revised version, we tried to further expand the discussion on possible applications of our emulator (Section 4.2). See also our response to C I-(a). To the best of our knowledge, there are currently no out-of-the-box methods available for directly comparing all experiments we considered in our work.

      Reviewer 3 - Recommendations

      R III-(a) In the introduction (Page 3) and then also in the 2.1 paragraph authors speak about the ”limit cycle”: Do you mean steady state conditions? In that case, it is more common to use steady state.

      When speaking about the limit cycle, we refer to what is also sometimes called the steady state, depending on the field of research and/or personal preference. We now mention both terms at the first occurence, but stick with the limit cycle terminology which can also be found in other works, see e.g. [Endresen and Skarland, 2000].

      R III-(b) On page 3, while comparing NN with GP emulators, I still don’t understand the key reason why NN can solve the discontinuous functions with more precision than GP.

      The potential problems in modeling sharp continuities using GPs is further explained in the referenced work [Ghosh et al., 2018] and further references therein:

      "Statistical emulators such as Gaussian processes are frequently used to reduce the computational cost of uncertainty quantification, but discontinuities render a standard Gaussian process emulation approach unsuitable as these emulators assume a smooth and continuous response to changes in parameter values [...] Applying GPs to model discontinuous functions is largely an open problem. Although many advances (see the discussion about non-stationarity in [Shahriari et al., 2016] and the references in there) have been made towards solving this problem, a common solution has not yet emerged. In the recent GP literature there are two specific streams of work that have been proposed for modelling non-stationary response surfaces including those with discontinuities. The first approach is based on designing nonstationary processes [Snoek et al., 2014] whereas the other approach attempts to divide the input space into separate regions and build separate GP models for each of the segmented regions. [...]"([Ghosh et al., 2018])

      We integrated a short segment of this explanation into Section 1.

      R III-(c) Why do authors prefer to use CARPentry and not directly openCARP? The use of CARPentry is purely a practical choice since the simulation pipeline was already set up. As we now point out however in Sec. 2.1 (Simulator), simulations can also be performed using any openly available ionic simulation tool, such as Myokit [Clerx et al., 2016], OpenCOR [Garny and Hunter, 2015] and openCARP [Plank et al., 2021]. We emphasized this in the text.

      "Note, that the simulations can also be performed using open-source software such as Myokit [Clerx et al., 2016], OpenCOR [Garny and Hunter, 2015] and openCARP [Plank et al., 2021]." (Section 2.1)

      R III-(d) In paragraph 2.1:

      (a) In this sentence: ”Various solver and sampling time steps were applied to generate APs and the biomarkers used in this study (see Appendix A)” this reviewer suggests putting the Appendix reference near “biomarkers”. In addition, a figure that shows the test of various solver vs. sampling time steps could be interesting and can be added to the Appendix as well.

      (b) Why did the authors set the relative difference below 5% for all biomarkers? Please give a reference to that choice. Instead, why choose 2% for the time step?

      1) We adjusted the reference to be closer to “biomarkers”. While we agree that further details on the influence of the sampling step would be of interest to some of the readers, we feel that it is far beyond the scope of this paper.

      2) There is no specific reference we can provide for the choice. Our goal was to reach 5% relative difference, which we surpassed by the chosen time steps of 0.01 ms (solver) and 0.05 ms (sampling), leading to only 2% difference. We rephrased the sentence in question to make this clear.

      "We considered the time steps with only 2% relative difference for all AP biomarkers (solver: 0.01ms; sampling: 0.05ms) to offer a sufficiently good approximation." (Section 2.1)

      R III-(e) In the caption of Figure 1 authors should include the reference for AP experimental data (are they from Orvos et al. 2019 as reported in the Experimental Data section?)

      We added the missing reference as requested. As correctly assumed, they are from [Orvos et al., 2019].

      R III-(f) Why do authors not use experimental data in the emulator development/training?

      For the supervised training of our NN emulator, we need to provide the maximum conductances of our chosen channels for each AP. While it would be beneficial to also include experimental data in the training to diversify the training data, the exact maximum conductances in our the considered retrospective experiments are not known. In the case such data would be available with low measurement uncertainty, it would be possible to include.

      R III-(g) What is TP used in the Appendix B? I could not find the acronymous explanation.

      We are sorry for the oversight, TP refers to the time-to-peak and is now described in Appendix A.

      R III-(h) Are there any reasons for only using ST and no S1? Maybe are the same?

      The global sensitivity analysis is further outlined in Appendix B, also showing S1 (first-order effects) and ST (variance of all interactions) together (Figure 11) [Herman and Usher, 2017] and their differences (e.g. in TP) Since S1 only captures first-order effects, it may fail to capture higher-order interactions between the maximum conductances, thus we favored ST.

      R III-(i) In Training Section Page 8. It is not clear why it is necessary to resample data. Can you motivate?

      The resampling part is motivated by exactly capturing the swift depolarization dynamics, whereas the output from CARPentry is uniformly sampled. This is now further highlighted in the text.

      "Then, the data were non-uniformly resampled from the original uniformly simulated APs, to emphasize the depolarization slope with a high accuracy while lowering the number of repolarization samples. For this purpose, we resamled the APs [...]" (Section 2.2.1)

      R III-(j) For the training of the neuronal network, the authors used the ADAM algorithm: have you tested any other algorithm?

      For training neural networks, ADAM has become the current de-facto standard and is certainly a robust choice for training our emulator. While there may exist slightly faster, or better-suited training algorithms, we witnessed (qualitative) convergence in the training (Equation (2)). We thus strongly believe that the training algorithm is not a limiting factor in our study.

      R III-(k) What is the amount of the drugs tested? Is the same dose reported in the description of the second data set or the values are only referring to experimental data? Moreover, it is not clear if in the description of experimental data, the authors are referring to newly acquired data (since they described in detail the protocol) or if they are obtained from Orvos et al. 2019 work.

      In all scenarios, we tested 5 different drugs (cisapride, dofetilide, sotalol, terfenadine, verapamil). We revised our previous presentation of the data available, and now try to give a concise overview over the utilized data (Section 2.2.1 and table 1) and drug comparison with the CiPA distributions (Table 5, former 4). Note that in the latter case, the available expected channel scaling factors by the CiPA distributions vary, but are now clearly shown in Table 5.

      R III-(l) In Figure 4, I will avoid the use of “control” in the legend since it is commonly associated with basal conditions and not with the drug administration.

      The terminology “control” in this context is in line with works from the CiPA initiative, e.g. [Li et al., 2017] and refers to the state of cell conditions before the drug wash-in. We added a minor note the first time we use the term control in the introduction to emphasize that we refer to the state of the cell before administering any drugs

      "To compute the drugged AP for given pharmacological parameters is a forward problem, while the corresponding inverse problem is to find pharmacological parameters for given control (before drug administration) and drugged AP." (Section 1)

      R III-(m) In Table 1 when you referred to Britton et al. 2017 work, I suggest adding also 10.1371/journal.pcbi.1002061.

      We added the suggested article as a reference.

      R III-(n) For the minimization problem, only data set #1 has been used. Have you tested data set #2?

      In the current scenario, we only tested the inverse problem for data set #2 (former #1). The main purpose for data set #3 (former #2), was to test the possibility to emulate EAD APs. Given the overall lower performance in comparison to data set #2 (former #1), we also expect deteriorated results in comparison to the existing inverse synthetic problem.

      R III-(o) In Figure 6 you should have the same x-axis (we could not see any points in the large time scale for many biomarkers). Why dVmMax is not uniformed distributed compared to the others? Can you comment on that?

      As suggested, we re-adjusted the x-range to show the center of distributions. Additionally, we denoted in each subplot the number of outliers which lie outside of the shown range. The error distribution on dVmMax exhibits a slightly off-center, left-tailed normal distribution, which we now describe a bit more in the revised text:

      "While the mismatches in phase 3 were simply a result of imperfect emulation, the mismatches in phase 0 were a result of the difficulty in matching the depolarization time exactly. [...] Likewise, the difficulty in exactly matching the depolarization time leads to elevated errors and more outliers in the biomarkers influenced by the depolarization phase (TP and dVmMax)," (Section 3.1.1)

      R III-(p) Page 14. Can the authors better clarify ”the average RMSE over all APs 13.6mV”: is it the mean for all histograms in Figure 7? (In Figure 5 is more evident the average RMSE).

      The average RMSE uses the same definition for Figures 5 and 7: It is the average over all the RMSEs for each pair of traces (simulated/emulated), though the amount of samples is much lower for the EAD data set and not normal distributed.

      R III-(q) In Table 4, the information on which drugs are considered should be added. For each channel, we added the names of the drugs for which respective data from the CiPA initiative were available.

      R III-(r) Pag. 18, second paragraph, there is a repetition of ”and”.

      Fixed

      R III-(s) The pair’s combination of scaling factors for simulating synthetic drugs reported in Table 2, can be associated with some effects of real drugs? In this case, I suggest including the information or justifying the choice.

      The scaling factors in Table 2 are used to create data set #3 (former #2), and is meant to provide several APs which expose EADs. This is described in more detail in the new data section, Section 2.2.1:

      "Data set #3: The motivation for creating data set #3 was to test the emulator on data of abnormal APs showing the repolarization abnormality EAD. This is considered a particularly relevant AP abnormality in pharmacological studies because of their role in the genesis of drug-induced ventricular arrhythmia’s [Weiss et al., 2010]. Drug data were created using ten synthetic drugs with the hERG channel and the Cav1.2 channel as targets. To this end, ten samples with pharmacological parameters for GKr and PCa (Table 2) were generated and the synthetic drugs were applied to the entire synthetic cardiomyocyte population by scaling GKr and PCa with the corresponding pharmacological parameter. Of the 1000 APs simulated, we discarded APs with a transmembrane potential difference of more than 10% of the amplitude between t = 0 and 1000ms (checked for the last AP), indicative of an AP that does not repolarize within 1000ms. This left us with 950 APs, 171 of which exhibit EAD (see Appendix C)." (Section 2.2.1)

      R III-(t) A general comment on the work is that the authors claim that their study highlights the potential of NN emulators as a powerful tool for increased efficiency in future quantitative systems pharmacology studies, but they wrote ”Larger inaccuracies were found in the inverse problem solutions on experimental data highlight inaccuracies in estimating the pharmacological parameters”: so, I was wondering how they can claim the robustness of NN use as a tool for more efficient computation in pharmacological studies.

      The discussed robustness directly refers to efficiently emulating steady-state/limit cycle APs from a set of maximum conductances (forward problem, Section 3.1.1). We extensively evaluated the algorithm and feel that given the low emulation RMSE of APs (< 1 mV), the statement is warranted. The inverse estimation, enabled through this rapid evaluation, performs well on synthetic data, but shows difficulties for experimental data. Note however that at this point there are multiple potential sources for these problems as highlighted in the Evaluation section (Section 4.1) and Table 5 (former 4) highlights the difference in accuracy of estimating per-channel maximum conductances, revealing a potentially large discrepancy. The emulator also offers future possibilities to incorporate additional informations in the forms of either priors, or more detailed measurements (e.g. calcium transients) and can be potentially improved to a point where also the inverse problem can be satisfactorily solved in experimental preparations, though further analysis will be required.

      References [Beck, 2017] Beck, A. (2017). First-order methods in optimization. SIAM.

      [Britton et al., 2013] Britton, O. J., Bueno-Orovio, A., Ammel, K. V., Lu, H. R., Towart, R., Gallacher, D. J., and Rodriguez, B. (2013). Experimentally calibrated population of models predicts and explains intersubject variability in cardiac cellular electrophysiology. Proceedings of the National Academy of Sciences, 110(23).

      [Chang et al., 2017] Chang, K. C., Dutta, S., Mirams, G. R., Beattie, K. A., Sheng, J., Tran, P. N., Wu, M., Wu, W. W., Colatsky, T., Strauss, D. G., and Li, Z. (2017). Uncertainty quantification reveals the importance of data variability and experimental design considerations for in silico proarrhythmia risk assessment. Frontiers in Physiology, 8.

      [Clerx et al., 2016] Clerx, M., Collins, P., de Lange, E., and Volders, P. G. A. (2016). Myokit: A simple interface to cardiac cellular electrophysiology. Progress in Biophysics and Molecular Biology, 120(1):100–114.

      [Endresen and Skarland, 2000] Endresen, L. and Skarland, N. (2000). Limit cycle oscillations in pacemaker cells. IEEE Transactions on Biomedical Engineering, 47(8):1134–1137.

      [Garny and Hunter, 2015] Garny, A. and Hunter, P. J. (2015). OpenCOR: a modular and interoperable approach to computational biology. Frontiers in Physiology, 6.

      [Gemmell et al., 2016] Gemmell, P., Burrage, K., Rodr´ıguez, B., and Quinn, T. A. (2016). Rabbit-specific computational modelling of ventricular cell electrophysiology: Using populations of models to explore variability in the response to ischemia. Progress in Biophysics and Molecular Biology, 121(2):169–184.

      [Ghosh et al., 2018] Ghosh, S., Gavaghan, D. J., and Mirams, G. R. (2018). Gaussian process emulation for discontinuous response surfaces with applications for cardiac electrophysiology models.

      [Herman and Usher, 2017] Herman, J. and Usher, W. (2017). SALib: An open-source python library for sensitivity analysis. J. Open Source Softw., 2(9):97.

      [Johnstone et al., 2016] Johnstone, R. H., Chang, E. T., Bardenet, R., de Boer, T. P., Gavaghan, D. J., Pathmanathan, P., Clayton, R. H., and Mirams, G. R. (2016). Uncertainty and variability in models of the cardiac action potential: Can we build trustworthy models? Journal of Molecular and Cellular Cardiology, 96:49–62.

      [Li et al., 2017] Li, Z., Dutta, S., Sheng, J., Tran, P. N., Wu, W., Chang, K., Mdluli, T., Strauss, D. G., and Colatsky, T. (2017). Improving the in silico assessment of proarrhythmia risk by combining hERG (human ether`a-go-go-related gene) channel–drug binding kinetics and multichannel pharmacology. Circulation: Arrhythmia and Electrophysiology, 10(2).

      [Muszkiewicz et al., 2016] Muszkiewicz, A., Britton, O. J., Gemmell, P., Passini, E., S´anchez, C., Zhou, X., Carusi, A., Quinn, T. A., Burrage, K., Bueno-Orovio, A., and Rodriguez, B. (2016). Variability in cardiac electrophysiology: Using experimentally-calibrated populations of models to move beyond the single virtual physiological human paradigm. Progress in Biophysics and Molecular Biology, 120(1):115–127.

      [Orvos et al., 2019] Orvos, P., Kohajda, Z., Szlov´ak, J., Gazdag, P., Arp´adffy-Lovas, T., T´oth, D., Geramipour, A.,´ T´alosi, L., Jost, N., Varr´o, A., and Vir´ag, L. (2019). Evaluation of possible proarrhythmic potency: Comparison of the effect of dofetilide, cisapride, sotalol, terfenadine, and verapamil on hERG and native iKr currents and on cardiac action potential. Toxicological Sciences, 168(2):365–380.

      [Paszke et al., 2019] Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Kopf, A., Yang, E., DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., Bai, J., and Chintala, S. (2019). PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc.

      [Plank et al., 2021] Plank, G., Loewe, A., Neic, A., Augustin, C., Huang, Y.-L., Gsell, M. A., Karabelas, E., Nothstein, M., Prassl, A. J., S´anchez, J., Seemann, G., and Vigmond, E. J. (2021). The openCARP simulation environment for cardiac electrophysiology. Computer Methods and Programs in Biomedicine, 208:106223.

      [Shahriari et al., 2016] Shahriari, B., Swersky, K., Wang, Z., Adams, R. P., and de Freitas, N. (2016). Taking the Human Out of the Loop: A Review of Bayesian Optimization. Proceedings of the IEEE, 104(1):148–175. Conference Name: Proceedings of the IEEE.

      [Snoek et al., 2014] Snoek, J., Swersky, K., Zemel, R., and Adams, R. (2014). Input Warping for Bayesian Optimization of Non-Stationary Functions. In Proceedings of the 31st International Conference on Machine Learning, pages 1674–1682. PMLR. ISSN: 1938-7228.

      [Weiss et al., 2010] Weiss, J. N., Garfinkel, A., Karagueuzian, H. S., Chen, P.-S., and Qu, Z. (2010). Early afterdepolarizations and cardiac arrhythmias. Heart Rhythm, 7(12):1891–1899.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      The study by Ghafari et al. addresses a question that is highly relevant for the field of attention as it connects structural differences in subcortical regions with oscillatory modulations during attention allocation. Using a combination of magnetoencephalography (MEG) and magnetic resonance imaging (MRI) data in human subjects, inter-individual differences in the lateralization of alpha oscillations are explained by asymmetry of subcortical brain regions. The results are important, and the strength of the evidence is convincing. Yet, clarifying the rationale, reporting the data in full, a more comprehensive analysis, and a more detailed discussion of the implications will strengthen the manuscript further.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors re-analysed the data of a previous study in order to investigate the relation between asymmetries of subcortical brain structures and the hemispheric lateralization of alpha oscillations during visual spatial attention. The visual spatial attention task crossed the factors of target load and distractor salience, which made it possible to also test the specificity of the relation of subcortical asymmetries to lateralized alpha oscillations for specific attentional load conditions. Asymmetry of globus pallidus, caudate nucleus, and thalamus explained inter-individual differences in attentional alpha modulation in the left versus right hemisphere. Multivariate regression analysis revealed that the explanatory potential of these regions' asymmetries varies as a function of target load and distractor salience.

      Strengths:

      The analysis pipeline is straightforward and follows in large parts what the authors have previously used in Mazzetti et al (2019). The authors use an interesting study design, which allows for testing of effects specific to different dimensions of attentional load (target load/distractor salience). The results are largely convincing and in part replicate what has previously been shown. The article is well-written and easy to follow.

      We thank the reviewer for their interest in our study.

      Weaknesses:

      While the article is interesting to read for researchers studying alpha oscillations in spatial attention, I am somewhat sceptical about whether this article is of high interest to a broader readership. Although I read the article with interest, the conceptual advance made here can be considered mostly incremental. As the authors describe, the present study's main advance is that it does not include reward associations (as in previous work) and includes different levels of attentional load. While these design features and the obtained results indeed improve our general understanding of how asymmetries of subcortical structures relate to lateralized alpha oscillations, the conceptual advance is somewhat limited.

      We thank the reviewer for their constructive comment. We’d like to highlight that this is the first study to show relationship between subcortical structures asymmetry with attention-modulated alpha oscillation that did not involve any reward-associations- which is the most studied role of basal ganglia. We also believe there is value is having a second study linking the asymmetry in volume of subcortical structures to the modulation of alpha oscillations as this surprising finding also have important clinical implications (see below). We edited the manuscript as below to explain the advances made in this study:

      Introduction (Line 112): “Our current findings broaden our understanding of how subcortical structures are involved in modulating alpha oscillations during top-down spatial attention, in the absence of any reward or value associations. “

      Discussion (Line 301): “It has also been shown that the spatial extent of pathological change in subcortical structures can predict cognitive changes in Parkinson’s Disease (43). […] Changes in neocortical oscillatory activity have also been observed in neurological disorders which mainly are known to affect subcortical structures. For example, individuals with Alzheimer's Disease demonstrate an increase in slow oscillatory activities and a decrease in higher frequency oscillations (45). Moreover, in patients with Parkinson’s Disease, the power of beta oscillations increases relatively to when they are dopamine-depleted compared with when they are on dopaminergic medication (46).”

      While the analysis of the relation of individual subcortical structures to alpha lateralization in different attentional load conditions is interesting, I am not convinced that the present analysis is suited to draw strong conclusions about the subcortical regions' specificity. For example, the Thalamus (Fig. 5) shows a significant negative beta estimate only in one condition (low-load target, non-salient distractor) but not in the other conditions. However, the actual specificity of the relation of thalamus asymmetry to lateralized alpha oscillations would require that the beta estimate for this one condition is significantly higher than the beta estimates for the other three conditions, which has not been tested as far as I understand.

      We thank the reviewer for this constructive comment. We agree with the reviewer that we should compare the beta value amongst the conditions. We therefore determined to better harness the multivariate nature of our analysis. Multivariate regression analysis allows one to test the null hypothesis that a given predictor does not contribute to all the dependent variables. A rejection of this hypothesis would suggest that lateralization of a given region of interest significantly predicts variability across all 4 of the task conditions, whereas failure to reject the null would imply that the predictive relationship holds only for that single condition. We tested this global null hypothesis using a MANOVA test and found the following which we have added to the manuscript:

      Results (Line 250): “To ascertain whether each predictor contributes to all conditions, we conducted statistical tests on the results of our MMR using the null hypothesis that a given regressor does not impact all dependent variables. We found that while, with marginal significancy, caudate nucleus can predict variability across all four of the task conditions (F(26,4) = 2.82, p-value = 0.046), the predictive relationships of thalamus (F(26,4) = 2.43, p-value = 0.073) with condition 1, and globus pallidus (F(26,4) = 2.29, p-value = 0.087) with conditions 2 and 3 hold only for these conditions. In sum, this demonstrates that when the task is easiest (condition 1), the thalamus is related to alpha modulation. When the task is most difficult (condition 4), the caudate nucleus relates to the alpha modulation, however, its contributions are substantial enough to predict outcomes across all conditions. For the conditions with medium difficulty (conditions 2 and 3) the globus pallidus is related to the alpha band modulation. “

      Method (Line 599): “To examine the specificity of each regressor for lateralized alpha in each condition, we statistically assessed the results of the MMR against the null hypothesis that a particular predictor does not contribute to all dependent variables, employing a MANOVA test in RStudio (version 2022.02.2) (80).”

      Discussion (Line 337): “Thalamus, Globus Pallidus, and Caudate nucleus play varying roles across different load conditions.”

      Discussion (Line 361): “Although these findings highlight the varying contributions of different regions, they do not imply a lack of evidence for correlations between these subcortical structures and other load conditions.”

      Discussion (Line 379): “Additionally, we refrained from directly comparing the contributions of subcortical structures to different conditions due to low statistical power. […] In future studies it would be interesting to design an experiment directly addressing which subcortical regions contribute to distractor and target load in terms of modulating the alpha band activity. In order to ensure sufficient statistical power for doing so possibly each factor needs to be addressed in different experiments.”

      Reviewer #3 (Public Review):

      Summary:

      In this study, Ghafari et al. explored the correlation between hemispheric asymmetry in the volume of various subcortical regions and lateralization of posterior alpha-band oscillations in a spatial attention task with varying cognitive demands. To this end, they combined structural MRI and task MEG to investigate the relationship between hemispheric differences in the volume of basal ganglia, thalamus, hippocampus, and amygdala and hemisphere-specific modulation of alpha-band power. The authors report that differences in the thalamus, caudate nucleus, and globus pallidus volume are linked to the attention-related changes in alpha band oscillations with differential correlations for different regions in different conditions of the design (depending on the salience of the distractor and/or the target).

      Strengths:

      The manuscript contributes to filling an important gap in current research on attention allocation which commonly focuses exclusively on cortical structures. Because it is not possible to reliably measure subcortical activity with non-invasive electrophysiological methods, they correlate volumetric measurements of the relevant subcortical regions with cortical measurements of alpha band power. Specifically, they build on their own previous finding showing a correlation between hemispheric asymmetry of basal ganglia volumes and alpha lateralization by assessing a task without an explicit reward component. Furthermore, the authors use differences in saliency and perceptual load to disentangle the individual contributions of the subcortical regions.

      We appreciate the reviewer’s interest in our study.

      Weaknesses:

      The theoretical bases of several aspects of the design and analyses remain unclear. Specifically, we missed statements in the introduction about why it is reasonable, from a theoretical perspective, to expect:

      (i) a link between volumetric measurements and task activity;

      We thank the reviewer for this constructive feedback. We have now addressed this concern in the revised manuscript.

      Discussion (Line 293): “It has been demonstrated that extensive navigation experience enlarges the size of right hippocampus (40). Furthermore, in terms of neurological disorders, it is well established that shrinkage (atrophy) in specific regions is a predictor of a number of neurological and psychiatric conditions including Parkinson’s disease, dementia, and Huntington’s disease. […] It has also been shown that the spatial extent of pathological change in subcortical structures can predict cognitive changes in Parkinson’s Disease (43). […] Changes in neocortical oscillatory activity have also been observed in neurological disorders which mainly are known to affect subcortical structures. For example, individuals with Alzheimer's Disease demonstrate an increase in slow oscillatory activities and a decrease in higher frequency oscillations (45). Moreover, in patients with Parkinson’s Disease, the power of beta oscillations increase relatively to when they are dopamine-depleted compared with when they are on dopaminergic medication (46). “

      (ii) a specific link with hemispheric asymmetry in subcortical structures (While focusing on hemispheric lateralization might circumvent the problem of differences in head size, it would be better to justify this focus theoretically, which requires for example a short review of evidence showing ipsilateral vs contralateral connections between the relevant subcortical and cortical structures);

      We thank the reviewer for this helpful comment that resulted in clarification of the manuscript. We addressed this issue in the revised manuscript; we also now have complemented the revised manuscript with papers directly investigating asymmetry of subcortical regions in relation to neurological disorders:

      Introduction (Line 102): “We utilized the hemispheric laterality of subcortical structures and alpha modulation to overcome issues related to individual variations in oscillatory power and head size.”

      Discussion (Line 314): “Employing hemispheric lateralization was motivated by the organizational characteristic of structural asymmetry in healthy brain (47). Additionally, considering the effects of aging (48) and neurodegenerative disorders, such as Alzheimer's Disease (49), on brain symmetry influenced this approach. Furthermore, computing lateralization indices for individuals addresses the challenge of accommodating variations in both head size and the power of oscillatory activity.”

      Discussion (Line 374): “Furthermore, in this study, our emphasis has been on assessing the size of subcortical structures. Future investigations could explore subcortical white matter connectivities and hemispheric asymmetries. This approach has previously been conducted on superior longitudinal fasciculus (SLF) (61,62) and holds potential for examining cortico-subcortical connectivity in the context of oscillatory asymmetries.”

      (iii) effects not only in basal ganglia and thalamus, but also hippocampus and amygdala (a justification of selection of all ROIs);

      We thank the reviewer for this comment. We assessed the hippocampus and amygdala because they are automatically segmented in the FIRST algorithm. As our analysis showed they did not show a relation to the modulation of alpha oscillations, these regions also provide a useful control for our approach. Therefore, we included all subcortical structures in the model and evaluated their predictive impact. This is now addressed in the revised manuscript.

      Method (Line 477): “FIRST is an automated model-based tool that runs a two-stage affine transformation to MNI152 space, to achieve a robust pre-alignment of thalamus, caudate nucleus, putamen, globus pallidus, hippocampus, amygdala, and nucleus accumbens based on individual’s T1-weighted MR images.”

      Method (Line 576): “The absence of a relationship between modulations of alpha oscillations and the hippocampus and amygdala was expected as these regions typically are not associated with the allocation of spatial attention and thus add validity to our approach. “

      (iv) effects that depend on distractor versus target salience (a rationale for the specific two-factor design is missing);

      We thank the reviewer for this comment that helped us clarify the manuscript. The two-factor design is to investigate how allocation of attentional resources specifically relates to mechanisms of excitability and suppression mechanism. For this reason, both the salience of the distractor (associated with suppression) and the perceptual load of the target (associated with excitability) had to be manipulated. We clarified the rationale in the revised version as below:

      Introduction (Line 96): “We analyzed MEG and structural data from a previous study (27), in which spatial cues guided participants to covertly attend to one stimulus (target) and ignore the other (distractor). To investigate the relationship between the allocation of attentional resources and mechanisms of neural excitability and suppression, the target load and the visual saliency of the distractor were manipulated using a noise mask. This load/salience manipulation resulted in four conditions that affect the attentional demands of target and distractor.”

      (v) effects in the absence of reward (why it is important to show that the effect seen previously in a task with reward is seen also in a task without reward);

      We thank the reviewer for this clarification comment. We addressed this question in introduction and discussion as below:

      Introduction (Line 107): “By examining their role in a task without explicit reward, we aim to elucidate the generalizability of the contributions of subcortical structures to spatial attention modulation. Such a finding would implicate a role for the basal ganglia in cognition beyond the well-studied realm of the estimation of choice values (33). Specifically, in a prior study (28), we observed that the contributions of the basal ganglia were most pronounced when the items in question were associated with a reward. Our current findings broaden our understanding of how subcortical structures are involved in modulating alpha oscillations during top-down spatial attention, in the absence of any reward or value associations. “

      Discussion (Line 333): “This convergence of results not only corroborates the validity and consistency of our findings but also extends the empirical foundation supporting the predictive role of the asymmetry of globus pallidus in modulating alpha oscillations beyond reward valence and to the context of attention.”

      (vi) effects on rapid frequency tagging.

      We thank the reviewer for this constructive comment. We have now included this analysis and added the results to the revised manuscript.

      Results (Line 224): “It is worth noting that neither the behavioural nor the rapid invisible frequency tagging (RIFT) measures showed significant relationships with LVs and HLM() (Supplementary material, Figure 1 and Table 3).”

      Discussion (Line 396): “We did not find any association between the power of RIFT signal and the size asymmetry of subcortical structures. Since to Bayes factors were less than 0.1, we conclude that our RIFT null findings are robust, suggesting a dissociation between how alpha oscillations and neuronal excitability indexed by RIFT relate to subcortical structures.”

      Method (Line 548): “We computed the modulation index (MI) for rapid invisible frequency tagging (RIFT) by averaging the power of the signal in sensors on the right when attention was directed to the right compared to when it was directed to the left. This calculation was also performed for sensors on the left. Consequently, we identified the top 5 sensors on each side with the highest MI as the Region of Interest (ROI). Utilizing the sensors within the ROI, we computed hemispheric lateralization modulation (HLM) of RIFT by summing the average MI(RIFT) of the right sensors and the average MI(RIFT) of the left sensors, obtaining one HLM(RIFT) value for each participant. For a more comprehensive analysis, refer to reference (24).”

      Supplementary Materials (Line 839): “Figure 1. Lateralization volume of thalamus, caudate nucleus and globus pallidus in relation to hemispheric lateralization modulation of rapid invisible frequency tagging (HLM(RIFT)) on the right and behavioural asymmetry on the left. A and E, The beta coefficients for the best model (having three regressors) associated with a generalized linear model (GLM) where lateralization volume (LV) values were defined as explanatory variables for HLM(RIFT) (A) and behavioural asymmetry (E). Error bars indicate standard errors of mean (SEM). B and F, Partial regression plot showing the association between LVTh and HLM(RIFT) (B, p-value = 0.59) and behavioural asymmetry (F, p-value = 0.38) while controlling for LVGP and LVCN. C and G, Partial regression plot showing the association between LVGP and HLM(RIFT) (C, p-value = 0.16) and behavioural asymmetry (G, p-value = 0.80) while controlling for LVTh and LVCN . D and H, Partial regression plot showing the association between LVCN and HLM(RIFT) (D, p-value = 0.53) and behavioural asymmetry (H, p-value = 0.74) while controlling for LVTh and LVGP. Negative (or positive) LVs indices denote greater left (or right) volume for a given substructure; similarly negative HLM(RIFT) values indicate stronger modulation of RIFT power in the left compared with the right hemisphere, and vice versa; positive behavioural asymmetry value shows higher accuracy when the target was on the right as compared with left, and vice versa for negative behavioural asymmetry values. The dotted curves in B, C, D, F, G, and H indicate 95% confidence bounds for the regression line fitted on the plot in red.

      Author response image 1.

      Second, the results are not fully reported. The model space and the results from the model comparison are omitted. Behavioral data and rapid frequency tagging results are not shown. Without having access to the data or the results of the analyses, the reader cannot evaluate whether the null effect corresponds to the absence of evidence or (as claimed in the discussion) evidence of absence.

      We thank the reviewer for this constructive suggestion. In the revised manuscript, we incorporated the model space, model comparisons, BIC values from the models, behavioral and rapid frequency tagging analysis methods, and their respective results. Additionally, we computed Bayes factors for our null findings to enhance the interpretability of our results.

      Results (Line 199): “This model predicted the HLM(α) values significantly in the GLM (F3,29 = 7.4824, p = 0.0007, adjusted R2 = 0.376) as compared with an intercept-only null model (Figure 4A).”

      Although, the beta estimate of LVGP only showed a positive trend, removing it from the regression resulted in worse models (AIC and BIC tables in supplementary material).

      Supplementary materials (Line 827): “Table 1. Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) values for all possible combinations of regressors (Lateralized Volume of subcortical structures). The selected model, with lowest AIC, is marked in green.

      Author response table 1.

      Author response table 2.

      Author response table 3.

      Bayes factors for correlation between hemispheric laterality of subcortical structures with hemispheric lateralization modulation of rapid invisible frequency tagging (HLM(RIFT)) and with behavioural asymmetry (BA). The Pearson correlation between each subcortical structure with HLM(RIFT) and behavioural asymmetry was calculated. The likelihood of the data under the alternative hypothesis (the evidence of correlation) were subsequently compared to the likelihood under null hypothesis (absence of correlation), given the data. As it is demonstrated in the table, all Bayes factors were below or very close to 1 indicating evidence for the null hypothesis.

      For the results of frequency tagging signal, we have now included this analysis and added the results to the revised manuscript. We refer the reviewer to our response to the weakness (vi) from reviewer #3.

      Third, it remains unclear whether the MMS is the best approach to analyzing effects as a function of target and distractor salience. To address the question of whether the effects of subcortical volumes on alpha lateralization vary with task demands (which we assume is the primary research question of interest, given the factorial design), we would like to evaluate some sort of omnibus interaction effect, e.g., by having target and distractor saliency interact with the subcortical volume factors to predict alpha lateralization. Without such analyses, the results are very hard to interpret. What are the implications of finding the differential effects of the different volumes for the different task conditions without directly assessing the effect of the task manipulation? Moreover, the report would benefit from a further breakdown of the effects into simple effects on unattended and attended alpha, to evaluate whether effects as a function of distractor (vs target) salience are indeed accompanied by effects on unattended (vs attended) alpha.

      The reviewer is correct that we did not directly compare between task conditions when we assessed the predictive relationship between basal ganglia lateralization and alpha lateralization. We opted for the multivariate regression approach as this allowed us to simultaneously model the predictive relationship between our continuous predictors and HLM alpha in each condition, allowing us to be most efficient with our level of statistical power (N=33). Indeed, directly comparing between task conditions within one model would result in an extra 16 regressors (1 (intercept) + 4-1 to model the difference between conditions + 3 to model the regressors + 3 x 3 to model each region x task condition interaction). This approach would be underpowered given our sample size, and the ensuing results are likely to be unreliable.

      However, we statistically analysed our regression results. Multivariate regression analysis allows one to test the null hypothesis that a given predictor does not contribute to all the dependent variables. A rejection of this hypothesis would suggest that lateralization of a given region of interest significantly predicts variability across all 4 of the task conditions, whereas failure to reject the null would imply that the predictive relationship holds only for that single condition. We tested this global null hypothesis using a MANOVA test and reported the findings in response to weakness two from reviewer #1.

      Discussion (Line 384): “In future studies it would be interesting to design an experiment directly addressing which subcortical regions contribute to distractor and target load in terms of modulating the alpha band activity. In order to ensure sufficient statistical power for doing so possibly each factor needs to be addressed in different experiments. “

      The fourth concern is that the discussion section is not quite ready to help the reader appreciate the implications of key aspects of the findings. What are the implications for our understanding of the roles of different subcortical structures in the various psychological component processes of spatial attention? Why does the volumetric asymmetry of different subcortical structures have diametrically opposite effects on alpha lateralization? Instead, the discussion section highlights that the different subcortical structures are connected in circuits: "Globus pallidus also has wide projections to the thalamus and can thereby impact the dorsal attentional networks by modulating prefrontal activities." If this is true, then why does the effect of the GP dissociate from that of the thalamus? Also, what is it about the current behavioural paradigm that makes the behavioral readout insensitive to variation in subcortical volume (or alpha lateralization?)?

      We thank the reviewer for this feedback. These are indeed all good points, and we hope that our findings will inspire further research to address these issues. In the revised manuscript we now write:

      Discussion (Line 349): “The opposite effect of the globus pallidus compared to the thalamus is striking, and possibly explained but the globus pallidus containing GABAergic interneurons. Thus the inhibitory nature of the globus pallidus projections to thalamus could explain why they are related to the alpha modulation in different manners (57).”

      Discussion (Line 379): “Moreover, the current study faced methodological constraints, limiting the analysis to the entire thalamus. […] . It would be of great interest to conduct further investigations to quantify the distinct impacts of individual thalamic nuclei on the association between subcortical structures and the modulation of oscillatory activity.“

      Discussion (Line 388): “Moreover, our failure to identify a relationship between the lateralized volume of subcortical structures and behavioural measures should be addressed in studies that are better designed to capture performance asymmetries (63). Individual preferences toward one hemifield, which were not addressed in the current study design, could potentially strengthen the power to detect correlations between structural variations in the subcortical structures and behavioural measures.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Minor comment:

      Between-subject correlation/regression analyses always rely on the assumption that the underlying dependent measures are reliable. While the reliability of asymmetries of subcortical structures can be assumed, the reliability of lateralized alpha oscillations during spatial attention can be questioned. It would be helpful if the authors could test the reliability of alpha lateralization, for instance by calculating HLM(a) in the first and second half of the experiment and correlating the resulting HLM(a) values (split-half reliability).

      We appreciate the reviewer for their insightful comment. Acknowledging that the between-subject regression relies on the reliability of alpha lateralization. Nonetheless, a previous study has demonstrated consistent results regarding HLM(α). We have further elaborated on these aspects in the discussion section:

      Discussion (Line 328): “Furthermore, our regression analysis outcomes align with the findings of Mazzetti et al. (28) underscoring the significant predictive influence exerted by the lateralized volume of globus pallidus on the modulation of hemispheric lateralization in alpha oscillations during spatial attention tasks. This convergence of results not only corroborates the validity and consistency of our findings but also extends the empirical foundation supporting the predictive role of the asymmetry of globus pallidus in modulating alpha oscillations within the context of attention.”

      Reviewer #3 (Recommendations For The Authors):

      We recommend that a revised version of the manuscript

      • Clarifies the theoretical basis for the 6 key design & analysis choices that we have outlined above;

      We thank the reviewer for their precision. We addressed the concerns outlined above in the previous section.

      • Also clarifies the task description (perhaps referring to target and distractor salience instead of target load versus distractor salience might help);

      Thank you for this constructive comment. We used the terms ‘load’ for target and ‘salience’ for distractor because the noise manipulation of the faces reduces the salience of the image which results in distractors being less distractive (easier) but targets being more perceptually loaded (harder). The explanation of these terms is made clear in the revised manuscript.

      Method (Line 447): “Over trials, the perceptual load of targets was manipulated using a noise mask; noisy targets are harder to detect than clear targets and therefore incur greater perceptual load in their detection. The saliency of distractor stimuli was also manipulated using a noise mask; noisy distractor stimuli are less salient than clear distractors and therefore less disruptive to performance on the detection task. The noise mask was created by randomly swapping 50% of the stimulus pixels (Figure 1B). This manipulation resulted in four target-load/distractor-saliency conditions: (1) target: low load, distractor: low saliency (i.e., clear target, noisy distractor), (2) target: high load, distractor: low saliency (i.e., noisy target, noisy distractor), (3) target: low load, distractor: high saliency (i.e., clear target, clear distractor), (4) target: high load, distractor: high saliency (i.e., noisy target, clear distractor) (Figure 1B and C).”

      • Fully reports all the data, including those of the model comparisons, the behavioural results, and the rapid frequency tagging results;

      We thank the reviewer for this constructive comment. We refer the reviewer to our response to second comment and comment (vi) from reviewer #3.

      • Reports interaction effects to directly test the modulating role of task demands in the link between volume and alpha, and break down the alpha lateralization indices into their simple effects on the ipsilateral and contralateral hemispheres;

      task demands have been addressed in response to in response to weakness two from reviewer #1.

      Regarding the second part of the comment, in our study, to compare the lateralized modulation of alpha oscillations between the right and left hemispheres, we computed hemispheric lateralization modulation. This involved dividing trials into attention right and attention left. Subsequently, we calculated the lateralization index separately for sensors on the right and left. Specifically, this entailed computing ipsilateral – contralateral for sensors on the right and contralateral – ipsilateral for sensors on the left side of the brain. We addressed this concern in methods section as below:

      Method (Line 537): “As MI(α) consistently represents power of alpha in attention right versus attention left conditions, it entails the comparison between ipsilateral and contralateral alpha modulation power for sensors located on the right side of the head. The same comparison applies inversely for sensors situated on the left side of the brain.”

      • Clarifies in the discussion section the specific implications of the results for our understanding of the link between distinct subcortical structures and distinct component processes of spatial attention.

      We thank the reviewer for their constructive comment. This point is addressed in response to the fourth concern of reviewer #3.

      More detailed specific recommendations are provided below:

      • Line 40ff: In this paragraph, the theoretical framework concerning the function of the subcortical regions of interest is described. Here, the authors jump back and forth between the role of the basal ganglia and the role of the thalamus. For clarity, we would advise to describe the functions of these two structures one after the other. And include a justification for assessing the hippocampus and the amygdala.

      We appreciate the reviewer’s preciseness in this comment. We put the description of these structures one after the other in the revised manuscript as below:

      Introduction (Line 44): “For instance, it has been shown that the pulvinar plays an important role in the modulation of neocortical alpha oscillations associated with the allocation of attention (9). Studies in rats and non-human primates have shown that both the thalamus and superior colliculus, are involved in the control of spatial attention by contributing to the regulation of neocortical activity (9-11). Notably, when the largest nucleus of the thalamus, the pulvinar, was inactivated after muscimol infusion, the monkey’s ability to detect colour changes in attended stimuli was lowered. This behavioral deficit occurred when the target was in the receptive field of V4 neurons that were connected to lesioned pulvinar (12). The basal ganglia play a role in different aspects of cognitive control, encompassing attention (13,14), behavioural output (15), and conscious perception (16). Moreover, the basal ganglia contribute to visuospatial attention by linking with cortical regions like the prefrontal cortex via the thalamus.”

      Justification for assessing the hippocampus and the amygdala has been addressed in response to weakness (iii) from reviewer #3.

      • The authors mention they defined symmetric clusters of 5 sensors in each hemisphere that showed the highest modulation, but it is not clear how this number of sensors was determined a priori.

      We thank the reviewer for their comment. We edited the revised manuscript as below:

      Method (Line 536): “Ten sensors were selected to ensure sufficient coverage of the region exhibiting alpha modulation as judged from prior work (62).”

      • In line 141, the abbreviation HLM is first mentioned but the concept of "hemispheric lateralization modulation of alpha power" is only mentioned in the following section. For the ease of the reader, the abbreviation could be mentioned together with this concept at the beginning of this paragraph.

      We thank the reviewer for the attention. In the revised manuscript HLM() is now mentioned with its concept.

      Results (Line 153): “Next, we computed the hemispheric lateralization modulation of alpha power (HLM()) in each individual.”

      • In line 188 of the results section, it is mentioned that the table including the AIC values for model comparisons is in the supplementary material, however, we could not locate this table.

      We thank the reviewer for their constructive feedback. The supplementary materials were uploaded in a separate file, and it must not have been available to the reviewers. We have now added the supplementary materials to the end of the manuscript for convenience.

      • Figure 4 is missing the panel headers A, B, C, and D.

      We thank the reviewer for their precision. This figure is now fixed.

      Author response image 2.

      • In lines 205 and 206, behavioral and rapid frequency tagging analysis are mentioned. For the behavioral analysis, the method is described, but no results are provided. For the rapid frequency tagging, neither the methods nor the results are described. To evaluate the strength of this (non)-evidence, we would advise to elaborate on these analysis steps and report the results in the supplementary material.

      We thank the reviewer for this constructive comment. A brief explanation of the analysis method of rapid frequency tagging signal is added to the revised manuscript.

      Method (Line 548): “We computed the modulation index (MI) for rapid invisible frequency tagging (RIFT) by averaging the power of the signal in sensors on the right when attention was directed to the right compared to when it was directed to the left. This calculation was also performed for sensors on the left. Consequently, we identified the top 5 sensors on each side with the highest MI as the Region of Interest (ROI). Utilizing the sensors within the ROI, we computed hemispheric lateralization modulation (HLM) of RIFT by summing the average MI(RIFT) of the right sensors and the average MI(RIFT) of the left sensors, obtaining one HLM(RIFT) value for each participant. For a more comprehensive analysis, refer to reference (24).” For a more detailed answer, we refer the reviewer to the second comment from reviewer #3.

      • For the paragraph starting at line 209, we would recommend referring to Figure 1.

      We thank the reviewer for their suggestion. This paragraph is now referring to Figure 1.

      Results (Line 229): “To relate load and salience conditions of the task to the relationship between subcortical structures and the alpha activity, we combined low-load or high-load targets with high-saliency or low-saliency distractors to manipulate the perceptual load appointed to each trial (Method section, Figure 1). “

      • Figure 5 as well as the report of the beta weights in this section shows a difference in the direction of the effect for the thalamus compared to the globus pallidus and caudate nucleus which is not discussed in this section.

      We thank the reviewer for bringing this important point to our attention. We addressed this comment in the discussion section as below:

      Discussion (Line 349): “The opposite effect of the globus pallidus compared to the thalamus is striking, and possibly explained by the globus pallidus containing GABAergic interneurons. Thus the inhibitory nature of the globus pallidus projections to thalamus could explain why they are related to the alpha modulation in different manners (54).”

      Discussion (Line 379): “Moreover, the current study faced methodological constraints, limiting the analysis to the entire thalamus. […] It would be of great interest to conduct further investigations to quantify the distinct impacts of individual thalamic nuclei on the association between subcortical structures and the modulation of oscillatory activity.“

      • Comment 2 on line 80 is addressed in the paragraph following 264 by describing volumetric changes in basal ganglia in neurodegenerative disorders such as PD or Huntington's. Still, the link of how a decrease in volume in this region could be causally linked to changes in alpha-band power could be better supported.

      We thank the reviewer for their constructive feedback. We are here highlighting the significant correlation between subcortical structures and changes in attention modulated alpha oscillation. We added a few more references to the discussion supporting the relationship between size and function in relation to neurological disorders. We also edited the manuscript to make this point clearer as below:

      Introduction (Line 113): “Our current findings broaden our understanding of how subcortical structures are involved in modulating alpha oscillations during top-down spatial attention, independent of any reward or value associations. “

      Discussion (Line 305): “Changes in neocortical oscillatory activity have also been observed in neurological disorders which mainly are known to affect subcortical structures. For example, individuals with Alzheimer's Disease demonstrate an increase in slow oscillatory activities and a decrease in higher frequency oscillations (42). Moreover, in patients with Parkinson’s Disease, the power of beta oscillations increases relatively to when they are dopamine-depleted compared with when they are on dopaminergic medication (43). “

      • Related to the previous comment on behavioral and rapid frequency tagging results, these are difficult to evaluate without mention of the methods and/or results.

      We thank the reviewer for this comment. We refer the reviewer to our response to the second comment from reviewer #3.

      • The authors show differential effects of target load and distractor saliency; however, we missed the description of how these two variables differ conceptually as they are both described as contributing to task difficulty and it is not described why we would expect differential effects for these concepts (or in other words, how the authors explain the differential effects).

      We thank the reviewer for their comment. Directly comparing between task conditions within one model would result in an extra 16 regressors (1 (intercept) + 4-1 to model the difference between conditions + 3 to model the regressors + 3 x 3 to model each region x task condition interaction). Give our sample size, this study is underpowered to directly compare alpha lateralisation in contralateral versus ipsilateral conditions. For a more detailed answer please refer to our response to weakness two from reviewer #1.

      • Line 364ff: Based on the description of the experimental design, it is not clear to us whether participants only had to report on the change in gaze for the stimulus in the cued hemifield.

      We thank the reviewer for this comment, which prompted us to clarify the experimental design as below:

      Method (Line 440): “Then followed a 1000 ms response interval where participants were asked to respond with their right or left index finger whether the gaze direction of the cued face shifted left or right.”

      • Line 47ff: As mentioned above, the AIC table is not included. Further, as it is mentioned that BIC values led to similar results (indicating that they are not identical), it would be valuable to report both AIC and BIC values.

      We thank the reviewer for their constructive feedback. The supplementary materials were uploaded in a separate file, and it must not have been available to the reviewers. We have now added the BIC values and attached the supplementary materials to the end of the manuscript for convenience.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary

      This article by Zhai et al, investigates sterol transport in bacteria. Synthesis of sterols is rare in bacteria but occurs in some, such as M capsulatus where the sterols are found primarily in the outer membrane. In a previous paper the authors discovered an operon consisting of five genes, with two of these genes encoding demethylases involved in sterol demethylation. In this manuscript, the authors set out to investigate the functions of the other three genes in the operon. Interestingly, through a bioinformatic analysis, they show that they are an inner membrane transporter of the RND family, a periplasmic binding protein, and an outer membrane-associated protein, all potentially involved with lipid transport, so providing a means of transporting the lipids to the outer membrane. These proteins are then extensively investigated through lipid pulldowns, binding analysis on all three, and X-ray crystallography and docking of the latter two.

      Strengths

      The lipid pulldowns and associated MST binding analysis are convincing, clearly showing that sterols are able to bind to these proteins. The structures of BstB and BstC are high resolution with excellent maps that allow docking studies to be carried out. These structures are distinct from sterol-binding proteins in eukaryotes.

      We thank the reviewer for their favorable impression of this work.

      Weaknesses

      While the docking and molecular dynamics studies are consistent with the binding of sterols to BstB and BstC, this is not backed up particularly well. The MST results of mutants in the binding pocket of BstB have relatively little effect, and while I agree with the authors this may be because of the extensive hydrophobic interactions that the ligand makes with the protein, it is difficult to make any firm conclusions about binding.

      We agree with the reviewer that at this point, there is no experimental evidence to define the sterol binding site in BstB. While in the manuscript we allude to the extensive hydrophobic interactions as being especially stabilizing and difficult to eliminate with one or two mutations, we are now also aware that hydrogen-bonding interactions with the polar head of the sterols are quite important (see data on BstC, where disruption of that interaction significantly reduces the equilibrium affinity for sterols). Our MD simulations show that at least 3 protein amino acids can participate in H-bonding with the sterols. Moreover, recent work from our lab show that even ligand site waters can extend an H-bonding network around the polar head of the lipid (Zhai et al., ChemBioChem 2023, 24, e202300156), thereby enabling H-bonding with amino acids that are further away from the ligand site. It is therefore difficult to predict which mutations will sufficiently destabilize the binding. While this question is one we will tackle in future studies focused on obtaining high-resolution substrate-bound structures of BstB or homologs, the findings reported here are still relevant and timely, and we posit will spur the discovery of functional homologs, including some in organisms that are more tractable.

      The authors also discuss the possibility of a secondary binding site in BstB based on a slight cavity in domain B next to a flexible loop. This is not backed up in any way and seems unlikely.

      The reviewer is correct in that the evidence for this second binding site weak. While the crystallographic structure shows a highly hydrophobic region and the binding studies suggests cooperativity exists in the binding of the 4methylsterol substrate, the docking studies do not strongly support binding at that site. As such, we have clarified in the manuscript that a second hydrophobic cavity is observed, but that its role in ligand interaction remains unexplored.

      Reviewer #2 (Public Review):

      Summary:

      In eukaryotes, sterols are crucial for signaling and regulating membrane fluidity, however, the mechanism governing cholesterol production and transport across the cell membrane in bacteria remains enigmatic. The manuscript by Zhai et al. sheds light on this topic by uncovering three potential cholesterol transport proteins. Through comprehensive bioinformatics analysis, the authors identified three genes bstA, bstB, and bstC encoding proteins which share homology with transporters, periplasmic binding proteins, and periplasmic components superfamily, respectively. Furthermore, the authors confirmed the specific interaction between these three proteins and C-4 methylated sterols and determined the structures of BstB and BstC. Combining these structural insights with molecular dynamics simulation, they postulated several plausible substrate binding sites within each protein.

      Strengths:

      The authors have identified 3 proteins that seem likely to be involved in sterol transport between the inner and outer membrane. The structures are of high quality, and the sterol binding experiments support a role for these proteins in sterol transport.

      We thank the reviewer for this positive view of our work.

      Weaknesses:

      While the author's model is very plausible, direct evidence for a role of BstABC in transport, or that the 3 proteins function together in a single pathway, is limited.

      The reviewer is correct that we were unable to demonstrate that the three proteins work together to transport 4methylsterols. This is not for lack of trying. We first attempted gene deletion studies, and as mentioned in the manuscript (with more details now provided in the experimental section), this appeared to be lethal. We then attempted in vitro exchange experiments, in which the proteins would be used to transfer sterols from sterol-loaded “heavy” liposomes to a sterol-free “light” liposomes – such exchange assays are frequently performed with eukaryotic sterol transporters (see Chung et al., Science 2015, https://doi.org/10.1126/science.aab1370). These assays were not successful because 1) sterols incorporated poorly into liposomes made with E. coli polar lipids and yielded leaky liposomes; 2) use of liposomes prepared with the TLE of M. capsulatus proved more stable, but no appreciable exchange was observed; we reasoned that this might be due to the absence of an energy source for BstA, the RND component for which we have expressed and purified only the soluble periplasmic domain. Given the technical difficulty of these in vitro transport experiments, we will continue to pursue in vivo demonstration of function as new homologs are identified.

      Reviewer #3 (Public Review):

      Summary:

      The work in this manuscript builds on prior efforts by this team to understand how sterols are biosynthesized and utilized in bacteria. The study reports a new function for three genes encoded near sterol biosynthesis enzymes, suggesting the resulting proteins function as a sterol transport system. Biochemical and structural characterization of the two soluble components of the pathway establishes that both proteins can bind sterols, with a preference for 4methylated derivatives. High-resolution x-ray structures of the apoproteins reveal hydrophobic cavities of the appropriate size to accommodate these substrates. Docking and molecular dynamics simulations confirm this observation and provide specific insights into residues involved in substrate binding.

      Strengths:

      The manuscript is comprehensive and well-written. The annotation of a new function in a set of proteins related to bacterial sterol usage is exciting and likely to enable further study of this phenomenon - which is currently not well understood. The work also has implications for improving our understanding of lipid usage in general among bacterial organisms.

      We thank the reviewer for this synopsis of our work.

      Weaknesses:

      The authors might consider moving some of the bioinformatics figures to the main text, given how much space is devoted to this topic in the results section.

      We have taken this advice and moved Figure S1 to the main manuscript.

      Reviewer #1 (Recommendations For The Authors):

      1. In the analysis of the MST data, the authors quote Hill coefficients. How reliable are these numbers? For BstB, for instance, it seems unlikely that more than one molecule would bind. Can the analysis be done without needing to include Hill coefficients?

      We used fits that did and did not invoke cooperativity – see below. We are certain that both BstA and BstB are better fit with cooperativity invoked.

      Author response image 1.

      1. In looking at the maps associated with the structures, which were included in the review package, I see that two citric acid molecules fit beautifully into the density where currently PEG has been modelled. This needs to be fixed and some comments may be appropriate in the manuscript.

      We thank the reviewer for calling our attention to this. Citric acid has now been added to the model, and we reason that these are present in the structure because citric acid was used in the crystallization condition. The revised model is now present in the PDB.

      1. It is not necessary to show the two molecules in the asymmetric unit in Figure 4 given that it is not a dimer. This doesn't add anything to the manuscript.

      We now show a single molecule of BstC in Figure 4 (now Figure 5).

      1. I wouldn't consider the loops shown in Figure S4 as disordered. They have slightly higher B-values but are not completely mobile.

      We did not refer to these loops as disordered. In the text, we say they “exhibit poor electron densities, suggesting conformational sampling of more than one state (Fig. S4A).”

      Reviewer #2 (Recommendations For The Authors):

      pg 7, "hinting at an astounding distinction": I might suggest a word other than astounding that conveys how statistically unlikely, unusual, etc. this result is.

      Thank you – we have removed “astounding”.

      pg 7, paragraph 2: Here the authors show that in the SSN analysis, BstB proteins cluster separately and suggest this implies a distinction in function. However, they also show that PhnD homologs do not cluster separately (distributed across multiple clusters), yet presumably have similar functions. I am not familiar with SSN, but it seems to me that the second statement about PhnD implies that the first statement about BstB might not be valid, i.e., if PhnD doesn't cluster based on function, on what basis can we conclude that BstB does? On what basis does clustering occur in the SSN analysis? Might it be driven by things other than function? This comment also concerns the final paragraph of this section.

      The reviewer is correct in that PhnD homologs occupy separate clusters of the SSN. Many of these homologs were crystallized with phosphate-like compounds, but it is possible that they have non-overlapping substrate scopes and are therefore functionally distinct. As for the basis of clustering, the SSN is fully sequence-based. What has been observed is that proteins with highly similar sequences can have similar functions – but this is not always true.

      pg 8, paragraph 1: The authors suggest that BstABC may be essential. This is probably not a critical claim and it might be simplest to just remove it, but if it is mentioned, the authors should probably explain what was attempted that failed, so a reader can assess the strength of the evidence supporting essentiality. For example, I don't see anything in the methods about genetic manipulations of M. capsulatus, so currently, this falls within the realm of "Data not shown".

      We have provided additional information about the experimental techniques used to do this. This statement was included so that it is understood that the reason for the experimental failure is unlikely to be technical in nature, as we have successfully deleted some sterol related genes while others remain intractable.

      Fig. 2A: It is unclear to me what is being plotted here, perhaps more experimental detail is required in the form of labels and/or legend. Is this a quantification of each sterol in each fraction separated by GC? There are essentially no methods provided for the GC-MS experiments. A reference is provided, but I think providing detailed methods for these specific experiments will provide a higher degree of scientific rigor. I am not sure what is standard for GCMS, but perhaps showing spectra in the supplement that establish the identity of the bound molecules as species I and II would be appropriate?

      Additional experimental details have been provided and the figure legend changed to be more clear. Moreover, we now clearly state that the chromatograms shown were used to identify lipids due to retention times for spectra that were previously published in Wei et al., 2016.

      pg 10-11, comparison with PhnD structure: Perhaps it is worth mentioning a 3rd possible explanation for the relative opening/closing of the cleft is simply crystal packing? I don't think it necessarily has to imply anything about a difference in function. Also, the focus seems to be on this pairwise comparison, but perhaps more insights could be gleaned from an analysis that included a wider range of homologs, especially if any are thought to bind hydrophobic substrates.

      This could be true, and we have included a statement to that effect. We are unaware of homologs shown to bind to large, hydrophobic molecules.

      I think that BstB is shown upside-down in sup movies relative to other figures. If it isn't changed, perhaps adding some labels would help orient the reader.

      We have rotated the movies to be more consistent with the figures.

      Fig. S7: No units are indicated for Kds (uM?).

      Thank you – this has been fixed.

      pg 11, paragraph 2. "adjacent to three residues: Glu118, Tyr120 and Asn192": The residue number used in the text doesn't seem to match the numbering in the PDB file. I think these residues correspond to Glu98, Tyr100, and Asn172 in the PDB file.

      We regret this error. The correct numbering for both structures is now present in the deposited PDB files (7T1M for BstB and 7T1S for BstC).

      pg 12, final paragraph: The authors present binding data for BstB variants with mutations in the putative sterol binding pocket identified in the structural and MD analyses. However, these mutants had no effect on binding. The authors rationalize this in terms of the size of the interface and hydrophobic nature (which indeed, may be correct and is very plausible), and it is worth noting that many of their mutations are to Ala and would largely preserve the hydrophobic nature of the cleft. However, these mutants raise questions about where sterols actually bind. No experimental evidence is presented that substrates bind in the cleft, it is only hypothesized based on structural homology, MD simulations, etc. These mutations formally provide evidence against the hypothesis being tested; I think that has to be discussed a bit more directly, alongside the caveats the authors already discuss about hydrophobicity, etc.

      This is a valid point by the reviewer, and it is one we have attempted to address with our statement in the manuscript and in our response to reviewer 1. We have modified the relevant text to more clearly state that there is as of yet no experimental evidence for the binding of sterols to the cavity identified via molecular docking.

      pg 13: Presumably this is not the full-length lipoprotein, but has been truncated/mutated in some way? Some statement of roughly what was purified/crystallized should be stated.

      The SI methods on protein purification states that the genes of BstB and BstC without their respective signal peptides were obtained.

      pg 13, last paragraph "TN1 exhibits hybrid hydrophobicity, with the sides horizontal to cavities being hydrophobic while the vertical sides are more hydrophilic". I don't really follow the horizontal vs vertical sides. Perhaps this could be described in a different way.

      Noted and changed to “TN1 is closer to the N-terminal face of the structure, while CA1 and CA2 are proximal to the C-terminal face and form two open hydrophobic pockets; TN1 exhibits a mixture of hydrophobic and hydrophilic amino acids (Fig. 4B and Fig. S9B, Table S4).”

      pg 15-16, "Comparison to eukaryotic sterol transporters": Perhaps this would be better suited for the discussion section? Could also be streamlined; it is mostly discussing and comparing eukaryotic sterol binding domains to each other, not to BstABC.

      Given that BstB and BstC are the first identified proteins (and putative transporters) for bacterial sterol engagement, we thought a careful description of the existing sterol transporters (which are all eukaryotic) was warranted.

      Reviewer #3 (Recommendations For The Authors):

      I have just two minor suggestions for the authors if they wish to comment on or address them.

      1. Do the three proteins (BstA/B/C) form any sort of complex? Perhaps this property was not assessed - but it seemed possible that the B and C components might constitute a shuttle for the membrane-bound transporter?

      This is an important observation – the unliganded version of these proteins show no appreciable affinity for each other. However, BstB (which would be expected to engage both with BstA and BstC) belongs to a family of proteins known to undergo significant conformational change upon substrate binding. It is possible that with substrate present, complexes are formed – we have yet to investigate this.

      1. In Figure S1, panel C - it appears that the label for the BstC cluster may have migrated away from the intended location. In this figure, it might also be useful to indicate in the caption the meaning of the red coloring of the nodes?

      The label is now fixed – thank you for drawing our attention to this.

    1. Author Response

      The following is the authors’ response to the original reviews.

      REVIEWER #1

      Leanza et al. investigated the regulation of Wnt signaling factors in the bone tissue obtained from individuals with or without type 2 diabetes. They showed that typical canonical Wnt ligands and downstream factors (Wnt10b, LEF1) are down-regulated, while Wnt5a and sclerostin mRNA are unregulated in diabetic bone tissue. Further, Wnt5a and sclerostin associated with the content of AGEs and SOST mRNA levels also correlated with glycemic control and disease duration.

      Strengths:

      • A strength of the study is the investigation of Wnt signaling in bone tissue from humans with type 2 diabetes. Most studies measure only serum levels of Wnt inhibitors, but this study takes it further and looks into bone specifically.

      • The measurement of AGEs and its correlation to the Wnt signaling molecules is interesting and important. The correlation of sclerostin and Wnt5a with AGEs and disease duration suggests that inhibited Wnt signaling is paralleled by higher AGE levels and potentially weaker bone.

      • The methodology in terms of obtaining the bone samples and the rigorous evaluation of RNA integrity is great and provides a solid basis for further analyses.

      Weaknesses:

      • A weakness may include the rather limited number of samples. Especially for some sub-analyses (e.g. RNA analyses), only a subset of samples was used.

      • How was the sample size determined? It seems like more samples might have been necessary to obtain significant results for methods with a higher standard deviation (e.g. histomorphometry).

      We apology for the oversight in the description of the statistical analysis and we thank the reviewer for the careful reading. For sample size calculation of bone histomorphometry we used the cohort of the only paper analyzing trabecular bone in T2D postmenopausal women by dynamic histomorphometry (Manavalan JS et al, JCEM 2012). We performed a priori sample size calculation using G*Power 3.1.9.7., based on the t-test, difference between two independent groups setting. Analysis demonstrated that given an effect size of 2.2776769, we needed a total of 12 patients (6/group) to reach a power of 0.978. Regarding gene expression analyses, it was performed not in a subset of patients, but in all recruited subjects for this study. Based on the results of gene expression analysis on our main outcome (Wnt signaling), we demonstrated that for SOST gene the effect size was 1.2733824, with a power of 0.9490065, confirming that sample size was sufficient to achieve adequate statistical power.

      • Why is the number of samples different for the mRNA measurements? In most cases, there were 9, but in some 8 and in some 10?

      We sincerely thank the reviewer for the opportunity to clarify such important aspects. The number of samples used for mRNA quantification may differ between the different analyzed genes due to multiple reasons: First, we used for the real-time PCR only samples with high quality ratio (260/280) between 1.8-2.0 as stated in the method section of the manuscript (Page 8, lines 163-164). Moreover, we decided not to use the undetermined values, undetectable after the amplification cycles (40 cycles in total), as specified in the method section (Page 8, line 167).

      Overall, this study validates findings from the group that reported similar findings in 2020. This validates their methodology and shows that alterations in Wnt signaling are reproducible in human bone tissue.

      We thank the reviewer for the positive comment, we really value her/his opinion.

      COMMENTS:

      (1) The authors could provide more details on how much of the bone was analyzed for bone histomorphometry (what area?).

      We truly thank the reviewer for allowing us to explain more in depth our methodology. First, a biopsy containing trabecular bone from the femoral head was fixed in 10% neutral buffered formalin for 24 h prior to storage in 70% ethanol. Tissues were embedded in methylmethacrylate and sectioned sagittally by the Washington University Musculoskeletal Histology and Morphometry Core. Sections were stained with Goldner’s trichrome. Then, a rectangular region of interest containing trabecular bone was chosen below the cartilage-lined joint surface and primary spongiosa. This region had an average dimension of 45 mm2. Tissue processing artifacts, such as folding and edges, were excluded from the ROI. A threshold was chosen using the BIOQUANT software to automatically select trabeculae and measure bone volume. Finally, Osteoid was highlighted in the software and quantified semi-automatically using a threshold and correcting with the brush tool (as shown in the image below).

      We specify that in the methods section (Page 7, lines 146-152).

      Author response image 1.

      (2) Could the number of samples used for histomorphometry be increased? That may also lead to more significant results.

      We sincerely appreciated this suggestion from the reviewer but unfortunately, all available samples for histomorphometry have been analyzed and we are not able to increase the number of recruited participants at this time. Recruitment of people with T2D undergoing hip replacement is extremely difficult giving the limited number of those approved for elective surgery and compliant with our inclusion criteria. Considering also the long time needed to process bone sample for gene expression and histology analysis would require several months to have a consistent increase in recruited subjects. However, we have previously calculated sample size for bone histomorphometry analysis using the only available data of trabecular bone in T2D postmenopausal women measured by dynamic histomorphometry (Manavalan JS et al, JCEM 2012). We performed a priori sample size calculation using G*Power 3.1.9.7., based on the t-test of two independent groups. Analysis demonstrated that given an effect size of 2.2776769, we needed a total of 12 patients (6/group) to reach a power of 0.978.

      (3) It would have been interesting to assess the biomechanical behavior of the bone specimens. While it is known that BMD is often higher in patients with T2D, the resistance to fractures is lower. Ideally, bone strength measures could be correlated with Wnt molecule expression and AGEs.

      We agree with the reviewer that the assessment of biomechanical parameters in our cohort would increase the importance of this study, giving more insights on the effect of downregulation of Wnt signaling on bone strength. Thus, we followed reviewer suggestion, and we performed bone compression tests on trabecular bone core. We found a significant decrease in bone plasticity of T2D compared to controls [Young’s Modulus 21.6 (13.46-30.10 MPa) vs. 76.24 (26.81-132.9 MPa); p=0.0025). We added results of bone compression test in a new paragraph (Page 8, lines 191-194). In order to assess the validity of our results, we performed a post-hoc power calculation using G*Power 3.1.9.7. We demonstrated that effect size was 1.4716626, with a power of 0.9730784, confirming that sample size was sufficient to achieve adequate statistical power. We added methods in the related section and biomechanical data in table 3; we modified the manuscript accordingly (modifications are shown in track changes). Moreover, we also performed correlation analysis between Wnt target genes, AGEs and biomechanical parameters showing significant correlations as reported in the added paragraph in the results section (Page 11, Lines 225-233).

      REVIEWER #2

      This study reports the levels of expression of selected genes implicated in Wnt signaling in trabecular bone from femur heads obtained after surgery from post-menopausal women with (15 women) or without (21 women) type 2 diabetes. They found higher expression levels of SOST and WNT5A, and lower expression levels of LEF-1 and WNT10B in tissues from subjects with T2D, correlating with glycemia and advanced glycation products. No significant differences in bone density were observed. Overall, this is a cross-sectional, observational study measuring a limited set of genes found to vary with glycemia in postmenopausal women undergoing hip surgery.

      Strengths:

      The study demonstrates the feasibility of measuring gene expression in post-surgical trabecular bone samples, and finds differences associated with glycemia despite a relatively small number of subjects. It can form the basis for further research on the causes and consequences of changes in elements of the WNT signaling pathway in bone biology and disease.

      Weaknesses:

      The small number of targeted genes does not provide a comprehensive view of the transcriptional landscape within which the effects are observed. The gene expression changes are not associated with cellular or physiological properties of the tissue, raising questions about the biological significance of the observations.

      We thank the reviewer for the comment. Replying to his/her concerns we have increased the number of Wnt target genes including more interactors of Wnt/β-catenin pathway. We measured GSK3B, AXIN2, BETA-CATENIN and SFRP5 gene expression levels, showing a significant increase in GSK3B, in line with a downregulation of Wnt signaling in T2D. We modified the manuscript accordingly with this new analysis and updated the figure 1 panel (Page 10, lines 210-213). Unfortunately, in this paper we were not able to perform experiments on cellular or physiological properties. However, in order to analyze the biological effect of the analyzed genes on the phenotype, we measured bone strength by performing compression tests on trabecular bone cores (Page 10, lines 201-203 and table 3) and used biomechanical parameters for correlation analysis with targeted genes showing significant correlations of bone strength and Wnt genes. We modified adding a new paragraph in the result section and a new figure panel to the main manuscript (Page 11, lines 225-233 and figure 4).

      COMMENTS:

      (1) The small number of targeted genes does not provide a comprehensive view of the transcriptional landscape within which the effects are observed. Given the author's success in obtaining good-quality RNA from trabecular bone, a more comprehensive exploration would greatly improve the quality of the study.

      We agree with the reviewer that increase the transcriptional landscape related to Wnt signaling would be of interest for this work and we really thank for this opportunity. We were able to increase the number of Wnt target genes including more interactors of Wnt/β-catenin pathway, using the same cohort of patients in which we performed the other analysis. We also measured GSK3B, AXIN2, BETA-CATENIN and SFRP5 gene expression levels, showing a significant increase in GSK3B, in line with a downregulation of Wnt signaling in T2D. We modified the manuscript accordingly with this new analysis and updated the figures panel (Page 10, lines 210-213 and Figure 1).

      (2) The gene expression changes are not associated with cellular or physiological properties of the tissue, raising questions about the biological significance of the observations. Can the authors perform immunohistochemistry to associate the changes in gene expression with protein expression?

      We sincerely acknowledge this comment for focusing the attention on a such important aspect. We have partially replied to this comment in the previous paragraph. Regarding immunohistochemistry analysis, it is not possible to further use the available samples. This is mainly due to the fact that non-decalcified bones were embedded in plastic to allow for separate analysis of newly formed osteoid and mineralized bone. This process leads to poor antigen preservation and unsuitable detection of most targets. Moreover, antibodies for Wnt are also unreliable due to the secreted nature of the protein. Overall, this approach is unlikely to work efficiently. Similarly, RNAscope is not possible due to the resin. Optimization and validation of these analyses will need to be saved for a future study with fresh specimens.

      REVIEWER #3

      The manuscript by Leanza and colleagues explores the regulation of Wnt signaling and its association with advanced glycation end products (AGEs) accumulation in postmenopausal women with type 2 diabetes (T2D). The paper provides valuable insights into the potential mechanisms underlying bone fragility in individuals with T2D. Overall, the manuscript is well-structured, and the methodology is sound. I would suggest some minor revisions to improve clarity.

      Strengths:

      The study addresses an important and clinically relevant question concerning the mechanisms underlying bone fragility in postmenopausal women with T2D.

      The study's methodology appears sound, and the inclusion of postmenopausal women with and without T2D undergoing hip arthroplasty adds to the clinical relevance of the findings. Additionally, measuring gene expression and AGEs in bone samples provides direct insights into the study's objectives.

      The manuscript presents data clearly, and the results are well-organized.

      Weaknesses:

      Title. The title could be more specific to better reflect the content of the study. Also, the abstract should concisely summarize the study's main findings, providing some figures.

      We thank the reviewer for this suggestion, and we modified the title giving specific information on the main findings of this study. The new title is “Bone canonical Wnt signaling is downregulated in type 2 diabetes and associates with higher Advanced Glycation End-products (AGEs) content and reduced bone strength”. Moreover, we added as suggested a graphical abstract summarizing our study results.

      Introduction: the introduction would benefit from the addition of a clearer, more focused statement of the research questions or hypotheses guiding this study.

      We thank the reviewer for this opportunity and we reformulated the hypothesis of this study based on our data and new findings as follow:” we hypothesized that T2D and AGEs accumulation downregulate Wnt canonical signaling and negatively affect bone strength”. (page 6, lines 116-117).

      Methods: more information is needed on the hystomorphometry analysis. Surgical samples from 8 T2D and 9 non-diabetic subjects were used for histomorphometry analysis. How did these subjects compare with the other subjects in the T2D and control groups? Were they representative? How were they selected?

      We thank the reviewer for the opportunity to clarify this important point. The number of subjects included in the different analysis of the paper differ for multiple reasons. In particular, we used only bone specimen with enough trabecular bone material adequate to perform histomorphometry analysis. Therefore, the samples used in the histomorphometry analysis belong to the same subjects enrolled in the study and analyzed for the other experiments of this paper. However, we have previously calculated sample size for bone histomorphometry analysis using the only available data of trabecular bone in T2D postmenopausal women measured by dynamic histomorphometry (Manavalan JS et al, JCEM 2012). We performed a priori sample size calculation using G*Power 3.1.9.7., based on the t-test of two independent groups. Analysis demonstrated that given an effect size of 2.2776769, we needed a total of 12 patients (6/group) to reach a power of 0.978.

      COMMENTS:

      (1) In the Abstract, values and p-values for comparisons, and Spearman's rho and p-values for correlations should be provided. Most adverbs (thus, accordingly, importantly) could be omitted to improve conciseness and clarity.

      We kindly thank the reviewers for this precise and careful comment. We changed the Abstract accordingly. According to the abstract style of the journal we initially reported only the main findings. We have now modified providing values and p values as requested. We defer to the wishes of the editor as to the format in which the abstract should be reported.

      (2) Result presentation: 25th and 75th percentile should be provided rather than the interquartile range, to better reflect data distribution.

      We thank the reviewer for the opportunity to better clarify this part of the results section. We changed the manuscript accordingly.

      (3) Estimated glomerular filtration rate should be calculated and provided as a marker of renal function, rather than serum creatinine values.

      We thank the reviewer for the comment, and we modify the manuscript accordingly, adding the eGFR values in table 1 and in the result section.

      (4) The manuscript should include a statement confirming compliance with the Declaration of Helsinki, considering that human subjects were involved in the study.

      We thank the reviewer for the comment. The study was conducted in accordance with the Declaration of Helsinki. Ethics Committee of Campus Bio-Medico University approved the present study. Informed consent was obtained from all subjects involved in the study. (Page 6, lines 134-137).

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      In this manuscript, the authors investigated the role of Elg1 in the regulation of telomere length. The main role of the Elg1/RLC complex is to unload the processivity factor PCNA, mainly after completion of synthesis of the Okazaki fragment in the lagging strand. They found that Elg1 physically interacts with the CST (Cdc13-Stn1- Ten1) and propose that Elg1 negatively regulates telomere length by mediating the interaction between Cdc13 and Stn1 in a pathway involving SUMOylation of both PCNA and Cdc13. Accumulation of SUMOylated PCNA upon deletion of ELG1 or overexpression of RAD30 leads to elongated telomeres. On the other hand, the interaction of Elg1 with Sten1 is SIM-dependent and occurs concurrently with telomere replication in late S phase. In contrast Elg1-Cdc13 interaction is mediated by PCNA-SUMO, is independent on the SIM of Elg1 but still dependent on Cdc13 SUMOylation. The authors present a model containing two main messages 1) PCNA- SUMO acts as a positive signal for telomerase activation 2) Elg1 promotes Cdc13/Stn1 interaction at the expense of Cdc13/Est1 interaction thus terminating telomerase action.

      The manuscript contains a large amount of data that make a major inroad on a new type of link between telomere replication and regulation of the telomerase. Nevertheless, the detailed choreography of the events as well as the role of PCNA- SUMO remain elusive and the data do not fully explain the role of the Stn1/Elg1 interaction. The data presented do not sufficiently support the claim that SUMO- PCNA is a positive signal for telomerase activation.

      We thank the reviewer for her/his review efforts and opinion. We have re-submitted a new version of the manuscript in which we clarify some of the criticisms presented. In a point-by-point letter we respond to all the specific queries.

      Reviewer #2 (Public Review):

      This paper purports to unveil a mechanism controlling telomere length through SUMO modifications controlling interactions between PCNA unloader Elg1 and the CST complex that functions at telomeres. This is an extremely interesting mechanism to understand, and this paper indeed reveals some interesting genetic results, leading to a compelling model, with potential impact on the field. The conclusions are largely supported by experiments examining protein-protein interactions at low resolution and ambiguous regarding directness of interactions like co-IP and yeast two-hybrid (Y2H) combined with genetics. However, some results appear contradictory and there's a lack of rigor in the experimental data needed to support claims. There is significant room for improvement and this work could certainly attain the quality needed to support the claims. The current version needs substantial revision and lacks the necessary experimental detail. Stronger support for the claims would add detail to help distinguish competing models.

      We thank the reviewer for her/his positive opinion. We have re-submitted a new version of the manuscript in which we clarify some of the criticisms presented by thereferees, and added all the missing experimental details. In a point-by-point letter we respond to all the specific queries.

      Reviewer #3 (Public Review):

      This paper reveals interesting physical connections between Elg1 and CST proteins that suggest a model where Elg1-mediated PCNA unloading is linked to regulation of telomere length extension via Stn1, Cdc13, and presumably Ten1 proteins. Some of these interactions appear to be modulated by sumolyation and connected with Elg1's PCNA unloading activity. The strength of the paper is in the observations of new interactions between CST, Elg1, and PCNA. These interactions should be of interest to a broad audience interested in telomeres and DNA replication.

      We thank the reviewer for her/his positive opinion. We have re-submitted a new version of the manuscript in which we clarify some of the criticisms presented. In a point-by-point letter we respond to all the specific queries.

      What is not well demonstrated from the paper is the functional significance of the interactions described. The model presented by the authors is one interpretation of the data shown, and proposes that the role of sumolyation is temporally regulate the Elg1, PCNA and CST interactions at telomeres. This model makes some assumptions that are not demonstrated by this work (such as Stn1 sumolyation, as noted) and are left for future testing. Alternative models that envision sumolyation as a key in promoting spatial localization could also be proposed based on the data here (as mentioned in the discussion), in addition to or instead of a role for sumolyation in enforcing a series of switches governing a tightly sequenced series of interactions and events at telomeres. Critically, the telomere length data from the paper indicates that the proposed model depicts interactions that are not necessary for telomerase activation or inhibition, as telomeres in pol30-RR strains are normal length and telomeres in elg1∆ strains are not nearly as elongated as in stn1 strains. One possibility mentioned in the paper is the PCNAS and Elg1 interactions are contributing to the negative regulation of telomerase under certain conditions that are not defined in this work. Could it also be possible that the role of these interactions is not primarily directed toward modulating telomerase activity? It will be of interest to learn more about how these interactions and regulation by Sumo function intersect with regulation of telomere extension.

      We present compelling evidence for a role of SUMOylated PCNA in telomere length regulation. Figure 1 shows that this modification is both necessary and sufficient to elongate the telomeres, indicating that PCNA SUMOylation plays a positive role in telomere elongation. The model we present is consistent with all our results. There are, of course, possible alternative models, but they usually fail to explain some of the results. We agree that the fact that pol30-RR presents normal-sized telomeres implies that SUMO-PCNA is not required for telomerase to solve the "end replication problem", but rather is needed for "sustained" activity of telomerase. Since elongated telomeres (by absence of Elg1 or by over-expression of SUMO-PCNA) was the phenotype monitored, this may require sustained telomerase activity. Similar results were seen in the past for Rnr1 (Maicher et al., 2017), and this mode depends on Mec1, rather than Tel1 (Harari and Kupiec, 2018). Telomere length regulation is complex, and we may not yet understand the whole picture. It appears that for normal “end replication problem” solution, very little telomerase activity may be needed, and spontaneous interactions at a low level may suffice. Future work may find the conditions at which telomerase switches from "end replication problem" to "sustained" activity. We have added further explanations on this subject to the Discussion section.

      We suspect, but could not prove, a role for Stn1 SUMOylation in the interactions. SUMOylation is usually transient, and notoriously hard to detect, and despite the fact that many telomeric proteins are SUMOylated, Stn1 SUMOylation could not be shown directly by us and others (Hang et al, 2011).

      Reviewer #1 (Recommendations For The Authors):

      Suggestions for improved or additional experiments, data or analyses.

      • My main concern is the claim that SUMOylated PCNA acts as a positive signal for telomerase activation. Yet the pol30-RR mutant has no impact on telomere length. The explanation of the authors is not entirely convincing.

      We are aware that the regulation of telomere length is complex, and we may not fully understand it yet. Just consider the fact that ~500 genes participate in determining the final telomere length of a yeast (Askree et al., 2004). Since mutation in EACH of these genes has a phenotype, the implication is that the joint action of 500 players determines the outcome (a dialogue of 500 participants). Having said this, we clearly show in figure 1 that mutations that prevent PCNA SUMOylation prevent telomere length elongation in cells lacking Elg1, and overexpressing SUMOylated PCNA is enough to elongate the telomeres. Thus, SUMOylation of PCNA does act as a positive signal for elongation.

      However, it appears that to fulfill the minimal requirement of dealing with the "end- replication problem", PCNA SUMOylation is not required, and only a "sustained activity" mode requires the S-PCNA signal (as we have also shown, surprisingly, for RNR1, Maicher et al. 2017). This sustained activity mode depends on Mec1, rather than Tel1 (Harari and Kupiec, 2018). Since elongated telomeres (by absence of Elg1 or by over-expression of SUMO-PCNA) was the phenotype monitored, this may require sustained telomerase activity. Telomere length regulation is complex, and we may not yet understand the whole picture. It appears that for normal “end replication problem” solution, very little telomerase activity may be needed, and spontaneous interactions at a low level may suffice (for example, unmodified PCNA may promote telomerase activity at a lower level than that of SUMO-PCNA. Future work may find the conditions at which telomerase switches from "end replication problem" to "sustained" activity.

      We have added further explanations on this subject to the Discussion section.

      • The model is entitled « Elg1 negatively regulates the telomere length by forming an interaction with the CST complex ». Nevertheless, expression of PCNA-RR completely reversed the long telomere phenotype of elg1∆ cells. Thus it appears that although the interaction between Stn1 and Cdc13 is reduced in the absence of Elg1, Elg1/Stn1 interaction is not instrumental in the formation of the CST complex and thus in the termination of telomerase activity. Does the elg1∆SIM mutant that does not interact with Stn1 impact telomere length?

      • In the model part (lane 318), it is argued that the complex Elg1-Stn1 unloads SUMOylated PCNA. Elg1-Stn1 interaction depends on the SIM of Elg1. This SIM is however not required for Elg1's function in genome-wide SUMO-PCNA unloading, is it required specifically at telomeres?

      The interactions between Elg1 and SUMOylated PCNA are carried out through both the SIM and the Threonines 386 and 387 (Shemesh et al, 2017). Consistently, the single elg1-SIM mutant has telomeres of normal length, and its effects on telomere length can only be seen when combined with mutations in the Threonines (elg1- TT386/7AA or elg1-TT386/7DD). Although the unloading of SUMOylated PCNA by Elg1 is important, the gene is not essential, and PCNA is either eventually unloaded by RFC, or spontaneously dis-assembles. This explains why the telomere length does not reach the same length in the absence of Elg1 as in the absence of, say, Stn1.

      • The model suggests that Elg1 promotes the interaction between Cdc13 and Stn1. This is based on the data presented in Figure 5 E and F. This is an important result. Because the experiment has been done on cells synchronized in S phase and the Elg1/Stn1 interaction occurs specifically at the end of S-phase, the FACS profile should be shown or a control provided to show that the two conditions are comparable.

      The FACS profile for this experiment is shown in Figure 5C.

      • Does the interaction between Cdc13 and Pol30 depend on the SUMOyaltion of POL30 ?

      Yes. We have added this as new Figure S2, and presented the results together with Figure 3 (Figure 3 is already too crowded).

      Others points :

      • Fig 1 : it should be mentioned in the Materials and Methods or in the figure legend how the average telomere lengths (horizontal bar) were calculated from the teloblot, as the position of the bar is not always intuitive

      We estimate telomere length by using TelQuant (Rubinstein et al., 2014). We have added this to the Methods section.

      -Fig 2 : Owing to the large span of telomere length in the stn1 mutants, the epistatic relationship between elg1∆ and stn1 mutants is poorly illustrated by the teloblot.

      We repeated this experiment several times, and stn1 mutants consistently gave a very spread telomere length. In ALL the blots, however, the double mutants elg1 stn1 showed a telomere length similar to that of the single stn1 mutant, and never longer.

      • It is mentioned that other mutants in the collection showed epistasis. Are any of these mutants related to telomere replication or the proposed model?

      Since we used the collection of non-essential mutants (so far), it was quite devoid of genes involved in DNA replication, which are mostly essential. An exception was siz1, which showed epistasis with elg1Δ.

      • The section entitled « Elg1's functional activity is essential for its interaction with Cdc13 » (lane 205) is difficult to follow. The hierarchy between the different mutants of Elg1 on their capacity to unload PCNA is not totally in agreement with the data published in Itzkovich et al 2023 and Shemesh et al. 2017. In particular it appears to me from these papers that elg1-WalkerA 238 (KK343/4AA) mutant did not show a defect in contrast to elg1-WalkerA 238(KK343/4DD).

      We are sorry for the typo in the results. We used the elg1-WalkerA (KK343/4DD) allele, which has a normal SIM but no activity. In a nutshell, we used mutants that either did or did not show unloading activity and/or SIM. The results clearly show that you need to unload PCNA in order for the N-ter of Elg1 to interact with Cdc13.

      • Are the synchronization done at 30{degree sign}C ?

      Yes. We have added the information to the Methods section.

      • ChIP experiments are not described in the Materials and Methods

      We apologize for this. They are now described.

      • In the figure 6, the PCNA rings are curiously placed at the beginning of the Okasaki fragments.

      We thank the referee for noticing, we have corrected the figure.

      Reviewer #2 (Recommendations For The Authors):

      This paper purports to unveil a mechanism controlling telomere length through SUMO modifications controlling interactions between PCNA unloader Elg1 and the CST complex that functions at telomeres. This is an extremely interesting mechanism to understand, and this paper indeed reveals some interesting genetic results, leading to a compelling model, with potential impact on the field. The conclusions are largely supported by experiments examining protein-protein interactions at low resolution and ambiguous regarding directness of interactions like co-IP and yeast two-hybrid (Y2H) combined with genetics. However, some results appear contradictory and there's a lack of rigor in the experimental data needed to support claims. There is significant room for improvement and this work could certainly attain the quality needed to support the claims. The current version needs substantial revision and lacks necessary experimental detail. Stronger support for the claims would add detail to help distinguish competing models.

      Specific comments:

      Insufficient technical detail: I could find no explanation of how overexpression was achieved. No description of how teloChIP is performed, either for the PCNA IP or how the sequence analysis is performed. Too limited details on growth like exact temperatures for the cell cycle time course.

      We have significantly expanded the Methods section to include all the technical information.

      Please do not bold and underline text for emphasis-EVER

      We have removed those from the text.

      Lines 130-132: they have not shown "accumulation of SUMOylated PCNA" anywhere; this is an inference.

      We have modified the text, it says: ”show that SUMOylated PCNA, and not unmodified or ubiquitinated PCNA, is both necessary and sufficient for telomere elongation in the presence or in the absence of Elg1.”

      Fig 2A Can authors show any other very long-telomere mutant like stn1 that does show enhancement in combination with elg1∆ to show feasibility of such phenotype?

      We don't think it is appropriate for the paper, but we have systematically created double mutants with elg1Δ and found many additive and even synergistic interactions. Here is an example. in Author response image 1, taken from the PhD thesis of Taly Ben-Shitrit, a PhD student in the lab.

      Author response image 1.

      What about cdc13 or ten1? Epistatic?

      We did not test telomere length in combination with Ten1. Combining elg1 with cdc13-50 resulted in synergistic elongation. Given the complex genetic relationship between Stn1/Ten1 and Cdc13, it is hard to interpret this result.

      Seems tenuous to use Y2H to decipher protein-protein interactions occurring out of context (i.e., not at telomere but at reporter gene promoter)

      Y2H is a great method to detect interactions, even if they are transient. Whenever possible, we confirm our findings using co-IP or telo-ChIP.

      Lines 268-270: It would be more accurate to state "can be" instead of "becomes" or "is" as they have not shown that SUMOylation or PCNA unloading have occurred.

      We agree, and have changed the text.

      Cdc13snm protein level?

      Unfortunately our Western blot is not presentable, but the level of Cdc13snm was similar to that of the wt Cdc13, and this result has been already published by Hang et al., 2011.

      Fig S3A: If SUMOylated Cdc13 mediates the Stn1-Elg1 interaction, why is Stn1-Elg1 interaction maintained in cdc13snm strain? This result seems to directly contradict the premise and overall conclusion of this section that Cdc13-SUMO mediates the (Y2H) interaction of Elg1 and Stn1.

      According to our model, the interaction between Stn1 and Elg1 takes place upstream, and only then this complex interacts with SUMOylated Cdc13. Hence, if Cdc13 cannot be SUMOylated, the interaction Elg1-Stn1 is not lost, although Stn1 fails to interact with Cdc13, leading to a telomeric phenotype.

      Line 279: which data establishes Stn1-Elg1 interaction as direct? Fig 2B co-Ip indicates physical but not necessarily direct interaction, but later the authors suggest that the interaction requires a SUMOylated intermediary, and Y2H in Fig. S3B doesn't demonstrate direct interaction.

      We have changed the text, taking out the word "direct".

      Co-Ip shows that interaction of Elg1 with Stn1 occurs mainly during later Sphase and with an overall delay compared to initial Elg1-Pol3 interaction.Co-IP Interaction between Cdc13 and Stn1 is reduced in the absence of Elg1

      The subsection title: "The interaction of Elg1 with Stn1 takes place at telomeres only at late S-phase" is not well supported by the data. I agree the data are consistent with the idea of the interactions occurring at telomeres but there's no direct evidence of this.

      We have changed the subsection title. It now reads: " The interaction of Elg1 with Stn1 takes place only at late S-phase"

      Model: Is unloading happening at the fork? Doesn't PCNA unloading have to follow its loading which occurred behind the fork particularly on the lagging strand? Model now suggest that Stn1 itself is SUMOylated.

      Yes, according to the model Elg1 moves with the fork, unloading PCNA from the lagging strand. Once Elg1 reaches the telomeres, it interacts with Stn1 (Figure 5). This interaction requires SUMOylation of Stn1 or of some other protein, which is not PCNA (Figure 3D) nor Cdc13 (Figure S3A) and could be Stn1 itself or another telomeric protein (Hang et al., 2011)

      Title is rather vague.

      We think it summarizes what we present in the paper.

      Abstract:

      "We report that SUMOylated PCNA acts as a signal that positively regulates telomerase activity."

      I don't think this is supported or a good description of what they find

      Figure 1B clearly shows that SUMO-PCNA is both necessary and sufficient for telomere elongation.

      "and dissected the mechanism by which Elg1 and Stn1 negatively regulates telomere elongation, coordinated by SUMO."

      Again, I don't think this is sufficiently supported and the model invokes SUMOylation events not demonstrated like Stn1, which might be a significant step forward.

      On the positive side, their model makes several predictions that they could test much more directly and rigorously: for example, examining the impact of the relevant mutations in the recruitment of proteins to the telomere.

      We have dissected the mechanism, and future work will be devoted to examining the impact of the relevant mutations in the recruitment of proteins to the telomere.

      Reviewer #3 (Recommendations For The Authors):

      Comments:

      1) The telomere length analysis data presented here is consistent with an interpretation that Stn1 and Elg1 play roles in a similar telomere maintenance pathway because the telomere restriction fragment pattern in the double mutants are not longer than the stn1 single mutants. No comment is made with respect to the yellow bars in Figure 2 that presumably measure telomere length appearing to be slightly shorter than in the stn1 single mutants. It may be interesting and informative if the double mutants do in fact have some phenotype distinct from the single stn1 mutants. Is there an impact on viability in the double mutant?

      Given the variable telomeric phenotype of the single stn1 mutants, slight variations in the measurement of the median telomere size are expected. The difference observed is not likely to be significant. What is important is that the double mutants with elg1 do not show longer telomeres. In terms of fitness, the stn1 mutants grow slightly slowly, but the elg1 mutation does not slow them down further.

      2) It is somewhat surprising that no additional telomere length analysis is included that actually tests the proposed model, including whether this path could be operational only under certain conditions. Maybe this is a topic of the next paper?

      Indeed, future work will explore the conditions under which PCNA SUMOylation is essential, and those under which is only needed.

      3) Were the error bars in Figure 5F determined only from the experiment in E? Does this represent error in measuring the data from one biological replicate? The type of error should be made clear to avoid readers assuming the data represents measurements from more than one sample in more than one experiment. The data would be stronger if it represented measurements from multiple experiments.

      The graph was made with data from three biological replicates. We show the best blot in Figure 5E. We have now stressed this in the Figure Legend.

      4) Why was only one two hybrid reporter shown? Having the multiple reporters can give confidence in interactions. (Not a big deal here given the nice co-IP data.)

      We thought that it is enough to show one reporter, as the results with a different reporter (B-gal assay) led to the same conclusions. since this did not add information and made the paper too lengthy (and boring), we took them out. In any case all data was verified by co-IP.

      5) Line 414 - what are the 32P-radio labeled PCR fragments? Are these solely comprised of TG1-3 repeats of some length? A bit more detail in this aspect of the method could be helpful.

      We have added an explanation on the probe in the Methods section.

      6) Line 432-433 - which anti-HA or anti-My antibodies are these? (very minor detail)

      We have added the details.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public review): 

      Summary: 

      Liu and colleagues applied the hidden Markov model on fMRI to show three brain states underlying speech comprehension. Many interesting findings were presented: brain state dynamics were related to various speech and semantic properties, timely expression of brain states (rather than their occurrence probabilities) was correlated with better comprehension, and the estimated brain states were specific to speech comprehension but not at rest or when listening to non-comprehensible speech. 

      Strengths: 

      Recently, the HMM has been applied to many fMRI studies, including movie watching and rest. The authors cleverly used the HMM to test the external/linguistic/internal processing theory that was suggested in comprehension literature. I appreciated the way the authors theoretically grounded their hypotheses and reviewed relevant papers that used the HMM on other naturalistic datasets. The manuscript was well written, the analyses were sound, and the results had clear implications. 

      Weaknesses: 

      Further details are needed for the experimental procedure, adjustments needed for statistics/analyses, and the interpretation/rationale is needed for the results. 

      For the Experimental Procedure, we will provide a more detailed description about stimuli, and the comprehension test, and upload the audio files and corresponding transcriptions as the supplementary dataset. 

      For statistics/analyses, we have reproduced the states' spatial maps using unnormalized activity pattern. For the resting state, we observed a state resembling the baseline state described in Song, Shim, & Rosenberg (2023). However, for the speech comprehension task, all three states were characterized by network activities varying largely from zero. In addition, we have re-generated the null distribution for behaviorbrain state correlations using circular shift. The results are largely consistent with the previous findings. We have also made some other adjustment to the analyses or add some new analyses as recommended by the reviewer. We will revise the manuscript to incorporate these changes.

      For the interpretation/rationale: We will add a more detailed interpretation for the association between state occurrence and semantic coherence. Briefly speaking, higher semantic coherence may allow for the brain to better accumulate information over time.

      State #2 seems to be involved in the integration of information at shorter timescales (hundreds of milliseconds) while State #3 seems to be involved in the longer timescales (seconds). 

      We greatly appreciate the reviewer for the insightful comments and constructive suggestions.  

      Reviewer #2 (Public review): 

      Liu et al. applied hidden Markov models (HMM) to fMRI data from 64 participants listening to audio stories. The authors identified three brain states, characterized by specific patterns of activity and connectivity, that the brain transitions between during story listening. Drawing on a theoretical framework proposed by Berwick et al. (TICS 2023), the authors interpret these states as corresponding to external sensory-motor processing (State 1), lexical processing (State 2), and internal mental representations (State 3). States 1 and 3 were more likely to transition to State 2 than between one another, suggesting that State 2 acts as a transition hub between states. Participants whose brain state trajectories closely matched those of an individual with high comprehension scores tended to have higher comprehension scores themselves, suggesting that optimal transitions between brain states facilitated narrative comprehension. 

      Overall, the conclusions of the paper are well-supported by the data. Several recent studies (e.g., Song, Shim, and Rosenberg, eLife, 2023) have found that the brain transitions between a small number of states; however, the functional role of these states remains under-explored. An important contribution of this paper is that it relates the expression of brain states to specific features of the stimulus in a manner that is consistent with theoretical predictions. 

      (1) It is worth noting, however, that the correlation between narrative features and brain state expression (as shown in Figure 3) is relatively low (~0.03). Additionally, it was unclear if the temporal correlation of the brain state expression was considered when generating the null distribution. It would be helpful to clarify whether the brain state expression time courses were circularly shifted when generating the null. 

      In the revision, we generated the null distribution by circularly shifting the state time courses. The results remain consistent with our previous findings: p = 0.002 for the speech envelope, p = 0.007 for word-level coherence, and p = 0.001 for clause-level coherence.

      We note that in other studies which examined the relationship between brain activity and word embedding features, the group-mean correlation values are similarly low but statistically significant and theoretically meaningful (e.g., Fernandino et al., 2022; Oota et al., 2022). We think these relatively low correlations are primarily due to the high level of noise inherent in neural data. Brain activity fluctuations are shaped by a variety of factors, including task-related cognitive processing, internal thoughts, physiological states, as well as arousal and vigilance. Additionally, the narrative features we measured may account for only a small portion of the cognitive processes occurring during the task. As a result, the variance in narrative features can only explain a limited portion of the overall variance in brain activity fluctuations.

      We will replace Figure 3 and the related supplementary figures with new ones, in which the null distribution is generated via circular shift. Furthermore, we will expand our discussion to address why the observed brain-stimuli correlations are relatively small, despite their statistical significance.

      (2) A strength of the paper is that the authors repeated the HMM analyses across different tasks (Figure 5) and an independent dataset (Figure S3) and found that the data was consistently best fit by 3 brain states. However, it was not entirely clear to me how well the 3 states identified in these other analyses matched the brain states reported in the main analyses. In particular, the confusion matrices shown in Figure 5 and Figure S3 suggests that that states were confusable across studies (State 2 vs. State 3 in Fig. 5A and S3A, State 1 vs. State 2 in Figure 5B). I don't think this takes away from the main results, but it does call into question the generalizability of the brain states across tasks and populations. 

      We identified matching states across analyses based on similarity in the activity patterns of the nine networks. For each candidate state identified in other analyses, we calculate the correlation between its network activity pattern and the three predefined states from the main analysis, and set the one it most closely resembled to be its matching state. For instance, if a candidate state showed the highest correlation with State #1, it was labelled State #1 accordingly. 

      Each column in the confusion matrix depicts the similarity of each candidate state with the three predefined states. In Figure S3 (analysis for the replication dataset), the highest similarity occurred along the diagonal of the confusion matrix. This means that each of the three candidate states was best matched to State #1, State #2, and State #3, respectively, maintaining a one-to-one correspondence between the states from two analyses.

      For the comparison of speech comprehension task with the resting and the incomprehensible speech condition, there was some degree of overlap or "confusion."

      In Figure 5A, there were two candidate states showing the highest similarity to State #2. In this case, we labelled the candidate state with the strongest similarity as State #2, while the other candidate state is assigned as State #3 based on the ranking of similarity. This strategy was also applied to naming of states for the incomprehensible condition. The observed confusion supports the idea that the tripartite-state space is not an intrinsic, task-free property. To make the labeling clearer in the presentation of results, we will use a prime symbol (e.g., State #3') to indicate cases where such confusion occurred, helping to distinguish these ambiguous matches.

      (3) The three states identified in the manuscript correspond rather well to areas with short, medium, and long temporal timescales (see Hasson, Chen & Honey, TiCs, 2015).

      Given the relationship with behavior, where State 1 responds to acoustic properties, State 2 responds to word-level properties, and State 3 responds to clause-level properties, the authors may want to consider a "single-process" account where the states differ in terms of the temporal window for which one needs to integrate information over, rather than a multi-process account where the states correspond to distinct processes. 

      The temporal window hypothesis provides a more fitting explanation for our results. Based on the spatial maps and their modulation by speech features, States #1, #2, and #3 seem to correspond to short, medium, and long processing timescales, respectively. We will update the discussion to reflect this interpretation.

      We sincerely appreciate the constructive suggestions from the two anonymous reviewers, which have been highly valuable in improving the quality of the manuscript.  

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors): 

      (1) The "Participants and experimental procedure" section deserves more details. I've checked Liu et al. (2020), and the dataset contained 43 participants aged 20-75 years, whereas this study contained data from 64 young adults and 30 old adult samples. The previous dataset seems to have two stories, whereas this study seems to have three. Please be specific, given that the dataset does not seem the same. Could the authors also include more descriptions of what the auditory stories were? For example, what were the contents, and how were they recorded? 

      The citation is partially incorrect. The dataset of young adults is shared with our work published in (2022). The 64 participants listened to one of three stories told by a female college student in Mandarin, recounting her real-life experience of hiking, a graduate admission interview, and her first time taking a flight, respectively. The sample of older adults is from our work published in (2020), which includes 30 older adults and additionally 13 young adults. The stimuli in this case were two stories told by an older woman in a Chinese dialect, describing her experience in Thailand and riding a warship, respectively. Since we aim to explore whether the main results can be replicated on a different age group, we excluded the 13 young adults from the analysis. 

      All the stories were recorded during fMRI scanning using a noise-canceling microphone (FOMRI-III; Optoacoustics Ltd, Or-Yehuda, Israel) positioned above the speaker’s mouth. The audio recordings were subsequently processed offline with Adobe Audition 3.0 (Adobe Systems Inc., USA) to further eliminate MRI scanner noise.

      In the revised manuscript, we have updated the citation, and provided a more detailed description of the stimuli in the supplementary material. We have also uploaded the audio files along with their corresponding transcriptions to GitHub.

      (2) I am curious about individual differences in comprehension scores. Did participants have less comprehension of the audio-narrated story because the story was a hard-tocomprehend narrative or because the audio quality was low? Could the authors share examples of comprehension tests? 

      We believe two factors contribute to the individual differences in comprehension scores. First, the audio quality is indeed moderately lower than in dailylife story-listening conditions. This is because those stories were recorded and played during fMRI scanning. Although a noise-canceling equipment was used, there were still some noises accompanying the speech, which may have made speech perception and comprehension more difficult than usual.

      Second, the comprehension test measured how much information about the story (including both main themes and details) participants could recall. Specifically, participants were asked to retell the stories in detail immediately after the scanning session. Following this free recall, the experimenters posed a few additional questions drawn from a pre-prepared list, targeting information not mentioned in their recall. If participants experienced lapses of attention or did not store the incoming information into memory promptly, they might fail to recall the relevant content. In several studies, such a task has been called a narrative recall test. However, memory plays a crucial role in real-time speech comprehension, while comprehension affects the depth of processing during memory encoding, thereby influencing subsequent recall performance. To align with prior work (e.g., Stephens et al., 2010) and our previous publications, we chose to referred to this task as narrative comprehension. 

      In the revised manuscript, we have provided a detailed description about the comprehension test (Line 907-933) and share the examples on GitHub. 

      (3) Regarding Figure 3, what does it mean for a state occurrence to follow semantic coherence? Is there a theoretical reason why semantic coherence was measured and related to brain state dynamics? A related empirical question is: is it more likely for the brain states to transition from one state to another when nearby time points share low semantic similarity compared to chance? 

      We analyzed semantic coherence and sound envelope as they capture different layers of linguistic and acoustic structure that unfold over varying temporal scales. Changes in the sound envelope typically occur on the order of milliseconds to a few hundred milliseconds, changes in word-level semantic coherence span approximately 0.24 ± 0.15 seconds, and changes in clause-level semantic coherence extend to 3.2 ± 1.7 seconds. Previous theory and empirical studies suggest that the timescales of information accumulation vary hierarchically, progressing from early sensory areas to higher-order areas (Hasson et al., 2015; Lerner et al., 2011). Based on this work, we anticipate that the three brain states, which are respectively associated with the auditory and sensory motor network, the language network and the DMN, would be selectively modulated by these speech properties corresponding to distinct timescales. 

      Accordingly, when a state occurrence aligns with (clause-level) semantic coherence, it suggests that this state is engaged in processing information accumulated at the clause level (i.e., its semantic relationship). Higher coherence facilitates better accumulation, making it more likely for the associated brain state to be activated. 

      We analyzed the relationship between state transition probability and semantic coherence, but did not find significant results. Here, the transition probability was calculated as Gamma(t) – Gamma(t-1), where Gamma refers to the state occurrence probability. The lack of significant findings may be because brain state transitions are driven primarily by more slowly changing factors. Indeed, we found the average dwell time of the three states ranges from 9.66 to 15.29s, which is a much slower temporal dynamics compared to the relatively rapid shifts in acoustic/semantic properties. 

      In the revised version, we have updated the Introduction to clarify the rational for selecting the three speech properties and to explore their relationship with brain dynamics (Line 111-118)

      (4) When running the HMM, the authors iterated K of 2 to 10 and K = 4, 10, and 12. However, the input features of the model consist of only 9 functional networks. Given that the HMM is designed to find low-dimensional latent state sequences, the choice of the number of latent states being higher than the number of input features sounds odd to me - to my speculation, it is bound to generate almost the exact same states as 9 networks and/or duplicates of the same state. I suggest limiting the K iterations from 2 to 8. For replication with Yeo et al.'s 7 networks, K iteration should also be limited to K of less than 7, or optionally, Yeo's 7 network scheme could be replaced with a 17network scheme. 

      We understand your concern. However, the determination of the number (K) of hidden states is not directly related to the number of features (in this case, the number of networks), but rather depends on the complexity of the time series and the number of underlying patterns. Given that each state corresponds to a distinct combination of the features, even a small number of features can be used to model a system with complex temporal behaviors and multiple states. For instance, for a system with n features, assuming each is a binary variable (0 or 1), there are maximally 2<sup>n</sup> possible underlying states. 

      In our study, we recorded brain activity over 300 time points and used the 9 networks as features. At different time points, the brain can exhibit distinct spatial configurations, reflected in the relative activity levels of the nine networks and their interactions. To accurately capture the temporal dynamics of brain activity, it is essential to explore models that allow for more states than the number of features. We note that in other HMM studies, researchers have also explored states more than the number of networks to find the best number of hidden states (e.g., Ahrends et al., 2022; Stevner et al., 2019). 

      Furthermore, Ahrends et al. (2022) suggested that “Based on the HCP-dataset, we estimate as a rule of thumb that the ratio of observations to free parameters per state should not be inferior to 200”, where free parameters per state is [𝐾 ∗(𝐾 −1)+ (𝐾 −1)+𝐾 ∗𝑁 ∗(𝑁 +1)/2]/𝐾. According to this, there should be above 10, 980 observations when the number of states (K) is 10 (the maximal number in our study) and the number of networks (N) is 9. In our group-level HMM model, there were 64 (valid runs) * 300 (TR) = 19200 observations for young adults, and 50 (valid runs) * 210 (TR) = 10500 observations for older adults. Aside from the older adults' data being slightly insufficient (4.37% less than the suggestion), all other hyperparameter combinations in this study meet the recommended number of observations. 

      (5) In Figure 2, the authors write that the states' spatial maps were normalized for visualization purposes. Could the authors also show visualization of brain states that are not normalized? The reason why I ask is, for example, in Song, Shim, & Rosenberg (2023), the base state was observed which had activity levels all close to the mean (which is 0 because the BOLD activity was normalized). If the activity patterns of this brain state were to be normalized after state estimation, the base state would have looked drastically different than what is reported. 

      We derived the spatial maps of the states using unnormalized activity patterns, with the BOLD signals Z-score normalized to a mean of zero. Under the speech comprehension task, the three states exhibited relatively large fluctuations in network activity levels. The activity ranges were as follows: [-0.71 to 0.51] for State #1, [-0.26 to 0.30] for State #2, and [-0.82 to 0.40] for State #3. For the resting state, we observed a state resembling the baseline state as described in Song, Shim, & Rosenberg (2023), with activity values ranging from -0.133 to 0.09. 

      In the revision, we have replaced the states' spatial maps with versions showing unnormalized activity patterns. 

      (6) In line 297, the authors speculate that "This may be because there is too much heterogeneity among the older adults". To support this speculation, the authors can calculate the overall ISC of brain state dynamics among older adults and compare it to the ISC estimated from younger adults.  

      We analyzed the overall ISC of brain state dynamics, and found the ISC was indeed significantly lower among the older adults than that among the younger adults. We have revised this statement as follows:

      These factors can diminish the inter-subject correlation of brain state dynamics— indeed, ISCs among older adults were significantly lower than those among younger adults (Figure S5)—and reduce ISC's sensitivity to individual differences in task performance (Line 321-326).

      Other comments: 

      (7) In Figure 4, the authors showed a significant positive correlation between head movement ISC with the best performer and comprehension scores. Does the average head movement of all individuals negatively correlate with comprehension scores, given that the authors argue that "greater task engagement is accompanied by decreased movement"? 

      We examined the relationship between participants' average head movement across the comprehension task and their comprehension scores. There was no significant correlation (r = 0.041, p = 0.74). In the literature (e.g. ,Ballenghein et al., 2019) , the relationship between task engagement and head movement was also assessed at the moment-by-moment level, rather than by using time-averaged data.

      Real-time head movements reflect fluctuations in task engagement and cognitive state. In contrast, mean head movement, as a static measure, fails to capture these changes, and thus is not effective in predicting task performance.

      (8) The authors write the older adults sample, the "independent dataset". Technically, however, this dataset cannot be independent because they were collected at the same time by the same research group. I would advise replacing the word independent to something like second dataset or replication dataset. 

      We have replaced the phrase “independent dataset” with “replication dataset”. 

      (9) Pertaining to a paragraph starting in line 586: For non-parametric permutation tests, the authors note that the time courses of brain state expression were "randomly shuffled". How was this random shuffling done: was this circular-shifted randomly, or were the values within the time course literally shuffled? The latter approach, literal shuffling of the values, does not make a fair null distribution because it does not retain temporal regularities (autocorrelation) that are intrinsic to the fMRI signals. Thus, I suggest replacing all non-parametric permutation tests with random circular shifting of the time series (np. roll in python).  

      In the original manuscript, the time course was literally shuffled. In the revised version, we circular-shifted the time course randomly (circshift.m in Matlab) to generate the null distribution. The results remain consistent with our previous findings: p = 0.002 for the speech envelope, p = 0.007 for word-level coherence, and p = 0.001 for clause-level coherence (Line 230-235). 

      (10) The p value calculation should be p = (1+#(chance>=observed))/(1+#iterations) for one-tailed test and p = (1+#(abs(chance)>=abs(observed)))/(1+#iterations) for twotailed test. Thus, if 5,000 iterations were run and none of the chances were higher than the actual observation, the p-value is p = 1/5001, which is the minimal value it can achieve. 

      Have corrected. 

      (11) State 3 in Figure S2 does not resemble State 3 of the main result. Could the authors explain why they corresponded State 3 of the Yeo-7 scheme to State 3 of the nineparcellation scheme, perhaps using evidence of spatial overlap? 

      The correspondence of states between the two schemes was established using evidence of state expression time course. 

      To assess temporal overlap, we calculated Pearson’s correlation between each candidate state obtained by the Yeo-7 scheme and the three predefined states obtained by the nine-network parcellation scheme in terms of state expression probabilities. The time courses of the 64 participants were concatenated, resulting in 19200 (300*64) time points for each state. The one that the candidate state most closely resembled was set to be its corresponding state. For instance, if a candidate state showed the highest correlation with State #1, it was labelled State #1 accordingly. As demonstrated in the confusion matrix, each of the three candidate states was best matched to State #1, State #2, and State #3, respectively, maintaining a one-to-one correspondence between the states from the two schemes.

      We also assessed the spatial overlap between the two schemes. First, a state activity value was assigned to each voxel across the whole brain (including a total of 34,892 voxels covered by both parcellation schemes). This is done for each brain state. Next, we calculated Spearman’s correlation between each candidate state obtained by the Yeo-7 scheme and the three predefined states obtained by the nine-network scheme in terms of whole-brain activities. The pattern of spatial overlap is consistent with the pattern of temporal overlap, such that each of the three candidate states was best matched to State #1, State #2, and State #3, respectively.

      Author response image 1.

      We noted that the networks between the two schemes are not well aligned in their spatial location, especially for the DMN (as shown below). This may lead to the low spatial overlap of State #3, which is dominated by DMN activity. Consequently, establishing state correspondence based on temporal information is more appropriate in this context. We therefore only reported the results of temporal overlap in the manuscript. 

      We have added a paragraph in the main text for “Establishing state correspondence between analyses” (Line 672-699). We have also updated the associated figures (Fig.S2, Fig.S3 and Fig.5)

      Author response image 2.

      (12) Line 839: gamma parameter, on a step size of? 

      (16) Figure 3. Please add a legend in the "Sound envelope" graph what green and blue lines indicate. The authors write Coh(t) and Coh(t, t+1) at the top and Coh(t) and Coh(t+1) at the bottom. Please be consistent with the labeling. Shouldn't they be Coh(t-1, t) and Coh(t, t+1) to be exact for both? 

      Have corrected. 

      (17) In line 226, is this one-sample t-test compared to zero? If so, please write it inside the parentheses. In line 227, the authors write "slightly weaker"; however, since this is not statistically warranted, I suggest removing the word "slightly weaker" and just noting significance in both States 1 and 2.  

      Have corrected.

      (18) In line 288, please fix "we also whether". 

      Have corrected. 

      (19) In Figure 2C, what do pink lines in the transition matrix indicate? Are they colored just to show authors' interests, or do they indicate statistical significance? Please write it in the figure legend.   

      Yes, the pink lines indicate a meaningful trend, showing that the between-state transition probabilities are significantly higher than those in permutation.

      We have added this information to the figure legend. 

      Reviewer #2 (Recommendations for the authors):

      (1) It is unclear how the correspondence between states across different conditions and datasets was computed. Given the spatial autocorrelation of brain maps, I recommend reporting the Dice coefficient along with a spin-test permutation to test for statistical significance.  

      The state correspondence between different conditions and between the two datasets are established using evidence of spatial overlap. The spatial overlap between states was quantified by Pearson’s correlation using the activity values (derived from HMM) of the nine networks. For each candidate state identified in other analyses (for the Rest, MG and older-adult datasets), we calculate the correlation between its network activity pattern and the three predefined states from the main analysis (for the young-adults dataset), and set the one it most closely resembled to be its matching state. For instance, if a candidate state showed the highest correlation with State #1, it was labelled State #1 accordingly. 

      For the comparison between the young and older adults’ datasets (as shown below), the largest spatial overlap occurred along the diagonal of the confusion matrix, with high correlation values. This means that each of the three candidate states was best matched to State #1, State #2, and State #3, respectively, maintaining a one-to-one correspondence between the states from the two datasets. As the HMM is modelled at the level of networks which lack accurate coordinates, we did not apply the spin-test to assess the statistical significance of overlap. Instead, we extracted the state activity patterns from the 1000 permutations (wherein the original BOLD time courses were circularly shifted and an HMM was conducted) for the older-adults dataset. Applying the similar state-correspondence strategy, we generated a null distribution of spatial overlap. The real overlap of the three states was greater than and 97.97%, 95.34% and 92.39% instances from the permutation (as shown below). 

      Author response image 3.

      For the comparison of main task with the resting and the incomprehensible speech condition, there was some degree of confusion: there were two candidate states showing the highest similarity to State #2. In this case, we labeled the most similar candidate as State #2. The other candidate was then assigned to the predefined state with which it had the second-highest correlation. We used a prime symbol (e.g., State #3') to denote cases where such confusion occurred. These findings support our conclusion that the tripartite-organization of brain states is not a task-free, intrinsic property.

      When establishing the correspondence between the Yeo-7 network and the ninenetwork parcellation schemes, we primarily relied on evidence from temporal overlap measures, as a clear network-level alignment between the two parcellation schemes is lacking. Temporal overlap was quantified by calculating the correlation of state occurrence probabilities between the two schemes. To achieve this, we concatenated the time courses of 64 participants, resulting in a time series consisting of 19,200 time points (300 time points per participant) for each state. Each of the three candidate states from the Yeo-7 network scheme was best matched to State #1, State #2, and State #3 from the main analyses, respectively. To determine the statistical significance of the temporal overlap, we circular shifted each participant’s time course of state expression obtained from the Yeo-7network scheme for 1000 times. Applying the same strategy to find the matching states, we generated a null distribution of overlap. The real overlap was much higher than the instances from permutation. 

      Author response image 4.

      In the revision, we have provided detailed description for how the state correspondence is established and reported the statistical significance of those correspondence (Line 671-699). The associated figures have also been updated (Fig.5, Fig. S2 and Fig.S3).  

      (2) Please clarify if circle-shifting was applied to the state expression time course when generating the null distribution for behavior-brain state correlations reported in Figure (3). This seems important to control for the temporal autocorrelation in the time courses.  

      We have updated the results by using circle-shifting to generated the null distribution. The results are largely consistent with the previous on without circular shifting (Line 230-242). 

      (3) Figure 3: What does the green shaded area around the sound envelope represent? In the caption, specify whether the red line in the null distributions indicates the mean or median R between brain state expression and narrative features. It would also be beneficial to report this value in the main text. 

      The green shaded area indicated the original amplitude of speech signal, while blue line indicates the smoothed, low-frequency contour of amplitude changes over time (i.e., speech envelope). We have updated the figure and explained this in the figure caption. 

      The red line in the null distributions indicates the R between brain state expression and narrative features for the real data. and reported the mean R of the permutation in the main text. 

      (4) The manuscript is missing a data availability statement (https://elifesciences.org/inside-elife/51839f0a/for-authors-updates-to-elife-s-datasharing-policies). 

      We have added a statement of data availability in the revision, as follows: 

      “The raw and processed fMRI data are available on OpenNeuro: https://openneuro.org/datasets/ds005623. The experimental stimuli, behavioral data and main scripts used in the analyses are provided on Github. ”

      (5) There is a typo in line 102 ("perceptual alalyses"). 

      Have corrected. 

      We sincerely thank the two reviewers for their constructive feedback, thorough review, and the time they dedicated to improving our work.

      Reference: 

      Ahrends, C., Stevner, A., Pervaiz, U., Kringelbach, M. L., Vuust, P., Woolrich, M. W., & Vidaurre, D. (2022). Data and model considerations for estimating timevarying functional connectivity in fMRI. Neuroimage, 252, 119026. 

      Ballenghein, U., Megalakaki, O., & Baccino, T. (2019). Cognitive engagement in emotional text reading: concurrent recordings of eye movements and head motion. Cognition and Emotion. 

      Fernandino, L., Tong, J.-Q., Conant, L. L., Humphries, C. J., & Binder, J. R. (2022). Decoding the information structure underlying the neural representation of concepts. Proceedings of the national academy of sciences, 119(6), e2108091119. https://doi.org/10.1073/pnas.2108091119  

      Hasson, U., Chen, J., & Honey, C. J. (2015). Hierarchical process memory: memory as an integral component of information processing. Trends in Cognitive Sciences, 19(6), 304-313. 

      Lerner, Y., Honey, C. J., Silbert, L. J., & Hasson, U. (2011). Topographic mapping of a hierarchy of temporal receptive windows using a narrated story [Article]. Journal of Neuroscience, 31(8), 2906-2915. https://doi.org/10.1523/JNEUROSCI.3684-10.2011  

      Liu, L., Li, H., Ren, Z., Zhou, Q., Zhang, Y., Lu, C., Qiu, J., Chen, H., & Ding, G. (2022). The “two-brain” approach reveals the active role of task-deactivated default mode network in speech comprehension. Cerebral Cortex, 32(21), 4869-4884. 

      Liu, L., Zhang, Y., Zhou, Q., Garrett, D. D., Lu, C., Chen, A., Qiu, J., & Ding, G. (2020). Auditory–Articulatory Neural Alignment between Listener and Speaker during Verbal Communication. Cerebral Cortex, 30(3), 942-951. https://doi.org/10.1093/cercor/bhz138

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      The authors show that SVZ-derived astrocytes respond to a middle carotid artery occlusion (MCAO) hypoxia lesion by secreting and modulating hyaluronan at the edge of the lesion (penumbra) and that hyaluronan is a chemoattractant to SVZ astrocytes. They use lineage tracing of SVZ cells to determine their origin. They also find that SVZ-derived astrocytes express Thbs-4 but astrocytes at the MCAO-induced scar do not. Also, they demonstrate that decreased HA in the SVZ is correlated with gliogenesis. While much of the paper is descriptive/correlative they do overexpress Hyaluronan synthase 2 via viral vectors and show this is sufficient to recruit astrocytes to the injury. Interestingly, astrocytes preferred to migrate to the MCAO than to the region of overexpressed HAS2.

      Strengths:

      The field has largely ignored the gliogenic response of the SVZ, especially with regard to astrocytic function. These cells and especially newborn cells may provide support for regeneration. Emigrated cells from the SVZ have been shown to be neuroprotective via creating pro-survival environments, but their expression and deposition of beneficial extracellular matrix molecules are poorly understood. Therefore, this study is timely and important. The paper is very well written and the flow of results is logical.

      Weaknesses:

      The main problem is that they do not show that Hyaluronan is necessary for SVZ astrogenesis and or migration to MCAO lesions. Such loss of function studies have been carried out by studies they cite (e.g. Girard et al., 2014 and Benner et al., 2013). Similar approaches seem to be necessary in this work. 

      We appreciate the comments by the reviewer. The article is, indeed, largely descriptive since we attempt to describe in detail what happens to newborn astrocytes after MCAO. Still, we have not attempted any modification to the model, such as amelioration of ischemic damage. This is a limitation of the study that we do not hide. However, we use several experimental approaches, such as lineage tracing and hyaluronan modification, to strengthen our conclusions.

      Regarding the weaknesses found by the reviewer, we do not claim that hyaluronan is necessary for SVZ astrogenesis. Indeed, we observe that when the MCAO stimulus (i.e. inflammation) is present, the HMW-HA (AAV-Has2) stimulus is less powerful (we discuss this in line 330-332). We do claim, and we believe we successfully demonstrate, the reverse situation: that SVZ astrocytes modulate hyaluronan, not at the SVZ but at the site of MCAO, i.e. the scar. However, regarding whether hyaluronan is necessary for SVZ astrogenesis, we only show a correlation between its degradation and the time-course of astrogenesis. We suggest this result as a starting point for a follow-up study. We have included a phrase in the discussion (line 310), stating that further experiments are needed to fully establish a link between hyaluronan and astrogenesis in the SVZ.

      Major points:

      (1) How good of a marker for newborn astrocytes is Thbs4? Did you co-label with B cell markers like EGFr? Is the Thbs4 gene expressed in B cells? Do scRNAseq papers show it is expressed in B cells? Are they B1 or B2 cells?

      We chose Thbs4 as a marker of newborn astrocytes based on published research (Beckervordersanforth et al., 2010; Benner et al., 2013; Llorens-Bobadilla et al. 2015, Codega et al, 2014; Basak et al., 2018; Mizrak et al., 2019; Kjell et al., 2020; Cebrian-Silla et al., 2021). From those studies, at least 3 associate Thbs4 to B-type cells based on scRNAseq data (LlorensBobadilla et al. 2015; Cebrian-Silla et al., 2021; Basak et al., 2018). We have included a sentence about this and the associated references, in line 92. 

      We co-label Thbs4 with EGFR, but in the context of MCAO. We observed an increase of EGFR expression with MCAO, similar to the increase in Thbs4 alongside ischemia (see author ). We did not include this figure in the manuscript since we did not have available tissue from all the time points we used (7d, 60d post-ischemia). 

      Author response image 1.

      Thbs4 cells, in basal and ischemic conditions, only represent a small amount of IdU-positive cells (Fig 3F), suggesting that they are mostly quiescent cells, i.e., B1 cells. However, the scRNAseq literature is not consistent about this.

      (2) It is curious that there was no increase in Type C cells after MCAO - do the authors propose a direct NSC-astrocyte differentiation?

      Type C cells are fast-proliferating cells, and our BrdU/IdU experiment (Fig. 3) suggests that Thbs4 cells are slow-proliferating cells. Some authors suggest (Encinas lab, Spain) that when the hippocampus is challenged by a harsh stimulus, such as kainate-induced epilepsy, the NSCs differentiate directly into reactive astrocytes and deplete the DG neurogenic niche (Encinas et al., 2011, Cell Stem Cell; Sierra et al., 2015, Cell Stem Cell). We believe this might be the case in our MCAO model and the SVZ niche, since we observe a decrease in DCX labeling in the olfactory bulb (Fig S5) and an increase in astrocytes in the SVZ, which migrate to the ischemic lesion. We did not want to overcomplicate an already complicated paper, dwelling with direct NSC-astrocyte differentiation or with the reactive status of these newborn astrocytes. 

      (3) The paper would be strengthened with orthogonal views of z projections to show colocalization.

      We thank the reviewer for this observation. We have now included orthogonal projections in the critical colocalization IF of CD44 and hyaluronan (hyaluronan internalization) in Fig S6D, and a zoomed-in inset. Hyaluronan membrane synthesis is already depicted with orthogonal projection in Fig 6F.

      (4) It is not clear why the dorsal SVZ is analysed and focused on in Figure 4. This region emanates from the developmental pallium (cerebral cortex anlagen). It generates some excitatory neurons early postnatally and is thought to have differential signalling such as Wnt (Raineteau group).

      We decided to analyze in depth the dorsal SVZ after the BrdU experiment (Fig S3), where we observed an increase in BrdU+/Thbs4+ cells mostly in the dorsal area. Hence, the electrodes for electroporation were oriented in such a way as to label the dorsal area. We appreciate the paper by Raineteau lab, but we assume that this region may potentially exploit other roles (apart from excitatory neurons generated early postnatally) depending on the developmental stage (our model is in adults) and/or pathological conditions (MCAO). 

      (5) Several of the images show the lesion and penumbra as being quite close to the SVZ. Did any of the lesions contact the SVZ? If so, I would strongly recommend excluding them from the analysis as such contact is known to hyperactivate the SVZ.

      We thank the referee for the suggestion to exclude the harsher MCAO-lesioned animals from the analysis. Indeed, the MCAO ischemia, methodologically, can generate different tissue damages that cannot be easily controlled. Thus, based on TTC staining, we had already excluded the more severe tissue damage that contacted the SVZ, based on TTC staining.

      (6) The authors switch to a rat in vitro analysis towards the end of the study. This needs to be better justified. How similar are the molecules involved between mouse and rat?

      We chose the rat culture since it is a culture that we have already established in our lab, and that in our own hands, is much more reproducible than the mouse brain cell culture that we occasionally use (for transgenic animals only). Benito-Muñoz et al., Glia. 2016; Cavaliere et al., Front Cell Neurosci. 2013. It is true that there could be differences between the rat and mouse Thbs4-cell physiology, despite a 96% identity between rat and mouse Thbs4 protein sequence (BLASTp). In vitro, we only confirm the capacity of astrocytes to internalize hyaluronan, which was a finding that we did not expect in our in vivo experiments. Indeed, these observations, notwithstanding the obvious differences between in vivo and in vitro scenarios, suggest that the HA internalization by astrocytes is a cross-species event, at least in rodents. Regarding HA, hyaluronan is similar in all species, since it’s a glycan (this is why there are no antibodies against HA, and ones has to rely on binding proteins such as HABP to label it).

      (7) Similar comment for overexpression of naked mole rat HA.

      We chose the naked mole rat Hyaluronan synthase (HAS), because it is a HAS that produces HA of very high molecular weight, similar to the one found accumulated in the glial scar, at the lesion border. The naked-mole rat HAS used in mice (Gorbunova Lab) is a known tool in the ECM field. (Zhang et al, 2023, Nature; Tian et al., 2013, Nature).

      Reviewer 1 (Recommendation to authors):

      (1) Line 22: most of the cells that migrate out of the SVZ are not stem cells but cells further along in the lineage - neuroblasts and glioblasts.

      We thank the reviewer for this clarification. We have modified the abstract accordingly. 

      (2) In Figure 3d the MCAO group staining with GFAP looks suspiciously like ependymal cells which have been shown to be dramatically activated by stroke models.

      The picture does show ependymal cells, which are located next to the ventricle and are indeed very proliferative in stroke. However, these cells do not express Thbs4 (Shah et al., 2018, Cell). In the quantifications from the SVZ of BrdU and IdU injected animals (Fig 3e and f), we only take into account Thbs4+ GFAP+ cells, no GFAP+ only. 

      (3) The TTC injury shown in Figure 5c is too low mag.

      We apologize for the low mag. We have increased the magnification two-fold without compromising resolution. The problem might also have arisen from the compression of TIF into JPEG in the PDF export process. We will address this in the revised version by carefully selecting export settings. The images we used are all publication quality (300 ppi).

      (4) How specific to HA is HABP?

      Hyaluronic Acid Binding Protein is a canonical marker for hyaluronan that is used also in ELISA to quantify it specifically, since it does not bind other glycosaminoglycans. The label has been used for years in the field for immunochemistry, and some controls and validations have been published: Deepa et al., 2006, JBC performed appropriate controls of HABP-biotin labeling using hyaluronidase (destroys labeling) and chondroitinase (preserves labeling). Soria et al., 2020, Nat Commun checked that (i) streptavidin does not label unspecifically, and (ii) that HABP staining is reduced after hyaluronan depletion in vivo with HAS inhibitor 4MU.

      (5) A number of images are out of focus and thus difficult to interpret (e.g. SFig. 4e).

      This is true. We realized that the PDF conversion process for the preprint version has severely compressed the larger images, such as the one found in Fig. S4e. We have submitted a revised version in a better-quality PDF (the final paper will have the original TIFF files). We apologize for the technical problem.

      (6) "restructuration" is not a word.

      We apologize for the mistake and thank the reviewer for the correction. We corrected “restructuration” with “reorganization” in line 67.

      (7) While much of the manuscript is well-written and logical it could use an in-depth edit to remove awkward words and phrasings.

      A native English speaker has revised the manuscript to correct these awkward phrases. All changes are labeled in red in the revised version.

      (8) Please describe why and how you used skeleton analysis for HABP in the methods, this will be unfamiliar to most readers. The one-sentence description in the methods is insufficient.

      We have modified the text accordingly, explaining in depth the logic behind the skeleton analysis. (Line 204). We also added several lines of text describing in detail the image analysis (CD44/HABP spots, fractal dimension, masks for membranal HABP, among others, in lines 484494) 

      Reviewer #2 (Public Review)

      Summary:

      In their manuscript, Ardaya et al have addressed the impact of ischemia-induced gliogenesis from the adult SVZ and their effect on the remodeling of the extracellular matrix (ECM) in the glial scar. They use Thbs4, a marker previously identified to be expressed in astrocytes of the SVZ, to understand its role in ischemia-induced gliogenesis. First, the authors show that Thbs4 is expressed in the SVZ and that its expression levels increase upon ischemia. Next, they claim that ischemia induces the generation of newborn astrocyte from SVZ neural stem cells (NSCs), which migrate toward the ischemic regions to accumulate at the glial scar. Thbs4-expressing astrocytes are recruited to the lesion by Hyaluronan where they modulate ECM homeostasis.

      Strengths:

      The findings of these studies are in principle interesting and the experiments are in principle good.

      Weaknesses:

      The manuscript suffers from an evident lack of clarity and precision in regard to their findings and their interpretation.

      We thank the reviewer for the valuable feedback. We hope the changes proposed improve clarity and precision throughout the manuscript.

      (1) The authors talk about Thbs4 expression in NSCs and astrocytes, but neither of both is shown in Figure 1, nor have they used cell type-specific markers.

      As we reported also to Referee #1 (major point 1), Thbs4 is widely considered in literature as a valid marker for newly formed astrocytes (Beckervordersanforth et al., 2010; Benner et al., 2013; Llorens-Bobadilla et al. 2015, Codega et al, 2014; Basak et al., 2018; Mizrak et al., 2019; Kjell et al., 2020; Cebrian-Silla et al., 2021). Some of the studies mentioned here and discussed in the manuscript text, also associate Thbs4 to B-type cells based on scRNAseq data (LlorensBobadilla et al. 2015; Cebrian-Silla et al., 2021; Basak et al., 2018). Moreover, we also showed colocalization of Thbs4 with activated stem cells marker nestin (Fig.2), glial marker GFAP (Fig. 3) and with dorsal NSCs marker tdTOM (from electroporation, Fig. 4). 

      (2) Very important for all following experiments is to show that Thbs4 is not expressed outside of the SVZ, specifically in the areas where the lesion will take place. If Thbs4 was expressed there, the conclusion that Thbs4+ cells come from the SVZ to migrate to the lesion would be entirely wrong.

      In Figure 1a, we show that Thbs4 is expressed in the telencephalon, exclusively in the neurogenic regions like SVZ, RMS and OB, together with cerebellum and VTA, which are likely not directly topographically connected to the damaged area (cortex and striatum). Regarding the origin of Thbs4+ cells, we demonstrated their SVZ origin by lineage tracking experiments after in vivo cell labeling (Fig. 4).

      (3) Next, the authors want to confirm the expression level of Thbs4 by electroporation of pThbs4-eGFP at P1 and write that this results in 20% of total cells expressing GFP, especially in the rostral SVZ. I do not understand the benefit of this sentence. This may be a confirmation of expression, but it also shows that the GFP+ cells derive from early postnatal NSCs.

      Furthermore, these cells look all like astrocytes, so the authors could have made a point here that indeed early postnatal NSCs expressing Thbs4 generate astrocytes alongside development. Here, it would have been interesting to see how many of the GFP+ cells are still NSCs.

      We thank the reviewer for this useful remark. We have rephrased this paragraph in the results section (Line 99).

      (4) In the next chapter, the authors show that Thbs4 increases in expression after brain injury. I do not understand the meaning of the graphs showing expression levels of distinct cell types of the neuronal lineage. Please specify why this is interesting and what to conclude from that.

      Also here, the expression of Thbs4 should be shown outside of the SVZ as well.

      In Fig 2, we show the temporal expression of two markers (besides Thbs4) in the SVZ. Nestin and DCX are the gold standard markers for NSCs, with DCX present in neuroblasts. This is already explained in line 119. What we didn’t explain, and now we say in line 124, is that Nestin and DCX decrease immediately after ischemia (7d time-point). This probably means that the NSCs stop differentiating into neuroblast to favor glioblast formation. This is also supported by the experiments in the olfactory bulb depicted in Fig. S5C-H.

      (5) Next, the origin of newborn astrocytes from the SVZ upon ischemia is revealed. The graphs indicate that the authors perfused at different time points after tMCAO. Did they also show the data of the early time points? If only of the 30dpi, they should remove the additional time points indicated in the graph. In line 127 they talk about the origin of newborn astrocytes. Until now they have not even mentioned that new astrocytes are generated. Furthermore, the following sentences are imprecise: first they write that the number of slow proliferation NSCs is increased, then they talk about astrocytes. How exactly did they identify astrocytes and separate them from NSCs? Morphologically? Because both cell types express GFAP and Thbs4.

      The same problem also occurs throughout the next chapter.

      We thank the reviewer for this interesting comment. The experiment in Fig 3 combines BrdU and IdU. This is a tricky experiment, since chronic BrdU is normally analyzed after 30d, since the experimenter must wait for the wash out of BrdU (it labels slow-proliferating cells). Since we also wanted to label fast proliferative cells with IdU, we used IP injections of this nucleotide at the different time points, and perfused the day after. It wouldn’t make sense to show BrdU at earlier time points. We do so in Fig 3e, just to colocalize with Thbs4 to read the tendency of the experiment. However, the quantification of BrdU (not of IdU) is done only at 30 DPI, which is explained in the methods (line 407).

      “In line 127, they talk about the origin of newborn astrocytes…” 

      Indeed, we wanted to introduce in the paragraph title that ischemia induced the generation of new astrocytes, which is more clearly described in the text. We changed the paragraph title with “Characterization of Ischemia-induced cell populations”

      “How exactly did they identify astrocytes and separate them from NSC?” 

      With this experiment and using two different protocols to label proliferating cells (BrdU vs IdU) we wanted to track the precursor cells that derivate to astrocytes and that already expressed the marker Thbs4. Indeed, the different increase and rate of proliferation is only related to the progenitor cells that lately will differentiate in astrocytes. In this experiment we only referred to the astrocytes in the last sentence “These results suggest that, after ischemia, Thbs4positive astrocytes derive from the slow proliferative type B cells”

      (6) "These results suggest that ischemia-induced astrogliogenesis in the SVZ occurs in type B cells from the dorsal region, and that these newborn Thbs4-positive astrocytes migrate to the ischemic areas." This sentence is a bit dangerous and bares at least one conceptual difficulty: if NSCs generate astrocytes under normal conditions and along the cause of postnatal development (which they do), then local astrocytes  (expressing the tdTom because they stem from a postnatal NSC ), may also react to MCAO and proliferate locally. So the astrocytes along the scar do not necessarily come from adult NSCs upon injury but from local astrocytes.  If the authors state that NSCs generate astrocytes that migrate to the lesion, I would like to see that no astrocytes inside the striatum carry the tdTom reporter before MCAO is committed.

      We understand the referee’s concern about the postnatal origin of astrocytes that can also be labeled with tdTom. Our hypothesis, tested at the beginning of the paper, is that SVZ-derived astrocytes derive from slow proliferative NSC. Thus, it is reasonable that Tom+ cells can reach the cortical region in such a short time frame. This is why we assumed that local astrocytes can’t be positive for tdTom. We characterized the expression of tfTom in sham animals and we observed few tdTom+ cells in the cortex and striatum (Author response image 2 and Figure S4). The expression of tdTom mainly remains in the SVZ and the corpus callosum under physiological conditions. However, proliferation of local astrocytes labeled with tdTom expression (early postnatally astrocytes) could explain the small percentage of tdTom+ cells in the ischemic regions that do not express Thbs4, even though this percentage could represent other cell types such as OPCs or oligodendrocytes. 

      Author response image 2.

      (7) If astrocytes outside the SVZ do not express Thbs4, I would like to see it.  Otherwise, the discrimination of SVZ-derive GFAP+/Thbs4+ astrocytes and local astrocytes expressing only GFAP is shaky.

      Regarding Thbs4 outside the SVZ, we already answered this in point 2 (please refer to Fig 1A). We also quantified the expression of Thbs4+/GFAP+ astrocytes in the corpus callosum, cortex and striatum of sham and MCAO mice (Figure S5a-b) and we did not observe that local astrocytes express Thbs4 under physiological conditions.

      (8) Please briefly explain what a Skeleton analysis and a Fractal dimension analysis is, and what it is good for.

      We apologized for the brief information on Skeleton and Fractal dimension analysis. We included a detailed explanation of these analyses in methods (line 484-494).

      (9) The chapter on HA is again a bit difficult to follow. Please rewrite to clarify who produces HA and who removes it by again showing all astrocyte subtypes (GFAP+/Thbs4+ and GFAP+/Thbs4-).

      We apologize for the lack of clarity. We rewrote some passages of those chapters (changes in red), trying to convey the ideas more clearly. We also changed a panel in Figure S6b-c to clarify all astrocytes subtypes that internalize hyaluronan (Thbs4+/GFAP+ and Thbs4-/GFAP+). See Author response image 3.

      Author response image 3.

      (10) Why did the authors separate dorsal, medial, and ventral SVZ so carefully? Do they comment on it? As far as I remember, astrogenesis in physiological conditions has some local preferences (dorsal?)

      We performed the electroporation protocol in the dorsal SVZ based on previous results (Figure 3 and Figure S3). NSC produce specific neurons in the olfactory bulb according to their location in the SVZ. However, postnatal production of astrocytes mainly occurs through local astrocytes proliferation and the SVZ contribution is very limited at this time point. 

      Reviewer #3 (Public Review)

      Summary:

      The authors aimed to study the activation of gliogenesis and the role of newborn astrocytes in a post-ischemic scenario. Combining immunofluorescence, BrdU-tracing, and genetic cellular labelling, they tracked the migration of newborn astrocytes (expressing Thbs4) and found that Thbs4-positive astrocytes modulate the extracellular matrix at the lesion border by synthesis but also degradation of hyaluronan. Their results point to a relevant function of SVZ newborn astrocytes in the modulation of the glial scar after brain ischemia. This work's major strength is the fact that it is tackling the function of SVZ newborn astrocytes, whose role is undisclosed so far.

      Strengths:

      The article is innovative, of good quality, and clearly written, with properly described Materials and Methods, data analysis, and presentation. In general, the methods are designed properly to answer the main question of the authors, being a major strength. Interpretation of the data is also in general well done, with results supporting the main conclusions of this article.

      Weaknesses:

      However, there are some points of this article that still need clarification to further improve this work.

      (1) As a first general comment, is it possible that the increase in Thbs4-positive astrocytes can also happen locally close to the glia scar, through the proliferation of local astrocytes or even from local astrocytes at the SVZ? As it was shown in published articles most of the newborn astrocytes in the adult brain actually derive from proliferating astrocytes, and a smaller percentage is derived from NSCs. How can the authors rule out a contribution of local astrocytes to the increase of Thbs4-positive astrocytes? The authors also observed that only about one-third of the astrocytes in the glial scar derived from the SVZ.

      We thank the reviewer for the interesting comment. We have extended the discussion about this topic in the manuscript, (lines 333-342), including the statement about a third of glial scar astrocytes being from the SVZ and not downplaying the role of local astrocytes.  Whether the glial scar is populated by newborn astrocytes derived from SVZ or from local astrocytes is under debate, since there are groups that found astrocytes contribution from local astrocytes (Frisèn group, Magnusson et al., 2014) but there are others that observed the opposite (Li et al., 2010; Benner et al., 2013; Faiz et al., 2015; Laug et al., 2019 & Pous et al., 2020). 

      In our study we observed that Thbs4 expression is almost absent in the cortex and striatum of sham mice. To demonstrate that new-born astrocytes are derived from SVZ we used two techniques: the chronic BrdU treatment and the cell tracing which mainly labels SVZ neural stem cells. Fast proliferating cells lose BrdU quickly so local astrocytes under ischemic conditions do not express BrdU. In addition, we injected IdU the day before perfusion in order to see if local astrocytes express Thbs4 when they respond to the brain ischemia. However, we did not observe proliferating local astrocytes expressing Thbs4 after MCAO (see Author response image 4)

      Author response image 4.

      As mentioned in the response for reviewer 2, the cell tracing technique could label early postnatal astrocytes. We characterized the technique and only a small percentage of tdTom expression was found in the cortex and striatum of sham animals.  This tdTom population could explain the percentage of tdTom+ cells in the ischemic regions that do not express Thbs4 even though this percentage could represent other cell types such as OPCs or oligodendrocytes. Taking all together, evidences suggest that Thbs4+ astrocyte population derived from the SVZ. 

      We indeed observed a small contribution of Thbs4+ astrocytes to the glial scar. However, Thbs4+ astrocytes arrive at the lesion at a critical temporal window - when local hyper-reactive astrocytes die or lose their function. We hypothesized that Thbs4+ astrocytes could help local astrocytes or replace them in reorganizing the extracellular space and the glial scar, an instrumental process for the recovery of the ischemic area. 

      (2) It is known that the local, GFAP-reactive astrocytes at the scar can form the required ECM. The authors propose a role of Thbs4-positive astrocytes in the modulation, and perhaps maintenance, of the ECM at the scar, thus participating in scar formation likewise. So, this means that the function of newborn astrocytes is only to help the local astrocytes in the scar formation and thus contribute to tissue regeneration. Why do we need specifically the Thbs4positive astrocytes migrating from the SVZ to help the local astrocytes? Can you discuss this further?

      Unfortunately, we could not demonstrate which molecular machinery is involved in these mechanisms, and we can only speculate the functional meaning of a second wave of glial activation. We added a lengthy discussion in lines 333-342.

      (3) The authors observed that the number of BrdU- and DCX-positive cells decreased 15 dpi in all OB layers (Fig. S5). They further suggest that ischemia-induced a change in the neuroblasts ectopic migratory pathway, depriving the OB layers of the SVZ newborn neurons. Are the authors suggesting that these BrdU/DCX-positive cells now migrate also to the ischemic scar, or do they die? In fact, they see an increase in caspase-3 positive cells in the SVZ after ischemia, but they do not analyse which type of cells are dying. Alternatively, is there a change in the fate of the cells, and astrogliogenesis is increased at the expense of neurogenesis?  The authors should understand which cells are Cleaved-caspase-3 positive at the SVZ and clarify if there is a change in cell fate. Also please clarify what happens to the BrdU/DCX-positive cells that are born at the SVZ but do not migrate properly to the OB layers.

      Actually, we cannot demonstrate the fate of missing BrdU/DCX cells in the OB. We can reasonably speculate that following the ischemic insult, the neurogenic machinery steers toward investing more energy in generating glial cells to support the lesion. We didn’t analyze the fate of the DCX that originally should migrate and differentiate to the OB, whether they die or if there is a shift in the differentiation program in the SVZ, since we consider that question is out of the study’s scope.   

      (4) The authors showed decreased Nestin protein levels at 15 dpi by western blot and immunostaining shows a decrease already at 7div (Figure 2). These results mean that there is at least a transient depletion of NSCs due to the promotion of astrogliogenesis. However, the authors show that at 30dpi there is an increase of slow proliferating NSCs (Figure 3). Does this mean, that there is a reestablishment of the SVZ cytogenic process?  How does it happen, more specifically, how NSCs number is promoted at 30dpi?  Please explain how are the NSCs modulated throughout time after ischemia induction and its impact on the cytogenic process.

      Based on the chronic BrdU treatment, results suggested a restoration of SVZ cytogenic process (also observed in the nestin and DCX proteins expression at 30dpi). However, we did not analyze how it happens (from asymmetric or symmetric divisions). As suggested by Encinas group, we hypothesized that the brain ischemia induces the exhaustion of the neurogenic niche of the SVZ by symmetric divisions of NSC into reactive astrocytes.

      (5) The authors performed a classification of Thbs4-positive cells in the SVZ according to their morphology. This should be confirmed with markers expressed by each of the cell subtypes.

      We thank the referee for the comment. Classifying NSC based on different markers could also be tricky because different NSC cell types share markers. This classification was made considering the specific morphology of each NSC cell type. In addition, Thbs4 expression in Btype cells is also observed in other studies (Llorens-Bobadilla et al. 2015; Cebrian-Silla et al.,

      2021; Basak et al., 2018).

      (6) In Figure S6, the authors quantified HABP spots inside Thbs4-positive astrocytes. Please show a higher magnification picture to show how this quantification was done.

      We quantified HABP area and HABP spots inside Thbs4+ astrocytes with a custom FIJI script.

      Thbs4 cell mask was done via automatic thresholding within the GFAP cell mask. Threshold for HABP marker was performed and binary image was processed with 1 pixel median filter (to eliminate 1 px noise-related spots). “Analyze particles” tool was used to sort HABP spots in the cell ROI. HABP spot number per compartment and population was exported to excel and data was normalized dividing HABP spots per ROI by total HABP spots. See Author response image 5.

      Author response image 5.

    1. Author Response

      We thank all three Reviewers and the editors for the time and effort they put in reading and critiquing the manuscript. Our revised manuscript includes new data and analyses that address the original concerns. These include, 1) a new Supplemental Figure characterizing Cre expression and cellular phenotypes in the hippocampus, 2) new tables that give a more comprehensive picture of the EEG recordings and statistical analyses, 3) addition of whole cell electrophysiology data, and 4) rewriting to ensure that we do not state that either mTORC1 or mTORC2 hyperactivation is sufficient to cause epilepsy. We discuss the issue of statistical power to detect reduction in generalized seizure rate in the responses below. These suggestions and additions have improved the paper and we hope they will raise both significance and strength of support for the conclusions.

      Reviewer #1 (Public Review):

      Hyperactivation of mTOR signaling causes epilepsy. It has long been assumed that this occurs through overactivation of mTORC1, since treatment with the mTORC1 inhibitor rapamycin suppresses seizures in multiple animal models. However, the recent finding that genetic inhibition of mTORC1 via Raptor deletion did not stop seizures while inhibition of mTORC2 did, challenged this view (Chen et al, Nat Med, 2019). In the present study, the authors tested whether mTORC1 or mTORC2 inhibition alone was sufficient to block the disease phenotypes in a model of somatic Pten loss-of-function (a negative regulator of mTOR). They found that inactivation of either mTORC1 or mTORC2 alone normalized brain pathology but did not prevent seizures, whereas dual inactivation of mTORC1 and mTORC2 prevented seizures. As the functions of mTORC1 versus mTORC2 in epilepsy remain unclear, this study provides important insight into the roles of mTORC1 and mTORC2 in epilepsy caused by Pten loss and adds to the emerging body of evidence supporting a role for both complexes in the disease development.

      Strengths:

      The animal models and the experimental design employed in this study allow for a direct comparison between the effects of mTORC1, mTORC2, and mTORC1/mTORC2 inactivation (i.e., same animal background, same strategy and timing of gene inactivation, same brain region, etc.). Additionally, the conclusions on brain epileptic activity are supported by analysis of multiple EEG parameters, including seizure frequencies, sharp wave discharges, interictal spiking, and total power analyses.

      Weaknesses:

      (1) The sample size of the study is small and does not allow for the assessment of whether mTORC1 or mTORC2 inactivation reduces seizure frequency or incidence. This is a limitation of the study.

      We agree that this is a minor limitation of the present study, however, for several reasons we decided not to pursue this question by increasing the number of animals. First, we performed a power analysis of the existing data. This analysis showed that we would need to use 89 animals per group to detect a significant difference (0.8 Power, p= 0.05, Mann-Whitney test) in the frequency of generalized seizures in the Pten-Raptor group and 31 animals per group in the Pten-Rictor group versus Pten alone. It is simply not feasible to perform video-EEG monitoring on this many animals for a single study. Second, even if we did do enough experiments to detect a reduction in seizure frequency, it is clear that neither Rptor nor Rictor deletion provides the kind normalization in brain activity that we seek in a targeted treatment. Both Pten-Rptor and Pten-Rictor animals still have very frequent spike-wave events (Fig. 3D) and highly abnormal interictal EEGs (Fig. 4), suggesting that even if generalized seizures were reduced, epileptic brain activity persists. This is in contrast to the triple KO animals, which have no increase in SWD above control level and very normal interictal EEG.

      (2) The authors describe that they inactivated mTORC1 and mTORC2 in a new model of somatic Pten loss-of-function in the cortex. This is slightly misleading since Cre expression was found both in the cortex and the underlying hippocampus, as shown in Figure 1. Throughout the manuscript, they provide supporting histological data from the cortex. However, since Pten loss-of-function in the hippocampus can lead to hippocampal overgrowth and seizures, data showing the impact of the genetic rescue in the hippocampus would further strengthen the claim that neither mTORC1 nor mTORC2 inactivation prevents seizures.

      Thank you for pointing out this issue. Cre expression was observed in both the cortex and the dorsal hippocampus in most animals, and we agree that differences in cortical versus hippocampal mTOR signaling could have differential contributions to epilepsy. We initially focused our studies on the cortex because spike-and-wave discharge, the most frequent and fully penetrant EEG phenotype in our model, is associated with cortical dysfunction. In our revised submission we have included a new Figure that quantifies Cre expression in the hippocampal subfields, as well as pS6, pAkt and soma size. These new data show that the amount of Cre expression in the hippocampus is not related to the occurrence of generalized seizures. The pattern of cell size changes in hippocampal neurons is the same as observed in cortical neurons. The levels of pS6 and pAkt are not much changed in the hippocampus, likely due to the sparse Cre expression there. We interpret these findings as supporting the conclusion that the reason we do not see seizure prevention by mTORC1 or mTORC2 inactivation is not due to hippocampal-specific dysfunction.

      (3)Some of the methods for the EEG seizure analysis are unclear. The authors describe that for control and Pten-Raptor-Rictor LOF animals, all 10-second epochs in which signal amplitude exceeded 400 μV at two time-points at least 1 second apart were manually reviewed, whereas, for the Pten LOF, Pten-Raptor LOF, and Pten-Rictor LOF animals, at least 100 of the highest- amplitude traces were manually reviewed. Does this mean that not all flagged epochs were reviewed? This could potentially lead to missed seizures.

      We reviewed at least 48 hours of data from each animal manually. All seizures that were identified during manual review were also identified by the automated detection program. It is possible but unlikely that there are missed seizures in the remaining data. We have added these details to the Methods of the revised submission.

      (4) Additionally, the inclusion of how many consecutive hours were recorded among the ~150 hours of recording per animal would help readers with the interpretation of the data.

      Thank you for this recommendation. Our revised submission includes a table with more information about the EEG recordings in the revised version of the manuscript. The number of consecutive hours recorded varied because the wireless system depends on battery life, which was inconsistent, but each animal was recorded for at least 48 consecutive hours on at least two occasions.

      (5) Finally, it is surprising that mTORC2 inactivation completely rescued cortical thickness since such pathological phenotypes are thought to be conserved down the mTORC1 pathway. Additional comments on these findings in the Discussion would be interesting and useful to the readers.

      We agree that the relationship between mTORC2, cortical thickness, and growth in general is an interesting topic with conflicting results in the literature. We didn’t add anything to the Discussion along these lines because we are up against word limits, but comment here that soma size was increased 120% by Pten inactivation and partially normalized to a 60% increase from Controls by mTORC2 inactivation (Fig. 2C). We and others have previously shown that mTORC2 inactivation (Rictor deletion) in neurons reduces brain size, neuron soma size, and dendritic outgrowth (PMIDs: 36526374, 32125271, 23569215). In our revised submission we also include new data showing that the membrane capacitance of Pten-Ric LOF neurons is normal. Thus, we do not find it completely surprising that mTORC2 inactivation reduces the cortical thickness increase caused by Pten loss. There may still be a slight increase in cortical thickness in Pten-Rictor animals, but it is statistically indistinguishable from Controls.

      Reviewer #2 (Public Review):

      Summary:

      The study by Cullen et al presents intriguing data regarding the contribution of mTOR complex 1 (mTORC1) versus mTORC2 or both in Pten-null-induced macrocephaly and epileptiform activity. The role of mTORC2 in mTORopathies, and in particular Pten loss-off-function (LOF)-induced pathology and seizures, is understudied and controversial. In addition, recent data provided evidence against the role of mTORC1 in PtenLOF-induced seizures. To address these controversies and the contribution of these mTOR complexes in PtenLOF-induced pathology and seizures, the authors injected a AAV9-Cre into the cortex of conditional single, double, and triple transgenic mice at postnatal day 0 to remove Pten, Pten+Raptor or Rictor, and Pten+raptor+rictor. Raptor and Rictor are essentially binding partners of mTORC1 and mTORC2, respectively. One major finding is that despite preventing mild macrocephaly and increased cell size, Raptor knockout (KO, decreased mTORC1 activity) did not prevent the occurrence of seizures and the rate of SWD event, and aggravated seizure duration. Similarly, Rictor KO (decreased mTORC2 activity) partially prevented mild macrocephaly and increased cell size but did not prevent the occurrence of seizures and did not affect seizure duration. However, Rictor KO reduced the rate of SWD events. Finally, the pathology and seizure/SWD activity were fully prevented in the double KO. These data suggest the contribution of both increased mTORC1 and mTORC2 in the pathology and epileptic activity of Pten LOF mice, emphasizing the importance of blocking both complexes for seizure treatment. Whether these data apply to other mTORopathies due to Tsc1, Tsc2, mTOR, AKT or other gene variants remains to be examined.

      Strengths:

      The strengths are as follows: 1) they address an important and controversial question that has clinical application, 2) the study uses a reliable and relatively easy method to KO specific genes in cortical neurons, based on AAV9 injections in pups. 2) they perform careful video-EEG analyses correlated with some aspects of cellular pathology.

      Weaknesses:

      The study has nevertheless a few weaknesses: 1) the conclusions are perhaps a bit overstated. The data do not show that increased mTORC1 or mTORC2 are sufficient to cause epilepsy. However the data clearly show that both increased mTORC1 and mTORC2 activity contribute to the pathology and seizure activity and as such are necessary for seizures to occur.

      We agree that our findings do not directly show that either mTORC1 or mTORC2 hyperactivity are sufficient to cause seizures, as we do not individually hyperactivate each complex in the absence of any other manipulation. We interpreted our findings in this model as suggesting that either is sufficient based on the result that there is no epileptic activity when both are inactivated, and thus assume that there is not a third, mTOR-independent, mechanism that is contributing to epilepsy in Pten, Pten-Raptor, and Pten-Rictor animals. In addition, the histological data show that Raptor and Rictor loss each normalize activity through mTORC1 and mTORC2 respectively, suggesting that one in the absence of the other is sufficient. However, we agree that there could be other potential mTOR-independent pathways downstream of Pten loss that contribute to epilepsy. We have revised the manuscript to reflect this.

      (2) The data related to the EEG would benefit from having more mice. Adding more mice would have helped determine whether there was a decrease in seizure activity with the Rictor or Raptor KO.

      Please see response to Reviewer 1’s first Weakness.

      (3) It would have been interesting to examine the impact of mTORC2 and mTORC1 overexpression related to point #1 above.

      We are not sure that overexpression of individual components of mTORC1 or mTORC2 would result in their hyperactivation or lead to increases in downstream signaling. We believe that cleanly and directly hyperactivating mTORC1 or especially mTORC2 in vivo without affecting the other complex or other potential interacting pathways is a difficult task. Previous studies have used mTOR gain-of-function mutations as a means to selectively activate mTORC1 or pharmacological agents to selectively activate mTORC2, but it not clear to us that the former does not affect mTORC2 activity as well, or that the latter achieves activation of mTORC2 targets other than p-Akt 473, or that it is truly selective. We agree that these would be key experiments to further test the sufficiency hypothesis, but that the amount of work that would be required to perform them is more that what we can do in this Short Report.

      Reviewer #3 (Public Review):

      Summary: This study investigated the role of mTORC1 and 2 in a mouse model of developmental epilepsy which simulates epilepsy in cortical malformations. Given activation of genes such as PTEN activates TORC1, and this is considered to be excessive in cortical malformations, the authors asked whether inactivating mTORC1 and 2 would ameliorate the seizures and malformation in the mouse model. The work is highly significant because a new mouse model is used where Raptor and Rictor, which regulate mTORC1 and 2 respectively, were inactivated in one hemisphere of the cortex. The work is also significant because the deletion of both Raptor and Rictor improved the epilepsy and malformation. In the mouse model, the seizures were generalized or there were spike-wave discharges (SWD). They also examined the interictal EEG. The malformation was manifested by increased cortical thickness and soma size.

      Strengths: The presentation and writing are strong. The quality of data is strong. The data support the conclusions for the most part. The results are significant: Generalized seizures and SWDs were reduced when both Torc1 and 2 were inactivated but not when one was inactivated.

      Weaknesses: One of the limitations is that it is not clear whether the area of cortex where Raptor or Rictor were affected was the same in each animal.

      Our revised submission includes new data showing that the area of affected cortex and hippocampus are similar across groups. (Figure 1A and Supplementary Figure 1)

      Also, it is not clear which cortical cells were measured for soma size.

      Soma size was measured by dividing Nissl stain images into a 10 mm2 grid. The somas of all GFP-expressing cells fully within three randomly selected grid squares in Layer II/III were manually traced. Three sections per animal at approximately Bregma -1.6, -2,1, and -2.6 were used. As Cre expression was driven by the hSyn promoter these cells include both excitatory and inhibitory cortical neurons.

      Another limitation is that the hippocampus was affected as well as the cortex. One does not know the role of cortex vs. hippocampus. Any discussion about that would be good to add.

      See response to Reviewer 1’s second Weakness.

      It would also be useful to know if Raptor and Rictor are in glia, blood vessels, etc.

      Raptor and Rictor are thought to be ubiquitously active in mammalian cells including glia and endothelial cells. Previous studies have shown that mTOR manipulation can affect astrocyte function and blood vessel organization, however, our study induced gene knockout using an AAV that expressed Cre under control of the hSyn promoter, which has previously been shown to be selective for neurons. Manual assessment of Cre expression compared with DAPI, NeuN, and GFAP stains suggested that only neurons were affected.

      Recommendations for the authors: please note that you control which revisions to undertake from the public reviews and recommendations for the authors

      Reviewer #1 (Recommendations For The Authors):

      In addition to the comments in the public review, it is recommended that the authors provide a more representative figure for p-Akt staining in the Pten LOF condition in Figure 1 D2. The current figure is not convincing.

      Thanks for the suggestion. We have replaced the images with zoomed in panels that beter demonstrate the difference.

      Additionally, in the last paragraph of the discussion, there is a reference error to an incorrect paper (reference 18) that should be corrected.

      Thanks, corrected.

      Reviewer #2 (Recommendations For The Authors):

      Major comments:

      Comment 1: Some statements need clarifications or changes.

      (1) Abstract: "spontaneous seizures and epileptiform activity persisted despite mTORC1 or mTORC2 inactivation alone but inactivating both mTORC1 and mTORC2 normalized pathology." Did inactivation of one only also normalized the pathology? Did inactivating both normalized the seizures? Pathology is not equal to seizures.

      We have altered this statement to avoid ambiguity.

      (2) Abstract: "These results suggest that hyperactivity of both mTORC1 and mTORC2 are sufficient to cause epilepsy,". Based on the abstract, it is not clear that it is sufficient. It is necessary.

      We have altered this statement by removing the term “sufficient.”

      (3) "Thus, there is strong evidence that hyperactivation of mTORC1 downstream of PTEN disruption causes the macrocephaly, epilepsy, early mortality, and synaptic dysregulation observed in humans and model organisms [17]" I would suggest adding that the strongest evidence is that mTOR GOF mutations lead to the same pathology and epilepsy, suggesting mTORC1 is sufficient. The other findings suggest that it is necessary.

      Unless we misunderstand the Reviewer’s point, we believe this viewpoint is already encompassed by the proceeding text that “These phenotypes resemble those observed in models of mTORC1- specific hyperactivation.”

      (4) Introduction (end): "suggesting that hyperactivity of either complex can lead to neuronal hyperexcitability and epilepsy".

      Comment 2: I do not agree with the title based on comment 1 above. You did not provide evidence that the mTORCs cause seizures. Your data suggest that they are necessary for seizures or contribute to seizures, but there is no evidence that mTORC2 can induce seizure.

      We softened the title by replacing “cause” with “mediate.”

      Comment 3: Fig. 1B. Could you beter describe the affected regions. I can see other regions than just the cortex and hippocampus.

      Almost all affected cell bodies were in the cortex and hippocampus. The virus in the image is cell-filling and as such projections from affected neurons throughout the brain can also be seen. We have clarified this in the figure legend.

      Comment 4: I feel unease about the number of animals recorded for EEG to assess seizure frequency. There is not enough power to draw clear conclusions. So, please make sure to not oversell your findings since it is all-or-nothing data (seizure or no seizure) in this case and the seizure frequency could very well be decreased with single mTOR LOF, but it is impossible to conclude. Maybe discuss this limitation of your study.

      We have addressed this in the public comments response.

      Minor:

      (1) Pten LOF: define the abbreviation.

      Done

      (2) Make sure that gene name in mice are not capitalized and italicized.

      OK

      (3) Fig 1C: could you specify in the results where the analysis was done.

      Detail added to Methods (to keep Results concise for word limit)

      (4) In the subtitle: "Concurrent mTORC1/2 inactivation, but neither alone, rescues epilepsy and interictal EEG abnormalities in focal Pten LOF". Replace "rescues" but prevents. This is not a rescue experiment since the LOF is done at the same time.

      OK

      (5) "GS did not appear to be correlated with mTOR pathway activity (Supplementary Figure 2)." Please can you do proper correlation analysis, by plotting all the values as a function of seizure frequency independent of the condition? There is also no correlation between pAKt and seizures.

      Here are those data in Author response image 1. They are now part of Supplementary Figure 2.

      Author response image 1.

      Reviewer #3 (Recommendations For The Authors):

      Figures 1 D, and E show images that are too small to judge. Where are the layers? Please add marks.

      We replaced these images with larger zoomed in images to show group differences more clearly. The images no longer show multiple differentiable cortical layers.

      If Fig 1 characterizes the model, where is the seizure data? When did they start? Where did they start? Was the focus of the cortical area affected by PTEN loss of function?

      Updated figure name to reflect content. Information about the seizure phenotypes is included in Figure 3.

      Figure 2 The font size for the calibration is too small. The correlations are hard to see. Colors are not easy to discriminate.

      We edited the figure to correct these problems.

      Figure 3 shows a clear effect on generalized seizures but the text of the Results does not reflect that.

      We wanted to be cautious about interpreting these data based on the issue raised by other reviewers that they are underpowered to detect seizure reduction in the Pten-Raptor and Pten-Rictor groups. We have updated the language to atempt to strike a beter balance between over- and under-interpretation. We also performed an additional analysis of the occurrence of generalized seizures to emphasize that only Control and PtRapRic animals have significantly lower seizure occurrence that Pten LOF mice (Fig 3C).

      For interictal power, was the same behavioral state chosen? Was a particular band affected?

      Epochs to be analyzed were selected automatically and were agnostic to behavioral state. Band-specific effects are outlined in Figure 4B and Table [2].

      There is no information about whether the model exhibits altered sleep, food intake, weight, etc.

      We didn’t collect information on food intake. It would be possible to look at sleep from the EEG, but that is not something that we are prepared to do at this point. Weight at endpoint was not different between genotypes but we did not collect longitudinal data on weight.

      Were the sexes different?

      Included in new Table [1]

      Where were EEG electrodes and were they subdural or not?

      Additional detail on this has been added to Methods. The screws are placed in the skull but above the dura.

      How long were continuous EEG records- the method just says 150 hr. per mouse in total.

      Included in new Table [1]

      The statistics don't discuss power, normality, whether variance was checked to ensure it did not differ significantly between groups, or whether data are mean +- sem or sd. For ANOVAs, were there multifactorial comparisons and what were F, df, and p values? Exact p for post hoc tests?

      We have added a new table (Table [3]) that gives information on the exact test used, F, p values, and exact p for post hoc tests. Information regarding power, normality, variance, post- tests and multiple comparisons corrections have been added to Methods section “Statistical Analysis.”

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This valuable study addresses the long-term effect of warming and altered precipitation on microbial growth, as a proxy for understanding the impact of global warming. While the methods are compelling and the evidence supporting the claims is solid, additional analysis of the data would strengthen the study, which should be of broad interest to microbial ecologists and microbiologists.

      We sincerely appreciate your assessment and thoughtful comments, which are valuable and very helpful for improving our manuscript. We have carefully considered all comments, and made extensive, thorough corrections and additional analysis of the data, which we hope to meet with approval.

      Reviewer #1 (Public Review):

      Warming and precipitation regime change significantly influences both above-ground and below-ground processes across Earth's ecosystems. Soil microbial communities, which underpin the biogeochemical processes that often shape ecosystem function, are no exception to this, and although research shows they can adapt to this warming, population dynamics and ecophysiological responses to these disturbances are not currently known. The Qinghai-Tibet Plateau, the Third Pole of the Earth, is considered among the most sensitive ecosystems to climate change. The manuscript described an integrated, trait-based understanding of these dynamics with the qSIP data. The experimental design and methods appear to be of sufficient quality. The data and analyses are of great value to the larger microbial ecological community and may help advance our understanding of how microbial systems will respond to global change. There are very few studies in which the growth rates of bacterial populations from multifactorial manipulation experiments on the Qinghai-Tibet Plateau have been investigated via qSIP, and the large quantity of data that comprises the study described in this manuscript, will substantially advance our knowledge of bacterial responses to warming and precipitation manipulations.

      We appreciate the encouragement and positive comments.

      Specific comments:

      (1) Please add some names of microbial groups with most common for the growth rates.

      We have added the sentence “The members in Solirubrobacter and Pseudonocardia genera had high growth rates under changed climate regimes” In the Abstract (Line 57-58).

      (2) L47-48, consider changing "microbial growth and death" to "microbial eco-physiological processes (e.g., growth and death)", and changing "such eco-physiological traits" to "such processes".

      Done (Line 47 and 48).

      (3) L50-51, the author estimated bacterial growth in alpine meadow soils of the Tibetan Plateau after warming and altered precipitation manipulation in situ. Actually, the soil samples were collected and incubated in the laboratory rather than in the field like the previous experiment conducted by Purcell et al. (2021, Global Change Biology). "In situ" would lead me to believe that the qSIP incubation was conducted in the field, so I think the use of the word in situ is inappropriate here. [https://onlinelibrary.wiley.com/doi/full/10.1111/gcb.15911]

      Agreed. We have deleted “in situ”.

      (4) L52, what does "interactive global change factors" mean?

      We have revised this sentence to “the growth of major taxa was suppressed by the single and combined effects of temperature and precipitation” (Line 52-53).

      (5) L61, in my opinion, "Microbial diversity" belongs to the category of species composition, rather than ecosystem functional services. Please revise it.

      Agree. We have deleted it.

      (6) L69, consider changing "further" to "thus".

      Done (Line 70).

      (7) L82, delete "The evidence is overwhelming that".

      Done.

      (8) L85-90, these two sentences have similar meanings, please express them concisely.

      We have deleted the sentence “Altered precipitation, particularly drought or heavy precipitation events, also tends to negatively influence soil processes and biodiversity”.

      (9) L91, the effect of drought on soil microorganisms is lacking here.

      We have added the sentence “Reduced precipitation affects soil processes notably by directly stressing soil organisms, and also altering the supply of substrates to microbes via dissolution, diffusion, and transport” in the Introduction (Line87-89).

      (10) L102, "Growth" should be highlighted here, as changes in relative abundance can also be classified as population dynamics. The use of the term "population dynamics" will eliminate the highlight of this study in calculating the growth rate of microbial species in in-situ soil based on qSIP. Consider changing "population dynamics" to "population-growth responses" or something like that.

      Done (Line 98).

      (11) L105, please note that this citation focuses on plant physiological characteristics.

      We have revised the reference (Line 102).

      (12) L115, "soil temperature, water availability" should be considered as a direct impact of climate change, rather than an indirect impact on microorganisms.

      We have deleted them.

      (13) L134-135, please clarify the interaction types between which climate factors.

      We have deleted this sentence.

      (14) L135-138, suggest modifying or deleting this sentence. The results in this study are already eco-physiological data and do not need to be further "understood and predicted".

      We have deleted this sentence.

      (15) L150, "The experimental design has been described in previously". I think this refers to another study and not the actual incubations in this study. Also in L198, suggest a change to "Incubation conditions were similar to those previously described". So, it's clear it's not the same experiment.

      We have revised these sentences to “has been described previously in (Ma et al., 2017)” (Line 136) and “according to a previous publication” (Line 194).

      Reference:

      Ma, Z., Liu, H., Mi, Z., Zhang, Z., Wang, Y., Xu, W. et al. (2017). Climate warming reduces the temporal stability of plant community biomass production. Nature Communications, 8, 15378.

      (16) L188, change "pre-wet soil samples" to "pre-wet samples" and change "soil samples for 48h incubation" to "incubation samples". What does "pre-wet" mean? Does it represent soil pre-cultivation?

      Done. The pre-wet samples, i.e., the soil samples before incubation (T = 0 d), were used to estimate the initial microbial composition. "pre-wet" does not mean soil pre-cultivation. We have added the description “A portion of the air-dried soil samples was taken as the pre-wet treatment (i.e., before incubation without H2O addition)” in MATERIALS AND METHODS (Line 174-175).

      (17) Unify the time unit of incubation (hour or day). Consider changing "48 h" to "2 d" in Materials and Methods.

      Done.

      (18) L247, what version of RDP Classifier was used?

      We used RDP v16 database for taxonomic annotation. We have added this information in the revision (Line 246).

      (19) L270, "average molecular weights".

      Done (Line 268).

      (20) L272-275, based on the preceding description, it appears that the culture period was limited to 48 hours. Please confirm it.

      Apologize for this mistake. We have revised it (Line 273).

      (21) L297, switch the order of the first two sentences of this paragraph.

      Done (Line 297).

      (22) L331, change "smaller-than-additive" to "smaller than their expected additive effect".

      Done (Line 331).

      (23) L374 and 381, I struggle with why "larger combined effects" than single factor effects represent higher degree of antoninism, and I think it should be "smaller combined effects".

      Agree. We have revised it according to this suggestion (Line 369 and 374).

      (24) L375, remove "than that of drought and warming".

      Done.

      (25) L405, simplify the expression, change "between different warming and rainfall regimes" to "between climate regimes"

      We have deleted this sentence.

      (26) L406-408, species are already on the phylogenetic tree and they can not "clustered at the phylogenetic branches", but the functional traits of microbes can. Please revise it.

      We have revised this sentence to “Overall, the most incorporators whose growth was influenced by the antagonistic interaction of T × P showed significant phylogenetic clustering (i.e., species clustered at the phylogenetic branches; NTI > 0, P < 0.05)” (Line 402-404).

      (27) L409, the same as above, and consider removing "The incorporators subjected to". We have revised this sentence to “The incorporators whose growth subjected to the additive interaction of warming × drought also showed significant phylogenetic clustering (P < 0.05)” (Line 404-406).

      (28) L412, consider changing "incorporators subjected to the synergistic interaction" to "the synergistic growth responses under multifactorial changes".

      We have revised the sentence to “incorporators whose growth is influenced by the synergistic interaction showed phylogenetically random distribution under both climate scenarios (P > 0.05)” (Line 407-409).

      (29) L505-506, please add a reference for this sentence.

      Done (Line 488).

      (30) L511-514, It should be noted that the production of MBC does not necessarily imply a net change in the C pool size. The accelerated growth rates may result in expedited turnover of MBC, rather than an increase in carbon sequestration.

      Thanks. We have deleted this sentence.

      (31) Language precision. In the discussion section there must be some additional caveats introduced to some of the claims the authors are making. For instance, L518, the author should clarify that "in this study, the bacterial growth in alpine grassland may be influenced by antagonistic interactions between multiple climatic factors after a decadal-long experiment". Because other studies may exhibit different results due to the focus on different ecosystem functions as well as environmental conditions. As such, softening of the language is recommended- lines are noted below- and these will not adjust the outcomes of this study, but support more precise interpretation.

      We have revised the sentence to “In this study, a decade-long experiment revealed that bacterial growth in alpine meadows is primarily influenced by the antagonistic interaction between T × P” (Line 497-499).

      (32) Picrust analysis is a good way to connect species and their functions, especially Picrust2, which updated the reference database and optimized the algorithm to improve its prediction accuracy (Douglas et al., 2020, Nature Biotechnology). However, the link between microbial taxonomy and microbial metabolism is still not straightforward, especially in diverse microbial communities like soils. The authors should introduce caveats within discussion that they know the limitations of their methods. For context, as a reader who does metabolisms in soils, I found myself somewhat disappointed when piecrust data was introduced and not properly caveated. Particularly, it might be helpful to introduce briefly in the last paragraph of the results. These caveats are necessary to not potentially overstate the author's findings, and to make sure the reader knows the authors understand the very clear limitations of these methods. [https://www.nature.com/articles/s41587-020-0548-6]

      Thanks. We have introduced caveats in DISCUSSION, that is “This is, however, still to be verified, as the functional output from PICRUSt2 is less likely to resolve rare environment-specific functions (Douglas et al., 2020)” (Line 540-542).

      Reference:

      Douglas, G., Maffei, V., Zaneveld, J., Yurgel, S., Brown, J., Taylor, C. et al. (2020). PICRUSt2 for prediction of metagenome functions. Nature Biotechnology, 38, 1-5.

      (33) Although the author has explained the potential causes for the negative effects of different climate change factors (i.e., warming, drought, and wet) on microbial growth, there seems to be a lack of a summary assertion and an extension on how climate change affects microbial growth and related ecosystem functions. It is recommended to make a general summary of the results in the last part of Discussion.

      We have added a general summary in the last paragraph of DISCUSSION, that is “Our results demonstrated that both warming and altered precipitation negatively affect the growth of grassland bacteria; However, the combined effects of warming and altered precipitation on the growth of ~70% soil bacterial taxa were smaller than the single-factor effects, suggesting antagonistic interaction. This suggests the development of multifactor manipulation experiments in precise prediction of future ecosystem services and feedbacks under climate change scenarios” (Line 552-558).

      (34) L546, please add the taxonomic information for "OTU 14".

      Done (Line 533).

      (35) L800, change "The phylogenetic tree" to "A phylogenetic tree".

      Done (Line 762).

      Reviewer #2 (Public Review):

      Summary:

      The authors aimed to describe the effect of different temperature and precipitation regimes on microbial growth responses in an alpine grassland ecosystem using quantitative 18O stable isotope probing. It was found that all climate manipulations had negative effects on microbial growth, and that single-factor manipulations exerted larger negative effects as compared to combined-factor manipulations. The degree of antagonism between factors was analyzed in detail, as well as the differential effect of these divergent antagonistic responses on microbial taxa that incorporated the isotope. Finally, a hypothetical functional profiling was performed based on taxonomic affiliations. This work gives additional evidence that altered warming and precipitation regimes negatively impact microbial growth.

      Strengths:

      A long term experiment with a thorough experimental design in apparently field conditions is a plus for this work, making the results potentially generalisable to the alpine grassland ecosystem. Also, the implementation of a qSIP approach to determine microbial growth ensures that only active members of the community are assessed. Finally, particular attention was given to the interaction between factors and a robust approach was implemented to quantify the weight of the combined-factor manipulations on microbial growth.

      We appreciate the reviewer’s positive comments.

      Weaknesses:

      The methodology does not mention whether the samples taken for the incubations were rhizosphere soil, bulk soil or a mix between both type of soils. If the samples were taken from rhizosphere soil, I wonder how the plants were affected by the infrared heaters and if the resulting shadow (also in the controls with dummy heaters) had an effect on the plants and the root exudates of the parcels as compared to plants outside the blocks? If the samples were bulk soil, are the results generalisable for a grassland ecosystem? In my opinion, it is needed to add more info on the origin of the soil samples and how these were taken.

      The samples taken for the incubations can be considered as a mixture of rhizosphere and bulk soils. During soil sampling, we did not use conventional rhizosphere soil collection methods. However, there is a certain proportion of fragmented roots in the soil samples we collected, indicating that soil properties are influenced by plants. We have added this description in MATERIALS AND METHODS (Line 158).

      To minimize the impact of physical shading on the plants, each sampling point was as far away from infrared heaters as possible. We have added this information of soil collection in MATERIALS AND METHODS, that is “In each plot, three soil cores of the topsoil (0-5 cm in depth) were randomly collected and combined as a composite sample, which can be considered as a mixture of rhizosphere and bulk soils. Each sampling point was as far away from infrared heaters as possible to minimize the impact of physical shading on the plants. The fresh soil samples were shipped to the laboratory and sieved (2-mm) to remove root fragments and stones.” (Line 157-162).

      Previous studies based on our field experiment assessed the effects of warming and altered precipitation on soil microbial communities (Zhang et al., 2016), the temporal stability of plant community biomass (Ma et al., 2017), shifting plant species composition and grassland primary production (Liu et al., 2018). These studies provide guidance for the experiment design and execution.

      Reference:

      Zhang, KP., Shi, Y., Jing, X. et al. (2016). Effects of Short-Term Warming and Altered Precipitation on Soil Microbial Communities in Alpine Grassland of the Tibetan Plateau. Frontiers in Microbiology, 7, 1-11.

      Ma ZY., Liu, HY., Mi, ZR. et al. (2017). Climate warming reduces the temporal stabilityof plant community biomass production. Nature Communications, 8, 15378.

      Liu, HY., Mi, ZR., Lin, L. et al. (2018). Shifting plant species composition in response to climate change stabilizes grassland primary production. Proceedings of the National Academy of Sciences, 115, 4051-4056.

      The qSIP calculations reported in the methodology for this work are rather superficial and the reader must be experienced in this technique to understand how the incorporators were identified and their growth quantified. For instance, the GC content of taxa was calculated for reads clustered in OTUs, and it is not discussed in the text the validity of such approach working at genus level.

      We have added the description of qSIP calculations in Supplementary Materials.

      The approach of GC content calculation can be used at genus level (Koch et al., 2018). The GC content of each bacterial taxon (Gi) was calculated using the mean density for the unlabeled (WLIGHTi) treatments (Hungate et al. 2015), rather than OTU sequence information. We have revised the sentence in MATERIALS AND METHODS, that is “the number of 16S rRNA gene copies per OTU taxon (e.g., genus or OTU) in each density fraction was calculated by multiplying the relative abundance (acquisition by sequencing) by the total number of 16S rRNA gene copies (acquisition by qPCR)” (Line 255-258).

      Reference:

      Hungate, B., Mau, R., Schwartz, E., Caporaso, J., Dijkstra, P., Van Gestel, N. et al. (2015). Quantitative microbial ecology through stable isotope probing. Applied and Environmental Microbiology, 81, 7570-7581.

      Koch, B., McHugh, T., Hayer, M., Schwartz, E., Blazewicz, S., Dijkstra, P. et al. (2018). Estimating taxon-specific population dynamics in diverse microbial communities. Ecosphere, 9, e02090.

      The selection of V4-V5 region over V3-V4 region to quantify the number of copies of the 16S rRNA gene should be substantiated in the text. Classic works determined one decade ago that primer pairs that amplify V3-V4 are most suitable to assess soil bacterial communities. Hungate et al. (2015), worked with the V3-V4 region when establishing the qSIP method. Maybe the number of unassigned OTUs is related with the selection of this region.

      Both primer sets (V3-V4 and V4-V5 regions), are widely used across various sample sets, with highly similar in representing the total microbial community composition (Fadeev et al., 2021; Zhang et al., 2018).

      A previous study based on our Field Research Station of Alpine Grassland Ecosystem used V4-V5 primer pairs to investigated the effect of warming and altered precipitation on the overall bacterial community composition (Zhang et al., 2016).

      Another reason for choosing the V4-V5 primer set in this study was to integrate and compare the data with that of two previous qSIP studies (Ruan et al., 2023; Guo et al., submitted), both of them focused on the growth responses of active species to global change and used V4-V5 primer pairs.

      We have added an explanation about primer selection as “The V4-V5 primer pairs were chosen to facilitate integration and comparison with data from previous studies (Ruan et al., 2023; Zhang et al., 2016)” (Line 213-215).

      Reference:

      Fadeev, E., Cardozo-Mino, M.G., Rapp, J.Z. et al. (2021). Comparison of Two 16S rRNA Primers (V3–V4 and V4–V5) for Studies of Arctic Microbial Communities. Frontiers in Microbiology, 12

      Zhang, J.Y., Ding, X., Guan, R. et al. (2018). Evaluation of different 16S rRNA gene V regions for exploring bacterial diversity in a eutrophic freshwater lake. Science of The Total Environment, 618, 1254-1267.

      Zhang, K.P., Shi, Y., Jing, X. et al. (2016). Effects of Short-Term Warming and Altered Precipitation on Soil Microbial Communities in Alpine Grassland of the Tibetan Plateau. Frontiers in Microbiology, 7, 1-11.

      Ruan, Y., Kuzyakov, Y., Liu, X. et al. (2023). Elevated temperature and CO2 strongly affect the growth strategies of soil bacteria. Nature Communications, 14, 1-12.

      Guo, J.J., Kuzyakov, Y., Li, L. et al. (2023). Bacterial growth acclimation to long-term nitrogen input in soil. The ISME Journal, Submitted.

      Report of preprocessing and processing of the sequences does not comply state of the art standards. More info on how the sequences were handled is needed, taking into account that a significant part of the manuscript relies on taxonomic classification of such sequences. Also, an OTU approach for an almost species-dependent analysis (GC contents) should be replaced or complemented with an ASV or subOTUs approach, using denoisers such as DADA2 or deblur. Usage of functional prediction tools underestimates gene frequencies, including those related with biogeochemical significance for soil-carbon and nitrogen cycling.

      (1) We have complemented the information about sequence processing as “The raw sequences were quality-filtered using the USEARCH v.11.0 (Edgar, 2010). In brief, the paired-end sequences were merged and quality filtered with “fastq_mergepairs” and “fastq_filter” commands, respectively. Sequences < 370 bp and total expected errors > 0.5 were removed. Next, “fastx_uniques” command was implemented to remove redundant sequences. Subsequently, high-quality sequences were clustered into operational taxonomic units (OTUs) with “cluster_otus” commandat a 97% identity threshold, and the most abundant sequence from each OTU was selected as a representative sequence.” (Line 238-245).

      (2) We have complemented the zero-radius OTU (ZOTU) analysis by the unoise3 command in USEARCH (https://drive5.com/usearch/manual/pipe_otus.html), as shown in Fig. S1-S2. The results showed that overall growth responses of soil bacteria to warming and precipitation changes were similar based on OTU and ZOTU analyses, i.e., warming and altered precipitation tend to negatively affect the growth of grassland bacteria and the prevalence of antagonistic interactions of T × P. The similarity of results between the different methods is reflected at the overall community level, the phylum level, the genus level and the species (i.e., OTU or ZOTU) level (Fig. S1 and S2).

      Author response image 1.

      The growth responses of grassland bacteria to warming and altered precipitation based on ZOTU analysis. The results of growth rates at the community level (A), the phylum level (B), and the ZOTU level (C and D) were similar to those based on OTU analysis. C the single and combined factor effects of climate factors on species growth, by comparing with the growth rates in T0nP. D the proportions of species growth influenced by different interaction types of T × P. T0-P represents the ambient temperature and decreased precipitation; T+-P represents warming and decreased precipitation; T0cP represents ambient temperature and precipitation; T+cP represents warming and ambient precipitation; T0+P represents ambient temperature and enhanced precipitation; T++P represents warming and enhanced precipitation. Values represent mean and the error bars represent standard deviation. Different letters indicate significant differences between climate treatments.

      Author response image 2.

      The growth responses of grassland bacteria at the genus level to warming and altered precipitation based on OTU analysis (A and C) and ZOTU analysis (B and D). A and B the single and combined factor effects of climate factors on growth in genera, by comparing with those in T0nP. C and D the proportions of genera whose growth influenced by different interaction types of T × P.

      (3) Agreed. We have introduced the caveat about the limitation of usage of functional prediction tools to the end of DISCUSSION, that is “This is, however, still to be verified, as the functional output from PICRUSt2 is less likely to resolve rare environment-specific functions (Douglas et al., 2020)” (Line 540-542). The caveat ensures that the reader knows the limitations of these methods, and are not potentially overstate our findings.

      Reference:

      Douglas, G.M., Maffei, V.J., Zaneveld, J.R. et al. (2020) PICRUSt2 for prediction of metagenome functions. Nat Biotechnol. 38, 685–688.

      Reviewer #2 (Recommendations For The Authors):

      General suggestions:

      Regarding the qSIP method, would be of help to see the differences in density vs number of 16S rRNA gene abundance for the most responsive bacterial groups in the different treatments, taking into account that with only 7 fractions the entire change in bacterial growth was resolved.

      We have selected three representative bacterial taxa (OTU1 belonging to Bradyrhizobium, OTU14 belonging to Solirubrobacter, OTU15 belonging to Pseudoxanthomonas), which have high growth rates in climate change treatments. The result showed that the peaks in the 18O treatment are shifted "backwards" (greater average weighted buoyancy density) compared to the 16O treatment, indicating that these species assimilates the 18O isotope into the DNA molecules during growth.

      Author response image 3.

      The distribution of 16S rRNA gene abundance of three representative bacterial taxa (OTU1- Bradyrhizobium, OTU14-Solirubrobacter, and OTU15-Pseudoxanthomonas) in different buoyant density fractions. Values represent mean and the error bars represent standard deviation.

      Seven fractionated DNA samples were selected for sequencing because they contained more than 99% gene copy numbers of each samples (please see the Figure below). The DNA concentrations of other fractions were too low to construct sequencing libraries.

      Author response image 4.

      Relative abundance of 16S rRNA gene copies in each fraction. The fractions with density between 1.703 and 1.727 g ml-1 were selected because they contained more than 99% gene copy numbers of each sample. T0-P represents the ambient temperature and decreased precipitation; T+-P represents warming and decreased precipitation; T0cP represents ambient temperature and precipitation; T+cP represents warming and ambient precipitation; T0+P represents ambient temperature and enhanced precipitation; T++P represents warming and enhanced precipitation. Values represent mean and the error bars represent standard deviation.

      With such dataset additional multivariate analysis would be of help to better interpret the ecological framework.

      Thanks for the suggestion. Interpreting the ecological framework is meaningful for understanding microbial responses to environmental changes.

      The main objective of this study is to investigate the growth response of soil microbes in alpine grasslands to the temperature and precipitation changes, and the interaction between climate factors. Our results, as well as the results of complementary analyses (based on subOTU analyses, SHOWN BELOW), indicate that warming and altered precipitation tend to negatively affect the growth of grassland bacteria, and the prevalence of antagonistic interactions of T × P.

      We have emphasized our research objectives and main conclusions in the revised manuscript: “The goal of current study is to comprehensively estimate taxon-specific growth responses of soil bacteria following a decade of warming and altered precipitation manipulation on the alpine grassland of the Tibetan Plateau” (Line 112-114);

      “Our results demonstrated that both warming and altered precipitation negatively affect the growth of grassland bacteria; However, the combined effects of warming and altered precipitation on the growth of ~70% soil bacterial taxa were smaller than the single-factor effects, suggesting antagonistic interaction” (Line 552-556).

      Extension of interaction analysis and its conclusions should be shortened, summarizing the most relevant findings. In my opinion, it becomes a bit redundant.

      We have shortened the discussion of Extension of interaction analysis by deleting the little relevant contents.

      Below are some, but not all, examples that have been deleted or revised in the Discussion,

      (1) Deleted “This result supports our second hypothesis that the interactive effects between warming and altered precipitation on soil microbial growth are not simply additive”;

      (2) Deleted “A previous study suggested that multiple global change factors had negative effects on soil microbial diversity (Yang et al., 2021)”;

      (3) Revised “A meta‐analysis of experimental manipulation revealed that the combined effects of different climate factors were usually less than expected additive effects, revealing antagonistic interactions on soil C storage and nutrient cycling processes (Dieleman et al., 2012; Wu et al., 2011). Moreover, two experimental studies on N cycling and net primary productivity demonstrated that the majority of interactions among multiple factors are antagonistic rather than additive or synergistic, thereby dampening the net effects (Larsen et al., 2011; Shaw et al., 2002)” to “A range of ecosystem processes have been revealed to be potentially subject to antagonistic interactions between climate factors, for instance, net primary productivity (Shaw et al., 2002), soil C storage and nutrient cycling processes (Dieleman et al., 2012; Wu et al., 2011; Larsen et al., 2011)” (Line 499-503);

      (4) Revised “Previous evidences suggest that warming has a negative impact on soil carbon pools (Jansson & Hofmockel, 2020; Purcell et al., 2022). During the first phase of soil warming (~ 10 years), microbial activity increased, resulting in rapid soil carbon mineralization and respiration (Melillo et al., 2017)” to “Previous evidences suggest that warming has a negative impact on soil carbon pools (Jansson & Hofmockel, 2020; Purcell et al., 2022), mainly because of the rapid soil carbon mineralization and respiration (Melillo et al., 2017)” (Line 464-466).

      I strongly suggest a functional analysis based on shotgun sequencing or RNAseq approaches. With this approach this work would be able to answer who is growing under altered T and Precipitation regimes and what are those that are growing doing.

      Thanks for the suggestion. Metagenomic sequencing is a popular approach to evaluate potential functions of microbial communities in environment. However, there are two main reasons that limit the application of metagenomic or metatranscriptomic sequencing in this study: 1) Most of the fractionated samples in SIP experiment have low DNA concentration and do not meet the requirement of library construction for sequencing; 2) Metagenome and metatranscriptomics usually have relatively low sensitivity to rare species, reducing the diversity of detected active species.

      This study focused on active microbial taxa and their growth in response to multifactorial climate change. We have added the prospect in DISCUSSION, that is “This suggests the development of methods combining qSIP with metagenomes and metatranscriptomes to assess the functional shifts of active microorganisms under global change scenarios” (Line 542-544).

      Minor suggestions:

      L121. _As

      We have deleted this sentence and relocated the hypotheses in the last paragraph of INTRODUCTION (according to the suggestion of the reviewer #3).

      Line150. Described previously in.

      Done (Line 136).

      Line500. Check whether it is better to use the word acclimatization (Coordinated response to several simultaneous stressors) in exchange of acclimation

      We have revised it according to this suggestion (Line 481).

      Fig.4C Drought

      Done (Line 761).

      Reviewer #3 (Public Review):

      Summary:

      In this paper, Ruan et al. studied the long-term impact of warming and altered precipitations on the composition and growth of the soil microbial community. The researchers adopted an experimental approach to assess the impact of climate change on microbial diversity and functionality. This study was carried out within a controlled environment, wherein two primary factors were assessed: temperature (in two distinct levels) and humidity (across three different levels). These factors were manipulated in a full factorial design, resulting in a total of six treatments. This experimental setup was maintained for ten years. To analyze the active microbial community, the researchers employed a technique involving the incorporation of radiolabeled water into biomolecules (particularly DNA) through quantitative stable isotope probing. This allowed for the tracking of the active fraction of microbes, accomplished via isopycnic centrifugation, followed by Illumina sequencing of the denser fraction. This study was followed by a series of statistical analysis to identify the impact of these two variables on the whole community and specific taxonomic groups. The full factorial design arrangement enabled the researchers to discern both individual contributions as well as potential interactions among the variables

      Strengths:

      This work presents a timely study that assesses in a controlled fashion the potential impact of global warming and altered precipitations on microbial populations. The experimental setup, experimental approach and data analysis seem to be overall solid. I consider the paper of high interest for the whole community as it provides a baseline to the assessment of global warming on microbial diversity.

      Thanks for the encouragement and positive comments.

      Weaknesses:

      While taxonomic information is interesting, it would have been highly valuable to include transcriptomics data as well. This would allow us to understand what active pathways become enriched under warming and altered precipitations. Non-metabolic OTUs hold significance as well. The authors could have potentially described these non-incorporators and derived hypotheses from the gathered information. The work would have benefited from using more biological replicates of each treatment.

      Thanks for the valuable suggestions.

      (1) Metatranscriptomics can assess the functional profiles of the community, but it has relatively low sensitivity to rare species, which is difficult to correlate the function pathways with the assignment to the numerous active taxa identified by qSIP. Additionally, due to the low DNA concentration, most fractionated samples are difficult to construct sequencing libraries, while amplicon based sequencing analyses were allowed. This study therefore focused on active microbial taxa and their growth in response to multifactorial climate change. We have added the prospect in DISCUSSION, that is “This suggests the development of methods combining qSIP with metagenomes and metatranscriptomes to assess the functional shifts of active microorganisms under global change scenarios” (Line 542-544).

      (2) 18O-qSIP can identify the growing microbial species (i.e., 18O incorporators) in the environment rather than metabolically active taxa. These non-incorporators in our study were likely to be metabolically active, i.e., maintaining life activities without reproduction, or recently deceased (Blazewicz et al., 2013). Therefore, it is hard to distinguish whether these non-incorporators possess metabolic activity.

      (3) Agreed. The qSIP experiments involve the use of isotopes and the sequencing of a large number of DNA samples (90 samples per biological replicate in this study). Considering its high cost, we selected three replicates for analysis. We have explained this issue in MATERIALS AND METHODS, that is “Considering the cost of qSIP experiment (i.e., the use of isotopes and the sequencing of a large number of DNA samples), we randomly selected three out of the six plots, serving as three replicates for each treatment” (Line 154-157).

      Reference:

      Nuccio, E.E., Starr, E., Karaoz, U. et al. (2020) Niche differentiation is spatially and temporally regulated in the rhizosphere. ISME J 14, 999–1014.

      Blazewicz, S.J., Barnard, R.L., Daly, R.A., Firestone, M.K (2013). Evaluating rRNA as an indicator of microbial activity in environmental communities: limitations and uses. The ISME Journal, 7, 2061–2068.

      Reviewer #3 (Recommendations For The Authors):

      Major comments:

      The manuscript should be written in a clearer way. The language should be more direct, so the message is conveyed faster and clearer. Some sentences, for instance, could be shortened or re-organized. Below, you will find some examples.

      We have rewritten the sentences to make the manuscript clearer. Below are some, but not all, examples that have been revised:

      (1) Deleted “(reduced precipitation, hereafter ‘drought’, or enhanced precipitation, hereafter ‘wet’)” in INTRODUCTION;

      (2) Deleted “Controlled experiments simulating climate change have investigated changes in microbial community composition as measured by shifts in the relative abundances (Evans & Wallenstein, 2014; Barnard et al., 2015). However, changes in relative abundances may be poor indicators of population responses to environmental change in some cases (Blazewicz et al., 2020). Another challenge is the presence of a large number of inactive microbial cells in the soil, which hinders the direct, quantitative measure of the ecological drivers in population dynamics (Fierer, 2017; Lennon & Jones, 2011).” in DISCUSSION;

      (3) Deleted “This result supports our second hypothesis that the interactive effects between warming and altered precipitation on soil microbial growth are not simply additive” in DISCUSSION;

      (4) Deleted “A previous study suggested that multiple global change factors had negative effects on soil microbial diversity (Yang et al., 2021)” in DISCUSSION;

      (5) Revised “A meta‐analysis of experimental manipulation revealed that the combined effects of different climate factors were usually less than expected additive effects, revealing antagonistic interactions on soil C storage and nutrient cycling processes (Dieleman et al., 2012; Wu et al., 2011). Moreover, two experimental studies on N cycling and net primary productivity demonstrated that the majority of interactions among multiple factors are antagonistic rather than additive or synergistic, thereby dampening the net effects (Larsen et al., 2011; Shaw et al., 2002)” to “A range of ecosystem processes have been revealed to be potentially subject to antagonistic interactions between climate factors, for instance, net primary productivity (Shaw et al., 2002), soil C storage and nutrient cycling processes (Dieleman et al., 2012; Wu et al., 2011; Larsen et al., 2011)” in DISCUSSION (Line 499-503);

      (6) Revised “Previous evidences suggest that warming has a negative impact on soil carbon pools (Jansson & Hofmockel, 2020; Purcell et al., 2022). During the first phase of soil warming (~ 10 years), microbial activity increased, resulting in rapid soil carbon mineralization and respiration (Melillo et al., 2017)” to “Previous evidences suggest that warming has a negative impact on soil carbon pools (Jansson & Hofmockel, 2020; Purcell et al., 2022), mainly because of the rapid soil carbon mineralization and respiration (Melillo et al., 2017)” in DISCUSSION (Line 464-466).

      I'm curious about why, even though there were six replicates of the experiment, only three samples were collected for analysis. Metagenomic analyses tend to display high variability.

      The qSIP experiments involve the use of isotopes and the sequencing of a large number of DNA samples (90 samples per biological replicate in this study). Considering its high cost, we selected three replicates for analysis..

      In Fig. 3A, the absolute growth rates (16S copies/d*g) are shown. How do you know that the efficiency of DNA extraction was similar across all treatments and therefore the absolute numbers are comparable?

      To avoid differences in extraction efficiency caused by experimental procedures, all DNA samples were extracted by the same person (the first author) within 2-3 hours, and a unifying procedure of cell lysis and DNA extraction was used, i.e., the mechanical cell destruction was attained by multi-size beads beating at 6 m s-1 for 40 s, and then FastDNA™ SPIN Kit for Soil (MP Biomedicals, Cleveland, OH, USA) was used for DNA extraction.

      We have measured the concentration of extracted DNA and found no significant difference between treatments (Table for the response letter).

      Author response table 1.

      Soil DNA concentration in climate change treatments after qSIP incubation (measured by Qubit® DNA HS Assay Kits).

      Values represent mean and standard deviation. T0-P represents the ambient temperature and decreased precipitation; T+-P represents warming and decreased precipitation; T0cP represents ambient temperature and precipitation; T+cP represents warming and ambient precipitation; T0+P represents ambient temperature and enhanced precipitation; T++P represents warming and enhanced precipitation. The results of ANOVA indicated no significant difference of extracted DNA concentration between treatments (p > 0.05).

      We have introduced the caveat in the DISCUSSION, that is “Note that the experimental parameters such as DNA extraction and PCR amplification efficiencies also have significant effects on the accuracy of growth assessment. This alerts the need to standardize experimental practices to ensure more realistic and reliable results” (Line 544-547).

      Line 96-99 and 121-124: "Hypotheses are typically placed at the end of the final paragraph in the Introduction section. It is advisable to relocate them there and provide a clearer description of the paper's main goal."

      We have relocated the hypotheses at the end of INTRODUCTION, and the main goal of this study, that is “The goal of current study is to comprehensively estimate taxon-specific growth responses of soil bacteria following a decade of warming and altered precipitation manipulation on the alpine grassland of the Tibetan Plateau, by using the 18O-quantitative stable isotope probing (18O-qSIP)” (Line 112-115).

      Line 399: Although you describe the classification among antagonistic interactions in the Methods section, I think you should describe this in further detail here. Can you clarify how you carried out this categorization and how these results were interpreted considering the phylogenetic classification.

      We have added the description of antagonistic interactions, that is “The interaction type of T × P on the growth of ~70% incorporators was antagonistic (i.e., the combined effect size is smaller than the additive expectation) (Fig. 4C)” (Line 388-390).

      The interaction types between factors can be classified into three categories: additive, synergistic and antagonistic. Additive interactions are those in which the combined effect size of factors is equal to the sum of the single effects of the factors (i.e., additive expectation, Fig. 1B). Synergistic interactions refer to the effect size was larger than the additive expectation by the combined manipulation of factors. On the contrary, antagonistic interactions refer to the combined effect size of factors is smaller than the additive expectation. In this study, the antagonistic interactions were further divided into three sub-categories: weak antagonistic interaction, strong antagonistic interaction, and neutralizing effect (Fig. 1B). The weak antagonistic interaction refers to the combined effect size smaller than the additive expectation and larger than any of the single factor effects. The strong antagonistic interaction refers to that the combined effect size is smaller than any of the single factor effects but larger than 0. The neutralizing effect refers to that the combined effect size is equal to 0, implying that the effects of different factors cancel each other out.

      Methodologically, the single and combined effects of two climate factors and their interaction effects were calculated by the natural logarithm of response ratio (lnRR) and Hedges’ d, respectively (Yue et al., 2017).

      We have added the result interpretation about the phylogenetic distribution patterns of incorporators, that is “The degree of phylogenetic relatedness can indicate the processes that influenced community assembly, like the extent a community is shaped by environmental filtering (clustered by phylogeny) or competitive interactions (life strategy is phylogenetically random distribution) (Evans & Wallenstein, 2014; Webb et al., 2002).The results showed that the incorporators whose growth was influenced by the antagonistic interaction of T × P showed significant phylogenetic relatedness, indicating the occurrence of taxa more likely shaped by environment filtering (i.e., selection pressure caused by changes in temperature and moisture conditions). In contrast, the growing taxa affected by synergistic interactions of T × P showed random phylogenetic distributions (Table S1), which may be explained by competition between taxa with similar eco-physiological traits or changes in genotypes (possibly through horizontal gene transfer) (Evans & Wallenstein, 2014). We also found that the extent of phylogenetic relatedness to which taxa groups of T × P interaction types varied by climate scenarios, suggesting that different climate history processes influenced the ways bacteria survive temperature and moisture stress” (Line 515-529).

      Reference:

      Evans, S.E. & Wallenstein, M.D. (2014). Climate change alters ecological strategies of soil bacteria. Ecology Letters, 17, 155-164.

      Webb, C.O., Ackerly, D.D., McPeek, M.A. & Donoghue, M.J. (2002). Phylogenies and Community Ecology. Annual Review of Ecology and Systematics, 33, 475-505.

      Yue, K., Fornara, D.A., Yang, W., Peng, Y., Peng, C., Liu, Z. et al. (2017). Influence of multiple global change drivers on terrestrial carbon storage: additive effects are common. Ecology Letters, 20, 663-672.

      Line 407-8: What do you mean with "...clustered at the phylogenetic branches" and Line 410: "cluster near the tips of the phylogenetic tree". Can you please clarify?

      Sorry for the unclear statement. We have added the explanation of NTI, that is “Nearest taxon index (NTI) was used to determine whether the species in a particular growth response are more phylogenetically related to one another than to other species (i.e., close or clustering on phylogenetic tree). NTI is an indicator of the extent of terminal clustering, or clustering near the tips of the tree (Evans & Wallenstein, 2014; Webb et al., 2002)” (Line 397-401).

      Reference:

      Evans, S.E. & Wallenstein, M.D. (2014). Climate change alters ecological strategies of soil bacteria. Ecology Letters, 17, 155-164.

      Webb, C.O., Ackerly, D.D., McPeek, M.A. & Donoghue, M.J. (2002). Phylogenies and Community Ecology. Annual Review of Ecology and Systematics, 33, 475-505.

      Could you provide some info about the biochemistry of the incorporation of heavy water into DNA molecules? What specific enzymes are typically involved?

      Due to the low DNA concentration in most fractionated samples (less than 10 ng/μL, measured by Qubit DNA HS Assay Kits), only amplicon based sequencing analyses were allowed. This study therefore focused only on active microbial taxa and their growth in response to multifactorial climate change.

      What might be the impact of soil desiccation on bacterial survival and subsequent water uptake?

      Slow dehydration and air drying of soil is a very common phenomenon in nature (Koch et al., 2018). In this process, microorganisms will reduce metabolism, and shift towards a potentially active state (Blagodatskaya and Kuzyakov, 2013). A previous study suggested that the potentially active microbial population permanently existing in soil between the active and dormant physiological states. Even under long-term starvation the potentially active microorganisms maintain ‘physiological alertness’ to be ready to occasional substrate input (Blagodatskaya and Kuzyakov, 2013). These microorganisms are important participants in the biogeochemical cycle is the focus of this study.

      Replacing the environmental water in the soil with 18O-labelled water is a typical practice for qSIP studies (Hungate et al. 2015; Koch et al., 2018). This process may cause disturbance to the microbial community. In this study, the soil samples were placed in a thermostatic incubator (14℃ and 16℃), rather than air-drying at 25℃ (as used in most studies). The incubation temperature is relatively low (compared to 25℃) and there is no violent air convection in the incubator, resulting slower evaporation and no significant discoloration caused by severe soil dehydration after 48 h. The process of soil drying in this study simulated the natural phenomenon, i.e., slow water loss in soil.

      We have added the description in MATERIALS AND METHODS, that is “There is no violent air convection in the incubator and the incubation temperature is relatively low (compared to 25℃ used in previous studies), resulting slower evaporation and no significant discoloration caused by severe soil dehydration after 48 h” (Line 171-174).

      Reference:

      Blagodatskaya, E. & Kuzyakov, Y. (2013) Active microorganisms in soil: Critical review of estimation criteria and approaches. Soil Biology and Biochemistry, 67, 192-211.

      Hungate, B., Mau, R., Schwartz, E., Caporaso, J., Dijkstra, P., Van Gestel, N. et al. (2015). Quantitative microbial ecology through stable isotope probing. Applied and Environmental Microbiology, 81, 7570-7581.

      Koch, B., McHugh, T., Hayer, M., Schwartz, E., Blazewicz, S., Dijkstra, P. et al. (2018). Estimating taxon-specific population dynamics in diverse microbial communities. Ecosphere, 9, e02090.

      The analysis of the 180 incorporators is interesting as it defines what microbes are metabolically active and hence growing under the different conditions tested. Should not be worth to analyze the non-incorporators? Is it possible to identify a pattern to generate a hypothesis of why they are metabolically inactive based on this information? In the Methods section, the authors state that they identified a total of 6,938 OTUs, of which only 1,373 were found to be incorporators.

      Microbes exist in a range of metabolic states: growing, active (non-growth), dormant and recently deceased (Blazewicz et al., 2013), and there is still a lack of clear threshold for their identification. 18O-DNA qSIP can identified the growing microbial species (i.e., 18O incorporators) rather than all metabolic active taxa, because some cells are measurably metabolizing (catabolic and/or anabolic processes) without reproduction. Therefore, the non-incorporators in our study may be metabolically active, or not (recently deceased microorganisms). This study focuses on the growing microorganisms identified by 18O-qSIP.

      In this study, ~20% microbial taxa (1,373/6,938) were identified as 18O incorporators. Microorganisms in soils suffer from resource and energy constraints frequently (Blagodatskaya and Kuzyakov, 2013). The energy requirements of species in the growing state are much higher (~30 fold) than those in the non-growing state, so the percentage of growing bacterial taxa in soil tends to be low.

      Reference:

      Blazewicz, S.J., Barnard, R.L., Daly, R.A., Firestone, M.K (2013). Evaluating rRNA as an indicator of microbial activity in environmental communities: limitations and uses. The ISME Journal, 7, 2061–2068.

      Blagodatskaya, E. & Kuzyakov, Y. (2013) Active microorganisms in soil: Critical review of estimation criteria and approaches. Soil Biology and Biochemistry, 67, 192-211.

      Minor comments:

      Fig. 3A and 3B. Please show the results of the multiple comparisons.

      Done.

      Author response image 5.

      Bacterial growth responses to climate change and the interaction types between warming and altered precipitation. The growth rates (A), and responses (LnRR) of soil bacteria to warming and altered precipitation (B) at the whole community level. The growth rates (C), and responses of the dominant bacterial phyla (D) had similar trends with that of the whole community. Values represent mean and the error bars represent standard deviation. Different letters indicate significant differences between climate treatments.

      Fig. 4. This figure should be self-explanatory. This diagram is challenging to understand.

      We have revised Fig. 4 to improve clarity.

      Author response image 6.

      The growth responses and phylogenetic relationship of incorporators subjected to different interaction types under two climate scenarios. A phylogenetic tree of all incorporators observed in the grassland soils (A). The inner heatmap represents the single and combined factor effects of climate factors on species growth, by comparing with the growth rates in T0nP. The outer heatmap represents the interaction types between warming and altered precipitation under two climate change scenarios. The proportions of positive or negative responses in species growth to single and combined manipulation of climate factors by summarizing the data from the inner heatmap (B). The proportions of species growth influenced by different interaction types of T × P by summarizing the data from the outer heatmap (C).

      Fig. 4. It says "Dorought" instead of "drought"

      Done (Line 760).

      Line 109: "relieves" instead of "relieved"

      Done (Line 102).

      Line 129: Should be: "We classified the interaction types as additive, synergistic, antagonistic, null and neutralizing."

      Done (Line 117).

      Line 233: How were the 16S rRNA sequences from each density fraction analyzed?

      (1) Raw sequencing data processing:

      The raw 16S rRNA gene sequences of each density fraction were quality-filtered using the USEARCH v.11.0 (Edgar, 2010). The paired-end sequences were merged and quality filtered with “fastq_mergepairs” and “fastq_filter” commands, respectively. Sequences < 370 bp and total expected errors > 0.5 were removed. Next, “fastx_uniques” command was implemented to identify the unique sequences. Subsequently, high-quality sequences were clustered into operational taxonomic units (OTUs) with “cluster_otus” commandat a 97% identity threshold, and the most abundant sequence from each OTU was selected as a representative sequence. The taxonomic affiliation of the representative sequence was determined using the RDP classifier (Wang et al., 2007).

      (2) qSIP calculation:

      Sequencing data reflects the relative abundance of taxa in community. We multiply the OTU’s relative abundance (acquisition by sequencing) and the number of 16S rRNA gene copies (acquisition by qPCR) to obtain the number of gene copies per OTU in each fraction. Then, the proportion of gene copies of a specific OTU of each fraction relative to the total amount of gene copies in one sample was calculated and used as a weight value for further calculation of the average weighted buoyant density (the critical parameter for assessing microbial growth).

      Line 366: "Three single-factor ... between warming and altered precipitation" -> "The individual impact of warming, drought, and wet conditions resulted in the most substantial negative effects on bacterial growth compared with the effects of warming x drought and warming x wet. A result that illustrates the negative interactions between warming and modified precipitations patterns."

      Done (Line 365-368).

      Line 376: "Similar with the result of whole growth of bacteria community, the growth responses of the major bacterial phyla were also negatively influenced by single climate factors". This sentence is hard to read. Maybe something like this: "Growth of the major bacterial phyla was also negatively influenced by the individual climate factors".

      Done (Line 371-372).

      Line 383: "In particular, the effects of wet and warming neutralized each other, resulting the net effects became zero on the growth rates of the phyla Actinobacteria and Bacteroidetes". "In Actinobacteria and Bacteroidetes, the effect of wet and warming neutralized each other, as the combined effect of these two factors had no effect on growth".

      Done (Line 377-379).

      Line 390: "The individual warming treatment (T+nP) reduced the growth rates of 75% incorporators..." "Warming (T+nP) reduced the growth of 75% of the taxonomic groups, which was followed by drought and wet.

      Done (Line 384-385).

      Line 392: "The combined manipulations of warming and altered precipitation lowered the percentages of incorporators with negative responses compared with single factor manipulation, especially warming and enhanced precipitation manipulation" -> "Warming x drought and warming x wet had a smaller impact on the growth of incorporators, compared with single effects."

      Done (Line 385-387).

      Line 468. This sentence "To the best ..." is not necessary.

      We have deleted this sentence.

      Line 476. Is it really "synthesis" the word you want to use?

      We have deleted this sentence.

      Line 477. Maybe should written like this: "Consistent with our findings, a recent experimental study demonstrated that 15 years of warming reduced the growth rate of soil bacteria in a montane meadow in northern Arizona."

      Done (Line 459-461).

      Line 490 and 502. Consider using "however" only once in a paragraph.

      We have deleted the second “however” (Line 483).

      Line 555-559. Based on genomic data you cannot predict the functional role of microbes in the environment. These sentences are speculative. Please, consider using less strong affirmations and focus more on the pathways that are enriched in the incorporators.

      Agreed. We have deleted this part of content.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We thank the reviewers for their careful, critical, and insightful evaluation of our manuscript.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The preprint by Laganowsky and co-workers describes the use of mutant cycles to dissect the thermodynamic profile of specific lipid recognition by the ABC transporter MsbA. The authors use native mass spectrometry with a variable temperature source to monitor lipid binding to the native protein dimer solubilized in detergent. Analysis of the peak intensities (that is, relative abundance) of 1-3 bound lipids as a function of solution temperature and lipid concentration yields temperature-dependent Kds. The authors use these to then generate van't Hoff plots, from which they calculate the enthalpy and entropy contributions to binding of one, two, and in some cases, three lipids to MsbA.

      The authors then employ mutant cycles, in which basic residues involved in headgroup binding are mutated to alanine. By comparing the thermodynamic signatures of single and double (and in one instance triple) mutants, they aim to identify cooperativity between the different positions. They furthermore use inward and outward locking conditions which should control access to the different binding sites determined previously.

      The main conclusion is that lipid binding to MsbA is driven mainly by energetically favorable entropy increase upon binding, which stems from the release of ordered water molecules that normally coordinate the basic residues, which helps to overcome the enthalpic barrier of lipid binding. The authors also report an increase in lipid binding at higher temperatures which they attribute to a non-uniform heat capacity of the protein. Although they find that most residue pairs display some degree of cooperativity, particularly between the inner and outer lipid binding sites, they do not provide a structural interpretation of these results.

      Strengths:

      The use of double mutant cycles and mass spectrometry to dissect lipid binding is novel and interesting. For example, the observation that mutating a basic residue in the inner and one in the outer binding site abolishes lipid binding to a greater extent than the individual mutations is highly informative even without having to break it down into thermodynamic terms (see "weaknesses" section). In this sense, the method and data reported here opens new avenues for the structure/activity relationship of MsbA. The "mutant cycle" approach is in principle widely applicable to other membrane proteins with complex lipid interactions.

      Weaknesses:

      The use of double mutant cycles to dissect binding energies is well-established, and has, as the authors point out, been employed in combination with mass spectrometry to study protein-protein interactions. Its application to extract thermodynamic parameters is robust in cases where a single binding event is monitored, e.g. the formation of a complex with well-defined stoichiometry, where dissociation constants can be determined with high confidence. It is, however, complicated significantly by the fact that for MsbA-lipid interactions, we are not looking at a single binding event, but a stochastic distribution of lipids across different sites. Even if the protein is locked in a specific conformation, the observation of a single lipid adduct does not guarantee that the one lipid is always bound to a specific site. In some of the complexes detected by MS, the lipid is likely bound somewhere else. Lipid binding Kds from mass spectrometry, although helpful in some instances as a proxy for global binding affinities, should therefore be taken with a grain of salt.

      We agree with the reviewer in that while we will measure binding of lipid (mass shift) we do not know the binding location(s). Given this issue, we have added to the discussion section on this important point and elaborate more broadly on this problem in the context of membrane protein-lipid interactions. Tackling this issue represents a frontier challenge for the field.

      The authors analyze the difference in binding upon mutating binding sites (ddG etc). Here, another complicating factor comes into play, the fact that mutation of a binding site (which the authors show reduces lipid binding) may instead allow the lipid to bind to a lower-affinity site elsewhere. Unfortunately, the authors do not specify the protein concentration, but assuming it is in the single-digit micromolar range, as common for native MS experiments, lipid and protein concentrations are almost equal for most of the data points, resulting in competition between binding sites for free lipids. As a rule of thumb, for Kd measurements, the concentration of the constant component, the protein, should be far below the Kd, to avoid working in the "titration" regime rather than the "binding" regime (see Jarmoskaite et al, eLife 2020). I cannot determine whether this is the case here. The way I understand the double mutant cycle approach, reliable Kd measurements are required to accurately determine dH and TdS, so I would encourage the authors to confirm their Kd values using complementary methods before in-depth interpretations of the thermodynamic components.

      The reviewer references an article in eLife by Jarmoskaite and co-workers describing “titration” vs “binding” regimes. Below we paste a snippet from this article:

      Author response image 1.

      Equation 4a is an expression for the fraction of protein bound to ligand, which universally holds, i.e., if we know the concentration of molecules at equilibrium (including those unbound or free) then one can obtain the special ratio or equilibrium constant at a given temperature. Jarmoskaite et al. note that in practice (using traditional biophysical approaches) one cannot readily distinguish protein that is free or bound to ligand (see highlighted part above). While this assumption is basis of their eLife assessment, it does NOT apply to native mass spectrometry data. It is important to realize that the mole fraction (or concentration) of apo and each lipid bound states, i.e., [P], [PL], [PL2], …, [PLn+1], can readily be obtained directly from the deconvoluted mass spectrum. This is unlike other biophysical methods that are ensemble measurements, which measures the amount of heat or fraction of total ligand bound to protein. Since we can discern each lipid bound state, including the free protein and free ligand concentrations, the equilibrium binding constants can be directly calculated, and the protein and ligand concentration becomes irrelevant. In principle, equilibrium constants for protein-lipid interactions can be calculated from one mass spectrum. To increase transparency, we have updated the results section to highlight the important difference of the native MS approach compared to less robust traditional approaches that are riddled with underlying issues/assumptions.

      We appreciated the reviewer’s suggestion of using complementary methods to confirm Kd values. In our previous report [1], we determined binding thermodynamics for soluble protein-ligand interactions using native MS, surface plasmon resonance (SPR), and isothermal calorimetry (ITC) and found the techniques yield similar binding constants and thermodynamic parameters. The use of soluble proteins with defined ligand binding studies was rather straightforward to carry out a complementary study. We have also shown consistent findings for native MS and SPR of membrane protein interaction with a soluble, regulatory protein [2]. However, in the case of membrane proteins they can bind the first few lipids very specifically and, with the addition of more lipid, bind even more lipids that represent rather weak binding. Thus, traditional approaches would report on the ensemble of lipids bound to membranes and specific lipid binding sites (such as inner and outer LPS binding sites in MsbA) are saturable but also additional binding will be observed, i.e., doesn’t follow traditional soluble protein-ligand binding studies. In the past we have used a fluorescent-lipid competition binding assay [3] to corroborate native MS results for Kir3.2, which showed a direct correlation. The disadvantage of this complementary approach is using a non-natural, fluorescent-modified lipid. Unfortunately, there is no commercial source for a fluorophore modified KDL.

      It is somewhat counterintuitive that for many double mutants, and the triple mutant, the entropic component becomes more favorable compared to the WT protein. If the increase in entropy upon lipid binding comes from the release of ordered water molecules around the basic residues (a reasonable assumption) why does this apply even more in proteins where several basic residues have been changed to alanine, which coordinate far fewer water molecules?

      There are many factors that contribute to the change in entropy of the system, beyond solvation entropy, and deciphering the entropic contributions of the various components remains a challenging task. We have revised the manuscript to emphasize that solvation is one component of the entropic term and other components are likely at play.

      The authors could devote more attention to the fact that they use detergent micelles as a vehicle for lipid binding studies. To a limited extent, detergents compete with lipids for binding, and are present in extreme excess over the lipid. The micelle likely changes its behavior in response to temperature changes. For example, the packing around the protein loosens up upon heating, which may increase the chance for lipids to bind. In this case, the increase in binding at higher temperatures may not be related to a change in heat capacity. This question could be addressed by MD simulations, if it's not already in the literature.

      The detergent and its concentration are consistent for all the different MsbA proteins in this study. In fact, we observe linear van’t Hoff plots with positive and negative slopes as well as non-linear curves that are convex or concave. The MsbA protein (wt or mutant), trapped or not, all display unique temperature-dependent responses. The reviewers comment of increasing temperature to loosen packing of detergent to promote lipid binding is clearly NOT that simple. If detergent was significantly influencing lipid binding (as suggested by reviewer) then increasing its concentration should impact lipid binding. In a previous study, we found no difference in membrane protein-lipid thermodynamics even when the concentration of detergent was increased five-fold [1]. We repeated similar experiments for MsbA and find the increased detergent concentration does not impact the abundances of lipid bound states. The figure to the right shows MsbA in the presence of lipid in 2x CMC (panel a and b) and 10x CMC (panel c and d). As you will see, no appreciably difference in the lipid bound signal is observed.

      Author response image 2.

      We applaud the suggestion of MD simulation. However, it is far beyond the scope of this paper and its not clear what will really be learned.

      Reviewer #2 (Public Review):

      Summary:

      This is a solid study that dissects the thermodynamics of lipopolysaccharide (LPS) transporter MsbA and LPS. Native ESI-MS and the novel strategies developed by the authors were employed to quantify the affinities of LPS-MsbA interactions and its temperature dependence. Here, the equilibrium of lipid-protein interactions occurs in the micellar phase. The double-/triple-mutant cycle analysis and van't Hoff analysis allowed a full thermodynamic description of the lipid-protein interactions and the analysis of thermodynamic coupling between LPS binding sites. The most notable result would be that LPS-MsbA interaction is largely driven by entropy involving the negative heat capacity, a signature of the solvent reorganization effect (here authors attribute the solvent effect to "water" reorganization). The entropy driven lipid binding has been previously reported by the same authors for Kir1,2-PIP2 interactions.

      Strengths:

      1. This is overall a very thorough and rigorous study providing the detailed thermodynamic principles of LPS-MsbA interaction.

      2. The double and triple-mutant cycle approaches are newly applied to lipid-protein interactions, enabling detailed thermodynamics between LPS binding sites.

      3. The entropy-driven protein-lipid interaction is surprising. The binding seems to be mainly mediated by the electrostatic interaction between the positively charged residues on the protein and the negatively charged or polar headgroup of LPS, which could be thought of as "enthalpic" (making of a strong bond relative to that with solvent).

      Weaknesses:

      1. This study is a good contribution to the field, but it was difficult to find novel biological insights or methodological novelty from this study.

      1a. Thermodynamic analysis of lipid-protein interactions, an example of entropy-driven lipid-protein interactions, and the cooperativity between lipid binding sites have been reported by the author's group. Also, the cooperativity between binding sites in general have been reported from numerous studies of biomolecular interactions.

      We appreciate the reviewer for highlighting our previous work. Of course, a single study does not establish a pattern, such as entropy-driven lipid-protein interactions.

      While we agree with the reviewer that cooperativity in biomolecular interactions has been established for many soluble protein systems, by no means do we have a detailed understanding of membrane protein-lipid interactions. This work is an important contribution to expanding on classical work on soluble protein systems to more challenging membrane protein systems and their interactions with lipids.

      1b. It is not clear how this study provides new insights into the understanding of LPS transport mechanisms. Probably, authors could strengthen the Discussion by providing biological insights-how the residue coupling.

      The thermodynamics provides us with a deeper insight into the chemical principles that drive specific membrane protein-lipid interactions. We have revised the discussion to highlight the importance of thermodynamics and the implication of individual residues to KDL binding, and the inner and outer LPS binding sites appear to be coupled, something that is new.

      1. One to three LPS molecules bind to MsbA, but it is unclear whether bound KDL occupies inner or outer cavities, or both and how a specific mutation affects the affinity of specific LPS (i.e., to inner or to outer cavities). Based on the known structures, the maximal number of LPS is three. It is possible that the inner and outer cavities have different LPS affinities. Also, there can be multiple one-LPS-bound states, two-LPS-bound states if LPS strictly binds to the binding sites indicated by the structures. This aspect is beyond the scope of this study and difficult to address, but without this information, it seems hard to tell what is going on in the system.

      In our response above, we note that lipids will bind to membrane proteins at specific site(s) and weaker sites, often described as non-annular lipids. The revision includes this discussion point.

      1. If a single mutation is introduced to the inner cavity, its effect will be "doubled" because the inner cavity is shared by two identical subunits. This effect needs to be clarified in the result section.

      Great point. In addition, an outer mutant will also impact not one but both outer binding site(s)s. The revised manuscript makes note of this point.

      1. In the result section, "Mutant cycle analysis of KDL binding to vanadate-trapped MsbA.":

      4a. It seems necessary to show the mass spectra for Msb-ADP-vanadate complex as well as its lipid bound forms.

      In the original submission, the mass spectra of vanadate trapped MsbA with KDL binding was provided in Supplementary Figures 10 and 11.

      4b. The rationale of this section (i.e., what mechanistic insights can be obtained from this study) is unclear. For example, it is not sure what meaningful information can be obtained from a single type (ADP/vanadate) of the bound state regarding the ATP-driven function of MsbA.

      MsbA is a dynamic, populates different conformations. Trapping with vanadate locks the transporter in an outwardfacing state with NDB interacting. This provides the opportunity to characterize binding to the exterior site. We revised the manuscript to note this point.

      Reviewer #3 (Public Review):

      Summary:

      In this paper presented by Liu et al, native MS on the lipid A transporter MsbA was used to obtain thermodynamic insight into protein-lipid interactions. By performing the analyses at different lipid A concentrations and temperatures, dissociation constants for 2-3 lipid A binding sites were determined, as well as enthalpies were calculated using nonlinear van't Hoff fitting. Changes in free Gibb's energies were then calculated based on the determined dissociation constants, and together with the enthalpy values obtained via van' t Hoff analysis, the entropic contribution to lipid binding (DeltaS*T) was indirectly determined.

      Strengths:

      This is an extensive high quality native MS dataset that provides unique opportunities to gain insights into the thermodynamic parameters underlying lipid A binding. In addition, it provides coupling energies between mutations introduced into MsbA, that are implicated in lipid A binding.

      Weaknesses:

      The data all rely on the accuracy of determining KD values for lipid binding to MsbA. For the weaker binding sites, the range of lipid concentrations probed were in fact too low to generate highly accurate data. Another weakness is a lack of clear evidence, which KD values belong to which of the possible lipid A binding sites.

      See our detailed response to reviewer 1 regarding Kd determination using native MS compared to other techniques. We chose to focus on the first three lipid binding events and adjusted the concentrations accordingly to titrate these three. As noted above, the Kd values can be determined from one mass spectrum. For rigor, we include different titration points and fit sequential binding model to the data – the fits are shown in supplemental and quite reasonable.

      Regarding multiple lipids binding to different site(s), we have been able to distinguish high-affinity vs low-affinity PIP binding to Kir3.2 in a previous study [4]. This was apparent by the mole fraction curves for some lipid bound states not returning back to zero. We agree binding to multiple sites can be an issue. However, other techniques report on the ensemble of binding and, hence, no real useful information is obtained. Native MS enables one step in the right direction by dissecting the different lipid bound states. Future directions will need to further address this forefront question in the field, which we make point of now in discussion.

      Reviewer #1 (Recommendations For The Authors):

      Experiments/analysis: In short, there should be a proof of principle experiment that the thermodynamic constants determined by MS are accurate. Once that is done, the authors can add a more engaging structural interpretation of the results from the mutant cycles (which they seem to consciously avoid in the present manuscript?). How are cooperative residues coupled? Why?

      See our detailed response to reviewer 1 above.

      The manuscript is well-written, but Figures 3-5 are somewhat repetitive and require a lot of time to understand. Schematics of the main findings in each figure would help the uninitiated reader.

      We agree the illustrations are complex but there is rich data being shown.

      Figure 2 C contains an x-axis label error.

      Corrected.

      Reviewer #2 (Recommendations For The Authors):

      1. Lines 128-129: "Like other mutant cycle studies, we assume the single- and double-mutations do not disrupt binding at specific sites on MsbA."

      This statement is obscure and needs to be clarified. Does this mean that the mutations still allow binding of KDL, or the mutations do not disrupt the conformational integrity of the binding sites?

      This statement has been removed.

      1. Lines 137-139: "More specifically, R78 coordinates one of the characteristic phosphoglucosamine (P-GlcN) substituents of KDL whereas K299 interacts with a carboxylic acid group in the headgroup of KDL."

      Two identical subunits form a dimer interface that forms an LPS binding site. Thus, a single mutation on the inner cavity will disrupt two binding sites on LPS. One R78 to P-ClcN and the other to a sugar backbone. Also, one K299 interacts with a carboxylic acid group in the headgroup and the other to an unknown (not clear in the figure).

      Also noted above, mutation of the outer site will also impact the two outer sites. We have made note of this caveat.

      1. Lines 171-172: "leading to an increase in ΔG by ~4 kJ/mol (Fig. 2d)"

      Relative to what?

      Corrected.

      1. Lines 172-173: "Mutant cycle analysis indicates a coupling energy (ΔΔGint) of 1.7 (plus minus) 0.4 kJ/mol that contributes to the stability of KDL-MsbA complex."

      The sign of DDG (DDH,DDS)_int is a bit confusing. I recommend that authors define the meaning of negative or positive sign of DDG_int (DDH,DDS) at this point. Here, a positive sign means favorable cooperation between the two mutated residues. Sometimes, researchers designate a positive cooperativity as a negative sign.

      The literature on mutant cycles does not appear to follow a consensus on the sign. Here, we have revised the manuscript to note positive sign means favorable cooperation and follow the formalism recently described by Horovitz, Sharon, and co-workers [5].

      1. Lines 182-185: "Enthalpy and entropy for KDL binding MsbA R188A was largely similar to the wild-type protein (Fig 3a). However, the R243A mutation resulted in an increase in entropy, compensated for by an increase in positive enthalpy (Fig 3a)."

      The thermodynamic parameters for R243A mutation change in a similar manner to WT and R188A. It is R238A, not R243A, whose DH-DS interplay shows a distinct pattern from WT. Please, reword this sentence.

      The sentence has been revised.

      1. Lines 252-253: Solvation of polar groups in aqueous solvent has been ascribed to positive heat capacities whereas negative for apolar solvation.

      This statement is not precise. More precisely, the collapse of apolar molecules from their solvated state leads to the negative "change" in heat capacity.

      The sentence has been corrected.

      1. Line 262-267: "These hydrophilic patches will be highly solvated, which will be desolvated upon binding lipids contributing favorably to entropy. In the case of MsbA, the selected lysine and arginine residues (based alpha carbon position) are separated by about 9 to 18 Å (PDB 8DMM). This distance could result in overlap of solvation shells that collectively contribute to the positive coupling enthalpy observed for MsbA-KDL interactions."

      This statement is too speculative without presenting the degree of solvation of the residues targeted for mutation. More quantitative arguments seem to be needed.

      We have removed the speculative statement.

      Reviewer #3 (Recommendations For The Authors):

      In this paper presented by Liu et al, native MS on the lipid A transporter MsbA was used to obtain thermodynamic insight into protein-lipid interactions. By performing the analyses at different lipid A concentrations and temperatures, dissociation constants for 2-3 lipid A binding sites were determined, as well as enthalpies were calculated using nonlinear van't Hoff fitting.

      Changes in free Gibb's energies were then calculated based on the determined dissociation constants, and together with the enthalpy values obtained via van' t Hoff analysis the entropic contribution to lipid binding (DeltaS*T) was indirectly determined.

      Correction – In the case on linear van’t Hoff plots, dH and dS were determined directly from the plot. For the nonlinear form of the van’t Hoff equation, which does not include an entropy fitting parameter, we back calculated dS using dH and dG at a given temperature.

      The authors then included single, double and triple mutants of residues known based on cryo-EM and X-ray structures to interact with Lipid A either in the large inward-facing cavity or at a secondary binding site accessible at the surface of outward-facing MsbA, and determined the thermodynamic parameters of these mutants alone and combined to gain access to coupling energies of pairwise interactions. This method has its roots in studying pair-wise interactions of protein-protein interfaces, generally known as thermodynamic mutant cycle analysis.

      Having the main expertise in ABC transporter structure-function, I will judge the paper mostly from the standpoint of what I can learn as a transporter expert from this study and whether the insights are of value for researchers with average biophysical knowledge.

      My overall impression of the manuscript is that, while it contains a wealth of experimental data using the innovative and unique method of native mass spectrometry, it is hard to understand what one can learn from this analysis beyond their interesting key finding that entropy plays an important role in lipid binding (but only at certain temperatures). In particular, the lessons learned from the coupling energy analysis of the introduced mutations is hard to grasp/digest for me with regards to what I can learn from these numbers (other than learning that there are such coupling effects).

      We agree the thermodynamic data is rich. Often a ddGint of zero is reported as having no coupling/significance but here the value is due to compensating ddH and d-dTS terms. In our view, this work forms the foundation of additional studies to better understand the coupling energetic terms, beyond ddGint.

      In some instances, the text/figure legends are a bit unclear or contain some typos; but this part can easily be handled in a revision. The discussion is well written and embeds the main findings in the (still rather limited) literature on thermodynamic analyses of lipid binding of membrane proteins.

      Major points

      1. The authors may have clarified the following point in a previous paper; but at least in this paper, it is unclear to me how they purified MsbA without lipid A. The reason I am asking is that in our experience, if one purifies MsbA expressed from E. coli with standard detergents (e.g. beta-DDM) one will find a perfect density for Lipid A when determining an inward-facing structure by cryo-EM. According to the Methods, MsbA is purified initially in DDM, and rebuffered to C10E5 during size exclusion chromatography. When looking at Fig. 2b, the authors state (or assume?) that if no lipid A is added, MsbA has 0 % lipid A bound.

      We have previously reported details of MsbA sample prep and optimization [6]. The revised manuscript makes note of this previous work and refers the reader to the publication. Yes, we see no appreciable signal for lipid A bound to MsbA (see Fig 2b).

      We also note that samples of MsbA prepared using DDM is highly heterogenous, contaminated by a battery of small molecules (that we suspect are co-purified lipids). These contaminants will inadvertently impact biochemical studies.

      1. A second topic where further clarification is in my view needed is the question of the conformations that were probed and the lipid binding sites. If I get the experimental rationale correctly, most of the data were determined in the absence of nucleotides, and only a small subset (Fig. 5) of data were determined in the presence of ATP-vanadate. However, structural evidence for the cytosolic lipid A binding site has been only determined for outward-facing MsbA (PDB: 8DMM), but has thus far not been seen in any of the inward-facing cryo-EM structures of MsbA, including recent well-resolved cryo-EM structures showing excellent density for the lipid A bound to the inward-facing cavity (PDB: 7PH2). Further, there is only one lipid A molecule that can be accommodated by the inward-facing cavity, whereas (owing to the symmetry of the homodimer) two lipid A can be bound sideways to outward-facing MsbA. Now, my understanding problem is why one does see up to three lipid A molecules bound to inward-facing apo MsbA, e.g. Fig. 2b and elsewhere. Where are they expected to bind? And what is the evidence supporting these additional binding sites?

      See our detailed response to reviewer 1. If we add more lipid, we see more lipid binding to MsbA, like every other membrane protein we have studied. This data clearly indicates that there are more KDL binding site(s) – deciphering the affinity of these site(s) represents a problem on the horizon.

      A further question is which lipid A binding sites are present in vanadate-trapped MsbA. Here, there are two identical binding sites (at the surface of each MsbA molecule), and it is therefore surprising to see that the affinities for the first and the second binding site are so different (see e.g. Supplementary Fig. 13).

      Great point. A logical explanation (described for other biochemical systems) is the two exterior LPS binding sites display negative cooperativity i.e., binding at one site weakens the affinity at the other site.

      Finally, what is the evidence that in vanadate-trapped MsbA, all molecules have closed NBDs and thus assume the outward-facing conformation? It is not uncommon that vanadate trapping leads to NBD closure only in a subfraction of all transporters (hence not in 100 % of them).

      Yes, the native mass spectrum shows no appreciable signal for MsbA not trapped with vanadate/ADP. In our previous cryoEM study [6], using the vanadate-trapped transporter, we did not observe particles with NDBs dissociated in space. Regarding samples from other labs, a native mass spectrum could shed light into the population of untrapped protein – however, most studies use SDS-PAGE for quality control of their purified samples. This technology is not sufficient to address underlying biochemical issues.

      We do have a new report in preparation describing a new discovery regarding trapping efficiency of MsbA.

      1. The key parameter that is underlying the entire thermodynamic analysis of wt and mutant MsbA is the dissociation/association constant, which are used to calculate free Gibb's energy and, via van't Hoff analysis, enthalpy. Entropy is not determined directly, but in fact indirectly from these two numbers both depending on the measurement quality of dissociation/association constant. Now, when looking at the fitted curves as shown in Figure 2b (and in the supplement), determination of the dissociation constant for KDL1 (blue curves) look reasonable and the determined KDs are within the range of measured points. However, for KDL2 (red) and even more so KDL3 (yellow), the determined KD values (Supplementary Table 5), the measured KD values are typically higher than highest KDL conc used in the assay (1.5 uM). For this reason, and despite the fact that error bars of the fits look reasonably small, I still have doubts about the reliability of these KD values for KDL2 and KDL3.

      Hence, the surprisingly strong changes of enthalpy/entropy values for different mutants/temperatures may have their origin in incorrectly determined KD values.

      The increase in binding affinity of subsequent lipid binding events is consistent with many reports from our group [1, 2, 4, 6-9] and that of Prof. Robinson [10, 11] on this topic. As noted above, we indeed observe linear van’t Hoff plots with positive and negative slopes as well as non-linear curves that are convex or concave. The MsbA protein (wt or mutant), trapped or not, all display unique temperature-dependent responses. If the reviewer suggestion that the Kd values are incorrectly or randomly determined, then none of the binding data should follow thermodynamic van’t Hoff equations. This is simply not the case - the error bars and fits are reasonable. Backing up even further, looking the raw native mass spectra (see supplemental figure 1-3 and 10-11) one can see different temperature-dependence of lipid binding.

      Minor points

      1. Lines 116-131: this section reads as an extended introduction/aims, and does not contain any results.

      This section has been moved to introduction.

      1. Lines 137-139: suggested to check whether these interactions are also present in recently determined cryo-EM structures determined at fairly high resolution (PDB: 7PH2)

      The interactions of MsbA and LPS (bound at the interior site) are comparable for PDB 7PH2 and 6BPL.

      1. Lines 144-146: suggested to elude in more detail on the fitting procedure here, as the KD values determined in this way are the foundation of all quantitative assessments.

      Details of data analysis and the fitting procedure are provided in methods.

      1. Figure legend, Fig. 2: Technically, MsbA was solubilized and purified in DDM and detergent exchange was done on SEC to C10E5.

      Corrected.

      1. Figure legend, Fig. 4: description in a) on deconvoluted mass spec data is incorrect. Letter below needs to be adjusted accordingly.

      Corrected.

      1. Figure legend, Fig. 5: suggested to mention in Figure legend title that here we look at ADP-vanadate trapped MsbA.

      Corrected.

      References 1. Cong, X., et al., Determining Membrane Protein–Lipid Binding Thermodynamics Using Native Mass Spectrometry. Journal of the American Chemical Society, 2016. 138(13): p. 4346-4349.

      1. Cong, X., et al., Allosteric modulation of protein-protein interactions by individual lipid binding events. Nat Commun, 2017. 8(1): p. 2203.

      2. Qiao, P., et al., Insight into the Selectivity of Kir3.2 toward Phosphatidylinositides. Biochemistry, 2020. 59(22): p. 2089-2099.

      3. Qiao, P., et al., Entropy in the Molecular Recognition of Membrane Protein-Lipid Interactions. J Phys Chem Lett, 2021. 12(51): p. 12218-12224.

      4. Sokolovski, M., et al., Measuring inter-protein pairwise interaction energies from a single native mass spectrum by double-mutant cycle analysis. Nat Commun, 2017. 8(1): p. 212.

      5. Lyu, J., et al., Structural basis for lipid and copper regulation of the ABC transporter MsbA. Nat Commun, 2022. 13(1): p. 7291.

      6. Patrick, J.W., et al., Allostery revealed within lipid binding events to membrane proteins. Proc Natl Acad Sci U S A, 2018. 115(12): p. 2976-2981.

      7. Schrecke, S., et al., Selective regulation of human TRAAK channels by biologically active phospholipids. Nature Chemical Biology, 2021. 17(1): p. 89-95.

      8. Zhu, Y., et al., Cupric Ions Selectively Modulate TRAAK-Phosphatidylserine Interactions. J Am Chem Soc, 2022. 144(16): p. 7048-7053.

      9. Tang, H., et al., The solute carrier SPNS2 recruits PI(4,5)P(2) to synergistically regulate transport of sphingosine1-phosphate. Mol Cell, 2023. 83(15): p. 2739-2752 e5.

      10. Yen, H.Y., et al., PtdIns(4,5)P(2) stabilizes active states of GPCRs and enhances selectivity of G-protein coupling. Nature, 2018. 559(7714): p. 423-427.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      I have one major concern regarding this draft of the manuscript:

      (1) In the manuscript (lines 130-31) it is stated that "About 55% (8/15) of mice with unilateral AAV-hM3Dq centered in the PMv showed an increase in LH release above 0.5ng/ml within 10-20 min following the CNO injection" However, data at time zero are not shown for 4 of the 8 "LH peak" animals. The missing data at time zero seems problematic for the analysis of the CNO-stimulated cohort. As mentioned in the manuscript, the area under the curve was calculated between the range of -10 to 20min post-injection. Because diestrus animals have spontaneous LH pulses, it is highly possible that an LH pulse is initiated in the10 minutes prior to drug delivery, as seen in the AAV-mCherry group in 1D, and similarly in 2C. Given the current form of analysis, it seems possible that a spontaneous LH pulse initiated anywhere up to 10 minutes prior to drug delivery could conceivably count as an experimentally induced "LH peak". Can you address this concern?

      We understand the reviewer’s concern about the spontaneous LH pulses. This is the reason we have been very strict on our analysis and have taken multiple approaches to analyze these data. In our hM3Dq group 55% of the animals responded to CNO with an increase in LH, while 0 responded in the negative control group. But also, in the clozapine group, where no time 0 points were missing, 100% of the animals with hM3Dq showed an LH increase after the injection while only 28% (2/7) showed the increase in the negative control group. Rigorously, the DREADDs approach doubled the chances of LH increase. Note that the spontaneous LH peaks observed in negative controls or during baseline show a very sharp increase and decrease at the next time point, while the 4 “PMv hits” without time 0 and increase in LH in the CNO-hM3Dq group showed a sustained rise after the 10 min or prolonged high LH levels (above 1ng/ml) even 30 min after the injection. But, ultimately, the cFOS levels in the PMv of CNO-hM3Dq group with increase in LH are significantly higher than in any other group and the number of cFOS neurons are highly correlated to LH levels. Another important aspect that should not be dismissed is that in this experimental design, we used unilateral injection in animals that are in a fed state, therefore the leptin role in rising LH levels is probably dampened.

      We have added a statement to clarify this issue.

      The following are minor concerns:

      a) Figure 4 a-d, it is clear that Vglut2 is absent in the VMH, but it seems more relevant to show this expression pattern in the PMv.

      We chose the VMH because it has a very dense collection of either LeprCre;VGlut2 or Vglut2 only cells and it illustrates very well the conditional Vglut2 deletion at small and high magnifications. In the PMv, however, the distribution of these cells is sparse. The reviewer is correct that for the current study, the PMv is more relevant and therefore, we have included images of the PMv showing a control and a LeprCre-Vglut2floxed animal in higher magnification.

      b) Methods section, targeting PMv: please check the injection coordinate: "dura-mater [dorsoventral -0.54]"

      Thank you for noticing this mistake, all coordinates for the injection have now been corrected (-5.4 mm, ±0.5 and -5.4mm)

      Reviewer #2 (Recommendations For The Authors):

      This is a very well-written manuscript by Saenz de Meira and colleagues on a careful study reporting on the key role of glutamate transporter vGlut2 expression in the neurons of the ventral perimammillary nucleus (PMv) of the hypothalamus expressing the leptin receptor LepRb in energy homeostasis, puberty, and estrous cyclicity. The authors first show using cre-dependent chemogenetic viral tools that the selective activation of the PMv LepRb induces luteinizing hormone (LH) release. Then the authors demonstrate that the selective invalidation of vGlut2 in LepRb-expressing cells in the all body induces obesity and mild alteration of sexual maturation in both sexes and blunted estrous cyclicity in females. Finally, the authors knock out vGlut2 in PMv neurons in which they reintroduce LepRb expression in an otherwise LepRb-null background using an AAV Cre approach. This latter very elegant experiment shows that while the sole re-expression of LepRb in PMv neurons in LepRb-null mice was shown before to restore puberty onset, deleting vGlut2 in LepRb-expressing PMv neurons blunts this effect.

      My specific comments are as follows. Please note that none of them require additional experiments and that they can be answered by amending the text.

      (1) Please provide information on the serotypes and promoters of the AAVs used in the study to enhance reproducibility.

      Thank you, serotypes and promoters have been added for all AAVs.

      (2) Please reformulate lines 220-221. Indeed, this reviewer does not agree with the fact that balanopreputial separation (BPS) is a sign of puberty completion. BPS is merely a sign of the advancement of sexual maturation, akin to vaginal opening in females. In certain mouse strains, BPS coincides with mini puberty rather than puberty. The definitive sign of puberty completion involves the presence of spermatozoa in the vas deferens (equivalent to the first ovulation/first estrus in females).

      Thank you for this remark, this statement has now been modified.

      (3) The authors convincingly show that the potential contamination of the arcuate nucleus of the hypothalamus (ARH) with the AAV injections targeted to the PMv should not account for the DREADD-mediated activation of LH release. However, do the authors believe that DREADD activation of LepRb-expressing PMv neurons, inducing cFOS expression in these neurons, could also activate ARH kisspeptin neurons (which do not express LepRb) via transsynaptic action? Alternatively, do they posit direct activation of GnRH cell bodies in the preoptic region or GnRH axon/dendrites in the ARH/median eminence region?

      Thank you for this comment. We don’t have enough evidence from this DREADDs experiment to make a strong prediction on the downstream pathways. However, as discussed, from the DREADDs khrGFP females, we observed very few kisspeptin cells expressing cFOS, reducing the evidence for a PMv to ARH kisspeptin action in this case. With the evidence from our LepR-Cre;Vglut2flox animals that showed no alterations in kiss1 gene expression but a strong decrease in GnRH release, we hypothesize that this acute activation of LH is mediated by direct inputs from PMv to GnRH neurons, while acknowledging the possible existence of alternative pathways. These arguments have been added to the discussion. 

      (4) This reviewer finds it intriguing that glutamatergic signaling is required for LepRb re-expression in the PMv to restore fertility. Given that the authors and others have shown that PMv neurons heavily express NOS1, the activity of which is known to heavily rely on glutamatergic NMDAR activation, the authors may want to contextualize their results in light of the recent study showing that NOS1 is found to be a new causative gene in people with congenital hypogonadotropic hypogonadism.

      Thank you for the advice, we have added a paragraph discussing the possible involvement of nNos from PMv neurons in the discussion.

      (5) Does the absence of vGlut2 have any impact on the obesity phenotype in mice where LepRb is selectively re-expressed in the PMv?

      We have followed the weight of these animals after the AAV injections. However, due to the difficulty of generating dual homozygous (LepRnull homozygous are infertile) and producing adequate stereotaxic injections with minimum contamination of adjacent nuclei, the groups could not be run all together and thus, we refrained from performing comparative analysis of energy balance. Analysis of body weight in LepRnull mice with reactivation of LepR in PMv neurons have been published before (Donato et al., 2011 using the Flp/Frt model and Mahany et al., 2018 using the Cre/loxP system). No difference in body weight was observed in both studies. Below is the progression of body weight in mice with reactivation of LepR and deletion of Vglut2 in PMv neurons. We added a comment on this regard.

      Author response image 1.

      Reviewer #3 (Recommendations For The Authors):

      The authors examined the effects of glutamate release from PMv LepR neurons in the regulation of puberty and reproduction in female mice. Multiple genetic mouse models were utilized to either manipulate PMv LepR neuron activities, or to delete glutamate vesicle transporters from LepR neurons. The authors have been quite rigorous in validating these models and exploring potential contaminations. Most of the data presented are solid and convincing, and support the conclusion. This reviewer has the following suggestions for the authors to further improve this work and the manuscript.

      (1) The DREADD study had some issues. For example, "2 out of 7 control mice with no AAV showed an increase in LH...", indicating that LH increase may just happen randomly. More importantly, 45% of PMv-hit mice did not show LH response to CNO, making it hard to interpret the positive LH responses from the other 55% PMv-hit mice undergoing the same treatment. Overall, there are just too many variabilities in these DREADD data for anyone to come up with a clean and convincing conclusion. This reviewer suggests repeating these experiments or removing the DREADD data altogether. After all, the rest of the results are much more convincing and stand alone to support the role of glutamate release from these PMv LepR neurons.

      We appreciate the reviewer’s concern. Indeed, LH shows spontaneous pulsatility which is one of the biggest challenges in our field. We have answered this concern for Reviewer 1 above and modified the text accordingly. We decided to keep the data in the publication because we believe that this is very important evidence supporting our observations since this is the only experiment that approaches the role of the PMv in a free-moving, ad libitum fed mouse model that is not deficient for leptin signaling or glutamatergic neurotransmission. Altogether this paper strongly supports a role for glutamate signaling on leptin’s action in reproductive function. Evidence for this role were dismissive or contentious until now.

      (2) The mCherry signals in Figure 3 are of low quality and do not look like cell bodies.

      We have now equally increased the contrast and brightness in all higher magnification images of mCherry neurons (Fig 3F, G, I and J) to improve their visibility. The lower magnification images are high quality images of areas with high density of mCherry positive neurons. Thick section (30µm) at low magnification compromises the focus at different Z-axis levels. We feel that images 3E and 3H are important to define the location of cells in the arcuate nucleus. Colocalization and mCherry expression are clear in high magnification images.

      (3) The validation of Vglut2 deletion in LepR neurons (Fig. 4A-D) is very nice and convincing, but the images are from the VMH region. Why not show the PMv region?

      As mentioned to Reviewer 1, we chose the VMH because it has a very dense collection of either LeprCre;VGlut2 or Vglut2 only cells and it illustrates very well the Vglut2 deletion at small and high magnifications. In the PMv, however, the distribution of these cells is sparce. The reviewer is correct that for the current study, the PMv is more relevant and therefore, we have included images of the PMv showing a control and a LeprCre-Vglut2floxed animal in higher magnification.

      (4) Figures 4-5 used LepR-Cre as controls, while Figure 6 used Vglut2flox as controls. Why? Also, how did the authors set up the breedings to generate "littermates" in each of these studies?

      We used the LepR-Cre as controls for our experiments since we need Cre homozygous for proper Cre expression and we had the LepR-Cre homozygous colony from the DREADDs experiment. Also, these mice had previously been thoroughly evaluated and no metabolic and/or reproductive disruption were noticed (please, see lines 213-214 of the original submission). However, our LepR-Cre colony had to be drastically reduced during COVID and suffered from unexpected Δ recombination leading to loss of Vglut2 homozygotes. To overcome these issues, we used VGlut2-floxed controls for the gene expression and GnRH immunoreactivity experiments. These mice had previously been used as controls for metabolic experiments with the LepCre-Vglut2fl genotype (Xu et al., 2013 Mol Metab), showing no deficiencies in the metabolic phenotype.

      As described in the methods section (lines 464-466 of the original preprint), to inactivate glutamate in leptin responsive cells, LepRb-Cre mice were crossed with mice carrying loxP-modified Vglut2 alleles. Our experimental mice were homozygous for the LepRb-Cre allele (LepRb_cre/cre_) and homozygous for the Vglut2-loxP allele (Vglut2_fl/fl_). Our controls consisted of mice homozygous for the Cre allele (LepRb_cre/cre_;Vglut2_+/+, named LepRb-Cre) or homozygous for the Vglut2-loxP allele (LepRb+/+;Vglut2_fl/fl, named Vglut2_flox_). Both experimental (LepRb_cre/cre_;Vglut2_fl/fl_, named LepRbΔVglut2) and control mice were derived from the same litters with parents homozygous for one of the genes and heterozygous for the other gene (LepRb_cre/cre_;Vglut2_fl/+or LepRb_cre/+;Vglut2_fl/fl_). Mice were genotyped at weaning (21 days) and again at the end of the experiments.

      (5) The labeling of Figures 5E-F is missing, making it hard to read.

      We have confirmed that Figure 5E and F were mentioned in the figure legends and in the results text. To improve the analysis of the figure we have added the Y axis titles to Figure 5 C,D, E and F, previously only shown in Fig 5A and B.

      (6) The last experiment was very nice confirming the role of glutamate release from PMv LepR neurons. However, the key phenotypes (puberty development, pregnancy) were not graphed and only stated in the text.

      Thank you for your comment. Since the key result is that none the LeprLoxTb;Vglut2flox animals showed vaginal opening or pregnancy, we don’t feel the need to graph this. All the details of the reproductive and metabolic phenotyping of the Lepr-loxTB with re-expression of LepR in the PMV were described in Mahany et al., 2018.

    1. Author Response

      The following is the authors’ response to the original reviews.

      This important study shows that two methods of sleep induction in the fly, optogenetically activation of the dorsal fan-shaped body (which is rapidly reversible and maintains a neuronal activity signature similar to wakefulness), and Gaboxadol-induced sleep (which shuts down neuronal activity), produce distinct forms of sleep and have different effects on brain-wide neural activity. The majority of the conclusions of the paper are supported by compelling data, but the evidence supporting the claim that the two interventions trigger distinct transcriptional responses is incomplete.

      Thank you for the helpful and detailed reviews. We feel that these have improved the manuscript considerably, and hopefully the additional figures in this Reply letter will help further convince our readers.

      Public Review

      In this study, Anthoney and coworkers continue an important, unique, and technologically innovative line of inquiry from the van Swinderen lab aimed at furthering our understanding of the different sleep stages that may exist in Drosophila. Here, they compare the physiological and transcriptional hallmarks of sleep that have been induced by two distinct means, a pharmacological block of GABA signaling and optogenetic activation of dorsal fan-shaped-body neurons. They first employ an incredibly impressive fly-on-the-ball 2-photon functional imaging setup to monitor neural activity during these interventions, and then perform bulk RNA sequencing of fly brains at different stages. These transcriptomic analyses leads them to (a) knocking out nicotinic acetyl-choline receptor subunits and (b) knocking down AkhR throughout the fly brain testing the impact of these genetic interventions on sleep behaviors in flies. Based on this work, the authors present evidence that optogenetically and pharmacologically induced sleep produces highly distinct brain-wide effects on physiology and transcription. The study is of significant interest, is easy to read, and the figures are mostly informative. However there are features of the experimental design and the interpretation of results that diminish enthusiasm.

      a- Conditions under which sleep is induced for behavioral vs neural and transcriptional studies

      1- There is a major conceptual concern regarding the relationships between the physiological and transcriptomic effects of optogenetic and pharmacological sleep promotion, and the effects that these manipulations have on sleep behavior. The authors show that these two means of sleep-induction produce remarkably distinct physiological and transcriptional responses, however, they also show that they produce highly similar effects on sleep behavior, causing an increase in sleep through increases in the duration of sleep bouts. If dFB neurons were promoting active sleep, the sleep it produces should be more fragmented than the sleep induced by the drug, because the latter is supposed to produce quiet sleep. Yet both manipulations seem to be biasing behavior toward quiet sleep.

      This is a correct observation, which is already evident in our sleep architecture data (Figure 2E-H): chronic optogenetic sleep induction promotes longer sleep bouts that are similar in structure (bout number vs bout duration) to those produced by THIP feeding. Since our plots in Figure 2E-H follow the 5min sleep criterion cutoff, upon the Reviewer’s advice we re-analyzed our optogenetic experiments for short (1-5min) sleep. These are graphed below in Author response image 1. As can be seen, and as suspected by the Reviewer, the optogenetic manipulation does not increase the total amount of short sleep; indeed, it decreases it compared to baseline (these are for the exact same data as in Figure 2). Optogenetic sleep induction does not create a bunch of short sleep bouts.

      Author response image 1.

      Short sleep in optogenetic experiments. A. Average baseline (±SEM) 1-5min sleep across a day and night. B. Average (±SEM) 1-5min sleep in optogenenetically-activated flies, across a day and night.

      We agree with the reviewer that this observation might seem inconsistent with the idea that optogenetic activation promotes active sleep, and that short sleep is active sleep. However, it does not necessarily follow that optogenetic activation has to produce short sleep. Indeed, we know from our brain imaging data (and the associated behavioral analysis) that active sleep will persist for as long as we induce it with red light. While we have not induced it for longer than 15 minutes (Tainton-Heap et al, Current Biology, 2021; Troup et al, J. of Neuroscience, 2023), this is already clearly longer than a <5min sleep bout. So our interpretation is that the longer sleep bouts induced by optogenetic activation are prolonged active sleep, rather than quiet sleep. In other words, this artificial sleep manipulation induces prolonged active sleep, rather than many short sleep bouts. This is of course different than what happens during spontaneous sleep. We have tried to be clearer about sleep bout durations in the revised manuscript (e.g., the new Figure 3), and we now admit early in the results (lines 376-380) that that we don’t know what optogenetic activation looks like in the fly brain beyond 15 minutes.

      2- The authors show that the pharmacological block of GABA signaling and the optogenetic activation of dorsal fan-shaped-body neurons cause different responses on brain activity. Based on these recordings and the behavioral and brain transcriptomic data they then claim that these responses correspond to different sleep states and are associated with the expression and repression of a different constellation of genes. Nevertheless, neural activity in animals was recorded following short stimulations whereas behavioral and transcriptomic data were obtained following chronic stimulation. In this regard, it would be interesting to determine how the 12-hour pharmacological intervention they employed for their transcriptomic analysis changes neural activity throughout the brain - 12 hours will likely be too long for the open-cuticle preps, but an in-between time-point (e.g. 1h) would probably be equally informative.

      The longest we’ve imaged brain activity for optogenetic sleep induction is 15 minutes, as discussed above. We see no changes in activity across this time, which would normally have led to a quiet sleep stage in spontaneous sleep recordings. Whole-brain imaging after 10 hours of optogenetic sleep induction (our RNA collection timepoint) is not realistic, and even 1 hour is difficult. We have however conducted overnight electrophysiological recordings (with multichannel silicon probes), where we activated the same R23E10 neurons for successive 20-minute bouts (alternating with 20min of no red light). We are preparing this work for publication (Van De Poll, et al). We see no evidence of optogenetic activation of this circuit ever producing anything resembling quiet sleep. Since we are not in a position to provide this new electrophysiological data in the current study, we are careful to clarify that we have not investigated what brain imaging looks like after chronic optogenetic activation (lines 376-380). We are showing through diverse lines of evidence that what is called sleep can look different in flies.

      b- Efficiency of THIP treatment under different conditions

      1- There are no data to quantify how THIP alters food consumption. It is evident that flies consume it otherwise they would not show increased sleep. However, they may consume different amounts of food overall than the minus THIP controls. This might have an influence on the animal's metabolism, which could at least explain the fact that metabolism-related genes are regulated (Figure 5). Therefore, in the current state, it is not possible to be certain that gene regulation events measured in this experiment are solely due to THIP effects on sleep.

      We have two arguments against this reasonable criticism. First, as discussed above, the optogenetic flies are sleeping at least as much as the THIP-fed flies, so in principle they also might be feeding less. But we see no metabolic gene downregulation in the optogenetic dataset. We include this counterargument in the discussion (lines 752-756). Then, together with our co-author Paul Shaw we have shown that THIP-fed flies are not eating less compared to controls (Dissel et al, Current Biology, 2015), by tracking dye consumption. We show those results again below in Author response image 2 to support our reasoning that feeding is not an issue.

      Author response image 2.

      Flies were fed blue dye in their food while being sleep deprived (SD), or while being induced to sleep with 0.1mg/ml THIP in their food, or both. Dye consumption was measured in triplicate for pooled groups of 16 flies. Average absorbance at 625nm (±stan dev) is shown. Experiments were not significantly different (ANOVA of means).

      2- A similar problem exists in the sleep deprivation experiments. If flies are snapped every 20 seconds, they may not have the freedom to consume appropriate amounts of food, and therefore their consumption of THIP or ATR may be smaller than in non-sleep deprived controls. Thus, it would be crucial to know whether the flies that are sleep-deprived (i.e. shaken every 20 seconds for 12 hours) actually consume comparable amounts of food (and therefore THIP) as those that are undisturbed. If not, then perhaps the transcriptional differences between the two groups are not sleep-specific, but instead reflect varying degrees of exposure to THIP.

      Please see our response to the similar critique above, and how Figure R2 addresses this concern.

      3- The authors should further discuss the slow action of THIP perfusion vs dFB activation, especially as flies only seem to fall asleep several minutes after THIP is being washed away. Is it a technical artifact? If not, it may not be unreasonable to hypothesize that THIP, at the concentration used, could prevent flies from falling asleep, and that its removal may lower the concentration to a point that allows its sleep-promoting action. The authors could easily test this by extending THIP treatment for another 4-5 minutes.

      The reviewer is partially correct in suggesting a technical artifact: THIP does not get washed away immediately after 5min of perfusion. The drip system we employ means that THIP concentration will slowly increase to the maximum concentration of 0.2mg/ml, and then slowly get diluted away at a rate of 1.25ml/minute (this is all in the Methods). In a previous study (Yap et al, Nature Communications, 2017) we used this exact same perfusion procedure to test a range of THIP concentrations, and settled on 0.2mg/ml as the lowest that reliably induced quiet sleep within 5 minutes. Higher concentrations induced quiet sleep faster, so the alternate explanation proposed by the Reviewer is not supported. We feel that our previous electrophysiological study provided the necessary groundwork for using the same approach and dosage here for our whole-brain imaging readout.

      c- Comments regarding the behavioral assays

      1- L319-322: the authors conclude that dFB stimulation and THIP consumption have similar behavioral effects on sleep. However, this is inaccurate as in Figure S1 they explain that one increases bout number in both day and night and the other one only during the day.

      We have now added a caveat about night bout architecture being different (lines 353-356). Figure S1 is now Figure 3.

      2- The behavioral definitions used for active and quiet sleep do not fit well with strong evidence that deep sleep (defined by lowered metabolic rates) is probably most closely associated with bouts of inactivity that are much longer than the >5min duration used here, i.e., probably 30min and longer (Stahl et al. 2017 Sleep 40: zsx084). Given that the authors are providing evidence that quiet sleep is correlated with changes in the expression of metabolism related genes, they should at least discuss the fact that reductions in metabolism have been shown to occur after relatively long bouts of inactivity and might reconsider their behavioral sleep analysis (i.e., their criteria for sleep state) with this in mind.

      Interestingly, induced sleep bout durations are on average longer for the optogenetic manipulation (40min vs 25min); this was evident in Figure S1C vs S1F (now Figure 3). So as discussed above, this provides a counterargument for sleep bout duration alone being indicative of metabolic processes associated with quiet sleep: the optogenetic dataset did not uncover metabolic-related pathways as relevant to that sleep manipulation. We refer to Stahl et al, Sleep, 2017, in our discussion (lines 748-750), making exactly this point about metabolic rates being decreased in longer sleep bouts, and flowing up with our observation that optogenetic flies sleep just as much, and their bouts are actually longer. So clearly different processes must be involved.

      d- Comments regarding the recordings of neuronal activity

      1- There is an additional concern regarding the proposed active and quiet sleep states that rest at the heart of this study. Here these two states in the fly are compared to the REM and NREM sleep states observed in mammals and the parallels between active fly sleep and REM and quiet fly sleep and NREM provide the framework for the study. The establishment of such parallel sleep states in the fly is highly significant and identifying the physiological and molecular correlates of distinct sleep stages in the fly is of critical importance to the field. However, the proposal that the dorsal fan shaped body (dFB) neurons promote active sleep runs counter to the prevailing model that these neurons act as a major site of sleep homeostasis. If quiet sleep were akin to NREM, wouldn't we expect the major site of sleep homeostasis in the brain to promote it? Furthermore, the authors state that the effects of dFB neuron excitation on transcription have "almost no overlap" (line 500) with the transcriptomic effects of sleep deprivation (Supplementary Table 3), which is not what would be expected if dFB neurons are tracking sleep pressure and promoting sleep, as suggested by a growing body of convergent work summarized on page four of the manuscript. Wouldn't the 10h excitation of the dFB neurons be predicted to mimic the effects of sleep deprivation if these neurons "...serve as the discharge circuit for the insect's sleep homeostat..." (line 60)? Shouldn't their prolonged excitation produce an artificial increase in sleep drive (even during sleep) that would favor deep, restorative sleep? How do the authors interpret their results with regard to the current prevailing model that dFB neurons act as a major site of sleep homeostasis? This study could be seen as evidence against it, but the authors do not discuss this in their Discussion.

      These are all excellent and thoughtful points, which have made us re-think parts of our discussion. First off, the potential comparison with REM and NREM is entirely speculative, and we have tried to make that more obvious in introduction) and the discussion (e.g, see lines 43, 708, 818). The evidence that the FB neurons (and maybe others) are involved in the homeostatic regulation of sleep is well-supported in the literature, so that part of the discussion holds. However, we concede that the timing of our sleep manipulations could benefit from more explanation. We conducted these during the flies’ subjective day, after the animals had presumably had a good night’s sleep. This means that we induced either kind of sleep for 10 daytime hours, which presumably replaced whatever behavioural states would ‘naturally’ be happening during the day. Female flies sleep less during the day than at night, and we have shown in previous work that daytime sleep quality is different than night-time sleep (van Alphen et al, Journal of Neuroscience, 2013), leading us to suggest that most ‘deep’ or quiet sleep happens at night, for flies. Following this reasoning, daytime optogenetic activation might not be depriving flies of much quiet sleep, or accumulating a deep sleep drive as the Reviewer proposes. Rather, both induced sleep manipulations could be providing 10 hours of either kind of sleep that the flies don’t really ‘need’. Why did we design it this way? Firstly, we were interested in simply asking what these chronic sleep manipulations do to gene expression in rested flies, and how they might be similar or different. We focussed on daytime manipulations to avoid precisely the confound of sleep pressure, and also because we observed red-light artifacts at night for our optogenetic experiments (which we reported). Our sleep deprivation strategy was designed specifically as a control for the THIP (Gaboxadol) experiments, to control for non-sleep related effects of the drug (see below our rationale for why this was less crucial for the optogenetic experiments). In conclusion, we had a logical rationale for how the experiments were done, centred on the straightforward question of whether these two different approaches to sleep induction were having similar effects in well-rested flies. In retrospect, we were not anticipating the Reviewer’s thoughtful logic regarding the dFB’s potential role in also regulating deep sleep homeostasis. We now provide some discussion along these lines to make readers aware of this line of reasoning, as well as our rationale for why prolonged optogenetic sleep induction was not sleep-depriving (lines 768-777).

      2- Regarding the physiological effects of Gaboxadol, to what extent is the quieting induced by this drug reminiscent of physiology of the brains of flies spontaneously meeting the behavioral criterion for quiet sleep? Given the relatively high dose of the drug being delivered to the de-sheathed brain in the imaging experiments (at least when compared to the dose used in the fly food), one worries that the authors may be inducing a highly abnormal brain state that might bear very little resemblance to the deeply sleeping brain under normal conditions. As the authors acknowledge, it is difficult to compare these two situations. Comparing the physiological state of brains put to sleep by Gaboxadol and brains that have spontaneously entered a deep sleep state therefore seems critical.

      As discussed above, our Gaboxadol (THIP) perfusion concentration (0.2mg/ml) was the minimal dosage that effectively induced sleep within 5 minutes, based upon previously published work (Yap et al, Nature Communications, 2017). Lower concentrations were unreliable, with some never inducing sleep at all. Comparisons with feeding THIP are tenuous, and we make that clear in our discussion (lines 731-735). Nevertheless, the Reviewer makes an excellent point about comparisons with spontaneous ‘quiet’ sleep. Here, we feel well supported (please see Author response image 3 below, comparing THIP-induced sleep (this work, B) and spontaneous sleep (A) from previous study). In our previous study (Tainton-Heap et al, 2021) we showed that neural activity and connectivity decreases during spontaneous quiet sleep. This is what we also see with THIP perfusion. In contrast, in Troup et al, J. of Neuroscience (2023) we confirm that neither neural activity nor connectivity changes during optogenetic R23E10 activation, and general anesthesia – unlike THIP – does NOT produce a quiet brain state. Our finding that THIP effects are nothing like general anesthesia (at the level of brain activity levels) suggests a physiological sleep state closer to spontaneous quiet sleep. We elaborate on this important observation in our results, also pointing to crucial differences with general anesthesia (lines 411-415).

      Author response image 3.

      THIP-induced sleep resembles quiet spontaneous sleep. A. Calcium imaging data from spontaneously sleeping flies, taken from Tainton-Heap et al, 2021. Left, percent neurons active; right, mean degree, a measure connectivity among active neurons. Both measures decrease during later stages of sleep. B. Calcium imaging data from flies induced to sleep with 5min of 0.2mg/ml THIP perfusion (this study). Left, percent neurons active; right, mean degree. Both measures are significantly decreased, resembling the later stages of spontaneous sleep, which we have termed ‘quiet sleep. Hence THIP-induced sleep resembles quiet sleep. Note that the genetic background is different in A and B, hence the different baseline activity levels.

      3- There are some issues with Figure 3, in particular 3C-D. It is not clear whether these panels show representative traces or an average, however both the baseline activity and fluorescence are different between C and D, in particular in their amplitude. Therefore, it is difficult to attribute the differences between C and D to the stimulation itself or to the previously different baseline. In addition, the fact that flies with dFB activation seem to keep a basal level of locomotor activity whereas THIP-treated ones don't is quite striking, however it is not being discussed. Finally, the authors claim that the flies eventually wake up from THIP-induced sleep (L360-361), however there are no data to support this statement.

      These are representative traces, which is a way of showing the raw calcium data (Cell ID) so readers can see for themselves that one manipulation silences whereas the other does not – even though flies become inactive for both. The Y-axis scale is standard deviation of the experiment mean. Since THIP decreases neural activity, then the baseline is comparatively higher. Since optogenetic activation does not change average neural activity levels, the baseline is centered on zero. This is an outcome of our analysis method and does not reflect any ‘true’ baseline. We have now clarified this in our figure legend. We now also confess that flies rendered asleep optogenetically can be ‘twitchy’ (line 374). Finally, we show data for 3 flies that were recorded until they woke up. The rest were verified behaviorally, after the experiment. This is now explained in the Methods.

      4- In Figure 4C, it is strange that the SEM is always exactly the same across the whole experiment. Readers should be aware that there might have been an issue when plotting the figure.

      This is not a mistake, the standard errors are just all quite close (between 0.17 and 0.22). This is because of the way we did the analysis, asking how many flies responded to each stimulus event, with incremental levels of responsiveness. This is explained in the Methods. The figure makes the important point of sleep and recovery.

      e- Comments regarding the transcript analyses

      1- General comment: the title of this manuscript is inaccurate - the "transcriptome" commonly refers to the entirety of all transcripts in a cell/tissue/organ/animal (including genes that are not differentially expressed following their interventions), and it is therefore impossible to "engage two non-overlapping transcriptomes" in the same tissue. Perhaps the word "transcriptional programs" or transcriptional profiles" would be more accurate here?

      We thank the Reviewer for this advice and have changed the title as proposed.

      2- Given the sensitivity of transcriptomic methods, there is a significant concern that the optogenetic experiments are not as well controlled as they could be. Given the need for supplemental all-trans retinal (ATR) for functional light gating of channelrhodopsins in the fly, it is convenient to use flies with Gal4-driven opsin that have not been given supplemental ATR as a negative control, particularly as a control for the effects of light. However, there is another critical control to do here. Flies bearing the UAS-opsin responder element but lacking the GAL4 driver and that have been fed ATR are critical for confirming that the observed effects of optogenetic stimulation are indeed caused by the specific excitation of the targeted neurons and not due to leaky opsin expression, or the effect of ATR feeding under light stimulation or some combination of these factors. Given the sensitivity of transcriptomic methods, it would be good to see that the candidate transcripts identified by comparing ATR+ and ATR- R23E10GAL4/UAS-Chrimson flies are also apparent when comparing R23E10GAL4/UAS-Chrimson (ATR+) with UAS-Chrimson (ATR+) alone.

      We have not done these experiments on UAS-Chrimson/+ controls. Like many others in our field, we viewed non-ATR flies as the best controls, because this involves identical genotypes. Since we were however aware that ATR feeding itself could be affect gene expression, we specifically checked for this with our early (1hour) collection timepoint. We only found 26 gene expression differences between ATR and -ATR flies at this early timepoint, compared with 277 for the 10-hour timepoint. We detail this rationale in our results, explaining why this is a convincing control for ATR feeding. If there was leaky opsin expression / activity, this would have been evident in our design. Regarding the cumulative effect of light, this would also have been accounted in our design, as only 1 hour would have elapsed in our first timepoint compared to 10 hours in our second. While the Reviewer is correct in saying that parental controls are called for in many Drosophila experiments, this becomes quickly unmanageable in transcriptomic studies, which is exactly why well-designed +ATR vs -ATR comparisons in the exact same strain are most appropriate. We feel that our 1-hr timepoint mostly addresses this concern.

      3- Figures about qPCR experiments (5G and 6G) are problematic. First, whereas the authors seem satisfied with the 'good correspondence' between their RNA-seq and qPCR results, this is true for only ~9/19 genes in 5G and 2/6 genes in 6G. Whereas discrepancies are not rare between RNA-seq and qPCR, the text in L460-461 and 540-541 is misleading. In addition, it is unclear whether the n=19 in L458 refers to the number of genes tested or the number of replicates. If the qPCR includes replicates, this should be more clearly mentioned, and error bars should be added to the corresponding figures.

      We consider that our qPCR validations were convincing, as they were all mostly changed in the ‘right’ direction. We agree that are some discrepancies, so have modified our language to reflect this. We have also clarified that 19 refers to the number of genes validated by qPCR in that THIP dataset. All qPCRs involved three technical replicates. We prefer to keep these histograms the way they are to convey these simple trends. For complete transparency, we now provide a supplemental Excel worksheet with all of the qPCR data, alongside corresponding RNAseq data and stats for the selected genes (Supplementary Table 9).

      4- There is a lack of error bars for all their RNAseq and qPCR comparisons, which is particularly surprising because the authors went to great lengths and analyzed an applaudably large amount of independent biological replicates, yet the variability observed in the corresponding molecular data is not reported.

      The genes reported in each of our datasets and associated supplemental figures and tables were all significant, as determined by criteria outlined in the Methods. However, we appreciate that readers might want to get a sense of the values and variances involved, as well as access to the entire gene datasets. We now provide all of these as additional ‘sheets’ in our existing supplemental tables (S2-S7), so this should be very easy to navigate and evaluate. In addition to the previously provided lists for significant genes, in the second Excel sheet (‘All genes’) readers will be able to see the data for all 5 replicates, for the significant genes as well as all other ~15,000 genes (listed in alphabetical order). We feel that this will be a helpful resource, because admittedly significance thresholds can still be a little arbitrary and some readers might want to look up ‘their’ genes of interest.

      Comments to authors

      Other comments

      1- Text in L441 & 606 is misleading. According to ref 52, AkhR is involved specifically in starvation-induced sleep loss, and not in general sleep regulation.

      Corrected.

      2- The language used in L568-570 and 573-574 is confusing. The authors should specify that the knock down of cholinergic subunits, rather than the subunits themselves is what causes sleep to increase or decrease.

      Corrected.

      3- The authors' investigation of cholinergic receptor subunits function is very preliminary, and it is difficult to draw any conclusion from what is presented here. In particular, their behavioral data is difficult to reconcile with the RNA-seq data showing overexpression of both short sleep increasing and short sleep decreasing subunits. Without knowing where in the brain these subunits are required for controlling sleep, the data in Figure 7 is difficult to appreciate.

      We have now conducted additional experiments where we specifically knocked down these alpha receptor subunits (all 7 of them) in the R23E10 neurons. This seemed an obvious knockdown location, to determine if any of these subunits regulated activity in the same sleep promoting neurons that were the focus of this study. We found that alpha1 knockdown in these neurons had similar sleep phenotypes, which we believe is an important result. Since this functional localisation is a logical ending for the paper, we have now made it the final figure.

      Suggestions & comments

      1- It would be interesting if the authors could discuss their findings that metabolism genes are downregulated in THIP flies in the context of recent work that showed upregulation of mitochondrial ROS after sleep deprivation (Kempf et al, 2019).

      We now add the Kempf 2019 reference and allude to how those findings could be consistent with ours.

      2- The fact that THIP-induced sleep persists long after THIP removal (Fig 3D) is very intriguing and interesting. This suggests that the drug might trigger a sleep-inducing pathway that can continue on its own without the drug, once activated.

      This is correct, and in stark contrast to the optogenetic manipulation we employ, which does not appear to show such sleep inertia. We have now added a sentence highlighting this interesting difference (lines 394-396).

      3- The authors identify many new genes regulated in response to specific methods for sleep induction. These are all potentially interesting candidates for further studies investigating the molecular basis of sleep. It would be interesting to know which of these genes are already known to display circadian expression patterns.

      By providing all of the gene lists, these are now available to ask questions such as these. We hesitate however to delve into this domain for this work, as our main goal was to compare these two kinds of sleep in flies.

      4- The brain-wide monitoring of neural activity invites a number of very exciting follow-up experiments - most importantly, it would be fascinating to establish, which neurons are active in the different phases the authors describe! Are these neurons that are involved in transmitting external visual stimuli to the central brain? Do they also project into the central complex? They could make use of the large collection of existing driver lines in the fly and they could also exploit the extraordinary knowledge of the connectome and transcriptome of the fly brain.

      Thank you for sharing our enthusiasm for these likely future directions.

      5- The Dalpha2,3,4,6 and 7 Knock-out strains they generate will be a useful reagent for the Drosophila neuroscience community once the efficiency/success of the knock-out has been confirmed by qPCR.

      These knockout strains have all been confirmed by our co-authors Hang Luong, Trent Perry, and Philip Batterham. These knockout confirmations are outlined in publications that we reference (Perry et al, 2021).

      Materials and methods:

      1- This study has employed custom-built apparatus and custom-written code/scripts, but these do not appear to be available to the reader. For the sake of replicability, the authors should make these available.

      The code/scripts are available via the University of Queensland research data management system as described in the Methods, and can be sent by the Lead Contact. The imaging hardware and analysis code are identical to what was described in a previous publication, and available as directed therein (Tainton-Heap et al, 2021).

      2- Also, the authors should give details on the food used to rear their flies. Fly media comes in several common forms and sleep is sensitive to diet.

      This has now been elaborated in the beginning of the Methods.

      3- The light regime used for optogenetic excitation of dFB neurons consists of 12h of uninterrupted bright red LED light. Most optogenetic stimulations consist of pulsed high frequency flashes interlaced with pauses in illumination. Can dFB neurons be driven constitutively with 12 hours of bright light?

      We showed in Tainton-Heap (2021) that 7Hz pulsed red light had exactly the same effect on R23E10/Chrimson readouts as continuous red light, which is why we opted here to provide continuous red light. That optogenetic sleep induction can be driven continuously for 12 hours is evident by our 24-hour sleep profiles. However, we agree that one could question whether sleep quality is similar after 12 hours. To address this, we did an additional experiment where we stimulated the flies hourly, to determine if their behavioural responsiveness to mechanical stimuli changed over the course of continued sleep induction, for both optogenetic and THIP-induced sleep. We present the data below in Author response image 4. As can be seen in these new analyses, while optogenetic sleep induction persists across 12 daytime hours (speed is close to zero throughout), flies do indeed become more responsive later in the day. This could have two different interpretations: either some sleep functions are being satisfied over time, or the activation regime is becoming less effective over time. Either way, these data show that at our 10-hour daytime timepoint, unstimulated flies are still largely inactive, even though their arousal thresholds might have gradually changed; so the uninterrupted red-light regime is still effective. The comparison with THIP is interesting: here there does not seem to be a change in responsiveness over time; the drug just decreases behavioral responsiveness throughout. Together, these experiments support our view that both approaches are sleep-promoting throughout the 12-hour day, although we appreciate that sleep quality is not identical.

      Author response image 4.

      A) The average speed of baseline (grey) and optogenetically-activated flies (green) across 24 hours. Red dots indicate vibration stimulus times. B) The average speed of control (grey) and THIP-fed flies (blue) across 24 hours. Flies are all R23E10/Chrimson. N= 87 for optogenetic, n=88 for -THIP, n=85 for +THIP.

      4- The authors use the SNAP apparatus to prevent THIP-treated flies from sleeping to tease out possible sleep-independent effects. This is an excellent control. Why have the authors not done the same with the optogenetic treatment? It's surprising not to see this control given the concern the authors express (lines 501 - 502) that the dFB manipulation might be paralyzing awake flies, which certainly seems possible given the light regimes used. Why not test this directly with SNAP?

      We appreciate that this may have been a valuable additional control. However, we designed this control for the THIP experiments specifically because of concerns about THIP’s (yet unknown) mechanism of action in flies. THIP is a gabaergic drug with most likely many off-target effects that have little to do with sleep, hence the need for a control where we compare to flies that ingested THIP but have been prevented from sleeping. In contrast, R23E10-driven sleep induction is exactly that, a circuit when activated that induces sleep. Whatever specific neurons might really be involved, the Gal4 circuit is sleep-inducing. This is well supported by multiple publications. The most appropriate control for assessing transcriptomic effects during optogenetic sleep here is not preventing sleep, but rather no increased sleep in flies that have not ingested ATR, and comparing that to effects of ATR alone, which is what we have done. Adding a sleep-deprivation layer onto both of these analyses may have been interesting, but a lot more analyses and not strictly required to identify relevant sleep-related genes. We have rephrased the misleading sentence about paralyzing flies, to instead clarify that lack of overlap with the SD dataset suggests that optogenetic activation is not preventing sleep functions from being engaged.

      5- A pairwise comparison of ZT01 and ZT10 does not address circadian expression cycles in a meaningful way. There will be strong effects of the LD cycle here. I suggest toning this down. (Though it is gratifying to see the expected changes in the core clock genes.)

      We have changed the language from ‘circadian’ to ‘light-dark’ to address this, although have kept the word ‘circadian’ when referring specifically to genes such as per, clock, timeless, etc.

      6- Line 109: There is a reference missing.

      We now provide the relevant reference.

      Results

      1- General comment regarding the figures: a general effort could be made to improve the design and quality of the figures and make them more readable. There are a lot of issues such as stretched or misaligned text, badly drawn frames, etc.

      We think we know which figures this might relate to (e.g., Figures 3,4B), so we have adjusted where appropriate.

      2- Instead of 'dFB-induced' (e.g., L77) it would be more accurate to use 'optogenetically-induced'

      Thank you for this helpful advice. We have changed our language throughout to say ‘optognetically-induced’

      3- Figure S1 should be integrated in the main figure to make the quantification more easily 4accessible.

      We have integrated Figure S1 into the main figures. It is now Figure 3.

      5- It would be good to include red light controls in Figure 2C, E, G.

      Making Figure S1 a main figure has better highlighted the fact that we have done red light controls (‘baseline’).

      6- line 313: Fig2E-H - these graphs would benefit if the authors made it more obvious where the maximum sleep amount would fall - i.e. the combination of bouts and minutes that add up to 12 hours (and therefore the entire day/night)

      If a fly were to sleep uninterrupted for all 12 hours of a day or night, that would amount to a sleep bout 720 minutes long. We do not feel that identifying this maximum on these graphs would be helpful. It should be clear from the data that a floor is reached with very few sleep bouts exceeding 60 minutes in our paradigm. To help orient the reader though, we now clarify in the figure legend that the maximum is 720 minutes or 12 hours.

      7- Fig. 2B, D: It was not clear why the authors took the 3-day average here. Doesn't that lead to a whole range of very different behaviors? I could, perhaps naively, imagine that a fly's behavior changes after 2 days of almost-permanent sleep?

      We took the 3-day average because the effect of THIP on each successive day was not significantly different (see Author response image 5, below). Flies wake up enough to have a good feed (see Author response image 2) and then go back to sleep. Since this is however an important point raised by the reviewer, we now mention in the Methods that sleep duration was not different among the 3 averaged days and nights (lines 193-195).

      Author response image 5.

      Data from THIP feeding experiment (Figure 2B) in manuscript, separated into 3 successive days and nights, with THIP-fed flies (blue) compared to controls (white). Averages  SD are shown, samples sizes are the same as in Figure 2D. No THIP data was significantly different across days and nights (ANOVA of means).

      8- In Figure 2C the authors compare optogenetically induced to "spontaneous sleep," which I think refers to baseline sleep before stimulation, according to the figure. I think the proper comparison would be to the red light control (ATR-); though see the comment above regarding optogenetic controls).

      This information was provided in Figure S1. We now provide it as a main Figure 3, as requested above.

      We also made a point about red light having an effect at night, which is why we focussed on daytime effects for our transcriptomic comparisons. We feel that the ATR-fed flies (minus red light) are an appropriate control here for optogenetically-induced sleep: same exact genotype and ATR feeding, just no optogenetic activation. We therefor would prefer to keep these graphs as they are, especially since we show -ATR data subsequently.

      9- Figures 3A and 4A are redundant; Figure 3B has some active ROIs that are outside of the brain. I am not sure how this is possible?

      We have removed the redundant 4A and replaced it with the THIP molecule to clearly signal what this figure is focussed on. In Figure 3B (now 4B), the brain mask is a visual estimate made from the middle of the image stack. Some neurons in other layers are outside this single-layer estimate. All neurons were all accounted for.

      10- Figure 4B is confusing. It took me a while to understand and so it can do with re-drawing in a more accessible way.

      We agree that this was confusing, e.g. there were too many arrows. We have redrawn and simplified (Now 5A).

      11- The authors state that flies wake up from THIP-induced sleep on the ball, but in Figure 4D there appears to be fewer samples for flies who have woken up from THIP (3) compared to those observed before THIP administration. Are flies dying?

      None of the flies died. Most flies were removed from imaging to confirm recovery, while 3 were left in our imaging setup to measure brain activity upon recovery. These results are in Figure 5C and now clarified in the Methods.

      12- Fig5C,D: I'm surprised that by far the most significant changes (in terms of log2-FC and p-val) occur in the sleep-deprived flies? It is not clear to me what the authors mean by effects that "relate waking process"? Perhaps they could elaborate on this?

      We have removed the phrase ‘relates to waking processes’. We now also remark on the high level of fold-change in many of these genes but refrain from discussing this further in the results. It is interesting though.

      13- The sentence in L425-428 is unclear - it would be good to rephrase this.

      We have rephrased this sentence, hopefully it’s clearer now.

      14- Text in L544-545 is confusing. What do you mean by 'less clear'?

      We have replaced ‘less clear’ with ‘not dominated by a single category’.

      15- It is unclear what is the control in Fig 7A. It would be good to mention what strain was used.

      Different knockout strains had different controls. These are identified in the figure legend and Methods.

      16- L579-581: it would be helpful to include this data in a supplementary figure.

      We now provide this as a supplementary figure as requested (Supplementary Figure 6).

      17- There is no information about R57C10 in the methods - it would be good to explain which neurons this line labels, and why you chose it.

      We now clarify in the methods that R57C10-Gal4 is a pan-neural driver, and provide a reference.

      18- Table S5 - If I'm not mistaken then the first line should say 1h, not 10h.

      Corrected

    1. Author response:

      The following is the authors’ response to the original reviews.

      We would like to thank the reviewers for helping us improve our article and software. The feedback that we received was very helpful and constructive, and we hope that the changes that we have made are indeed effective at making the software more accessible, the manuscript clearer, and the online documentation more insightful as well. A number of comments related to shared concerns, such as:

      • the need to describe various processing steps more clearly (e.g. particle picking, or the nature of ‘dust’ in segmentations)

      • describing the features of Ais more clearly, and explaining how it can interface with existing tools that are commonly used in cryoET

      • a degree of subjectivity in the discussion of results (e.g. about Pix2pix performing better than other networks in some cases.)

      We have now addressed these important points, with a focus on streamlining not only the workflow within Ais but also making interfacing between Ais and other tools easier. For instance, we explain more clearly which file types Ais uses and we have added the option to export .star files for use in, e.g., Relion, or meshes instead of coordinate lists. We also include information in the manuscript about how the particle picking process is implemented, and how false positives (‘dust’) can be avoided. Finally, all reviewers commented on our notion that Pix2pix can work ‘better’ despite reaching a higher loss after training. As suggested, we included a brief discussion about this idea in the supplementary information (Fig. S6) and used it to illustrate how Ais enables iteratively improving segmentation results. 

      Since receiving the reviews we have also made a number of other changes to the software that are not discussed below but that we nonetheless hope have made the software more reliable and easier to use. These include expanding the available settings, slight changes to the image processing that can help speed it up or avoid artefacts in some cases, improving the GUI-free usability of Ais, and incorporating various tools that should help make it easier to use Ais with remote data (e.g. doing annotation on an office PC, but model training on a more powerful remote PC). We have also been in contact with a number of users of the software, who reported issues or suggested various other miscellaneous improvements, and many of whom had found the software via the reviewed preprint.

      Reviewer 1 (Public Review):

      This paper describes "Ais", a new software tool for machine-learning-based segmentation and particle picking of electron tomograms. The software can visualise tomograms as slices and allows manual annotation for the training of a provided set of various types of neural networks. New networks can be added, provided they adhere to a Python file with an (undescribed) format. Once networks have been trained on manually annotated tomograms, they can be used to segment new tomograms within the same software. The authors also set up an online repository to which users can upload their models, so they might be re-used by others with similar needs. By logically combining the results from different types of segmentations, they further improve the detection of distinct features. The authors demonstrate the usefulness of their software on various data sets. Thus, the software appears to be a valuable tool for the cryo-ET community that will lower the boundaries of using a variety of machine-learning methods to help interpret tomograms. 

      We thank the reviewer for their kind feedback and for taking the time to review our article. On the basis of their  comments, we have made a number of changes to the software, article, and documentation, that we think have helped improve the project and render it more accessible (especially for interfacing with different tools, e.g. the suggestions to describe the file formats in more detail). We respond to all individual comments one-by-one below.

      Recommendations:

      I would consider raising the level of evidence that this program is useful to *convincing* if the authors would adequately address the suggestions for improvement below.

      (1) It would be helpful to describe the format of the Python files that are used to import networks, possibly in a supplement to the paper. 

      We have now included this information in both the online documentation and as a supplementary note (Supplementary Note 1). 

      (2) Likewise, it would be helpful to describe the format in which particle coordinates are produced. How can they be used in subsequent sub-tomogram averaging pipelines? Are segmentations saved as MRC volumes? Or could they be saved as triangulations as well? More implementation details like this would be good to have in the paper, so readers don't have to go into the code to investigate. 

      Coordinates: previously, we only exported arrays of coordinates as tab-separated .txt files, compatible with e.g. EMAN2. We now added a selection menu where users can specify whether to export either .star files or tsv .txt files, which together we think should cover most software suites for subtomogram averaging. 

      Triangulations: We have now improved the functionality for exporting triangulations. In the particle picking menu, there is now the option to output either coordinates or meshes (as .obj files). This was previously possible in the Rendering tab, but with the inclusion in the picking menu exporting triangulations can now be done for all tomograms at once rather than manually one by one.

      Edits in the text: the output formats were previously not clear in the text. We have now included this information in the introduction:

      “[…] To ensure compatibility with other popular cryoET data processing suites, Ais employs file formats that are common in the field, using .mrc files for volumes, tab-separated .txt or .star files for particle datasets, and the .obj file format for exporting 3D meshes.”

      (3) In Table 2, pix2pix has much higher losses than alternatives, yet the text states it achieves fewer false negatives and fewer false positives. An explanation is needed as to why that is. Also, it is mentioned that a higher number of epochs may have improved the results. Then why wasn't this attempted? 

      The architecture of Pix2pix is quite different from that of the other networks included in the test. Whereas all others are trained to minimize a binary cross entropy (BCE) loss, Pix2pix uses a composite loss function that is a weighted combination of the generator loss and a discriminator penalty, neither of which employ BCE. However, to be able to compare loss values, we do compute a BCE loss value for the Pix2pix generator after every training epoch. This is the value reported in the manuscript and in the software. Although Pix2pix’ BCE loss does indeed diminish during training, the model is not actually optimized to minimize this particular value and a comparison by BCE loss is therefore not entirely fair to Pix2pix. This is pointed out (in brief) in the legend to the able: 

      “Unlike the other architectures, Pix2pix is not trained to minimize the bce loss but uses a different loss function instead. The bce loss values shown here were computed after training and may not be entirely comparable.”

      Regarding the extra number of epochs for Pix2pix: here, we initially ran in to the problem that the number of samples in the training data was low for the number of parameters in Pix2pix, leading to divergence later during training. This problem did not occur for most other models, so we decided to keep the data for the discussion around Table 1 and Figure 2 limited to that initial training dataset. After that, we increased the sample size (from 58 to 170 positive samples) and trained the model for longer. The resulting model was used in the subsequent analyses. This was previously implicit in the text but is now mentioned explicitly and in a new supplementary figure. 

      “For the antibody platform, the model that would be expected to be one of the worst based on the loss values, Pix2pix, actually generates segmentations that are seem well-suited for the downstream processing tasks. It also output fewer false positive segmentations for sections of membranes than many other models, including the lowest-loss model UNet. Moreover, since Pix2pix is a relatively large network, it might also be improved further by increasing the number of training epochs. We thus decided to use Pix2pix for the segmentation of antibody platforms, and increased the size of the antibody platform training dataset (from 58 to 170 positive samples) to train a much improved second iteration of the network for use in the following analyses (Fig. S6).”

      (4) It is not so clear what absorb and emit mean in the text about model interactions. A few explanatory sentences would be useful here. 

      We have expanded this paragraph to include some more detail.

      “Besides these specific interactions between two models, the software also enables pitching multiple models against one another in what we call ‘model competition’. Models can be set to ‘emit’ and/or ‘absorb’ competition from other models. Here, to emit competition means that a model’s prediction value is included in a list of competing models. To absorb competition means that a model’s prediction value will be compared to all values in that list, and that this model’s prediction value for any pixel will be set to zero if any of the competing models’ prediction value is higher. On a pixel-by-pixel basis, all models that absorb competition are thus suppressed whenever their prediction value for a pixel is lower than that of any of the emitting models.”

      (5) Under Figure 4, the main text states "the model interactions described above", but because multiple interactions were described it is not clear which ones they were. Better to just specify again. 

      Changed as follows:

      “The antibody platform and antibody-C1 complex models were then applied to the respective datasets, in combination with the membrane and carbon models and the model interactions described above (Fig. 4b): the membrane avoiding carbon, and the antibody platforms colocalizing with the resulting membranes”.

      (6) The next paragraph mentions a "batch particle picking process to determine lists of particle coordinates", but the algorithm for how coordinates are obtained from segmented volumes is not described. 

      We have added a paragraph to the main text to describe the picking process:

      “This picking step comprises a number of processing steps (Fig. S7). First, the segmented (.mrc) volumes are thresholded at a user-specified level. Second, a distance transform of the resulting binary volume is computed, in which every nonzero pixel in the binary volume is assigned a new value, equal to the distance of that pixel to the nearest zero-valued pixel in the mask. Third, a watershed transform is applied to the resulting volume, so that the sets of pixels closest to any local maximum in the distance transformed volume are assigned to one group. Fourth, groups that are smaller than a user-specified minimum volume are discarded. Fifth, groups are assigned a weight value, equal to the sum of the prediction value (i.e. the corresponding pixel value in the input .mrc volume) of the pixels in the group. For every group found within close proximity to another group (using a user-specified value for the minimum particle spacing), the group with the lower weight value is discarded. Finally, the centroid coordinate of the grouped pixels is considered the final particle coordinate, and the list of all

      coordinates is saved in a tab-separated text file.

      “As an alternative output format, segmentations can also be converted to and saved as triangulated meshes, which can then be used for, e.g., membrane-guided particle picking. After picking particles, the resulting coordinates are immediately available for inspection in the Ais 3D renderer (Fig. S8).“

      The two supplementary figures are pasted below for convenience. Fig. S7 is new, while Fig. S8 was previously Fig. S10 -the reference to this figure was originally missing in the main text, but is now included.

      (7) In the Methods section, it is stated that no validation splits are used "in order to make full use of an input set". This sounds like an odd decision, given the importance of validation sets in the training of many neural networks. Then how is overfitting monitored or prevented? This sounds like a major limitation of the method. 

      In our experience, the best way of preparing a suitable model is to (iteratively) annotate a set of training images and visually inspect the result. Since the manual annotation step is the bottleneck in this process, we decided not to use validation split in order to make full use of an annotated training dataset (i.e. a validation split of 20% would mean that 20% of the manually annotated training data is not used for training)

      We do recognize the importance of using separate data for validation, or at least offering the possibility of doing so. We have now added a parameter to the settings (and made a Settings menu item available in the top menu bar) where users can specify what fraction (0, 10, 20, or 50%) of training datasets should be set aside for validation. If the chosen value is not 0%, the software reports the validation loss as well as the size of the split during training, rather than (as was done previously) the training loss. We have, however, set the default value for the validation split to 0%, for the same reason as before. We also added a section to the online documentation about using validation splits, and edited the corresponding paragraph in the methods section:

      “The reported loss is that calculated on the training dataset itself, i.e., no validation split was applied. During regular use of the software, users can specify whether to use a validation split or not. By default, a validation split is not applied, in order to make full use of an input set of ground truth annotations. Depending on the chosen split size, the software reports either the overall training loss or the validation loss during training.”

      (8) Related to this point: how is the training of the models in the software modelled? It might be helpful to add a paragraph to the paper in which this process is described, together with indicators of what to look out for when training a model, e.g. when should one stop training? 

      We have expanded the paragraph where we write about the utility of comparing different networks architectures to also include a note on how Ais facilitates monitoring the output of a model during training:

      “When taking the training and processing speeds in to account as well as the segmentation results, there is no overall best architecture. We therefore included multiple well-performing model architectures in the final library, in order to allow users to select from these models to find one that works well for their specific datasets. Although it is not necessary to screen different network architectures and users may simply opt to use the default (VGGNet), these results thus show that it can be useful to test different networks in order to identify one that is best. Moreover, these results also highlight the utility of preparing well-performing models by iteratively improving training datasets and re-training models in a streamlined interface. To aid in this process, the software displays the loss value of a network during training and allows for the application of models to datasets during training. Thus, users can inspect how a model’s output changes during training and decide whether to interrupt training and improve the training data or choose a different architecture.”

      (9) Figure 1 legend: define the colours of the different segmentations. 

      Done

      (10) It may be better to colour Figure 2B with the same colours as Figure 2A. 

      We tried this, but the effect is that the underlying density is much harder to see. We think the current grayscale image paired with the various segmentations underneath is better for visually identifying which density corresponds to membranes, carbon film, or antibody platforms.

      Reviewer 2 (Public Review):

      Summary: 

      Last et al. present Ais, a new deep learning-based software package for the segmentation of cryo-electron tomography data sets. The distinguishing factor of this package is its orientation to the joint use of different models, rather than the implementation of a given approach. Notably, the software is supported by an online repository of segmentation models, open to contributions from the community. 

      The usefulness of handling different models in one single environment is showcased with a comparative study on how different models perform on a given data set; then with an explanation of how the results of several models can be manually merged by the interactive tools inside Ais. 

      The manuscripts present two applications of Ais on real data sets; one is oriented to showcase its particlepicking capacities on a study previously completed by the authors; the second one refers to a complex segmentation problem on two different data sets (representing different geometries as bacterial cilia and mitochondria in a mouse neuron), both from public databases. 

      The software described in the paper is compactly documented on its website, additionally providing links to some YouTube videos (less than an hour in total) where the authors videocapture and comment on major workflows. 

      In short, the manuscript describes a valuable resource for the community of tomography practitioners. 

      Strengths: 

      A public repository of segmentation models; easiness of working with several models and comparing/merging the results. 

      Weaknesses: 

      A certain lack of concretion when describing the overall features of the software that differentiate it from others. 

      We thank the reviewer for their kind and constructive feedback. Following the suggestion to use the Pix2pix results to illustrate the utility of Ais for analyzing results, we have added a new supplementary figure (Fig. S6) and brief discussion, showing the use of Ais in iteratively improving segmentation results. We have also expanded the online documentation and included a note in the supplementary information about how models are saved/loaded (Supplemetary note 1) 

      Recommendations:

      I would like to ask the authors about some concerns about the Ais project as a whole: 

      (1) The website that accompanies the paper (aiscryoet.org), albeit functional, seems to be in its first steps. Is it planned to extend it? In particular, one of the major contributions of the paper (the maintenance of an open repository of models) could use better documentation describing the expected formats to submit models. This could even be discussed in the supplementary material of the manuscript, as this feature is possibly the most distinctive one of the paper. Engaging third-party users would require giving them an easier entry point, and the superficial mention of this aspect in the online documentation could be much more generous.

      We have added a new page to the online documentation, titled ‘Sharing models’ where we include an explanation of the structure of model files and demonstrate the upload page. We also added a note to the Supplementary Information that explains the file format for models, and how they are loaded/saved (i.e., that these standard keras model obects). 

      To make it easier to interface Ais with other tools, we have now also made some of the core functionality available (e.g. training models, batch segmentation) via the command line interface. Information on how to use this is included in the online documentation. All file formats are common formats used in cryoET, so that using Ais in a workflow with, e.g. AreTomo -> Ais -> Relion should now be more straightforward.

      (2) A different major line advanced by the authors to underpin the novelty of the software, is its claimed flexibility and modularity. In particular, the restrictions of other packages in terms of visualization and user interaction are mentioned. Although in the manuscript it is also mentioned that most of the functionalities in Ais are already available in major established packages, as a reader I am left confused about what exactly makes the offer of Ais different from others in terms of operation and interaction: is it just the two aspects developed in the manuscript (possibility of using different models and tools to operate model interaction)? If so, it should probably be stated; but if the authors want to pinpoint other aspects of the capacity of Ais to drive smoothly the interactions, they should be listed and described, instead of leaving it as an unspecific comment. As a potential user of Ais, I would suggest the authors add (maybe in the supplementary material) a listing of such features. Figure 1 does indeed carry the name "overview of (...) functionalities", but it is not clear to me which functionalities I can expect to be absent or differently solved on the other tools they mention.

      We have rewritten the part of the introduction where we previously listed the features as below. We think it should now be clearer for the reader to know what features to expect, as well as how Ais can interface with other software (i.e. what the inputs and outputs are). We have also edited the caption for Figure 1 to make it explicit that panels A to C represent the annotation, model preparation, and rendering steps of the Ais workflow and that the images are screenshots from the software.

      “In this report we present Ais, an open-source tool that is designed to enable any cryoET user – whether experienced with software and segmentation or a novice – to quickly and accurately segment their cryoET data in a streamlined and largely automated fashion. Ais comprises a comprehensive and accessible user interface within which all steps of segmentation can be performed, including: the annotation of tomograms and compiling datasets for the training of convolutional neural networks (CNNs), training and monitoring performance of CNNs for automated segmentation, 3D visualization of segmentations, and exporting particle coordinates or meshes for use in downstream processes. To help generate accurate segmentations, the software contains a library of various neural network architectures and implements a system of configurable interactions between different models. Overall, the software thus aims to enable a streamlined workflow where users can interactively test, improve, and employ CNNs for automated segmentation. To ensure compatibility with other popular cryoET data processing suites, Ais employs file formats that are common in the field, using .mrc files for volumes, tab-separated .txt or .star files for particle datasets, and the .obj file format for exporting 3D meshes.”

      “Figure 1 – an overview of the user interface and functionalities. The various panels represent sequential stages in the Ais processing workflow, including annotation (a), testing CNNs (b), visualizing segmentation (c). These images (a-c) are unedited screenshots of the software. a) […]”

      (3) Table 1 could have the names of the three last columns. The table has enough empty space in the other columns to accommodate this. 

      Done.

      (4) The comment about Pix2pix needing a larger number of training epochs (being a larger model than the other ones considered) is interesting. It also lends itself for the authors to illustrate the ability of their software to precisely do this: allow the users to flexibly analyze results and test hypothesis

      Please see the response to Reviewer 1 comment #3. We agree that this is a useful example of the ability to iterate between annotation and training, and have added an explicit mention of this in the text:

      “Moreover, since Pix2pix is a relatively large network, it might also be improved further by increasing the number of training epochs. In a second iteration of annotation and training, we thus increased the size of the antibody platform training dataset (from 58 to 170 positive samples) and generated an improved Pix2pix model for use in the following analyses.”

      Reviewer 3 (Public Review):

      We appreciate the reviewer’s extensive and very helpful feedback and are glad to read that they consider Ais potentially quite useful for the users. To address the reviewer’s comments, we have made various edits to the text, figures, and documentation, that we think have helped improve the clarity of our work. We list all edits below. 

      Summary

      In this manuscript, Last and colleagues describe Ais, an open-source software package for the semi-automated segmentation of cryo-electron tomography (cryo-ET) maps. Specifically, Ais provides a graphical user interface (GUI) for the manual segmentation and annotation of specific features of interest. These manual annotations are then used as input ground-truth data for training a convolutional neural network (CNN) model, which can then be used for automatic segmentation. Ais provides the option of several CNNs so that users can compare their performance on their structures of interest in order to determine the CNN that best suits their needs. Additionally, pre-trained models can be uploaded and shared to an online database. 

      Algorithms are also provided to characterize "model interactions" which allows users to define heuristic rules on how the different segmentations interact. For instance, a membrane-adjacent protein can have rules where it must colocalize a certain distance away from a membrane segmentation. Such rules can help reduce false positives; as in the case above, false negatives predicted away from membranes are eliminated. 

      The authors then show how Ais can be used for particle picking and subsequent subtomogram averaging and for the segmentation of cellular tomograms for visual analysis. For subtomogram averaging, they used a previously published dataset and compared the averages of their automated picking with the published manual picking. Analysis of cellular tomogram segmentation was primarily visual. 

      Strengths:

      CNN-based segmentation of cryo-ET data is a rapidly developing area of research, as it promises substantially faster results than manual segmentation as well as the possibility for higher accuracy. However, this field is still very much in the development and the overall performance of these approaches, even across different algorithms, still leaves much to be desired. In this context, I think Ais is an interesting package, as it aims to provide both new and experienced users with streamlined approaches for manual annotation, access to a number of CNNs, and methods to refine the outputs of CNN models against each other. I think this can be quite useful for users, particularly as these methods develop. 

      Weaknesses: 

      Whilst overall I am enthusiastic about this manuscript, I still have a number of comments: 

      (1) On page 5, paragraph 1, there is a discussion on human judgement of these results. I think a more detailed discussion is required here, as from looking at the figures, I don't know that I agree with the authors' statement that Pix2pix is better. I acknowledge that this is extremely subjective, which is the problem. I think that a manual segmentation should also be shown in a figure so that the reader has a better way to gauge the performance of the automated segmentation.

      Please see the answer to Reviewer 1’s comment #3.

      (2) On page 7, the authors mention terms such as "emit" and "absorb" but never properly define them, such that I feel like I'm guessing at their meaning. Precise definitions of these terms should be provided. 

      We have expanded this paragraph to include some more detail:

      “Besides these specific interactions between two models, the software also enables pitching multiple models against one another in what we call ‘model competition’. Models can be set to ‘emit’ and/or ‘absorb’ competition from other models. Here, to emit competition means that a model’s prediction value is included in a list of competing models. To absorb competition means that a model’s prediction value will be compared to all values in that list, and that this model’s prediction value for any pixel will be set to zero if any of the competing models’ prediction value is higher. On a pixel-by-pixel basis, all models that absorb competition are thus suppressed whenever their prediction value for a pixel is lower than that of any of the emitting models.” 

      (3) For Figure 3, it's unclear if the parent models shown (particularly the carbon model) are binary or not.

      The figure looks to be grey values, which would imply that it's the visualization of some prediction score. If so, how is this thresholded? This can also be made clearer in the text. 

      The figures show the grayscale output of the parent model, but this grayscale output is thresholded to produce a binary mask that is used in an interaction. We have edited the text to include a mention of thresholding at a user-specified threshold value:

      “These interactions are implemented as follows: first, a binary mask is generated by thresholding the parent model’s predictions using a user-specified threshold value. Next, the mask is then dilated using a circular kernel with a radius 𝑅, a parameter that we call the interaction radius. Finally, the child model’s prediction values are multiplied with this mask.”

      To avoid confusion, we have also edited the figure to show the binary masks rather than the grayscale segmentations. 

      (4) Figure 3D was produced in ChimeraX using the hide dust function. I think some discussion on the nature of this "dust" is in order, e.g. how much is there and how large does it need to be to be considered dust? Given that these segmentations can be used for particle picking, this seems like it may be a major contributor to false positives. 

      ‘Dust’ in segmentations is essentially unavoidable; it would require a perfect model that does not produce any false positives. However, when models are sufficiently accurate, the volume of false positives is typically smaller than that of the structures that were intended to be segmented. In these cases, discarding particles based on size is a practical way of filtering the segmentation results. Since it is difficult to generalize when to consider something ‘dust’ we decided to include this additional text in the Method’s section rather than in the main text:

      “… with the use of the ‘hide dust’ function (the same settings were used for each panel, different settings used for each feature).

      This ‘dust’ corresponds to small (in comparison to the segmented structures of interest) volumes of false positive segmentations, which are present in the data due to imperfections in the used models. The rate and volume of false positives can be reduced either by improving the models (typically by including more examples of the images of what would be false negatives or positives in the training data) or, if the dust particles are indeed smaller than the structures of interest, they can simply be discarded by filtering particles based on their volume, as applied here. In particle picking a ‘minimum particle volume’ is specified – particles with a smaller volume are considered ‘dust’.

      In combination with the newly included text about the method of converting volumes into lists of coordinates (see Reviewer 1’s comment #6).

      “Third, a watershed transform is applied to the resulting volume, so that the sets of pixels closest to any local maximum in the distance transformed volume are assigned to one group. Fourth, groups that are smaller than a user-specified minimum volume are discarded…”

      We think it should now be clearer that (some form of) discarding ‘dust’ is a step that is typically included in the particle picking process.

      (5) Page 9 contains the following sentence: "After selecting these values, we then launched a batch particle picking process to determine lists of particle coordinates based on the segmented volumes." Given how important this is, I feel like this requires significant description, e.g. how are densities thresholded, how are centers determined, and what if there are overlapping segmentations? 

      Please see the response to Reviewer 1’s comment #6.

      (6) The FSC shown in Figure S6 for the auto-picked maps is concerning. First, a horizontal line at FSC = 0 should be added. It seems that starting at a frequency of ~0.045, the FSC of the autopicked map increases above zero and stays there. Since this is not present in the FSC of the manually picked averages, this suggests the automatic approach is also finding some sort of consistent features. This needs to be discussed. 

      Thank you for pointing this out. Awkwardly, this was due to a mistake made while formatting the figure. In the two separate original plots, the Y axes had slightly different ranges, but this was missed when they were combined to prepare the joint supplementary figure. As a result, the FSC values for the autopicked half maps are displayed incorrectly. The original separate plots are shown below to illustrate the discrepancy:

      Author response image 1.

      The corrected figure is Figure S9 in the manuscript. The values of 44 Å and 46 Å were not determined from the graph and remain unchanged.

      (7) Page 11 contains the statement "the segmented volumes found no immediately apparent false positive predictions of these pores". This is quite subjective and I don't know that I agree with this assessment. Unless the authors decide to quantify this through subtomogram classification, I don't think this statement is appropriate. 

      We originally included this statement and the supplementary figure because we wanted to show another example of automated picking, this time in the more crowded environment of the cell. We do agree that it requires better substantiation, but also think that the demonstration of automated picking of the antibody platforms and IgG3-C1 complexes for subtomogram averaging suffices to demonstrate Ais’ picking capabilities. Since the supplementary information includes an example of picked coordinates rendered in the Ais 3D viewer (Figure S7) that also used the pore dataset, we still include the supplementary figure (S10) but have edited the statement to read:

      “Moreover, we could identify the molecular pores within the DMV, and pick sets of particles that might be suitable for use in subtomogram averaging (see Fig. S11).”

      We have also expanded the text that accompanies the supplementary figure to emphasize that results from automated picking are likely to require further curation, e.g. by classification in subtomogram averaging, and that the selection of particles is highly dependent on the thresholds used in the conversion from volumes to lists of coordinates.

      (8) In the methods, the authors note that particle picking is explained in detail in the online documentation. Given that this is a key feature of this software, such an explanation should be in the manuscript. 

      Please see the response to Reviewer 1’s comment #6. 

      Recommendations:

      (9) The word "model" seems to be used quite ambiguously. Sometimes it seems to refer to the manual segmentations, the CNN architectures, the trained models, or the output predictions. More precision in this language would greatly improve the readability of the manuscript.

      This was indeed quite ambiguous, especially in the introduction. We have edited the text to be clearer on these differences. The word ‘model’ is now only used to refer to trained CNNs that segment a particular feature (as in ‘membrane model’ or ‘model interactions’). Where we used terms such as ‘3D models’ to describe scenes rendered in 3D, we now use ‘3D visualizations’ or similar terms. Where we previously used the term ‘models’ to refer to CNN architectures, we now use terms such as ‘neural network architectures’ or ‘architecture’. Some examples:

      … with which one can automatically segment the same or any other dataset …

      Moreover, since Pix2pix is a relatively large network, …       

      … to generate a 3D visualization of ten distinct cellular …

      … with the use of the same training datasets for all network architectures …

      In Figure 1, the text in panels D and E is illegible. 

      We have edited the figure to show the text more clearly (the previous images were unedited screenshots of the website).

      (10) Prior to the section on model interactions, I was under the impression that all annotations were performed simultaneously. I think it could be clarified that models are generated per annotation type. 

      Multiple different features can be annotated (i.e. drawn by hand by the user) at the same time, but each trained CNN only segments one feature. CNNs that output segmentations for multiple features can be implemented straightforwardly, but this introduces the need to provide training data where for every grayscale image, every feature is annotated. This can make preparing the training data much more cumbersome. Reusability of the models is also hampered. We now mention the separateness of the networks explicitly in the introduction:

      “Multiple features, such as membranes, microtubules, ribosomes, and phosphate crystals, can be segmented and edited at the same time across multiple datasets (even hundreds). These annotations are then extracted and used as ground truth labels upon which to condition multiple separate neural networks, …”

      (11) On page 6, there is the text "some features are assigned a high segmentation value by multiple of the networks, leading to ambiguity in the results". Do they mean some false features? 

      To avoid ambiguity of the word ‘features’, we have edited the sentence to read:

      “… some parts of the image are assigned a high segmentation value by multiple of the networks, leading to false classifications and ambiguity in the results.”

      (12) Figures 2 and 3 would be easier to follow if they had consistent coloring. 

      We have changed the colouring in Figure 2 to match that of Figure 3 better:

      (13) For Figure 3D, I'm confused as to why the authors showed results from the tomogram in Figure 2B. It seems like the tomogram in Figure 3C would be a more obvious choice, as we would be able to see how the 2D slices look in 3D. This would also make it easier to see the effect of interactions on false negatives. Also, since the orientation of the tomogram in 2B is quite different than that shown in 3D, it's a bit difficult to relate the two.

      We chose to show this dataset because it exemplifies the effects of both model competition and model interactions better than the tomogram in Figure 3C. See Figure 3D and Author response image 2 for a comparison:

      Author response image 2.

      (14) I'm confused as to why the tomographic data shown in Figures 4D, E, and F are black on white while all other cryo-ET data is shown as white on black. 

      The images in Figure 4DEF are now inverted.

      (15) For Figure 5, there needs to be better visual cueing to emphasize which tomographic slices are related to the segmentations in Panels A and B. 

      We have edited the figure to show more clearly which grayscale image corresponds to which segmentation:

      (16) I don't understand what I should be taking away from Figures S1 and S2. There are a lot of boxes around membrane areas and I don't know what these boxes mean. 

      We have added a more descriptive text to these figures. The boxes are placed by the user to select areas of the image that will be sampled when saving training datasets.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      The study examines how pyruvate, a key product of glycolysis that influences TCA metabolism and gluconeogenesis, impacts cellular metabolism and cell size. It primarily utilizes the Drosophila liver-like fat body, which is composed of large post-mitotic cells that are metabolically very active. The study focuses on the key observations that overexpression of the pyruvate importer MPC complex (which imports pyruvate from the cytoplasm into mitochondria) can reduce cell size in a cell-autonomous manner. They find this is by metabolic rewiring that shunts pyruvate away from TCA metabolism and into gluconeogenesis. Surprisingly, mTORC and Myc pathways are also hyper-active in this background, despite the decreased cell size, suggesting a non-canonical cell size regulation signaling pathway. They also show a similar cell size reduction in HepG2 organoids. Metabolic analysis reveals that enhanced gluconeogenesis suppresses protein synthesis. Their working model is that elevated pyruvate mitochondrial import drives oxaloacetate production and fuels gluconeogenesis during late larval development, thus reducing amino acid production and thus reducing protein synthesis. 

      Strengths: 

      The study is significant because stem cells and many cancers exhibit metabolic rewiring of pyruvate metabolism. It provides new insights into how the fate of pyruvate can be tuned to influence Drosophila biomass accrual, and how pyruvate pools can influence the balance between carbohydrate and protein biosynthesis. Strengths include its rigorous dissection of metabolic rewiring and use of Drosophila and mammalian cell systems to dissect carbohydrate:protein crosstalk. 

      Weaknesses: 

      However, questions on how these two pathways crosstalk, and how this interfaces with canonical Myc and mTORC machinery remain. There are also questions related to how this protein:carbohydrate crosstalk interfaces with lipid biosynthesis. Addressing these will increase the overall impact of the study. 

      We thank the reviewer for recognizing the significance of our work and for providing constructive feedback. Our findings indicate that elevated pyruvate transport into mitochondria acts independently of canonical pathways, such as mTORC1 or Myc signaling, to regulate cell size. To investigate these pathways, we utilized immunofluorescence with well-validated surrogate measures (p-S6 and p-4EBP1) in clonal analyses of MPC expression, as well as RNAseq analyses in whole fat body tissues expressing MPC. These methods revealed surprising hyperactivation of mTORC1 and Myc signaling in Drosophila fat body cells expressing MPC, which are dramatically smaller than control cells. One explanation of these seemingly contradictory observations could be an excess of nutrients that activate mTORC1 or Myc pathways. However, our data is inconsistent with a nutrient surplus that could explain this hyperactivation. Instead, we observed reduced amino acid abundance upon MPC expression, which is very surprising given the observed hyperactivation of mTORC1. This led us to hypothesize the existence of a feedback mechanism that senses an inappropriate reduction in cell size and activates signaling pathways to promote cell growth. The best-characterized “sizer” pathway for mammalian cells is the Cyclin D/CDK4 complex, which has been well studied in the context of cell size regulation of the cell cycle (PMID 10970848, 34022133). However, the mechanisms that sense cell size in post-mitotic cells, such as fat body cells and hepatocytes, remain poorly understood. Investigating the hypothesized size-sensing mechanisms at play here is a fascinating direction for future research.

      For the current study, we conducted epistatic analyses with mTORC1 pathway members by overexpressing PI3K and knocking down the TORC1 inhibitor Tuberous Sclerosis Complex 1 (Tsc1). These manipulations increased the size of control fat body cells but not those overexpressing the MPC (Supplementary Fig. 3c, 3d). Regarding Myc, its overexpression increased the size of both control and MPC+ clones (Supplementary Fig. 3e), but Myc knockdown had no additional effect on cell size in MPC+ clones (Supplementary Fig. 3f). These results suggest that neither mTORC1, PI3K, nor Myc is epistatic to the cell size effects of MPC expression. Consequently, we shifted our focus to metabolic mechanisms regulating biomass production and cell size.

      When analyzing cellular biomolecules contributing to biomass, we observed a significant impact on protein levels in Drosophila fat body cells and mammalian MPC-expressing HepG2 spheroids. Triglyceride abundance in MPC-expressing HepG2 spheroids and whole fat body cells showed a statistically insignificant decrease compared to controls. Furthermore, lipid droplets in fat body cells were comparable in MPC-expressing clones when normalized to cell size.

      Interestingly, RNA-seq analysis revealed modestly increased expression of fatty acid and cholesterol biosynthesis pathways in MPC-expressing fat body cells. Upregulated genes included major SREBP targets, such as ATPCL (2.08-fold), FASN1 (1.15-fold), FASN2 (1.07-fold), and ACC (1.26-fold). Since mTORC1 promotes SREBP activation and MPC-expressing cells showed elevated mTOR activity and upregulation of SREBP targets, we hypothesize that SREBP is modestly activated in these cells. Nonetheless, our data on amino acid abundance and its impact on protein synthesis activity suggest that protein abundance is likely to play a prominent causal role in regulating cell size in response to increased pyruvate transport into mitochondria.

      Reviewer #2 (Public review): 

      In this manuscript, the authors leverage multiple cellular models including the drosophila fat body and cultured hepatocytes to investigate the metabolic programs governing cell size. By profiling gene programs in the larval fat body during the third instar stage - in which cells cease proliferation and initiate a period of cell growth - the authors uncover a coordinated downregulation of genes involved in mitochondrial pyruvate import and metabolism. Enforced expression of the mitochondrial pyruvate carrier restrains cell size, despite active signaling of mTORC1 and other pathways viewed as traditional determinants of cell size. Mechanistically, the authors find that mitochondrial pyruvate import restrains cell size by fueling gluconeogenesis through the combined action of pyruvate carboxylase and phosphoenolpyruvate carboxykinase. Pyruvate conversion to oxaloacetate and use as a gluconeogenic substrate restrains cell growth by siphoning oxaloacetate away from aspartate and other amino acid biosynthesis, revealing a tradeoff between gluconeogenesis and provision of amino acids required to sustain protein biosynthesis. Overall, this manuscript is extremely rigorous, with each point interrogated through a variety of genetic and pharmacologic assays. The major conceptual advance is uncovering the regulation of cell size as a consequence of compartmentalized metabolism, which is dominant even over traditional signaling inputs. The work has implications for understanding cell size control in cell types that engage in gluconeogenesis but more broadly raise the possibility that metabolic tradeoffs determine cell size control in a variety of contexts. 

      We thank the reviewer for their thoughtful recognition of our efforts, and we are honored by the enthusiasm the reviewer expressed for the findings and the significance of our research. We share the reviewer’s opinion that our work might help to unravel metabolic mechanisms that regulate biomass gain independent of the well-known signaling pathways.

      Reviewer #3 (Public review): 

      Summary: 

      In this article, Toshniwal et al. investigate the role of pyruvate metabolism in controlling cell growth. They find that elevated expression of the mitochondrial pyruvate carrier (MPC) leads to decreased cell size in the Drosophila fat body, a transformed human hepatocyte cell line (HepG2), and primary rat hepatocytes. Using genetic approaches and metabolic assays, the authors find that elevated pyruvate import into cells with forced expression of MPC increases the cellular NADH/NAD+ ratio, which drives the production of oxaloacetate via pyruvate carboxylase. Genetic, pharmacological, and metabolic approaches suggest that oxaloacetate is used to support gluconeogenesis rather than amino acid synthesis in cells over-expressing MPC. The reduction in cellular amino acids impairs protein synthesis, leading to impaired cell growth. 

      Strengths: 

      This study shows that the metabolic program of a cell, and especially its NADH/NAD+ ratio, can play a dominant role in regulating cell growth.

      The combination of complementary approaches, ranging from Drosophila genetics to metabolic flux measurements in mammalian cells, strengthens the findings of the paper and shows a conservation of MPC effects across evolution.

      Weaknesses: 

      In general, the strengths of this paper outweigh its weaknesses. However, some areas of inconsistency and rigor deserve further attention. 

      Thank you for reviewing our manuscript and offering constructive feedback. We appreciate your recognition of the significance of our work and your acknowledgment of the compelling evidence we have presented. We have carefully revised the manuscript in line with the reviewers' recommendations.

      The authors comment that MPC overrides hormonal controls on gluconeogenesis and cell size (Discussion, paragraph 3). Such a claim cannot be made for mammalian experiments that are conducted with immortalized cell lines or primary hepatocytes. 

      We appreciate the reviewer’s insightful comment. Pyruvate is a primary substrate for gluconeogenesis, and our findings suggest that increased pyruvate transport into mitochondria increases the NADH-to-NAD+ ratio, and thereby elevates gluconeogenesis. Notably, we did not observe any changes in the expression of key glucagon targets, such as PC, PEPCK2, and G6PC, suggesting that the glucagon response is not activated upon MPC expression. By the statement referenced by the reviewer, we intended to highlight that excess pyruvate import into mitochondria drives gluconeogenesis independently of hormonal and physiological regulation. 

      It seems the reviewer might also have been expressing the sentiment that our in vitro models may not fully reflect the in vivo situation, and we completely agree.  Moving forward, we plan to perform similar analyses in mammalian models to test the in vivo relevance of this mechanism. For now, we will refine the language in the manuscript to clarify this point.

      Nuclear size looks to be decreased in fat body cells with elevated MPC levels, consistent with reduced endoreplication, a process that drives growth in these cells. However, acute, ex vivo EdU labeling and measures of tissue DNA content are equivalent in wild-type and MPC+ fat body cells. This is surprising - how do the authors interpret these apparently contradictory phenotypes? 

      We thank the reviewer for raising this important issue. The size of the nucleus is regulated by DNA content and various factors, including the physical properties of DNA, chromatin condensation, the nuclear lamina, and other structural components (PMID 32997613). Additionally, cytoplasmic and cellular volume also impact nuclear size, as extensively documented during development (PMID 17998401, PMID 32473090).

      In MPC-expressing cells, it is plausible that the reduced cellular volume impacts chromatin condensation or the nuclear lamina in a way that slightly decreases nuclear size without altering DNA content. Specifically, in our whole-fat body experiments using CG-Gal4 (as shown in Supplementary Figure 2a-c), we noted that after 12 hours of MPC expression, cell size was significantly reduced (Supplementary Figure 2c and Author Response Figure 1A). However, the reduction in nuclear size is modestly different at 24 hours and significantly different at 36 hours (Author Response Figure 1B), suggesting that the reduction in cell size is a more acute response to MPC expression, followed only later by effects on nuclear size.

      In clonal analyses, this relationship was further clarified. MPC-expressing cells with a size greater than 1000 µm² displayed nuclear sizes comparable to control cells, whereas those with a drastic reduction in cell size (less than 1000 µm²) exhibited smaller nuclei (Author Response Figure 1C and 1D). These observations collectively suggest that changes in nuclear size are more likely to be downstream rather than upstream of cell size reduction. Given that DNA content remains unaffected, we focused on investigating the rate of protein synthesis. Our findings suggest that protein synthesis might play a causal role in regulating cell size, thereby reinforcing the connection between cellular and nuclear size in this context.

      Author response image 1.<br />

      Cell Size vs. Nuclear Size in MPC-Expressing Fat Body Cells A. Cell size comparison between control (blue, ay-GFP) and MPC+ (red, ay-MPC) fat body cells over time, measured in hours after MPC expression induction. B. Nuclear area measurements from the same fat body cells in ay-GFP and ay-MPC groups. C. Scatter plot of nuclear area vs. cell area for control (ay-GFP) cells, including the corresponding R<sup>2</sup> value. D. Scatter plot of nuclear area vs. cell area for MPC-expressing (ay-MPC) cells, with the respective R² value.

      This figure highlights the relationship between nuclear and cell size in MPC-expressing fat body cells, emphasizing the distinct cellular responses observed following MPC induction.

      In Figure 4d, oxygen consumption rates are measured in control cells and those overexpressing MPC. Values are normalized to protein levels, but protein is reduced in MPC+ cells. Is oxygen consumption changed by MPC expression on a per-cell basis? 

      As described in the manuscript, MPC-expressing cells are smaller in size. In this context, we felt that it was most appropriate to normalize oxygen consumption rates (OCR) to cellular mass to enable an accurate interpretation of metabolic activity. Therefore, we normalized OCR with protein content to account for variations in cellular size and (probably) mitochondrial mass. 

      Trehalose is the main circulating sugar in Drosophila and should be measured in addition to hemolymph glucose. Additionally, the units in Figure 4h should be related to hemolymph volume - it is not clear that they are. 

      We appreciate this valuable suggestion. In the revised manuscript, we have quantified trehalose abundance in circulation and within fat bodies. As described in the Methods section and following the approach outlined in Ugrankar-Banerjee et al. (2023, we bled 10 larvae (either control or MPC-expressing) using forceps onto parafilm. From this, 2 microliters of hemolymph were collected for glucose measurement. The hemolymph was treated with trehalase overnight, and the resulting glucose derived from trehalose was measured. We have observed that trehalose levels were also elevated in hemolymph of fat body-specific MPC-expressing larvae, further supporting our conclusion that MPC expression in fat body induces a hyperglycemic state. These data are now included in Figure 4h of the revised manuscript, and the details are further mentioned in the revised materials and methods.  

      Measurements of NADH/NAD ratios in conditions where these are manipulated genetically and pharmacologically (Figure 5) would strengthen the findings of the paper. Along the same lines, expression of manipulated genes - whether by RT-qPCR or Western blotting - would be helpful to assess the degree of knockdown/knockout in a cell population (for example, Got2 manipulations in Figures 6 and S8). 

      We appreciate this suggestion, which will provide additional rigor to our study. We have already quantified NADH/NAD+ ratios in HepG2 cells under UK5099, NMN, and Asp supplementation, as presented in Figure 6k. As suggested, we have quantified the expression of Got2 manipulations mentioned in Figure 6j using RT-qPCR, this data is presented in revised Supplementary Figure 8f-h. In addition, Supplementary Figure 8i has been updated with western blot analysis of Got2 expression in knock-out cells used to perform the size analysis in HepG2 cells.

      Additionally, we have also analysed the efficiency of pcb (Supplementary Figure 6a-c), pdha (Supplementary Figure 6f-h), dlat (Supplementary Figure 6f, g and i), pepck2 (Supplementary Figure 6n-p), fbp  (Supplementary Figure 6n, m, q)  manipulations used to modulate the expression of these genes. These validations will ensure the robustness of our findings and strengthen the conclusions of our study.

      Reviewer #1 (Recommendations for the authors): 

      General questions: 

      (1) MPC over-expression in HepG2 cells altered the redox balance and the NADH/NAD+ ratio. This is suggested to help drive the metabolic rewiring from protein to carbohydrate biosynthesis. In line with this overexpression of Nmnat (which makes NAD+) or NDX rescues cell size and elevates protein biosynthesis. However, mechanistically it is unclear exactly how these redox NAD+ changes directly impact protein biosynthesis. Some additional explanations will strengthen this portion of the study. 

      Our data indicate that the altered redox state of the cell, particularly elevated NADH levels, affects the rate of protein synthesis. A similar relationship between redox balance and protein synthesis has been observed during embryonic development (PMID: 39879975), although the underlying mechanism remains uncharacterized. Our study suggests that increased NADH levels reprogram cellular carbohydrate metabolism, shifting it from glycolysis toward gluconeogenesis. This metabolic shift necessitates the use of oxaloacetate by PEPCK2, instead of its diversion toward GTP-mediated aspartate synthesis. Aspartate, which can be anaplerotically converted into glutamate and proline, plays a critical role in protein biosynthesis. Thus, the conversion of oxaloacetate to phosphoenolpyruvate represents a key metabolic node influencing protein synthesis under altered redox conditions. Additionally, since aspartate serves as a precursor for NAD biosynthesis, this may suggest a feedforward loop reinforcing the metabolic rewiring. Nonetheless, the precise relationship between NADH concentration and redox status and the regulation of protein synthesis warrants further investigation in future studies.

      (2) In the MPC1/2 (MPC+) over-expression background, can blocking of gluconeogenesis downstream in the carbohydrate synthesis pathway rescue the phenotype? 

      We knocked down FBPase (Drosophila fbp) using an RNAi construct, achieving approximately 60% reduction in FBPase expression in Drosophila. Notably, FBPase knockdown in fat body cells overexpressing MPC rescued the reduced cell size phenotype. These findings are presented in Figure 4o and Supplementary Figures 6n–q.

      (3) Biomass accrual and cell size are also influenced by lipogenesis. The study suggests mTORC and Myc are uncoupled to cell size determination per se, but how lipogenesis regulatory pathways like SREBP are impacted by MPC overexpression is not really explored. How lipid membrane synthesis inter-relates to this protein/carbohydrate crosstalk would add to the understanding of the system. 

      As mentioned above - When analyzing cellular biomolecules contributing to biomass, we observed a significant impact on protein levels in Drosophila fat body cells and mammalian MPC-expressing HepG2 spheroids. Triglyceride abundance in MPC-expressing HepG2 spheroids and whole fat body cells showed a statistically insignificant decrease compared to controls. Furthermore, lipid droplets in fat body cells were comparable in MPC-expressing clones when normalized to cell size.

      Interestingly, RNA-seq analysis revealed increased expression of fatty acid and cholesterol biosynthesis pathways in MPC-expressing fat body cells. Upregulated genes included major SREBP targets, such as ATPCL (2.08-fold), FASN1 (1.15-fold), FASN2 (1.07-fold), and ACC (1.26-fold). Since mTOR promotes SREBP activation and MPC-expressing cells showed elevated mTOR activity and upregulation of SREBP targets, we hypothesize that SREBP is modestly activated in these cells. Nonetheless, our data on amino acid abundance and its impact on protein synthesis activity suggest that protein abundance, rather than lipids, is likely to play a larger causal role in regulating cell size in response to increased pyruvate transport into mitochondria.

      Reviewer #2 (Recommendations for the authors): 

      I have only minor suggestions for the authors to consider. 

      Minor points 

      (1) Wherever possible, scale bars should be labeled with units or indicated comparisons (e.x. Supplementary Fig. 1). To make the data as accessible as possible, it would be helpful for the authors to include the data presented in Supplementary Figure 1 as an associated table as well. 

      We have corrected this in the revised manuscript and included the table. 

      (2) To support the conclusions about TCA cycle flux (lines 280-284), it will be helpful for the authors to consider relative metabolite pool sizes (which they should have on hand) in addition to labeling rate and fraction. 

      We thank the reviewer for this suggestion. We have included the metabolite counts with fractional abundance changes side by side in Supplementary Figure 5. 

      (3) believe (?) there is a typo in lines 326-328; PEPCK KO increases (not decreases) the size of spheroids/cells. 

      We thank the reviewer for pointing out this error. We have corrected this in the revised manuscript.

      (4) Supplementary Figure 7b: PHD has 3 phospho sites that have independent regulation; the specific phosphosite queried should be listed on the figure and unless all 3 sites are probed the claims about lack of change in phosphorylation (line 337) should be removed. 

      We thank the reviewer for bringing this to our attention. We have included this in the revised manuscript.

      (5) (Optional) I appreciate the effort the authors undertook to acquire cytoplasmic and mitochondrial ratios of NADH/NAD. While I recognize that many labs perform this assay, it is difficult for this reviewer to envision how accurately these values reflects the ratios present in the intact cell given how quickly these redox couples interconvert and significant post-harvest metabolic flux (see for ex PMID: 31767181), even with the extremely rapid fractionation protocol described in the methods. The present data certainly support the notion that MPC+ cells are more reduced, but these ratios may reflect a capacity for reductive metabolism rather than a bona fide NADH/NAD ratio; for example, Figure 7f shows almost identical NADH/NAD ratios in the cytoplasm and mitochondria, even though these compartments are frequently considered to have (sometimes vastly) different redox states. If the authors are willing, I would support them by including a brief discussion of the caveat of this method for new readers in the field. 

      We agree with this important note from the reviewers. This is an important caveat of the technique that we used for these analyses. We have included a description of this caveat in the manuscript (Revised Manuscripts lines 393 to 395).

      Reviewer #3 (Recommendations for the authors): 

      Minor points: 

      (1) Line 327 - "smaller" should be "bigger". 

      We thank the reviewer for pointing out this error. We have corrected this in the revised manuscript.

      (2) For Figure 7 - references to panels e and f in the text, and descriptions of e and f in the Figure Legend are switched with regard to the Figure itself. 

      We thank the reviewer for pointing out this error. We have corrected this in the revised manuscript.

      (3) Line 449 - "reduced" is missing its R 

      We thank the reviewer for pointing out this error. We have corrected this in the revised manuscript.

      (4) Some additional, careful proofreading is needed - several other punctuation errors were found. 

      We thank the reviewer for pointing out these errors. 

      We thank the reviewer for bringing this to our attention. We have conducted very careful proofreading and corrected errors.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Jin, Briggs, and colleagues use light sheet imaging to reconstruct the islet threedimensional Ca2+ network. The authors find that early/late responding (leader) cells are dynamic over time, and located at the islet periphery. By contrast, highly connected or hub cells are stable and located toward the islet center. Suggesting that the two subpopulations are differentially regulated by fuel input, glucokinase activation only influences leader cell phenotype, whereas hubs remain stable.

      Strengths:

      The studies are novel in providing the first three-dimensional snapshot of the beta cell functional network, as well as determining the localization of some of the different subpopulations identified to date. The studies also provide some consensus as to the origin, stability, and role of such subpopulations in islet function.

      We thank the reviewers for their positive assessment.

      Weaknesses:

      Experiments with metabolic enzyme activators do not take into account the influence of cell viability on the observed Ca2+ network data. Limitations of the imaging approach used need to be recognized and evaluated/discussed.

      We worked very hard to make sure the islets remained stable and healthy over the duration of imaging time course. We imaged the islet in 3D and observed that all betacells displayed glucose-dependent oscillations, which can only arise from functioning cells. From the raw calcium traces (displayed in the figures) we observed no detectable loss of signal over 60 min of continuous imaging regardless of drug treatment; this is because the laser excitation is below the bleach threshold for GCaMP6s, and it is bleaching that generates phototoxicity. To demonstrate this clearly, we performed a bleach test using 6x laser power; in this case calcium amplitude dropped 30% over a 60 min of imaging, however islet calcium oscillatory behavior was preserved. Light-sheet is well documented to be 1000x more gentle than other optical sectioning techniques, which is why it was chosen for this application.

      Regarding the limitations of imaging approach, we recognized studying islets ex vivo is necessarily performed in the absence of native surrounding tissue, as highlighted in the discussion.

      Reviewer #2 (Public Review):

      The manuscript by Erli Jin, Jennifer Briggs et al. utilizes light sheet microscopy to image islet beta cell calcium oscillations in 3D and determine where beta cell populations are located that begin and coordinate glucose-stimulated calcium oscillations. The light sheet technique allowed clear 3D mapping of beta cell calcium responses to glucose, glucokinase activation, and pyruvate kinase activation. The manuscript finds that synchronized beta-cells are found at the islet center, that leader beta cells showing the first calcium responses are located on the islet periphery, that glucokinase activation helped maintain beta cells that lead calcium responses, and that pyruvate kinase activation primarily increases islet calcium oscillation frequency. The study is well-designed, contains a significant amount of high-quality data, and the conclusions are largely supported by the results.

      It has recently been shown that beta cells within islets containing intact vasculature (such as those in a pancreatic slice) show different calcium responses compared to isolated islets (such as that shown in PMID: 35559734). It would be important to include some discussion about the potential in vitro artifacts in calcium that arise following islet isolation (this could be included in the discussion about the limitations of the study).

      Although isolated islets reproduce the slow oscillatory calcium behavior observed in vivo, we agree that missing elements such as blood flow, cholinergic innervation, and surrounding tissues may each impact islet calcium responses. Pancreatic regional blood flow also links the endocrine and exocrine signaling which can directly influence the behavior of beta cells. We have highlighted some of these issues in the discussion “In addition to α-cells, vasculature may also impact islet Ca2+ responses, and may induce additional heterogeneity in vivo.” (see line 375, Ref. 46).

      Reviewer #3 (Public Review):

      Summary:

      Jin, Briggs et al. made use of light-sheet 3D imaging and data analysis to assess the collective network activity in isolated mouse islets. The major advantage of using whole islet imaging, despite compromising on the speed of acquisition, is that it provides a complete description of the network, while 2D networks are only an approximation of the islet network. In static-incubation conditions, excluding the effects of perfusion, they assessed two subpopulations of beta cells and their spatial consistency and metabolic dependence.

      Strengths:

      The authors confirmed that coordinated Ca2+ oscillations are important for glycemic control. In addition, they definitively disproved the role of individual privileged cells, which were suggested to lead or coordinate Ca²⁺ oscillations. They provided evidence for differential regional stability, confirming the previously described stochastic nature of the beta cells that act as strongly connected hubs as well as beta cells in initiating regions (doi.org/10.1103/PhysRevLett.127.168101).

      The fact that islet cores contain beta cells that are more active and more coordinated has also been readily observed in high-frequency 2D recordings (e.g. DOI: 10.2337/db22-0952), suggesting that the high-speed capture of fast activity can partially compensate for incomplete topological information.

      They also found an increased metabolic sensitivity of mantle regions of an islet with a subpopulation of beta cells with a high probability of leading the islet activity which can be entrained by fuel input. They discuss a potential role of alpha/delta cell interaction, however relative lack of beta cells in the islet border region could also be a factor contributing to less connectivity and higher excitability.

      The Methods section contains a useful series of direct instructions on how to approach fast 3D imaging with currently available hardware and software.

      The Discussion is clear and includes most of the issues regarding the interpretation of the presented results.

      Some issues concerning inconsistencies between data presented and statements made as well as statistical analysis need to be addressed.

      Taken together it is a strong technical paper to demonstrate the stochasticity regarding the functions subpopulations of beta cells in the islets may have and how less well-resolved approaches (both missing spatial resolution as well as missing temporal resolution) led us to jump to unjustified conclusions regarding the fixed roles of individual beta cells within an islet.

      We thank the reviewers for the comments on the many strengths of the manuscript and address the specific critiques below.

      Recommendations for the authors:

      Reviewing Editor Comments:

      Essential revisions:

      (1) How useful is GK activation as a subpopulation-level perturbation, given that all beta cells would be affected? Previous studies by the authors have shown that GK gradients likely dictate subpopulation behaviour, so the concern here is that GK activation across all cells might mask the influence of such gradients i.e. a U-shaped effect. Also, does the GK activator differentially penetrate the islet such that first responders/leaders are more vulnerable than hubs?

      As we previously published, non-saturating concentrations of GK activator (as used here) have the same effect on calcium oscillations as raising glucose (PMID:33147484). In other words, the activator boosts the activity of the endogenous GK. To the second point, recent ex vivo islet studies (PMID: 28380380) document the islet penetration of a fluorescent glucose analogue within seconds even under static conditions, and in our study the islets calcium oscillations reached steady state, so we are not concerned about drug penetration. The real limitation with any drug study in the islet is that non-beta cells are also activated; this limitation is included in the discussion along with the recommendation that genetic tools are needed to assess the effect of GK activation in the various endocrine subpopulations. 

      An additional concern with the GK activation experiment is that GK activation might push beta cells into a more stressed state such that they are more susceptible to phototoxicity. Although the authors state that photobleaching is low, they provide no data to support such a statement. Given the long duration of imaging and acquisition rate, phototoxicity might be more of an issue, especially with GK activation. Some further analysis (e.g. apoptosis) would be useful here to exclude an effect of beta cell viability versus GK activation on the observed phenotype of the different subpopulations.

      Acute GK activation (for 30min) does not stress the islet; the drug has the same effect as raising glucose (PMID: 33147484). To determine whether photobleaching was impacted by GK activation, we examined the peak of consecutive oscillations in response to vehicle and GK activator. The average photobleaching was less than 2% of the calcium fluorescence over 30min of continuous imaging. Furthermore, GKa activation did not significantly increase photobleaching (see Author response image 1). 

      Author response image 1.

      To the reviewer’s second point, apoptosis cannot occur on the timescale of the drug treatment (30min), and raw calcium traces are included showing that all beta cells display oscillatory behavior throughout the course of the experiment.

      (2) The authors show that glucokinase activation increases the duration of islet calcium oscillations and in some islets (3 of 15 islets) causes "a Ca2+ plateau." The authors indicate that "Glucokinase, as the 'glucose sensor' for the β-cell, controls the input of glucose carbons into glycolysis, and opens KATP channels." It would be nice to have some experimental evidence that the change in oscillation rate caused by the glucokinase activator is due to KATP activation. This could be accomplished by treating islets with subthreshold KATP activators (e.g., diazoxide) or subthreshold KATP inhibitors (e.g., tolbutamide).

      The statement that glucokinase activation opens KATP channels was a typo; glucose metabolism closes KATP channels by raising the ATP/ADP ratio. We now include additional citations that document the relationship between GK and KATP and the oscillatory behavior. See Ref 22 (PMID: 33147484) and Ref 34 (PMID: 33147484).

      The manuscript finds that "Early phase cells were maintained to a greater degree upon GKa application." Yet GKa is proposed to activate KATP. Some discussion about how the early phase is maintained in cell populations by GKa activation in the context of KATP activity would be useful.

      As discussed above, we meant to say that GKa will close KATP and apologize for the confusion. As we mentioned in the discussion, early phase cells are most likely maintained to a great degree following GK activation as result of enhanced GK gradient and reduced effect of stochastic alpha cell input. 

      (3) Membrane potential depolarization precedes calcium channel activation and subsequent calcium entry. In many cases, electrical coupling across beta cells happens on millisecond timescale. It would be good to confirm that the calcium is showing the same time scale in terms of elevation following beta cell membrane potential depolarization. One concern is that the islet beta cells could be depolarizing at the same speed and lagging in terms of calcium channel activation and calcium entry.

      We thank the reviewer for making this point, which is almost certainly true, particularly since plasma membrane calcium influx is not the sole source of intracellular calcium. Previously published “simultaneous” recordings of Vm and calcium show their same phase relationship but do not have sufficient time resolution to capture depolarization of each cell. A quantification of phase lag would require the field to generate mice with voltage sensors expressed in beta cells; these tools are not yet available.  

      A related issue: in the text, the authors discuss changes in membrane potential (not been measured in this study), while in the figures they exclusively describe Ca2+ oscillations (which were measured). Examples are on lines 149, 150, 153, 154, 263. It is recommended that the silent and active phases in the Results section describe processes actually measured in this study as shown in 6A.

      To clarify, we did not use the term ‘membrane potential’ anywhere in the manuscript. We do sometimes refer to calcium influx as a proxy for membrane depolarization; we think this is valid given the abundant evidence that these processes are interdependent in beta cells.

      (4) It would be good to include the timing of the phases of calcium entry. When was the beta cell calcium entry monitored for the response time? Were the response times between the late and early phases consistent for each oscillation? It looks as if the start of the calcium upstroke was similar for many beta cells (such as for the Figure 2I traces). It would be nice to include a shorter time duration graph of calcium oscillation traces right when the upstroke starts. This would allow the community to observe the differences in the start time of calcium entry. 

      We agree this is an important point. We now include an inset showing the expanded time scale of the calcium upstroke in Fig.2I. The response time spread between early and late phase cells is now shown in Fig.7F (and in Author response image 2). We also quantified the coefficient of variation in the response time spread (0 = no variation and 1 = maximal variation) and found no significant differences between metabolic activators (Author response image 2). 

      Author response image 2.

      Also, for most of the GCaMP6s traces shown, the authors indicate that they are plotted as F/F0. However, this normalization (F/F0) is not done for the actual traces shown. For example, Figure 2D shows the traces starting from what looks to be 0 to 0.3 F/F0, but the traces for an F/F0 group should all start at 1. Please change this for all representative oscillations so the start of calcium entry for example traces all line up.

      This has been corrected in Fig. 2D, I and Fig. 3B. Also Fig.6 should be F not F/F0

      Reviewer #1 (Recommendations for the authors):

      (1) Line 53: "Silencing the electrical activity of these hub cells with optogenetics was found to abolish the coordination within that plane of the islet". The authors should acknowledge that studies also showed that beta cell transcription factor (Pdx1/Mafa) dosage was important for hub cell phenotype and islet function.

      Thank you, this reference to Nasteska et al. (PMID: 33514698, Ref. 16) has been added to the discussion.

      (2) Light sheet imaging is used to image the 3D islet volume. Whilst speed is undoubtedly an advantage of this technique, axial resolution is ~1.1 µm over 4 µm z-step size. How confident are the authors that single nuclei can be reliably identified given their ~6 µm size in a beta cell (e.g. do some elongated nuclear appear, which could be "doublets")?

      The axial resolution of 1.1 µm exceeds the resolution needed for the Nyquist criterion (i.e. sampling every 2-3 µm). As a practical matter, it is not possible to doublecount nuclei because the software will exclude nuclei that occupy the same volume. Only a very elongated nucleus (>10 µm) would be double counted and this does not occur.

      (3) The authors discuss the advantages of the light sheet imaging approach used, including speed and phototoxicity. Some more balance is needed here since other approaches such as two-photon excitation achieve similar speeds with much better axial resolution (see dozens of neural circuit studies).

      We are careful to point out that two-photon excitation has better axial resolution, better tissue penetration, and often higher speeds (kHz using linescans) – however these neuronal studies are limited to the cells in a few planes and the laser power is orders of magnitude higher than lightsheet. For this reason, two photon imaging has not been used to image islet calcium in three dimensions. The bottom line is lightsheet trades axial resolution for gentle volumetric imaging. 

      (4) Line 340: "Laser ablation or optogenetic inactivation of these early phase cells would be predicted to have little impact on islet function, as suggested previously by electrophysiological studies in which surface β-cells have been voltage-clamped with no impact on β-cell oscillations". This statement is slightly ambiguous since the authors showed in their previous studies that laser ablation of first responder cells/leaders was able to influence the Ca2+ network. Do the authors mean that laser ablation would only temporarily influence islet function before another cell picked up the role of a first responder/leader? As written, the sentence seems to imply that first responders/leaders are unimportant for the islet function.

      We intended to imply that the oscillatory system is sufficiently robust that a new cell take over when leader cells are ablated. We also cite Korosak et al. (PMID:34723613, Ref. 40) and Dwulet et al. (PMID: 33939712, Ref. 15) to make this point, although to clarify we are not examining first responders in this study.

      (5) Line 369: "In contrast with leader cells, we found that the highly synchronized cells are both spatially and temporally stable." The sentence needs qualifying- what would spatiotemporal stability be expected to confer on such a subpopulation?

      We believe that the spatiotemporal stability of highly synchronized cells is a consequence of beta cells in the center of the islet lacking the stochastic input of nearby alpha cells; we raise this point in the discussion: “The preponderance of α-cells on the periphery of mouse islets, which influence β-cell oscillation frequency, would be expected to disrupt β-cell synchronization on the periphery and stabilize it in the islet center – which is precisely the pattern of network activity we observed.” (see line 372). 

      (6) Line 370: "However, in conflict with the description of hub cells as intermingled with other cells throughout the islet, the location of such cells in 3D space is close to the center." The study by Johnston et al did not have the axial resolution to exclude that some cells might have been grouped together.

      We agree and have included the reviewer’s comment in the text (See line 384); that’s an important reason for conducting this 3D study.  

      (7) Line 380: "One explanation may be that paracrine communication within the islet determines which region of cells will show high or low degree. For example, more peripheral cells that are in contact with nearby δ-cells may show some suppression in their Ca2+ dynamics, and thus reduced synchronization." A potentially exciting future study. Should however probably cite DOI s41467-022-31373-6 here.

      We thank the reviewer for their input. This reference to Ren et al. (PMID:35764654) was previously included as Ref. 42 (now Ref. 45)

      Reviewer #3 (Recommendations for the authors):

      (1) There are in fact no radially oriented networks in the core of an islet (l. 130, Figure 4) apart from the fact that every hub has somewhat radially oriented edges. For radiality to have some general meaning, the normalized distance from the geometric center would need to be lower than 0.4. The networks are centrally located, which does not change the major conclusions of the study.

      Thank you for pointing out this imprecise language. We did not intend to imply that the functional network is orientated radially. We corrected the text (see line 131, 145) to indicate that the cells with high and low synchronization are distributed in a radial pattern. 

      (2) The study would benefit from acknowledging that Ca2+ influx is not a sole mechanism to drive insulin secretion and that KATP channels are not the sole target sensitive to changes in the cytosolic (global or local) ADP and ATP concentration or that there is an absolute concentration-dependence of these ligands on KATP channels. The relatively small conductance changes that have been found to be associated with active and silent phases (closing and opening of the KATP channels as interpreted by the authors, respectively, doi: 10.1152/ajpendo.00046.2013) and should be due to metabolic factors, could be also associated to desensitization of KATP channels to ATP due to the increase in cytosolic Ca2+ changes after intracellular Ca2+ flux (DOI: 10.1210/endo.143.2.8625) as they have been found to operate also at time scales, significantly faster (DOI: 10.2337/db22-0952) than reported before (refs. 21,22). Metabolic changes influence intracellular Ca2+ flux as well.

      The reviewer is absolutely correct that there are amplifying factors and other sources of calcium beyond plasma membrane influx and there are other mechanisms that regulate insulin secretion beyond calcium levels. These alternative mechanisms are introduced in Refs. 1-2, however they are not the focus of this study. 

      (3) There is no explanation for why KL divergence is so different between the pre-test regional consistency of the islets used to test the vehicle compared to those where GKa and PKa have been tested.

      We thank the reviewer for their careful observation. This arises because there are larger differences between preparations than within a preparation. This has been described previously (PMID: 16306370 and 20037650) and could be expected to account for the differences in KL divergence between animals. 

      (4) Statistical analysis would profit from testing the normality of the data distribution before choosing the statistical test and then learning the difference between parametric and nonparametric tests. For example, in Figures 3CD and 5EF, the data density is lower at the calculated mean than below and above this value and there are other examples in other figures too.

      We thank the reviewer for this very important comment, and we apologize for the oversight on our part. To address this comment, we conducted two normality tests: Anderson-Darling and Kolmogorov-Smirnov on all statistical analyses in the manuscript. If the data were not normally distributed, we changed the analysis to Wilcoxon matchedpairs signed rank test (non-parametric version of t-tests) or the Friedman test (nonparametric version of ANOVA). Three results were changed based on this statistical correction: Figure 4D, also 5F 3D (from P=0.01 to P=0.0526), Figure 5F  ¼ z-depth (P = 0.005 to P = 0.012). We have updated the manuscript methods, results, and figures accordingly. Importantly, these results did not change the main points of the paper.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript the authors re-examine the developmental origin of cortical oligodendrocyte (OL) lineage cells using a combination of strategies, focussing on the question of whether the LGE generates cortical OL cells. The paper is interesting to myelin biologists, the methods used are appropriate and, in general, the study is well-executed, thorough, and persuasive, but not 100% convincing.

      Thank you very much for approving our paper.

      Strengths, weaknesses, and recommendations:

      The first evidence presented that the LGE does not generate OLs for the cortex is that there are no OL precursors 'streaming' from the LGE during embryogenesis, unlike the MGE (Figure 1A). This in itself is not strong evidence, as they might be more dispersed. In fact, in the images shown, there is no obvious 'streaming' from the MGE either. Note that in Figure 1 there is no reference to the star that is shown in the figure.

      We totally agree with you. While OPC migration stream is not strong evidence to support that the LGE does not generate OPCs for the cortex, when considering our additional evidence, the absence of obvious 'streaming' from LGE to cortex provided supplementary support for this conclusion. Finally, we have removed the star in the figure.

      The authors then electroporate a reporter into the LGE at E13.5 and examine the fate of the electroporated cells (Figures 1C-E). They find that electroporated cells became neurons in the striatum and in the cortex but no OLs for the cortex. There are two issues with this: first, there is no quantification, which means there might indeed be a small contribution from the LGE that is not immediately obvious from snapshot images. Second, it is unexpected to find labelled neurons in the cortex at all since the LGE does not normally generate neurons for the cortex. Electroporations are quite crude experiments as targeting is imprecise and variable and not always discernible at later stages. For example, in Figure 1D, one can see tdTOM+ cells near the AEP, as well as the striatum. Hence, IUE cannot on its own be taken as proof that there is no contribution of the LGE to the cortical OL population.

      Thank you for your constructive suggestions.

      (1) Following the reviewer's suggestion, we have added these statistics, please see Figure 1F.

      (2) The reviewer raised a good point. We occasionally found a very small number of electroporated cells in the MGE/AEP VZ in our IUE system. Therefore, we can identify these electroporated cells in the cortex, most of them expressed the neuronal marker NeuN. We suspect these are MGE-derived cortical interneurons. It's worth noting that these electroporated cells (MGE-derived) are not glia cells. The probable reason may be that MGE/AEP generate cortical OPCs mainly before E13.5 (in this study we performed IUE at E13.5).

      The authors then use an alternative fate-mapping approach, again with E13.5 electroporations (Figure 2). They find only a few GFP+ cells in the cortex at E18 (Figures 2C-D) and P10 (Figure 2E) and these are mainly neurons, not OL lineage cells. Again, there is no quantification.

      Thank you very much for your suggestions. Actually, in this fate-mapping approach, the electroporated cells in the cortex is very few. We analyzed four mice, and found that all GFP positive cells (139 GFP+) did not express OLIG2, SOX10 and PDGFRA.

      Figure 3 is more convincing, but the experiments are incomplete. Here the authors generate triple-transgenic mice expressing Cre in the cortex (Emx1-Cre) and the MGE (Nkx2.1-Cre) as well as a strong nuclear reporter (H2B-GFP). They find that at P0 and P10, 97-98% of OL-lineage cells (SOX10+ or PDGFRA+) in the cortex are labelled with GFP (Figure 3). This is a more convincing argument that the LGE/CGE might not contribute significant numbers of OL lineage cells to the cortex, in contrast to the Kessaris et at. (2006) paper, which showed that Gsh2-Cre mice label ~50% of SOX10+ve cells in the motor cortex at P10. The authors of the present paper suggest that the discrepancy between their study and that of Kessaris et al. (2006) is based on the authors' previous observation (Zhang et al 2020) (https://doi.org/10.1016/j.celrep.2020.03.027) that GSH2 is expressed in intermediate precursors of the cortex from E18 onwards. If correct, then Kessaris et al. might have mistakenly attributed Gsh2-Cre+ lineages to the LGE/CGE when they were in fact intrinsic to the cortex. However, the evidence from Zhang et al 2020 that GSH2 is expressed by cortical intermediate precursors seems to rest solely on their location within the developing cortex; a more convincing demonstration would be to show that the GSH2+ putative cortical precursors co-label for EMX1 (by immunohistochemistry or in situ hybridization), or that they co-label with a reporter in Emx1-driven reporter mice. This demonstration should be simple for the authors as they have all the necessary reagents to hand. Without these additional data, the assertion that GSX2+ve cells in the cortex are derived from the cortical VZ relies partly on an act of faith on the part of the reader. Note that Tripathi et al. (2011, "Dorsally- and ventrally-derived oligodendrocytes have similar electrical properties but myelinate preferred tracts." J. Neurosci. 31, 6809-6819) found that the Gsh-Cre+ OL lineage contributed only ~20% of OLs to the mature cortex, not ~50% as reported by Kessaris et al. (2006). If it is correct that these Gsh2-derived OLs are from the cortical anlagen as the current paper claims, then it would raise the possibility that the ventricular precursors of GSH2+ intermediate progenitors are not uniformly distributed through the cortical VZ but are perhaps localized to some part of it. Then the contribution of Gsh2-derived OLs to the cortical population could depend on precisely where one looks relative to that localized source. It would be a nice addition to the current manuscript if the authors could explore the distribution of their GSH2+ intermediate precursors throughout the developing cortex. In any case, Tripathi et al. (2011) should be cited.

      Thank you for your constructive suggestions.

      (1) We used the Emx1Cre; RosaH2B-GFP mouse and found that nearly all GSX2+ cells in the cortical SVZ are derived from the Emx1+ lineage at P0 (Please see our new Figure 3-supplement 1A-C). 

      (2) According to your suggestion, we have cited this paper (Tripathi et al.) in our revised manuscript.

      (3) The study conducted by Kessaris et al. (2006) revealed that roughly 50% of cortical oligodendrocytes (OLs) originate from the Gsx2 lineage (LGE/CGE-derived). In contrast, Tripathi et al. (2011) observed that Gsx2-derived OLs contribute only around 20% to the corpus callosum (CC). To investigate the reasons behind these disparate findings, we conducted three experiments. Firstly, using Emx1Cre; RosaH2B-GFP mice, we found that approximately 89% of lateral CC (LCC) OLs originate from the Emx1 lineage, with only around 11% derived from the ventral source (refer to Author response image 1A and B below). Secondly, employing Nkx2-1Cre; RosaH2B-GFP mice, we determined that approximately 11% of LCC OLs originate from the Nkx2.1 lineage (refer to pictures C and D below). Finally, we found that approximately 98.3% of lateral LCC OLs originate from both Emx1 and Nkx2.1 lineages, with only around 1.7% possibly derived from the LGE (see Author response image 1E and F below). Taken together, our results indicate that approximately 89% of LCC OLs originate from the Emx1 lineage, while 11% of LCC OLs are derived from the medial ganglionic eminence (MGE).

      It is worth noting that OLs from Emx1 and Nkx2.1 lineages were equally distributed in the medial CC (mCC) (see Author response image 1G below). This finding suggests that MGE-derived OLs exhibit spatial heterogeneity in their distribution within the CC. These results provide evidence that the contribution of the lateral ganglionic eminence (LGE) and caudal ganglionic eminence (CGE) to CC OLs is minimal.

      Author response image 1.

      Finally, the authors deleted Olig2 in the MGE and found a dramatic reduction of PDGFRA+ and SOX10+ cells in the cortex at E14 and E16 (Figure 4A-F). This further supports their conclusion that, at least at E16, there is no significant contribution of OLs from ventral sources other than the MGE/AEP. This does not exclude the possibility that the LGE/CGE generates OLs for the cortex at later stages. Hence, on its own, this is not completely convincing evidence that the LGE generates no OL lineage cells for the cortex.

      There are three reasons why we didn't analyze Olig2-NCKO mice after E16.5. 1. The expression of Nkx2.1Cre is lower within the dorsal-most region of the MGE than other Nkx2.1-expressing regions. Even at E15.5, we can still find a small number of OPCs in the lateral cortex. We speculate that these OPCs are derived from dorsal MGE. 2. Considering the possibility of incomplete recombination in Olig2 gene locus, we guess OPCs (Olig2+) in the lateral cortex are derived from MGE. Indeed, we found a few OPCs in the MGE/AEP in the Olig2-NCKO mice (Figure 4F). 3. The recent study (bioRxiv preprint doi: https://doi.org/10.1101/2024.01.23.576886) showed that the contribution of LGE/CGE to cortical OPCs is minimal, which further supporting our findings. Taken together, our results provide additional evidence supporting the limited contribution of the LGE/CGE to cortical OPCs (OLs).

      Reviewer #2 (Public Review):

      Traditional thinking has been that cortical oligodendrocyte progenitor cells (OPCs) arise in the development of the brain from the medial ganglionic eminence (MGE), lateral/caudal ganglionic eminence (LGE/CGE), and cortical radial glial cells (RGCs). Indeed a landmark study demonstrated some time ago that cortical OPCs are generated in three waves, starting with a ventral wave derived from the medial ganglionic eminence (MGE) or the anterior entopeduncular area (AEP) at embryonic day E12.5 (Nkx2.1+ lineage), followed by a second wave of cortical OLs derived from the lateral/caudal ganglionic eminences (LGE/CGE) at E15.5 (Gsx2+/Nkx2.1- lineage), and then a final wave occurring at P0, when OPCs originate from cortical glial progenitor cells (Emx1+ lineage). However, the authors challenge the idea in this paper that cortical progenitors are produced from the LGE. They have found previously that cortical glial progenitor cells were also found to express Gsx2, suggesting this may not have been the best marker for LGE-derived OPCs. They have used fate mapping experiments and lineage analyses to suggest that cortical OPCs do not derive from the LGE.

      Strengths:

      (1) The data is high quality and very well presented, and experiments are thoughtful and elegant to address the questions being raised.

      (2) The authors use two elegant approaches to lineage trace LGE derived cells, namely fate mapping of LGE-derived OPCs by combining IUE (intrauterine electroporation) with a Cre recombinase-dependent IS reporter, and Lineage tracing of LGE-derived OPCs by combining IUE with the PiggyBac transposon system. Both approaches show convincingly that labelled LGE-derived cells that enter the cortex do not express OPC markers, but that those co-labelling with oligodendrocyte markers remain in the striatum.

      (3) The authors then use further approaches to confirm their findings. Firstly they lineage trace Emx1-Cre; Nkx2.1-Cre; H2B-GFP mice. Emx1-Cre is expressed in cortical RGCs and Nkx2.1-Cre is specifically expressed in MGE/AEP RGCs. They find that close to 98% of OPCs in the cortex co-label with GFP at later times, suggesting the contribution of OPCs from LGE is minimal.

      (4) They use one further approach to strengthen the findings yet further. They cross Nkx2.1-Cre mice with Olig2 F/+ mice to eliminate Olig2 expression in the SVZ/VZ of the MGE/AEP (Figures 4A-B). The generation of MGE/AEP-derived OPCs is inhibited in these Olig2-NCKO conditional mice. They find that the number of cortical progenitors at E16.5 is reduced 10-fold in these mice, suggesting that LGE contribution to cortical OPCs is minimal.

      We thank the reviewer for summarizing the strengths of our manuscript.

      Weaknesses:

      (1) The authors use IUE in experiments mentioned in point 2 of 'Strengths' above (Figures 1 and 2) and claim that the reporter was delivered specifically into LGE VZ at E13.5 using this IUE. It would be nice to see some sort of time course of delivery after IUE to show the reporter is limited to LGE VZ at early times post-IUE.

      Thank you very much for your suggestions. Indeed, when using IUE in our system, we occasionally found a small number of electroporated cells in the MGE/AEP VZ. Thus, we can find very few electroporated cells (MGE/AEP-derived) in the cortex and these electroporated cells are neuron (perhaps interneuron).

      (2) In the experiments mentioned in point 3 of 'Strengths' (Figure 3), statistical analysis showed that only approximately 2% of OPCs were GFP-negative cells. This 2% could possibly be derived from the LGE/CGE so does not totally rule out that LGE contributes some cortical OPCs.

      Thank you for your constructive suggestions. We apologize for any imprecise descriptions. Despite we suspect that this 2% may originate from MGE {Considering the possibility of incomplete recombination in Olig2 gene locus, we guess the OPCs (Olig2+) may be derived from MGE. Indeed, we found a few OPCs in the MGE/AEP in the Olig2-NCKO mice (Figure 4F)} or from the dMGE (The expression of Nkx2.1Cre is lower within the dorsal-most region of the MGE than in other Nkx2.1-expressing regions). Anyway, we have softened the assertion everywhere in our revised manuscript.

      (3) In the experiments mentioned in point 4 of 'Strengths' (Figure 4), they do still find cortical OPCs at E16.5 in the Olig2-NCKO conditional mice. It is unclear whether this is due to the recombination efficiency of the CRE enzyme not being 100%, or whether there is some LGE contribution to the cortical OPCs.

      This experiment alone may not provide strong evidence to support that LGE do not contribute to the cortical OPCs during development. However, when combing our other results with this result, we can confirm that the contribution of LGE to cortical OPCs is minimal. Furthermore, a recent study reported that LGE/CGE-derived OLs make minimum contributions to the neocortex and corpus callosum,which further supporting the reliability of our conclusion.

      We would like to thank the reviewers and editors for their valuable comments and suggestions again.

      Impact of Study:

      The authors show elegantly and convincingly that the contribution of the LGE to the pool of cortical OPCs is minimal. The title should perhaps be that the LGE contribution is minimal rather than no contribution at all, as they are not able to rule out some small contribution from the LGE. These findings challenge the traditional belief that the LGE contributes to the pool of cortical OPCs. The authors do show that the LGE does produce OPCs, but that they tend to remain in the striatum rather than migrate into the cortex. It is interesting to wonder why their migration patterns may be different from the MGE-derived OPCs which migrate to the cortex. The functional significance of these different sources of OPCs for adult cortex in homeostatic or disease states remains unclear though.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for The Authors):

      (1) Change the title to e.g. 'limited contribution of the LGE to cortical oligodendrocytes'. Alternatively, It might be more useful to highlight where they come from, e.g. "Cortical oligodendrocytes originate predominantly or exclusively from the MGE and cortical VZ"

      As suggested, we have changed the old title to the following: The lateral/caudal ganglionic eminence makes a limited contribution to cortical oligodendrocytes

      (2) Demonstrate using lineage tracing that GSH2+ cells in the cortex are derived from the Emx1-lineage, e.g. using immunohistochemistry for GSX2 and a reporter in Emx1-Cre mice crossed to a reporter.

      In our revised manuscript, we have added a new figure (Figure 3-supplement 1A-C) to demonstrate that the GSX2+ cells in the cortex are derived from the Emx1-lineage.

      (3) Make it clear in their discussion that they have not explored the CGE so it is possible that this region generates some OLs.

      The Emx1Cre; Nkx2.1Cre; H2B-GFP mice showed that only ~2% cortical OLs are derived from LGE/CGE. Actually, considering the efficiency of Cre enzyme recombination and the relatively low Cre activity in the dMGE of Nkx2.1Cre, the actual contribution of LGE/CGE-derived cortical OLs could be even lower than our current observation. Therefore, our results demonstrate that the LGE/CGE generate very few,possibly even no,OLs for the cortex.

      (4) Soften the assertion that the LGE does not generate any OL lineage cells that reach the cortex by e.g. changing the word 'sole' to 'predominant' (line 88) and, elsewhere in the paper, leaving open the possibility that small numbers of LGE-derived OLs might enter the cortex, depending on where exactly one looks.

      As suggested, we have softened the assertion everywhere in our manuscript.

      (5) Lines 255-260: 'First, the time window during which the MGE generates OLs is very brief, perhaps occurring before MGE neurogenesis. The high level of SHH in the MGE allows for the production of a small population of cortical OPCs around E12.5. Subsequently, multipotent intermediate progenitors begin to express DLX transcription factors resulting in ending the generation of OPCs in the MGE'. What is the evidence that OL genesis precedes neurogenesis? If there is none (as I suspect) then this statement should be removed.

      The editors raised a good point. We have no strong evidence to support that OL genesis precedes neurogenesis in MGE, thus, we removed these sentences in our manuscript.

      (6) Figure 1E should show quantification of cells as a % of electroporated cells and as a % of PDGFRA+ or OLIG2+ or SOX10+ cells, so that the reader might have a clear view of the extent of labelling.

      Done.

      (7) Figure 4: This is interesting but incomplete. At E14.5 the authors show the presence of PDGFRA+cells in the telencephalon. However, at E16.5 they show images only of the dorsal-most region of the cortex. If the LGE/CGE begins to generate OLPs for the early cortex, they would be expected to appear near the cortico-striatal boundary, as shown in Kessaris 2006 Fig1g-h. In the current manuscript, the authors do not show these regions, or the LGE and CGE, in their images. It is essential to show PDGFRA immunolabelling at the cortico-striatal boundary and also in the LGE and CGE at E16.5 in control and Olig2 mutant mice. It is also necessary to extend this analysis to E18.5, perhaps showing PDGFRA+ cells streaming from the cortical VZ/SVZ.

      There are three reasons why we didn't analyze Olig2-NCKO mice after E16.5. 1.Frankly, the expression of Nkx2.1Cre is lower within the dorsal-most region of the MGE than other Nkx2.1-expressing regions. Even at E15.5, we can still find a small number of OPCs in the lateral cortex. We guess these OPCs are derived from dMGE. 2. Considering the possibility of incomplete recombination in Olig2 gene locus, we guess OPCs (Olig2+) are derived from MGE. In fact, we found a few OPCs in the MGE/AEP in the Olig2-NCKO mice (Figure 4F). 3. The recent study (bioRxiv preprint doi: https://doi.org/10.1101/2024.01.23.576886) showed that the contribution of LGE/CGE to cortical OPCs is minimal. Taken together, our results provide additional evidence supporting the limited contribution of the LGE/CGE to cortical OPCs (OLs).

      (8) Cite Tripathi et al. (2011) and mention the disparity between the findings of that paper and Kessaris et al. (2006) and possible reasons - see main review above.

      Done.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The manuscript "Self-inhibiting percolation and viral spreading in epithelial tissue" describes a model based on 5-state cellular automata of development of an infection. The model is motivated and qualitatively justified by time-resolved measurements of expression levels of viral, interferon-producing, and antiviral genes. The model is set up in such a way that the crucial difference in outcomes (infection spreading vs. confinement) depends on the initial fraction of special virus-sensing cells. Those cells (denoted as 'type a') cannot be infected and do not support the propagation of infection, but rather inhibit it in a somewhat autocatalytic way. Presumably, such feedback makes the transition between two outcomes very sharp: a minor variation in concentration of ``a' cells results in qualitative change from one outcome to another. As in any percolation-like system, the transition between propagation and inhibition of infection goes through a critical state with all its attributes. A power-law distribution of the cluster size (corresponding to the fraction of infected cells) with a fairly universal exponent and a cutoff at the upper limit of this distribution.

      Strengths:

      The proposed model suggests an explanation for the apparent diversity of outcomes of viral infections such as COVID.

      Author response: We thank the referee for the concise and accurate summary of our work.

      Weaknesses:

      Those are not real points of weakness, though I think addressing them would substantially improve the manuscript.

      Author response: Below we will address these point by point.

      The key point in the manuscript is the reduction of actual biochemical processes to the NOVAa rules. I think more could be said about it, be it referring to a set of well-known connections between expression states of cells and their reaction to infection or justifying it as an educated guess.

      Author response: We have now improved this part in the model section. We have added a few sentences explaining how the cell state transitions are motivated by the UMAP results:

      “The cell state transitions triggered by IFN signaling or viral replication are known in viral infection, but how exactly the transitions are orchestrated for specific infections is poorly understood. The UMAP cell state distribution hints at possible preferred transitions between states. The closer two cell states are on the UMAP, the more likely transitions between them are, all else being equal. For instance, the antiviral state (𝐴) is easily established from a susceptible cell (𝑂), but not from the fully virus-hijacked cell (𝑉 ). The IFN-secreting cell state (𝑁) requires the co-presence of the viral and antiviral genes and thus the cell cluster is located between the antiviral state (𝐴) and virus-infected state (𝑉 ) but distant from the susceptible cells (𝑂).

      Inspired by the UMAP data visualization (Fig. 1a), we propose the following transitions between five main discrete cell states”

      Another aspect where the manuscript could be improved would be to look a little beyond the strange and 'not-so-relevant for a biomedical audience' focus on the percolation critical state. While the presented calculation of the precise percolation threshold and the critical exponent confirm the numerical skills of the authors, the probability that an actual infected tissue is right at the threshold is negligible. So in addition to the critical properties, it would be interesting to learn about the system not exactly at the threshold: For example, how the speed of propagation of infection depends on subcritical p_a and what is the cluster size distribution for supercritical p_a.

      Author response: We agree that further exploring the model away from the critical threshold is worthwhile. While our main focus has been on explaining the large degree of heterogeneity in outcomes – readily explained as a consequence of the sharp threshold-like behavior – we now include plots of the time-evolution of the infection (as well as the remaining states) over time for subcritical values of pa. The plots can be found in Figure S4 of the supplement.

      Reviewer #2 (Public Review):

      Xu et al. introduce a cellular automaton model to investigate the spatiotemporal spreading of viral infection. In this study, the author first analyzes the single-cell RNA sequencing data from experiments and identifies four clusters of cells at 48 hours post-viral infection, including susceptible cells (O), infected cells (V), IFN-secreting cells (N), and antiviral cells (A). Next, a cellular automaton model (NOVAa model) is introduced by assuming the existence of a transient pre-antiviral state (a). The model consists of an LxL lattice; each site represents one cell. The cells change their state following the rules depending on the interaction of neighboring cells. The model introduces a key parameter, p_a, representing the fraction of pre-antiviral state cells. Cell apoptosis is omitted in the model. Model simulations show a threshold-like behavior of the final attack rate of the virus when p_a changes continuously. There is a critical value p_c, so that when p_a < p_c, infections typically spread to the entire system, while at a higher p_a > p_c, the propagation of the infected state is inhibited. Moreover, the radius R that quantifies the diffusion range of N cells may affect the critical value p_c; a larger R yields a smaller value of the critical value p_c. The structure of clusters is different for different values of R; greater R leads to a different microscopic structure with fewer A and N cells in the final state. Compared with the single-cell RNA seq data, which implies a low fraction of IFN-positive cells - around 1.7% - the model simulation suggests R=5. The authors also explored a simplified version of the model, the OVA model, with only three states. The OVA model also has an outbreak size. The OVA model shows dynamics similar to the NOVAa model. However, the change in microstructure as a function of the IFN range R observed in the NOVAa model is not observed in the OVA model.

      Author response: We thank the referee for the comprehensive summary of our work.

      Data and model simulation mainly support the conclusions of this paper, but some weaknesses should be considered or clarified.

      Author response: Thank you - we will address these point by point below.

      (1) In the automaton model, the authors introduce a parameter p_a, representing the fraction of pre-antiviral state cells. The authors wrote: ``The parameter p_a can also be understood as the probability that an O cell will switch to the N or A state when exposed to the virus of IFNs, respectively.' Nevertheless, biologically, the fraction of pre-antiviral state cells does not mean the same value as the probability that an O cell switches to the N or A state. Moreover, in the numerical scheme, the cell state changes according to the deterministic role N(O)=a and N(a)=A. Hence, the probability p_a did not apply to the model simulation. It may need to clarify the exact meaning of the parameter p_a.

      Author response: We acknowledge that this was an imprecise formulation, and have now changed it.

      What we tried to convey with that comment was that, alternatively to having a certain fraction of cells be in the a state initially, one could instead have devised a model in which We should note that even the current model has a level of stochasticity, since we choose the cells to be updated with a constant probability rate - we choose N cells to update in each timestep, with replacement.

      However, based on your suggestion, we simulated a version of the dynamics which included stochastic conversion, i.e. each action of a cell on a nearby cell happens only with a probability p_conv (and the original model is recovered as the p_conv=1 scenario). Of course, this slows down the dynamics (or effectively rescales time by a factor p_conv), but crucially we find that it does not appreciably affect the location of the threshold p_c. Below we include a parameter scan across p_a values for R=1 and p_conv=0.5, which shows that the threshold continues to appear at around p_a=27%. each O-state cell simply had a probability to act as an a-state cell upon exposure to the virus or to interferons, i.e. to switch to an N state (if exposed to virus) or to the A state (if exposed to interferons). In this simplified model, there would be no functional difference, since it would simply amount to whether each cell had a probability to be designated an a-cell initially (as in our model), or upon exposure. So our remark mainly served to explain that the role of the p_a parameter is simply to encode that a certain fraction of virus-naive cells behave this way (whether predetermined or not).

      (2) The current model is deterministic. However, biologically, considering the probabilistic model may be more realistic. Are the results valid when the probability update strategy is considered? By the probability model, the cells change their state randomly to the state of the neighbor cells. The probability of cell state changes may be relevant for the threshold of p_a. It is interesting to know how the random response of cells may affect the main results and the critical value of p_a.

      Author response: This is a good point - we are firm believers in the importance of stochasticity. We should note that even the current model has a level of stochasticity, since we choose the cells to be updated with a constant probability rate - we choose N cells to update in each timestep, with replacement.

      However, based on your suggestion, we simulated a version of the dynamics which included stochastic conversion, i.e. each action of a cell on a nearby cell happens only with a probability p_conv (and the original model is recovered as the p_conv=1 scenario). Of course, this slows down the dynamics (or effectively rescales time by a factor p_conv), but crucially we find that it does not appreciably affect the location of the threshold p_c. Below we include a parameter scan across p_a values for R=1 and p_conv=0.5, which shows that the threshold continues to appear at around p_a=27%.

      We now discuss these findings in the supplement and include the figure below as Fig. S5.

      Author response image 1.

      (3) Figure 2 shows a critical value p_c = 27.8% following a simulation on a lattice with dimension L = 1000. However, it is unclear if dimension changes may affect the critical value.

      Author response: Re-running the simulations on a lattice 4x as large (i.e. L=2000) yields a similar critical value of 27-28% for R=1, so we are confident that finite size effects do not play a major role at L=1000 and beyond. For R=5, however, we find that a minimum lattice size greater than L=1000 is necessary to determine the critical threshold. Concretely, we find that the threshold value pc for R=5 changes somewhat when the lattice size is increased from 1000 to 2000, but is invariant under a change from 2000 to 3000, so we conclude that L=2000 is sufficient for R=5. The pc value for R=5 cited in the manuscript (~0.4%) was determined from simulations at L=2000.

      Reviewer #3 (Public Review):

      Summary:

      This study considers how to model distinct host cell states that correspond to different stages of a viral infection: from naïve and susceptible cells to infected cells and a minority of important interferon-secreting cells that are the first line of defense against viral spread. The study first considers the distinct host cell states by analyzing previously published single-cell RNAseq data. Then an agent-based model on a square lattice is used to probe the dependence of the system on various parameters. Finally, a simplified version of the model is explored, and shown to have some similarity with the more complex model, yet lacks the dependence on the interferon range. By exploring these models one gains an intuitive understanding of the system, and the model may be used to generate hypotheses that could be tested experimentally, telling us "when to be surprised" if the biological system deviates from the model predictions.

      Author response: Thank you for the summary! We agree with the role that you describe for a model such as this one.

      Strengths:

      -  Clear presentation of the experimental findings and a clear logical progression from these experimental findings to the modeling.

      -  The modeling results are easy to understand, revealing interesting behavior and percolation-like features.

      -  The scaling results presented span several decades and are therefore compelling. - The results presented suggest several interesting directions for theoretical follow-up work, as well as possible experiments to probe the system (e.g. by stimulating or blocking IFN secretion).

      Weaknesses:

      -  Since the "range" of IFN is an important parameter, it makes sense to consider lattice geometries other than the square lattice, which is somewhat pathological. Perhaps a hexagonal lattice would generalize better.

      -  Tissues are typically three-dimensional, not two-dimensional. (Epithelium is an exception). It would be interesting to see how the modeling translates to the three-dimensional case. Percolation transitions are known to be very sensitive to the dimensionality of the system.

      Author response: We agree that probing different lattice geometries (2- and 3-dimensional alike) would be interesting and worthwhile. However, for this manuscript, we prefer to confine the analysis to the current, simple case. We do agree, however, that an extensive exploration of the role of geometry is an interesting future possibility.

      -  The fixed time-step of the agent-based modeling may introduce biases. I would consider simulating the system with Gillespie dynamics where the reaction rates depend on the ambient system parameters.

      -  Single-cell RNAseq data typically involves data imputation due to the high sparsity of the measured gene expression. More information could be provided on this crucial data processing step since it may significantly alter the experimental findings.

      Justification of claims and conclusions:

      The claims and conclusions are well justified.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      It is necessary to explain what UMAP does. Is clustering done in the space of twenty-something original dimensions or 2D? How UMAP1 and UMAP2 are selected and are those the same in all plots?

      Author response: We have now added a few sentences to clarify the point raised above - the second snippet explains how clustering is performed:

      “As a dimension reduction algorithm, UMAP is a manifold learning technique that favors the preservation of local distances over global distances (McInnes et al., 2018; Becht et al., 2019). It constructs a weighted graph from the data points and optimizes the graph layout in the low-dimensional space.”

      “We cluster the cells with the principal components analysis (PCA) results from their gene expression. With the first 16 principal components, we calculate k-nearest neighbors and construct the shared nearest neighbor graph of the cells then optimize the modularity function to determine clusters. We present the cluster information on the UMAP plane and use the same UMAP coordinates for all the plots in this paper hereafter.”

      Figure 1, what do bars in the upper right corners of panels d,e,f, and g indicate? ``Averaged' refers to time average? Something is missing in ``Cell proportions are labeled with corresponding colors in a)' .

      Author response: Thank you - we have now modified the figure caption. The bars in the upper right corners of panels d, e, f are color keys for gene expression, the brighter the color is, the higher the gene expression is.

      “Averaged” gene expression refers to the mean expression of that particular gene across the cells within each indicated cluster.

      The lines in c) correspond to cell proportions in different states at different time points. The same state in 1) and c) is shown in the same color.

      Line 46, ``However' does not sound right in this context. Would ``Also' be better?

      Author response: We agree and have corrected it in the revised manuscript.

      Line 96``The viral genes are also partially expressed in these cells, but different from the 𝑁 cluster, the antiviral genes are fully expressed (Fig. S1 and S2).' The sentence needs to be rephrased.

      Author response: We have rephrased the sentence: “As in the N cluster, the viral gene E is barely detected in these cells, indicating incomplete viral replication. However, in contrast to the N cluster, the antiviral genes are expressed to their full extent (Fig. S1 and S2).”

      Line 126, missing "be", ``large' -> ``larger'.

      Author response: Thank you, we have now corrected these typos.

      Line 139-140 The logical link between ignoring apoptosis and the diffusion of IFN is unclear.

      Author response: We modified the sentence as “Here, we assume that the secretion of IFNs by the 𝑁 cells is a faster process than possible apoptosis (Wen et al., 1997; Tesfaigzi, 2006) of these cells and that the diffusion of IFNs to the neighborhood is not significantly affected by apoptosis.”

      Fig. 2a Do the yellow arrows show the effect of IFN and the purple arrows the propagation of viral infection?

      Author response: That is correct. We have added this information to the figure caption: “The straight black arrows indicate transitions between cell states. The curved yellow arrows indicate the effects of IFNs on activating antiviral states. The curved purple arrows indicate viral spread to cells with 𝑂 and 𝑎 states.”

      Fig. 3, n(s) as the axis label vs P(s) in the text? How do the curves in panel a) look when the p_a is well above or below p_c?

      Author response: Thank you. We have edited the labels in the figure to reflect the symbols used in the text.

      Boundary conditions? From Fig. 4, apparently periodic?

      Author response: Yes, we use periodic boundary conditions in the model. We clarify it in the model section now (last sentence).

      It will be good to see a plot with time dependences of all cell types for a couple of values of p_a, illustrating propagation and cessation of the infection.

      Author response: We agree, and have added a Figure S4 in the supplement which explores exactly that. Thank you for the suggestion.

      A verbal qualitative description of why p_a has such importance and how the infection is terminated for large p_a would help.

      Reviewer #2 (Recommendations For The Authors):

      Below are two minor comments:

      (1) In the single-cell RNA sequencing data analysis, the authors describe the cell clusters O, V, A, and N. However, showing how the clusters are identified from the data might be more straightforward.

      Author response: Technically, we cluster the cells using principal components analysis (PCA) results of their gene expression. With the first 16 principal components, we calculate k-nearest neighbors and construct the shared nearest neighbor graph of the cells and then optimize the modularity function to determine clusters. We manually annotate the clusters with O, V, A, and N based on the detected abundance of viral genes, antiviral genes, and IFNs.

      (2) In Figure 3, what does n(s) mean in Figure 3a? And what is the meaning of the distribution P(s) of infection clusters? It may be stated clearly.

      Author response: The use of n(s) was inconsistent, and we have now edited the figure to instead say P(s), to harmonize it with the text. P(s) is the distribution of cluster sizes, s, expressed as a fraction of the whole system. In other words, once a cluster has reached its final size, we record s=(N+V)/L^2 where N and V are the number of N and V state cells in the cluster (note that, by design, each simulation leads to a single cluster, since we seed the infection in one lattice point). We now indicate more clearly in the caption and the main text what exactly P(s) and s refer to.

      Reviewer #3 (Recommendations For The Authors):

      - Would the authors kindly share the simulation code with the community? Also, the data analysis code should be shared to follow current best practices. This needs to be standard practice in all publications. I would go as far as to say that in 2024 publishing a data analysis / simulation study without sharing the relevant code should be ostracized by the community.

      Author response: We absolutely agree and have created a GitHub repository in which we share the C++ source code for the simulations and a Python notebook for plotting. The public repository can be found at https://github.com/BjarkeFN/ViralPercolation. We add this information in supplement under section “Code availability”.

      ­

      - I would avoid the use of the wording "critical" threshold since this is almost guaranteed to infuriate a certain type of reader.

      ­

      - Line 265 has a curious use of " ... " which should be replaced with something more appropriate.

      Author response: Thank you for pointing it out! We have checked the typos.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1:

      The manuscript suggests the zebrafish homolog of ctla-4 and generates a new mutant in it. However, the locus that is mutated is confusingly annotated as both CD28 (current main annotation in ZFIN) and CTLA-4/CD152 (one publication from 2020), see: https://zfin.org/ZDB-GENE-070912-128. Both human CTLA-4 and CD28 align with relatively similar scores to this gene. There seem to be other orthologs of these receptors in the zebrafish genome, including CD28-like (https://zfin.org/ZDB-GENE-070912-309) which neighbors the gene annotated as CD28 (exhibiting similar synteny as human CD28 and CTLA-4). It would be helpful to provide more information to distinguish between this family of genes and to further strengthen the evidence that this mutant is in ctla-4, not cd28. Also, is one of these genes in the zebrafish genome (e.g. cd28l) potentially a second homolog of CTLA-4? Is this why this mutant is viable in zebrafish and not mammals? Some suggestions:

      (a) A more extensive sequence alignment that considers both CTLA-4 and CD28, potentially identifying the best homolog of each human gene, especially taking into account any regions that are known to produce the functional differences between these receptors in mammals and effectively assigns identities to the two genes annotated as "cd28" and "cd28l" as well as the gene "si:dkey-1H24.6" that your CD28 ORF primers seem to bind to in zebrafish.

      In response to the reviewer's insightful suggestions, we have conducted more extensive sequence alignment and phylogenetic analyses that consider both CTLA-4, CD28, and CD28-like molecules, taking into account key regions crucial for the functionalities and functional differences between these molecules across various species, including mammals and zebrafish.

      Identification of zebrafish Ctla-4: We identified zebrafish Ctla-4 as a homolog of mammalian CTLA-4 based on key conserved structural and functional characteristics. Structurally, the Ctla-4 gene shares similar exon organization compared to mammalian CTLA-4. Ctla-4 is a type I transmembrane protein with typical immunoglobulin superfamily features. Multiple amino acid sequence alignments revealed that Ctla-4 contains a <sup>113</sup>LFPPPY<sup>118</sup> motif and a <sup>123</sup>GNGT<sup>126</sup> motif in the ectodomain, and a tyrosine-based <sup>206</sup>YVKF<sup>209</sup> motif in the distal C-terminal region. These motifs closely resemble MYPPPY, GNGT, and YVKM motifs in mammalian CTLA-4s, which are essential for binding to CD80/CD86 ligands and molecular internalization and signaling inhibition. Despite only 23.7% sequence identity to human CTLA-4, zebrafish Ctla-4 exhibits a similar tertiary structure with a two-layer β-sandwich architecture in its extracellular IgV-like domain. Four cysteine residues responsible for the formation of two pairs of disulfide bonds (Cys<sup>20</sup>-Cys<sup>91</sup>/Cys<sup>46</sup>-Cys<sup>65</sup> in zebrafish and Cys<sup>21</sup>-Cys<sup>92</sup>/Cys<sup>48</sup>-Cys<sup>66</sup> in humans) that connect the two-layer β-sandwich are conserved. Additionally, a separate cysteine residue (Cys<sup>120</sup> in zebrafish and Cys<sup>120</sup> in humans) involved in dimerization is also present, and Western blot analysis under reducing and non-reducing conditions confirmed Ctla-4’s dimerization. Phylogenetically, Ctla-4 clusters with other known CTLA-4 homologs from different species with high bootstrap probability, while zebrafish Cd28 groups separately with other CD28s. Functionally, Ctla-4 is predominantly expressed on CD4<sup>+</sup> T and CD8<sup>+</sup> T cells in zebrafish. It plays a pivotal inhibitory role in T cell activation by competing with CD28 for binding to CD80/86, as validated through a series of both in vitro and in vivo assays, including microscale thermophoresis assays which demonstrated that Ctla-4 exhibits a significantly higher affinity for Cd80/86 than Cd28 (KD = 0.50 ± 0.25 μM vs. KD = 2.64 ± 0.45 μM). These findings confirm Ctla-4 as an immune checkpoint molecule, reinforcing its identification within the CTLA-4 family.

      Comparison between zebrafish Cd28 and "Cd28l": Zebrafish Cd28 contains an extracellular SYPPPF motif and an intracellular FYIQ motif. The extracellular SYPPPF motif is essential for binding to Cd80/CD86, while the intracellular FYIQ motif likely mediates kinase recruitment and co-stimulatory signaling. In contrast, the "Cd28l" molecule lacks the SYPPPF motif, which is critical for Cd80/CD86 binding, and exhibits strong similarity in its C-terminal 79 amino acids to Ctla-4 rather than Cd28. Consequently, "Cd28l" resembles an atypical Ctla-4-like molecule but fails to exhibit Cd80/CD86 binding activity.

      We have incorporated the relevant analysis results into the main text of the revised manuscript and updated Supplementary Figure 1. Additionally, we provide key supplementary analyses here for the reviewer's convenience.  

      Author response image 1.

      Illustrates the alignment of Ctla-4 (XP_005167576.1) and Ctla-4-like (XP_005167567.1, previously referred to as "Cd28l") in zebrafish, generated using ClustalX and Jalview. Conserved and partially conserved amino acid residues are highlighted in color gradients ranging from carnation to red, respectively. The B7-binding motif is encircled with a red square.

      (b) Clearer description in the main text of such an analysis to better establish that the mutated gene is a homolog of ctla-4, NOT cd28.

      We appreciate the reviewer's advice. Additional confirmation of zebrafish Ctla-4 is detailed in lines 119-126 of the revised manuscript.

      (c) Are there mammalian anti-ctla-4 and/or anti-cd28 antibodies that are expected to bind to these zebrafish proteins? If so, looking to see whether staining is lost (or western blotting is lost) in your mutants could be additionally informative. (Our understanding is that your mouse anti-Ctla-4 antibody is raised against recombinant protein generated from this same locus, and so is an elegant demonstration that your mutant eliminates the production of the protein, but unfortunately does not contribute additional information to help establish its homology to mammalian proteins).

      This suggestion holds significant value. However, a major challenge in fish immunology research is the limited availability of antibodies suitable for use in fish species; antibodies developed for mammals are generally not applicable. We attempted to use human and mouse anti-CTLA-4 and anti-CD28 antibodies to identify Ctla-4 and Cd28 in zebrafish, but the results were inconclusive, with no expected signals. This outcome likely arises from the low sequence identity between human/mouse CTLA-4 and CD28 and their zebrafish homologs (ranging from 21.3% to 23.7% for CTLA-4 and 21.2% to 24.0% for CD28). Therefore, developing specific antibodies against zebrafish Ctla-4 is essential for advancing this research.

      The methods section is generally insufficient and doesn't describe many of the experiments performed in this manuscript. Some examples:

      (a) No description of antibodies used for staining or Western blots (Figure1C, 1D, 1F).

      (b) No description of immunofluorescence protocol (Figure 1D, 1F).

      (c) No description of Western blot protocol (Figure 1C, 2C).

      (d) No description of electron microscopy approach (Figure 2K).

      (e) No description of the approach for determining microbial diversity (Entirety of Figure 6).

      (f) No description of PHA/CFSE/Flow experiments (Figure 7A-E).

      (g) No description of AlphaFold approach (Figures 7F-G).

      (h) No description of co-IP approach (Figure 7H).

      (i) No description of MST assay or experiment (Figure 7I).

      (j) No description of purification of recombinant proteins, generation of anti-Ctla-4 antibody, or molecular interaction assays (Figures S2 and S6).

      We apologize for this oversight. The methods section was inadvertently incomplete due to an error during the file upload process at submission. This issue has been addressed in the revised manuscript. We appreciate your understanding.

      Figure 5 suggests that there are more Th2 cells 1, Th2 cells 2, and NKT cells in ctla-4 mutants through scRNA-seq. However, as the cell numbers for these are low in both genotypes, there is only a single replicate for each genotype scRNA-seq experiment, and dissociation stress can skew cell-type proportions, this finding would be much more convincing if another method that does not depend on dissociation was used to verify these results. Furthermore, while Th2 cells 2 are almost absent in WT scRNA-seq, KEGG analysis suggests that a major contributor to their clustering may be ribosomal genes (Fig. 5I). Since no batch correction was described in the methods, it would be beneficial to verify the presence of this cluster in ctla-4 mutants and WT animals through other means, such as in situ hybridization or transgenic lines.   

      We are grateful for the insightful comments provided by the reviewer. Given that research on T cell subpopulations in fish is still in its nascent stages, the availability of specific marker antibodies and relevant transgenic strains remains limited. Our single-cell RNA sequencing (scRNA-seq) analysis revealed that a distinct Th2 subset 2 was predominantly observed in Ctla-4 mutants but was rare in wild-type zebrafish, it suggests that this subset may primarily arise under pathological conditions associated with Ctla-4 mutation. Due to the near absence of Th2 subset 2 in wild-type samples, KEGG enrichment analysis was performed exclusively on this subset from Ctla-4-deficient intestines. The ribosome pathway was significantly enriched, suggesting that these cells may be activated to fulfill their effector functions. However, confirming the presence of Th2 subset 2 using in situ hybridization or transgenic zebrafish lines is currently challenging due to the lack of lineage-specific markers for detailed classification of Th2 cell subsets and the preliminary nature of scRNA-seq predictions.

      To address the reviewers' suggestion to confirm compositional changes in Th2 and NKT cells using dissociation-independent methods, we quantified mRNA levels of Th2 (il4, il13, and gata3) and NKT (nkl.2, nkl.4, and prf1.1) cell marker genes via RT-qPCR in intestines from wild-type and mutant zebrafish. As shown in Figure S7B and S7C, these markers were significantly upregulated in Ctla-4-deficient intestines compared to wild-type controls. This indicates an overall increase in Th2 and NKT cell activity in mutant zebrafish, aligning with our scRNA-seq analysis and supports the validity of our initial findings.

      Before analyzing the scRNA-seq data, we performed batch correction using the Harmony algorithm via cloud-based Cumulus v1.0 on the aggregated gene-count matrices. This methodological detail has been included in the “Materials and Methods” section of the revised manuscript. Moreover, the RT-qPCR results are presented in Supplementary Figures S7B and S7C.

      Quality control (e.g., no. of UMIs, no. of genes, etc.) metrics of the scRNAseq experiments should be presented in the supplementary information for each sample to help support that observed differential expression is not merely an outcome of different sequencing depths of the two samples.

      As illustrated in Fig. S5, the quality control data have been supplemented to include the effective cell number of the sample, along with pre- and post-filtering metrics such as nFeature_RNA, nCount_RNA and mitochondrial percentage (percent.mito). Furthermore, scatter plots comparing the basic information of the sample cells before and after filtering are provided.

      Some references to prior research lack citations. Examples:

      (a)"Given that Ctla-4 is primarily expressed on T cells (Figure 1E-F), and its absence has been shown to result in intestinal immune dysregulation, indicating a crucial role of this molecule as a conserved immune checkpoint in T cell inhibition."

      The references were incorporated into line 71 of the revised manuscript.

      (b) Line 83: Cite evidence/review for the high degree of conservation in adaptive immunity.

      The references were incorporated into line 93 of the revised manuscript.

      (c) Lines 100-102: Cite the evidence that MYPPPY is a CD80/86 binding motif.

      The references were incorporated into line 117 of the revised manuscript.

      The text associated with Figure 8 (Lines 280-289) does not clearly state that rescue experiments are being done in mutant zebrafish.

      We have provided a clear explanation of the rescue experiments conducted in Ctla-4-deficient zebrafish. This revision has been incorporated into line 319.

      Line 102: Is there evidence from other animals that LFPPPY can function as a binding site for CD80/CD86? Does CD28 also have this same motif?

      The extracellular domains of CTLA-4 and CD28, which bind to CD80/CD86, are largely conserved across various species. This conservation is exemplified by a central PPP core motif, although the flanking amino acids exhibit slight variations. In mammals, both CTLA-4 and CD28 feature the conserved MYPPPY motif. By contrast, in teleost fish, such as rainbow trout, CTLA-4 contains an LYPPPY motif, while CD28 has an MYPPPI motif (Ref. 1). Grass carp CTLA-4 displays an LFPPPY motif, whereas its CD28 variant bears an IYPPPF motif. Yeast two-hybrid assays confirm that these motifs facilitate interactions between grass carp CTLA-4 and CD28 with CD80/CD86 (Ref. 2). Similarly, zebrafish Ctla-4 contains the LFPPPY motif observed in grass carp, while Cd28 exhibits a closely related SYPPPF motif.

      References:

      (1) Bernard, D et al. (2006) Costimulatory Receptors in a Teleost Fish: Typical CD28, Elusive CTLA-4. J Immunol. 176: 4191-4200.

      (2) Lu T Z et al. (2022) Molecular and Functional Analyses of the Primordial Costimulatory Molecule CD80/86 and Its Receptors CD28 and CD152 (CTLA-4) in a Teleost Fish. Frontiers in Immunology. 13:885005.

      Line 110-111: Suggest adding citation of these previously published scRNAseq data to the main text in addition to the current description in the Figure legend.

      The reference has been added in line 129 in the main text.

      Figure 3B: It would be helpful to label a few of the top differentially expressed genes in Panel B?

      The top differentially expressed genes have been labeled in Figure 3B.

      Figure 3G: It's unclear how this analysis was conducted, what this figure is supposed to demonstrate, and in its current form it is illegible.

      Figure 3G displays a protein-protein interaction network constructed from differentially expressed genes. The densely connected nodes, representing physical interactions among proteins, provide valuable insights for basic scientific inquiry and biological or biomedical applications. As proteins are crucial to diverse biological functions, their interactions illuminate the molecular and cellular mechanisms that govern both healthy and diseased states in organisms. Consequently, these networks facilitate the understanding of pathogenic and physiological processes involved in disease onset and progression.

      To construct this network, we first utilized the STRING database (https://string-db.org) to generate an initial network diagram using the differentially expressed genes. This diagram was subsequently imported into Cytoscape (version 3.9.1) for visualization and further analysis. Node size and color intensity reflect the density of interactions, indicating the relative importance of each protein. Figure 3G illustrates that IL1β was a central cytokine hub in the disease process of intestinal inflammation in Ctla-4-deficient zebrafish.

      Expression scale labeling:

      (a) Most gene expression scales are not clearly labeled: do they represent mean expression or scaled expression? Has the expression been log-transformed, and if so, which log (natural log? Log10? Log2?). See: Figure 3E, 3I, 4D, 4E, 5B, 5G, 5H, 6I.

      The gene expression scales are detailed in the figure legends. Specifically, Figures 3E, 3I, and 6I present heatmaps depicting row-scaled expression levels for the corresponding genes. In contrast, Figures 4D and 4E display heatmaps illustrating the mean expression of these genes. Additionally, the dot plots in Figures 5B, 5G, and 5H visualize the mean expression levels of the respective genes.

      (b) For some plots, diverging color schemes (i.e. with white/yellow in the middle) are used for non-diverging scales and would be better represented with a sequential color scale. See: 4D, 4E, and potentially others (not fully clear because of the previous point).

      The color schemes in Figures 4D and 4E have been updated to a sequential color scale. The gene expression data depicted in these figures represent mean expression values and have not undergone log transformation. This information has been incorporated into the figure legend for clarity.

      Lines 186-187: Though it is merely suggested, apoptotic gene expression can be upregulated as part of the dissociation process for single-cell RNAseq. This would be much stronger if supported by a staining, such as anti-Caspase 3.

      Following the reviewer's insightful recommendations, we conducted a TUNEL assay to evaluate apoptosis in the posterior intestinal epithelial cells of both wild-type and Ctla-4-deficient zebrafish. As expected, our results demonstrate a significant increase in epithelial cell apoptosis in Ctla-4-deficient zebrafish compared with wild-type fish. The corresponding data are presented in Figure S6D and have been incorporated into the manuscript. Detailed protocols for the TUNEL assay have also been included in the Materials and Methods section.

      Author response image 2.

      Illustrates the quantification of TUNEL-positive cells per 1 × 10<sup>4</sup> μm<sup>2/⁻</sup> in the posterior intestines of both wild-type (WT) and ctla-4<sup>⁻/⁻</sup> zebrafish (n = 5). The data demonstrate a comparative analysis of apoptotic cell density between the two genotypes.

      Lines 248-251: This manuscript demonstrates gut inflammation and also changes in microbial diversity, but I don't think it demonstrates an association between them, which would require an experiment that for instance rescues one of these changes and shows that it ameliorates the other change, despite still being a ctla-4 mutant.

      We appreciate the valuable comments from the reviewer. Recently, the relationship between inflammatory bowel disease (IBD) and gut microbial diversity has garnered considerable attention, with several key findings emerging from human IBD studies. For instance, patients with IBD (including ulcerative colitis and Crohn's disease) exhibit reduced microbial diversity, which is correlated with disease severity. This decrease in microbial richness is thought to stem from the loss of normal anaerobic bacteria, such as Bacteroides, Eubacterium, and Lactobacillus (Refs. 1-6). Research using mouse models has shown that inflammation increases oxygen and nitrate levels within the intestinal lumen, along with elevated host-derived electron acceptors, thereby promoting anaerobic respiration and overgrowth of Enterobacteriaceae (Ref 7). Consistent with these findings, our study observed a significant enrichment of Enterobacteriaceae in the inflamed intestines of Ctla-4-deficient zebrafish, which supporting the observations in mice. Despite this progress, the zebrafish model for intestinal inflammation remains under development, with limitations in available techniques for manipulating intestinal inflammation and reconstructing gut microbiota. These challenges hinder investigations into the association between intestinal inflammation and changes in microbial diversity. We plan to address these issues through ongoing technological advancements and further research. We thank the reviewer for their understanding.

      References:

      (1) Ott S J, Musfeldt M, Wenderoth D F, Hampe J, Brant O, Fölsch U R et al. (2004) Reduction in diversity of the colonic mucosa associated bacterial microflora in patients with active inflammatory bowel disease. Gut 53:685-693.

      (2) Manichanh C, Rigottier-Gois L, Bonnaud E, Gloux K, Pelletier E, Frangeul L et al. (2006) Reduced diversity of faecal microbiota in Crohn's disease revealed by a metagenomic approach. Gut 55:205-211.

      (3) Qin J J, Li R Q, Raes J, Arumugam M, Burgdorf K S, Manichanh C et al. (2010) A human gut microbial gene catalogue established by metagenomic sequencing. Nature 464:59-U70.

      (4) Sha S M, Xu B, Wang X, Zhang Y G, Wang H H, Kong X Y et al. (2013) The biodiversity and composition of the dominant fecal microbiota in patients with inflammatory bowel disease. Diagn Micr Infec Dis 75:245-251.

      (5) Ray K. (2015) IBD. Gut microbiota in IBD goes viral. Nat Rev Gastroenterol Hepatol 12:122.

      (6) Papa E, Docktor M, Smillie C, Weber S, Preheim S P, Gevers D et al. (2012) Non-Invasive Mapping of the Gastrointestinal Microbiota Identifies Children with Inflammatory Bowel Disease. Plos One 7: e39242-39254.

      (7) Hughes E R, Winter M G, Duerkop B A, Spiga L, de Carvalho T F, Zhu W H et al. (2017) Microbial Respiration and Formate Oxidation as Metabolic Signatures of Inflammation-Associated Dysbiosis. Cell Host Microbe 21:208-219.

      Lines 270-272 say that interaction between Cd28/ctla-4 and Cd80/86 was demonstrated through bioinformatics, flow-cytometry, and Co-IP. Does this need to reference Fig S6D for the flow data? Figures 7F-G are very hard to read or comprehend as they are very small. Figure 7H is the most compelling evidence of this interaction and might stand out better if emphasized with a sentence referencing it on its own in the manuscript. 

      In this study, we utilized an integrated approach combining bioinformatics prediction, flow cytometry, and co-immunoprecipitation (Co-IP) to comprehensively investigate and validate the interactions between Cd28/Ctla-4 and Cd80/86. Flow cytometry analysis, as depicted in Supplementary Figure 6D (revised as Supplementary Figure 8F), demonstrated the surface expression of Cd80/86 on HEK293T cells and quantified their interactions with Cd28 and Ctla-4. These experiments not only validated the interactions between Cd80/86 and Cd28/Ctla-4 but also revealed a dose-dependent relationship, providing robust supplementary evidence for the molecular interactions under investigation. Furthermore, in Figure 7F-G, the axis font sizes were enlarged to improve readability. Additionally, in response to reviewers' feedback, we have emphasized Figure 7H, which presents the most compelling evidence for molecular interactions, by including a standalone sentence in the text to enhance its prominence.

      For Figure 7A-E, for non-immunologists, it is unclear what experiment was performed here - it would be helpful to add a 1-sentence summary of the assay to the main text or figure legend.

      We apologize for this oversight. Figures 7A–E illustrate the functional assessment of the inhibitory role of Ctla-4 in Cd80/86 and Cd28-mediated T cell activation. A detailed description of the methodologies associated with Figures 7A–E is provided in the ‘Materials and Methods’ section of the revised manuscript.

      For Figure 7F-G, it is extremely hard to read the heat map legends and the X and Y-axis. Also, what the heatmaps show and how that fits the overall narrative can be elaborated significantly.

      We regret this oversight. To enhance clarity, we have increased the font size of the heatmap legends and the X and Y-axes, as shown in the following figure. Additionally, a detailed analysis of these figures is provided in lines 299–306 of the main text.

      In general, the main text that accompanies Figure 7 should be expanded to more clearly describe these experiments/analyses and their results.

      We have conducted a detailed analysis of the experiments and results presented in Figure 7. This analysis is described in lines 278-314.

      Reviewer #2:

      The scRNASeq assay is missing some basic characterization: how many WT and mutant fish were assayed in the experiment? how many WT and mutant cells were subject to sequencing? Before going to the immune cell types, are intestinal cell types comparable between the two conditions? Are there specific regions in the tSNE plot in Figure 4A abundant of WT or ctla-4 mutant cells?

      In the experiment, we analyzed 30 wild-type and 30 mutant zebrafish for scRNA-seq, with an initial dataset comprising 8,047 cells in the wild-type group and 8,321 cells in the mutant group. Sample preparation details are provided on lines 620-652. Due to the relatively high expression of mitochondrial genes in intestinal tissue, quality control filtering yielded 3,263 cells in the wild-type group and 4,276 cells in the mutant group. Given that the intestinal tissues were dissociated using identical protocols, the resulting cell types are comparable between the two conditions. Both the wild-type and Ctla-4-deficient groups contained enterocytes, enteroendocrine cells, smooth muscle cells, neutrophils, macrophages, B cells, and a cluster of T/NK/ILC-like cells. Notably, no distinct regions were enriched for either condition in the tSNE plot (Figure 4A).

      The cell proliferation experiment using PHA stimulation assay demonstrated the role of Ctla-4 in cell proliferation, while the transcriptomic evidence points towards activation rather than an overall expansion of T-cell numbers. This should be discussed towards a more comprehensive model of how subtypes of cells can be differentially proliferating in the disease model.

      In the PHA-stimulated T cell proliferation assay, we aimed to investigate the regulatory roles of Ctla-4, Cd28, and Cd80/86 in T cell activation, focusing on validating Ctla-4's inhibitory function as an immune checkpoint. While our study examined general regulatory mechanisms, it did not specifically address the distinct roles of Ctla-4 in different T cell subsets. We appreciate the reviewer's suggestion to develop a more comprehensive model that elucidates differential T cell activation across various subsets in disease models. However, due to the nascent stage of research on fish T cell subsets and limitations in lineage-specific antibodies and transgenic strains, such investigations are currently challenging. We plan to pursue these studies in the future. Despite these constraints, our single-cell RNA sequencing data revealed an increased proportion of Th2 subset cells in Ctla-4-deficient zebrafish, as evidenced by elevated expression levels of Th2 markers (Il4, Il13, and Gata3) via RT-qPCR (see Figures S7B). Notably, recent studies in mouse models have shown that naïve T cells from CTLA-4-deficient mice tend to differentiate into Th2 cells post-proliferation, with activated Th2 cells secreting higher levels of cytokines like IL-4, IL-5, and IL-13, thereby exerting their effector functions (Refs. 1-2). Consequently, our findings align with observations in mice, suggesting conserved CTLA-4 functions across species. We have expanded the "Discussion" section to clarify these points.

      References:

      (1) Bour-Jordan H, Grogan J L, Tang Q Z, Auger J A, Locksley R M, Bluestone J A et al. (2003) CTLA-4 regulates the requirement for cytokine-induced signals in T<sub>H</sub>2 lineage commitment. Nature Immunology 4: 182-188.

      (2) Khattri Roli, Auger, Julie A, Griffin Matthew D, Sharpe Arlene H, Bluestone Jeffrey A et al. (1999) Lymphoproliferative Disorder in CTLA-4 Knockout Mice Is Characterized by CD28-Regulated Activation of Th2 Responses. The Journal of Immunology 162:5784-5791.

      It would be nice if the authors could also demonstrate whether other tissues in the zebrafish have an inflammation response, to show whether the model is specific to IBD.

      In addition to intestinal tissues, we also performed histological analysis on the liver of Ctla-4-deficient zebrafish. The results showed that Ctla-4 deficiency led to mild edema in a few hepatocytes, and lymphocyte infiltration was not significant. Compared to the liver, we consider intestinal inflammation to be more pronounced.

      Some minor comments on terminology

      (a) "multiomics" usually refers to omics experiments with different modalities (e.g. transcriptomics, proteomics, metabolomics etc), while the current paper only has transcriptomics assays. I wouldn't call it "multiomics" analysis.

      We appreciate the reviewer's attention to this issue. The "multi-omics" has been revised to "transcriptomics".

      (b) In several parts of the figure legend the author mentioned "tSNE nonlinear clustering" (Figures 4A and 5A). tSNE is an embedding method rather than a clustering method.

      The "tSNE nonlinear clustering" has been revised to "tSNE embedding”.

      (c) Figure 1E is a UMAP rather than tSNE.

      The "tSNE" has been revised to "UMAP" in the figure legend in line 1043.

      Reviewer #3: 

      Line 28: The link is not directly reflected in this sentence describing CTLA-4 knockout mice.

      We appreciate the reviewer for bringing this issue to our attention. We have expanded our description of CTLA-4 knockout mice on lines 77-84.

      Line 80-83: There is a lack of details about the CTLA-4-deficient mice. The factor that Th2 response could be induced has been revealed in mouse model. See the reference entitled "CTLA-4 regulates the requirement for cytokine-induced signals in TH2 lineage commitment" published in Nature Immunology.

      We thank the reviewer for providing valuable references. We have added descriptions detailing the differentiation of T cells into Th2 cells in CTLA-4-deficient mice on lines 78–81, and the relevant references have been cited in the revised manuscript.

      To better introduce the CTLA-4 immunobiology, the paper entitled "Current Understanding of Cytotoxic T Lymphocyte Antigen-4 (CTLA-4) Signaling in T-Cell Biology and Disease Therapy" published in Molecules and Cells should be referred.

      We have provided additional details on CTLA-4 immunology (lines 75-84) and have included the relevant reference in the revised manuscript.

      In current results, there are many sentences that should be moved to the discussion, such as lines 123-124, lines 152-153, lines 199-200, and lines 206-207. So, the result sections just describe the results, and the discussions should be put together in the discussion.

      We have relocated these sentences to the 'Discussion' section and refined the writing.

      In the discussion, the zebrafish enteritis model, such as DSS/TNBS and SBMIE models, should also be compared with the current CTLA-4 knockout model. Also, the comparison between the current fish IBD model and the previous mouse model should also be included, to enlighten the usage of CTLA-4 knockout zebrafish IBD model.

      We compared the phenotypes of our current Ctla-4-knockout zebrafish IBD model with other models, including DSS-induced IBD models in zebrafish and mice, as well as TNBS- and SBM-induced IBD models in zebrafish. The details are included in the "Discussion" section (lines 353-365).

      As to the writing, the structure of the discussion is poor. The paragraphs are very long and hard to follow. Many findings from current results were not yet discussed. I just can't find any discussion about the alteration of intestinal microbiota.

      In response to the reviewers' constructive feedback, we have revised and enhanced the discussion section. Furthermore, we have integrated the most recent research findings relevant to this study into the discussion to improve its relevance and comprehensiveness.

      In the discussion, the aerobic-related bacteria in 16s rRNA sequencing results should be focused on echoing the histopathological findings, such as the emptier gut of CTLA-4 knockout zebrafish.

      As mentioned above, the discussion section has been revised and expanded to provide a better understanding of the potential interplay among intestinal inflammatory pathology, gut microbiota alterations, and immune cell dysregulation in Ctla-4-deficient zebrafish. Furthermore, promising avenues for future research that warrant further investigation were also discussed.

      In the current method, there are no descriptions for many used methods, which already generated results, such as WB, MLR, MST, Co-IP, AlphaFold2 prediction, and how to make currently used anti-zfCTLA4 antibody. Also, there is a lack of description of the method of the husbandry of knockout zebrafish line.

      We regret these flaws. The methods section was inadvertently incomplete due to an error during the file upload process at submission. This issue has been rectified in the revised manuscript. Additionally, Ctla-4-deficient zebrafish were reared under the same conditions as wild-type zebrafish, and the rearing methods are now described in the "Generation of Ctla-4-deficient zebrafish" section of the Materials and Methods.

      Line 360: the experimental zebrafish with different ages could be a risk for unstable intestinal health. See the reference entitled "The immunoregulatory role of fish-specific type II SOCS via inhibiting metaflammation in the gut-liver axis" published in Water Biology and Security. The age-related differences in zebrafish could be observed in the gut.

      We appreciate the reviewers' reminders. The Ctla-4 mutant zebrafish used in our experiments were 4 months old, while the wild-type zebrafish ranged from 4 to 6 months old. These experimental fish were relatively young and uniformly distributed in age. During our study, we examined the morphological structures of the intestines in zebrafish aged 4 to 6 months and observed no significant abnormalities. These findings align with previous research indicating no significant difference in intestinal health between 3-month-old and 6-month-old wild-type zebrafish (Ref. 1). Consequently, we conclude that there is no notable aging-related change in the intestines of zebrafish aged 4 to 6 months. This reduces the risk associated with age-related variables in our study. We have added an explanation stating that the Ctla-4 mutant zebrafish used in the experiments were 4 months old (Line 449) in the revised manuscript.

      Reference

      (1) Shan Junwei, Wang Guangxin, Li Heng, Zhao Xuyang et al. (2023) The immunoregulatory role of fish-specific type II SOCS via inhibiting metaflammation in the gut-liver axis. Water Biology and Security 2: 100131-100144.

      Section "Generation of Ctla-4-deficient zebrafish": There is a lack of description of PCR condition for the genotyping.

      The target DNA sequence was amplified at 94 °C for 4 min, followed by 35 cycles at 94°C for 30 s, 58°C for 30 s and 72°C for 30 s, culminating in a final extension at 72 °C for 10 min. The polymerase chain reaction (PCR) conditions are described in lines 458-460.

      How old of the used mutant fish? There should be a section "sampling" to provide the sampling details.

      The "Sampling" information has been incorporated into the "Materials and Methods" section of the revised manuscript. Wild-type and Ctla-4-deficient zebrafish of varying months were housed in separate tanks, each labeled with its corresponding birth date. Experiments utilized Ctla-4-deficient zebrafish aged 4 months and wild-type zebrafish aged between 4 to 6 months.

      Line 378-380: The index for the histopathological analysis should be detailed, rather than just provide a reference. I don't think these indexes are good enough to specifically describe the pathological changes of intestinal villi and mucosa. It is suggested to improve with detailed parameters. As described in the paper entitled "Pathology of Gastric Intestinal Metaplasia: Clinical Implications" published in Am J Gastroenterol., histochemical, normal gastric mucins are pH neutral, and they stain magenta with periodic acid-Schiff (PAS). In an inflamed gut, acid mucins replace the original gastric mucins and are stained blue with Alcian blue (AB). So, to reveal the pathological changes of goblet cells and involved mucin components, AB staining should be added. Also, for the number of goblet cells in the inflammatory intestine, combining PAS and AB staining is the best way to reveal all the goblet cells. In Figure 2, there were very few goblet cells. The infiltration of lymphocytes and the empty intestinal lumen could be observed. Thus, the ratio between the length of intestinal villi and the intestinal ring radius should calculated.

      In response to the reviewers’ valuable suggestions, we have augmented the manuscript by providing additional parameters related to the pathological changes observed in the Ctlta-4-deficient zebrafish intestines, including the mucin component changes identified through PAS and AB-PAS staining, the variations in the number of goblet cells evaluated by AB-PAS staining, and the ratio of intestinal villi length to the intestinal ring radius, as illustrated in the following figures. These new findings are detailed in the "Materials and Methods" (lines 563-566) and "Results" (lines 143-146) sections, along with Supplementary Figure S3 of the revised manuscript.

      Section "Quantitative real-time PCR": What's the machine used for qPCR? How about the qPCR validation of RNA seq data? I did not see any related description of data and methods for qPCR validation. In addition, beta-actin is not a stable internal reference gene, to analyze inflammation and immune-related gene expression. See the reference entitled "Actin, a reliable marker of internal control?" published in Clin Chim Acta. Other stable housekeeping genes, such as EF1alpha and 18s, could be better internal references.

      RT-qPCR experiments were conducted using a PCR thermocycler device (CFX Connect Real-Time PCR Detection System with Precision Melt Analysis<sup>TM</sup> Software, Bio-Rad, Cat. No. 1855200EM1). This information has been incorporated into lines 608-610 of the "Materials and Methods" section. In these experiments, key gene sequences of interest, including il13, mpx, and il1β, were extracted from RNA-seq data for RT-qPCR validation. To ensure accurate normalization, potential internal controls were evaluated, and β-actin was identified as a suitable candidate due to its consistent expression levels in the intestines of both wild-type and Ctla-4-deficient zebrafish. The use of β-actin as an internal control is further supported by its application in recent studies on intestinal inflammation (Refs 1–2).

      References:

      (1) Tang Duozhuang, Zeng Ting, Wang Yiting, Cui Hui et al. (2020) Dietary restriction increases protective gut bacteria to rescue lethal methotrexate-induced intestinal toxicity. Gut Microbes 12: 1714401-1714422.

      (2) Malik Ankit, Sharma Deepika et al. (2023) Epithelial IFNγ signaling and compartmentalized antigen presentation orchestrate gut immunity. Nature 623: 1044-1052.

      How to generate sCtla-4-Ig, Cd28-Ig and Cd80/86? No method could be found.

      We apologize for the omission of these methods. The detailed protocols have now been added to the "Materials and Methods" section of the revised manuscript (lines 464-481).

      Figure 5: As reviewed in the paper entitled "Teleost T and NK cell immunity" published in Fish and Shellfsh Immunology, two types of NK cell homologues have been described in fish: non-specific cytotoxic cells and NK-like cells. There is no NKT cell identified in the teleost yet. Therefore, "NKT-like" could be better to describe this cell type.

      We refer to "NKT" cells as "NKT-like" cells, as suggested.

      For the supplementary data of scRNA-seq, there lacks the details of expression level.

      The expression levels of the corresponding genes are provided in Supplemental Table 4.

      Supplemental Table 1: There are no accession numbers of amplified genes.

      The accession numbers of the amplified genes are included in Supplemental Table 1.

      The English needs further editing.

      We have made efforts to enhance the English to meet the reviewers' expectations.

      Line 32: The tense should be the past.

      This tense error has been corrected.

      Line 363-365: The letter of this approval should be provided as an attachment.

      The approval document is provided as an attachment.

      Line 376: How to distinguish the different intestinal parts? Were they judged as the first third, second third, and last third parts of the whole intestine?

      The differences among the three segments of zebrafish intestine are apparent. The intestinal tube narrows progressively from the anterior to the mid-intestine and then to the posterior intestine. Moreover, the boundaries between the intestinal segments are well-defined, facilitating the isolation of each segment.

      Line 404: Which version of Cytoscape was used?

      The version of Cytoscape used in this study is 3.9.1. Information about the Cytoscape version is provided on line 603.

      The product information of both percoll and cell strainer should be provided.

      The information regarding Percoll and cell strainers has been added on lines 626 and 628, respectively.

      Line 814: Here should be a full name to tell what is MST.

      The acronym MST stands for "Microscale Thermophoresis", a technique that has been referenced on lines 1157-1158.

    1. Author Response

      The following is the authors’ response to the original reviews.

      In this manuscript, Xie et al report the development of SCA-seq, a multiOME mapping method that can obtain chromatin accessibility, methylation, and 3D genome information at the same time. This method is highly relevant to a few previously reported long read sequencing technologies. Specifically, NanoNome, SMAC-seq, and Fiber-seq have been reported to use m6A or GpC methyltransferase accessibility to map open chromatin, or open chromatin together with CpG methylation; Pore-C and MC-3C have been reported to use long read sequencing to map multiplex chromatin interactions, or together with CpG methylation. Therefore, as a combination of NanoNome/SMAC-seq/Fiber-seq and Pore-C/MC-3C, SCA-seq is one step forward. The authors tested SCA-seq in 293T cells and performed benchmark analyses testing the performance of SCA-seq in generating each data module (open chromatin and 3D genome). The QC metrics appear to be good and the methods, data and analyses broadly support the claims. However, there are some concerns regarding data analysis and conclusions, and some important information seems to be missing.

      1. The chromatin accessibility tracks from SCA-seq seem to be noisy, with higher background than DNase-seq and ATAC-seq (Fig. 2f, Fig. 4a and Fig. S5). Also, SCA-seq is much less sensitive than both DNase-seq and ATAC-seq (Figs. 2a and 2b). This and other limitations of SCA-seq (high background, high sequencing cost, requirement of specific equipment, etc) need to be carefully discussed.

      We thank the reviewer for the important comment about noisy GpC methylation signal in SCA-seq. We acknowledge that the SCA-seq signal presented in Fig. 2f, Fig. 4a, and Fig. S5 in our first draft was indeed noisy, as we present the raw 1D genomic signal. In this revision, we have taken steps to reduce the noise in GpC methylation signal by identifying the accessible regions on each segment of every single molecule. For each segment, we performed the sliding window analysis (50bp window sliding by a 10 bp step) with binomial test to identify accessible windows that significantly deviate from background GpC methylation ratio. The overlapping accessible windows (p < 0.05 for binomial test and contain at least two GpC sites) on the single fragments are merged as accessible region. Then we retain the GpC methylation signal inside the accessible region to reduce the background noise (Sfig 5ab). The details of the noise filtering steps are described in the Methods section (page 22 lines 13-23).

      Visually, we can observe from the updated exemplary view of 1D signal track that the noise is dramatically reduced in filtered SCA-seq GpC methylation signal compared to the raw signal (Sfig5c). The clean SCA-seq GpC methylation 1D signals were also updated (Fig2f and Fig4a). We have observed an increase in the TSS enrichment score, which is a commonly used metric for assessing the signal-to-noise ratios in ATAC-seq data quality control. Specifically, the TSS enrichment score increased to 2.74 when using the filtered signal, compared to 1.93 when using the raw signal (Sfig5d). After noise filtering, 80% of SCA-seq 1D peaks overlaps with peaks called by ATAC-seq and/or DNase-seq (Fig2ab), compared to 74% from the raw signal in the first draft.

      We thank the reviewer for raising up the concern about the sequencing cost and requirement of specific equipment. The sequencing cost is approximately 1300 USD per sample to sequence 30X depth human sample and obtain saturated GpC methylation signal (Sfig4d) as well as loop signal similar to the NGS-based Hi-C (Fig3gh). Considering that SCA-seq simultaneously provides higher-order chromatin structure and chromatin accessibility at single molecule resolution, we believe the cost is acceptable. However, it is worth noting that SCA-seq requires a regular Oxford nanopore sequencer with R9.4.1 chip, which is currently available but might be discontinued by Oxford Nanopore in the future. We have addressed all these concerns in the discussion section.

      1. In Fig. 2f, many smaller peaks are present besides the major peaks. Are they caused by baseline DNA methylation? How many of the small methylation signals are called peaks? In Fig. 4a, it seems that the authors define many more enhancers from SCA-seq data than what will be defined from ATAC-seq or DHS. Are those additional enhancers false positives? Also, it is difficult to distinguish the gray "inaccessible segments" from the light purple "accessible segments.

      We thank the reviewer for bringing up these concerns.

      Regarding the smaller peaks in the 1D genomic GpC methylation signal, we have addressed this issue by implementing the noise filtering in this revision, the small peaks on 1D tracks are greatly reduced (Fig2f, Sfig5c). It is important to note that SCA-seq generates accessibility signals specifically on ligation junctions, which differs from the one-dimensional (1D) signals obtained through ATAC-seq or DNase-seq. The presence of remaining small peaks in the SCA-seq data can be attributed to the varied sequencing depth, which is influenced by the enriched spatial interactions occurring in regions of the genome that are enriched with ligation junctions. In general, the SCA-seq 1D peaks are well correlated with the high confidence peaks from 1D track of ATAC-seq and DNase-seq (Fig2b).

      We apologize for the lack of clarity in our enhancer annotation. The enhancer regions were obtained from The Ensembl Regulatory Build (PMID: 25887522). We have now included this information in the method section (page 24 line 16).

      We thank the reviewer for pointing out this visualization problem. The color scheme has been revised, with purple now representing the inaccessible segments and yellow representing the accessible segments.

      1. For 3D genome analysis, it is important to provide information about data yield from SCA-seq. With 30X sequencing depth, how many contacts are obtained (with long-read sequencing, this should be the number of ligation junctions)? How is the number compared to Hi-C.

      We thank the reviewer for raising up this crucial point about the sequencing yield that we missed. We have now included this information in the revised result section (page 11, lines 11-14).

      We have checked the public data of a successful HEK293T Hi-C run (PMID: 34400762). The Hi-C experiment produced 699,464,541 reads (105G base), and we obtained 388,031,859 contacts.

      From 100G bases of HEK293T SCA-seq data, we obtained 81,229,369 ligation junctions and 378,848,187 virtual pairwise contacts (3.8M pairwise contacts per Gb). The SCA-seq performance of virtual pairwise contact number per Gb is similar to that of PORE-C (PMID: 35637420).

      1. Fig 3j. Because SCA-seq only do GpC methylation, the capability to detect the footprint at individual CTCF peaks depends on the density of GpC nearby. Have the authors taken GpC density into account when defining CTCF sites with or without footprint?

      We appreciate the reviewer for bringing up the concern about the GpC site density at CTCF site. We would like to highlight that Battaglia et al. have demonstrated the feasibility of identifying transcription factor binding events using GpC labeling (PMID: 36195755). In our study, we have implemented a high-resolution sliding window approach to enhance the sensitivity of CTCF binding detection. We have taken GpC density into account by performing a sliding window (50 bp window, 10 bp step) binomial test on every single molecule overlapping with CTCF site to call accessible region. The detailed steps to call accessible region has been described in the answer of the first question. Based on the pattern in Fig3j, we identify CTCF footprints if the accessible regions are called nearby the CTCF sites (at least 20 bp away from the center of CTCF sites) but not on the CTCF sites.

      To ensure that the GpC site density is sufficient for binomial test of each sliding window of the regions around CTCF site genome-wide, we examined the number of GpC sites in each window. Our analysis revealed that GpC sites are evenly distributed, and over 87% of the windows contain at least 2 GpC sites, which qualifies them for a binomial test (Author response image 1). This indicates that we are able to detect the CTCF footprint at most of the CTCF sites, taking into consideration the GpC density.

      Author response image 1.

      Genome wide GpC site density at CTCF site centered region. Distribution of the number of GpC sites (y-axis) at each 50 bp sliding window region (x-axis) was presented in violin plots.

      1. This study only performs higher resolution chromatin interaction analysis based on individual read concatenates. It is unclear to me if the data have enough depth to perform loop analysis with Hi-C pipelines.

      We thank the reviewer for highlighting this important concern about the depth of data for performing loop analysis. We have performed Aggregate peak analysis for SCA-seq and Hi-C side-by-side using hiccups function in Juicer (v1.9.9) (PMID: 27467249). We acknowledge that the level of loop signal enrichment is relatively weaker (one-fold less) in SCA-seq compared to Hi-C (Fig3h). This difference can be attributed to the lower sequencing yield per Gb in SCA-seq, which resulted in 4.93M pairwise contacts per Gb, compared to the 7M contacts per Gb in Hi-C. Despite this discrepancy, we were still able to observe the clear genome-wide loop enrichment pattern in SCA-seq (Fig3gh).

      1. It appears that SCA-seq is of low efficiency in detecting chromatin interactions. As shown in Fig. S7a, 65.4% of sequenced reads contained only one restriction enzyme (RE) fragment/segment (with no genomic contact), which is much higher than that reported in published PORE-C methods. In addition, Fig. S7g is very confusing and in conflict with Fig. S7a. For example, in Fig. S7g, 21.4% and 22.2% of CSA-seq concatemers contain one and two segments, whereas the numbers are 65.4% and 14.7% in Fig. S7a, respectively. Please explain.

      We apologize for the confusion in sfig7a and sfig7g.

      Sfig7a was intended to illustrate the cardinality count of concatemers with only chr7 segments included, representing the intra-chromosome cardinality instead of the genome-wide cardinality. We have revised sfig7a and its corresponding figure legend to clarify that the figure describes segments of intra-chromosome interactions.

      On the other hand, sfig7g shows the concatemers including both intra-chromosome and inter-chromosome segments, which explains the differences in the percentages of different cardinality ranges compared to Figure S7a. Moreover, the percentages reported in Figure S7g are similar to what is typically reported in PORE-C methods when considering both intra- and inter-chromosome interactions.

      To provide a comprehensive view of the genome-wide concatemer cardinality distribution, we have also included a histogram in Fig3k, which demonstrates the detailed distribution of cardinality for genome-wide concatemers.

      1. I disagree with the rationale of the entire Fig. S9. Biologically there is no evidence that chromatin accessibility will change due to genome interactions (the opposite is more likely), therefore the definition of "expected chromatin accessibility" is hard to believe. If the authors truly believe this is possible, they will need to test their hypothesis by deleting cohesin and check if the chromatin accessibility driven by "power center" are truly abolished. The math in Fig. S9 is also confusing. Firstly, the dimension of the contact matrix in Fig. S9 appears to be wrong, it should have 8 rows. Secondly, I don't understand why the interaction matrix is not symmetric. Third, if I understand correctly the diagonal of the matrix should be all 1, it is also hard to understand why the matrix only has 1, 0 or -1. It appears that the authors assume that the observed accessibility is a simple sum of the expected accessibility of all its interacting regions; this is wrong. In my opinion, the whole Fig. S9 should be deleted unless the authors can make sense of it and ideally also provide more evidence.

      I apologize for any confusion caused by the rationale and figures in Fig. S9. The purpose of the hypothesis presented in the figure is to explore the potential relationship between chromatin accessibility and genome interactions. While there is currently no direct biological evidence supporting this hypothesis, it is a possibility that warrants further investigation.

      Regarding the suggestion to delete Fig. S9 unless more evidence is provided, it is important to note that this paper primarily focuses on the methodology and theoretical framework. Experimental validation of the hypothesis falls outside the scope of this particular study.

      We have made corrections to the schematic matrix in Fig. S9 to accurately represent the dimensions and symmetry. The numbers in the matrix represent mean accessible values of the contacts. Specifically, accessible-accessible contacts are represented by 2, accessible-inaccessible contacts are represented by 0, and inaccessible-inaccessible contacts are represented by -2.

      Minor concerns:

      1. The authors may want to clearly demonstrate the specificity and sensitivity of the ATAC part and the efficiency of the Hi-C part of SCA-seq.

      We appreciate the reviewer’s suggestion to demonstrate the specificity and sensitivity of the ATAC-seq part and the efficiency of the Hi-C part in SCA-seq.

      We considered the non-peak region genomic bins shared by ATAC-seq and DNase-seq as true negatives and the overlapping peaks of ATAC-seq and DNase-seq as true positives. Based on these criteria, the specificity of SCA-seq 1D peaks is calculated as TN / N, where TN represents the number of true negatives (89107) and N represents the sum of true negatives and false positives (89107 + 9345). The resulting specificity is 0.91. The sensitivity of SCA-seq 1D peaks is calculated as TP / P, where TP represents the number of true positives (33190) and P represents the sum of true positives and false negatives (33190 + 11758). The resulting sensitivity is 0.73.

      We evaluate the efficiency of spatial interaction by the restriction enzyme digested fragments recovered in the pairwise contacts that contain ligation junctions. In SCA-seq, the efficiency is calculated as the number of dpnII digested fragments recovered by pairwise contacts (5625908) divided by the total number of in silico dpnII digested fragments (7127633). The resulting efficiency is 0.79.

      We have now included this information in the revised result section (page 8 lines 15-18)

      1. Fig 4g, colors with apparent differences might be used to clearly discriminate the three types of interactions (I-I, I-A and A-A).

      We appreciate the reviewer for bringing up the issue regarding the visualization in Fig 4g. The color scheme has been revised, with purple now representing I-I interactions, orange representing I-A interactions, and red representing A-A interactions. We believe that these modifications have significantly improved the clarity.

      1. Fig. 4c, when fitting an unknown curve, R-square becomes meaningless.

      We appreciate the reviewer for pointing out the issue regarding the interpretation of R-square. We have removed the R-square value from Fig. 4c.

      1. Fig 5a, "oCGIs comprised 65% CGIs that did not directly contact enhancers or promoters". Should it be "oCGIs comprised 65% of all CGIs"?

      We appreciate the reviewer for pointing out the clarification needed in Fig 5a. We have revised the phrase in the figure legend to accurately state that “oCGIs comprised 65% of all CGIs”. Thank you for bringing this to our attention.

      1. Page 15 lines 5-8, "By examining the methylation status on reads, as expected, these read segments demonstrated lower CpG methylation and higher chromatin accessibility (GpC methylation), which further supports their roles in gene activation (Fig 5b)". This statement seems to be inconsistent with the figure legend.

      We appreciate the reviewer for pointing out the inconsistency in the legend of Fig 5b. We have revised the legend of Fig 5b to accurately highlight the low CpG methylation on oCGI regions. Thank you for bringing this to our attention.

      1. Language editing and proof reading are needed.

      I apologize for any errors or mistakes in the language. We have carefully reviewed the manuscript and made the necessary language editing and proofreading revisions to ensure its quality for publication.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1:

      (1) The mechanism by which fenofibrate rescues memory loss in Kallistatin-transgenic mice is unclear. As a PPARalpha agonist, does fenofibrate target the Kallistatin pathway directly or indirectly? Please provide a discussion based on literature supporting either possibility.

      Thank you for your important suggestion. Fenofibrate is indeed acting as a PPARα agonist. Fenofibrate has been shown to protect memory and cognitive function by downregulating α- and β-secretases[1]. Activation of PPARα can reduce Aβ plaques by upregulating ADAM10, thereby protecting memory and cognition[2]. Whereas, Fenofibrate can also act through a PPARα-independent pathway[3]. In our previous study, we proved that Fenofibrate can directly down-regulate the expression of Kallistatin in hepatocytes[4]. Here, our findings showed that Kallistatin induces cognitive memory deterioration by increasing amyloid-β plaques accumulation and tau protein hyperphosphorylation (Fig. 1-3), and Fenofibrate can directly down-regulate the serum level of Kallistatin (Fig. 8G). In addition, the expression of PPARα in the hippocampal tissue of Kallistatin (KAL-TG) mice showed no significant difference compared to the WT group (Author response image 1A-B). Therefore, we think Fenofibrate may improve memory and cognitive function at least in part through a PPARα-independent effect, which provides a new mechanism of Fenofibrate in AD with elevated Kallistatin levels.

      Author response image 1.

      (A-B) Protein levels of PPARα were tested by western blot analysis in hippocampal tissue, then statistically analyzed the above results.

      (2) The current study exclusively investigated the hippocampus. What about other cognitive memory-related regions, such as the prefrontal cortex? Including data from these regions or discussing the possibility of their involvement could provide a more comprehensive understanding of the role of Kallistatin in memory impairment.

      Thank you for your suggestion. In addition to hippocampal tissue analysis, we performed immunohistochemical detection of Aβ and phosphorylated Tau levels in the prefrontal cortex. Our findings revealed that KAL-TG mice exhibited significantly elevated Aβ and phosphorylated Tau levels in the prefrontal cortex compared to WT mice. These observations align with the pathological patterns observed in hippocampal tissues, demonstrating consistent neurodegenerative pathology across both the hippocampus and prefrontal cortex. The data for this part are seen as follows.

      Author response image 2.

      (A-B) Immunofluorescence staining of Aβ and phosphorylated tau (p-tau T231) was carried out in the prefrontal cortex tissue of KAL-TG and WT mice. Error bars represented the Standard Error of Mean (SEM); **p < 0.01. Scale bar, 100 μm.

      (3) Fenofibrate rescued phenotypes in Kallistatin-transgenic mice while rosiglitazone, a PPARgamma agonist, did not. This result contradicts the manuscript's emphasis on a PPARgamma-associated mechanism. Please address this inconsistency.

      Thank you for the reminder. In fact, our results showed a trend towards improved memory and cognitive function in KAL-TG mice treated with Rosiglitazone, although its effect is not as significant as that of Fenofibrate. Several studies have reported that Rosiglitazone has a beneficial effect on memory and cognitive function in mouse models of dementia, while these studies involve treatment periods of 3 to 4 months[5, 6], whereas our treatment period was only one month. Extending the treatment period with Rosiglitazone may result in a more pronounced improvement. In addition, Fenofibrate may have a PPAR-independent pathway by downregulating Kallistatin directly as discussed above and then show stronger effects.

      (4) Most of the immunohistochemistry images are unclear. Inserts have similar magnification to the original representative images, making judgments difficult. Please provide larger inserts with higher resolution.

      According to your suggestion, we provided larger inserts with higher resolution in Fig 3A and Fig 4B, as follows:

      (5) The immunohistochemistry images in different figures were taken from different hippocampal subregions with different magnifications. Please maintain consistency, or explain why CA1, CA3, or DG was analyzed in each experiment.

      Thank you for your advice. The trends of changes in different brain regions(including CA1, CA3, or DG) are consistent. Following your suggestion, we have now selected the DG region replaced the different hippocampal subregions with the DG area, and re-conducted the statistical analysis in Fig 5I & 6C, as follows. Due to the significant deposition of Aβ only in the CA1 region, Fig 2A was not replaced.

      (6) Figure 5B is missing a title. Please add a title to maintain consistency with other graphs.

      Thanks for your suggestion. We have added a title to Figure 5B, as follows:

      (7) Please list statistical methods used in the figure legends, such as t-test or One-way ANOVA with post-hoc tests.

      Thanks for your suggestion. We have listed the statistical methods used in the figure legends.

      Reviewer #2:

      (1) It was suggested that Kallistatin is primarily produced by the liver. The study demonstrates increased Kallistatin levels in the hippocampus tissue of AD mice. It would be valuable to clarify if Kallistatin is also increased in the liver of AD mice, providing a comprehensive understanding of its distribution in disease states.

      Thank you for your suggestion. We extracted liver tissue from APP/PS1 mice, and the Western blot results indicated that the expression of Kallistatin in the liver of APP/PS1 mice was elevated, as follows:

      Author response image 3.

      (A-B) Protein levels of Kallistatin were tested by western blot analysis in the liver tissue, then statistically analyzed the above results. Error bars represented the Standard Error of Mean (SEM); **p < 0.01.

      (2) Does Kallistatin interact directly with Notch1 ligands? Clarifying this interaction mechanism would enhance understanding of how Kallistatin influences Notch1 signaling in AD pathology.

      Thank you for your suggestion. This study reveals that Kallistatin directly binds to Notch1 and contributes to the activation of the Noch1-HES1 signaling pathway. As for whether Kallistatin can bind to the ligands of Notch1, it needs to conduct further investigations in future studies. Our preliminary data showed that Jagged1 was upregulated in the hippocampal tissues of KAL-TG mice by qPCR and Western blot analyses.

      Author response image 4.

      Kallistatin promoted Notch ligand Jagged1 expression to activate Notch1 signaling. (A) QPCR analysis of Notch ligands (Dll1, Dll3, Jagged1, Jagged2) expression in the 9 months hippocampus tissue. (B) Western blotting analysis of Notch ligand Jagged1 expression in the hippocampus tissue. (C) Western blotting analysis of Notch ligand Jagged1 expression in the hippocampus primary neuron. β-actin served as the loading control. Error bars represented the Standard Error of Mean (SEM); *p < 0.05.

      (3) Is there any observed difference in AD phenotype between male and female Kallistatin-transgenic (KAL-TG) mice? Including this information would address potential gender-specific effects on cognitive decline and pathology.

      Thank you for your suggestion. Actually, we have previously used female mice for Morris Water Maze experiments, and the results showed that both male and female KAL-TG mice exhibited a phenotype of decreased memory and cognitive function compared to the gender-matched WT group, while there was no significant difference between male and female KAL-TG mice as follows:

      Author response image 5.

      (A-D) Behavioral performance was assessed through the Morris water maze test. (A) The escape latency time was presented during 1-5 days. (B-D) Cognitive functions were evaluated by spatial probe test on day 6, then analyzing each group of mice crossing platform times(B), time percent in the targeted area (C), and the path traces heatmap (D). Error bars represented the Standard Error of Mean (SEM); F represents Female, M represents Male, and TG refers to KAL-TG; *p < 0.05.

      (4) It is recommended to include molecular size markers in Western blots for clarity and accuracy in protein size determination.

      Thank you for your reminder. We have shown the molecular weight of each bolt.

      (5) The language should be revised for enhanced readability and clarity, ensuring that complex scientific concepts are communicated effectively to a broader audience.

      According to your suggestion, we have polished the article for enhancing readability and clarity.

      Reviewer #3:

      (1) The authors did not illustrate whether the protective effect of fenofibrate against AD depends on Kallistatin.

      Thank you for your important suggestion. Fenofibrate is indeed acting as a PPARα agonist. Fenofibrate has been shown to protect memory and cognitive function by downregulating α- and β-secretases[1]. Activation of PPARα can reduce Aβ plaques by upregulating ADAM10, thereby protecting memory and cognition[2]. Whereas, Fenofibrate can also act through a PPARα-independent pathway[3]. In our previous study,we proved Fenofibrate can directly down-regulate the expression of KAL in hepatocytes[4]. Here, our findings showed that Kallistatin induces cognitive memory deterioration by increasing amyloid-β plaques accumulation and tau protein hyperphosphorylation (Fig. 1-3), and Fenofibrate can directly down-regulate the serum level of Kallistatin (Fig. 8G). In addition, the expression of PPARα in the hippocampal tissue of Kallistatin (KAL-TG) mice showed no significant difference compared to the WT group (Author response image 1-B). Therefore, we think Fenofibrate may improve memory and cognitive function at least in part through downregulatin Kallistatin. To conclusively determine whether fenofibrate’s therapeutic effects depend on Kallistatin, future studies should employ Kallistatin-knockout AD animal models to evaluate fenofibrate’s impact on cognitive and memory functions. These investigations will further clarify the mechanistic underpinnings of fenofibrate in AD therapy.

      (2) The conclusions are supported by the results, but the quality of some results should be improved.

      Thank you for your kind suggestion. We have updated the magnified images in the immunohistochemistry section of the article, ensuring that the fields of view for the immunohistochemistry are within the same brain region, and have shown the molecular weights in each bolt. Additionally, we have conducted a quantitative analysis of the protein levels in the Western blot results presented in Fig6&8.

      (3) Figures 2c, 3c, and 4a present the Western blot results of p-tau from mice of different ages on one membrane, showing age-dependent expression. The authors analyzed the results of mice of different ages in one statistical chart, which will create ambiguity with the results of the representative images. For example, the expression of p-tau 396 in the blot was lower in the WT-12 M group than in the WT-9 M group (Figure 3c), which is contradictory to the statistical analysis.

      Thank you for your reminder. The statistical presentation here does not match the figure. At that time, the WB experiments for the hippocampal tissue at each age group were conducted separately, and it was not appropriate to compare different age groups together. This graph cannot illustrate age dependency. We have replaced the statistical graph in Figure 3B&D, as follows:

      (4) Figure 4b shows that KAL-TG-9 M had greater BACE1 expression than KAL-TG-12 M. Furthermore, the nuclei are not uniformly colored. Please provide more representative figures.

      Thank you for your reminder. Due to the fact that these sets of data were not processed in a single batch, the ages in the graph are not comparable. Regarding the issue of inconsistent nuclear staining, we have provided another representative image from this group, as follows:

      (5) Unclear why the BACE1 and Aβ levels seems less with KAL+shHES1 treatment than GFP+shNC treatment (Fig 6H)? This finding contradicts the conclusion.

      Thank you for your reminder. This experiment was repeated three times, and here, we have represented the representative results along with the corresponding statistical data. There are no difference between KAL+shHES1 treatment and GFP+shNC treatment. We have updated the Fig. 6H.

      (6) The Western blot results in figure 6e-h, 8h-i, and S3-S5 were not quantified.

      Thank you for your reminder. We have added statistical graphs and original images of the pictures in figure 6e-h, 8h-i, and S3-S5.

      (7) The authors did not provide the detection range of the Aβ42 ELISA kit.

      Thank you for your suggestion. The Aβ42 ELISA kit is from the IBL, with the product number 27721. Its standard range is 1.56 - 100 pg/mL, and the sensitivity is 0.05 pg/mL.

      (8)The authors did not specify the sex of the mice. This is important since sex could have had a dramatic impact on the results.

      Thank you for your suggestion. The results we present in the text are all statistically obtained from male mice. Actually, we have previously used female mice for Morris Water Maze experiments, and the results showed that both male and female KAL-TG mice exhibited a phenotype of decreased memory and cognitive function compared to the gender-matched WT group, while there was no significant difference between male and female KAL-TG mice (Author response image 5).

      Minor:

      (1) In Figure 2b, there are no units for the vertical coordinates of the statistical graph.

      Thank you for your reminder. We have added units for the vertical coordinates in Figure 2b.

      (2) In Figure 2c, the left Y-axis title is lacking in the statistic chart.

      Thank you for your reminder. We have added the left Y-axis title in the statistic chart.

      Reference:

      (1) Assaf N, El-Shamarka ME, Salem NA, Khadrawy YA, El Sayed NS. Neuroprotective effect of PPAR alpha and gamma agonists in a mouse model of amyloidogenesis through modulation of the Wnt/beta catenin pathway via targeting alpha- and beta-secretases. Progress in Neuro-Psychopharmacology and Biological Psychiatry 2020, 97: 109793.

      (2) Rangasamy SB, Jana M, Dasarathi S, Kundu M, Pahan K. Treadmill workout activates PPARα in the hippocampus to upregulate ADAM10, decrease plaques and improve cognitive functions in 5XFAD mouse model of Alzheimer’s disease. Brain, Behavior, and Immunity 2023, 109: 204-218.

      (3) Yuan J, Tan JTM, Rajamani K, Solly EL, King EJ, Lecce L, et al. Fenofibrate Rescues Diabetes-Related Impairment of Ischemia-Mediated Angiogenesis by PPARα-Independent Modulation of Thioredoxin-Interacting Protein. Diabetes 2019, 68(5): 1040-1053.

      (4) Fang Z, Shen G, Wang Y, Hong F, Tang X, Zeng Y, et al. Elevated Kallistatin promotes the occurrence and progression of non-alcoholic fatty liver disease. Signal Transduct Target Ther 2024, 9(1): 66.

      (5) Nelson ML, Pfeifer JA, Hickey JP, Collins AE, Kalisch BE. Exploring Rosiglitazone's Potential to Treat Alzheimer's Disease through the Modulation of Brain-Derived Neurotrophic Factor. Biology (Basel) 2023, 12(7).

      (6) Pedersen WA, McMillan PJ, Kulstad JJ, Leverenz JB, Craft S, Haynatzki GR. Rosiglitazone attenuates learning and memory deficits in Tg2576 Alzheimer mice. Exp Neurol 2006, 199(2): 265-273.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Weiler, Teichert, and Margrie systematically analyzed long-range cortical connectivity, using a retrograde viral tracing strategy to identify layer and region-specific cortical projections onto the primary visual, primary somatosensory, and primary motor cortices. Their analysis revealed several hundred thousand inputs into each region, with inputs originating from almost all cortical regions but dominated in number by connections within cortical sub-networks (e.g. anatomical modules). Generally, the relative areal distribution of contralateral inputs followed the distribution of corresponding ipsilateral inputs. The largest proportion of inputs originated from layer 6a cells, and this layer 6 dominance was more pronounced for contralateral than ipsilateral inputs, which suggests that these connections provide predominantly feedback inputs. The hierarchical organization of input regions was similar between ipsi- and contralateral regions, except for within-module connections, where ipsilateral connections were much more feed-forward than contralateral. These results contrast earlier studies which suggested that contralateral inputs only come from the same region (e.g. V1 to V1) and from L2/3 neurons. Thus, these results provide valuable data supporting a view of interhemispheric connectivity in which layer 6 neurons play an important role in providing modulatory feedback.

      The conclusions of this paper are mostly well-supported by the data and analysis, but additional consideration of possible experimental biases is needed.

      We thank the reviewer for their positive feedback on our manuscript.

      Further discussion or analysis is needed about possible biases in uptake efficiency for different cell types. Is it possible that the nuclear retro-AAV has a tropism for layer 6 axons? Quantitative comparisons with results obtained with alternative methods such as rabies virus (Yao et al., 2023) or anterograde tracing (Harris et al., 2019) may be helpful for this.

      We appreciate this technical comment. For the reasons indicated below we are confident that our AAV approach successfully and rather comprehensively labels inputs to the three target areas. Firstly, in the brains in which we injected our retrograde nuclear-AAV tracer into VISp, SSp-bfd or MOp we found several instances where layer 5 and/or layer 2/3 as was the dominant cortical projection layer (please see e.g. Figure 3 heatmaps). This was true for both ipsilateral and contralateral projection. 

      Secondly, by way of comparison Yao et al., 2023 injected rabies virus into VISp (but not in SSp-bfd or MOp) and their results show notable similarities to ours: 1) They show that contralateral inputs to VISp (and higher visual areas) were mainly located in Layers 5 and 6. 2) Retrogradely labelled neurons in higher visual areas revealed anatomical hierarchy that reflects the known functional hierarchy of the mouse cortical visual system and that shown by our retro-AAV approach. Thus, as AAV and rabies based tracing lead to similar results, this is further evidence against bias via tropism of our AAV tracer. That said, direct comparisons of the results between our study and the Yao et al., 2023 study should be viewed with some caution since Yao et. al.  injected rabies virus into specific Cre-driver lines in which the rabies virus targets individual genetically defined cell types in specific layers. Importantly, because of the lack of a specific cre-driver line, L6 cortico-cortical (L6 CC) cells could not be targeted by their approach. Thus, the dataset in Yao et al., overlook the contribution of L6 CCs due to the lack of available Cre-lines. 

      Thirdly, in a recent study (Weiler et al., 2024) we found that in a specific pathway (SSp-bfd→ VISp) both retro-AAV and the more traditional non-viral tracer cholera toxin subunit B (CTB) identified neurons in Layer 6 as the main source of projection neurons. The same results for the same pathway was shown by Bieler et al., 2019 (Bieler et al., 2017) using Fluorogold for retrograde tracing. Thus, the described dominance of Layer 6 projection neurons in specific pathways is likely not the result of a tropism of retro-AAV tracers. 

      Please also see that we have now further extended the summary of these points in our revised manuscript in the discussion section (e.g. lines 457-463): 

      Quantitative analysis of the injection sites should be included to account for possible biases. For example, L6 neurons are known to be the main target of contralateral inputs into the visual cortex (Yao et al., 2023). Thus, if the injections are biased towards or against layer 6 neurons, this may change the layer distribution of retrogradely labeled input cells. Comparison across biological replicates may help reveal sensitivity to particular characteristics of the injections.

      In response to the reviewers' feedback, please see we have now quantified the injection volume per cortical layer, as shown in the revised Fig. S3D. Our results indicate that the injections were not biased toward Layer 6. Instead, the injected tracer volumes in Layers 1, 4, 5, and 6 were similar across all animals and injected areas. However, we observed that the injected tracer volume in Layer 2/3 tended to be higher than in other layers. Although the tracer volumes in Layers 2/3 appeared to be higher, the proportion of input neurons located in Layers 2/3 for most of the cortical projection areas was consistently lower than that from Layer 6. These findings provide strong evidence against injection bias towards L6 inputs.

      The possibility of labelling axons of passage within the white matter should be addressed. This could potentially lead to false positive connections, contributing to the broad connectivity from most cortical regions that were observed.

      For clarification, please see Fig.S2B in our revised manuscript. In this panel we plot the average percentage volume of the viral boli in the target areas and in all other nearby structures including the white matter. The percentage of virus injected into the white matter (WM) was 0.0824 ± 0.0759% for VISp and 0.0650 ± 0.0481 for SSp-bfd injections. Notably, injections into MOp showed no leakage into white matter (0%). These minimal volumes of virus in the white matter are unlikely to significantly influence the observed profile of widespread connectivity. Please see we have added a sentence to the Results section (lines 84-86) where we state that we only used brains that had a transduction of the white matter below 0.1%.

      Reviewer #2 (Public review):

      Summary:

      Weiler et al use retrograde tracers, two-photon tomography, and automatic cell detection to provide a detailed quantitative description of the laminar and area sources of ipsi- and contralateral cortico-cortical inputs to two primary sensory areas and a primary motor area. They found considerable bilateral symmetry in the areas providing cortico-cortical inputs. However, although the same regions in both hemispheres tended to supply inputs, a larger proportion of inputs from contralateral areas originated from deeper layers (L5 and L6).

      Strengths:

      The study applies state-of-the-art anatomical methods, and the data is very effectively presented and carefully analyzed. The results provide many novel insights into the similarities and differences of inputs from the two hemispheres. While over the past decade there have been many studies quantitatively and comprehensively describing cortico-cortical connections, by directly comparing inputs from the ipsi and contralateral hemispheres, this study fills in an important gap in the field. It should be of great utility and an important reference for future studies on inter-hemispheric interactions.

      We thank the reviewer for this encouraging feedback on our manuscript.

      Weaknesses:

      Overall, I do not find any major weakness in the analyses or their interpretation. However, one must keep in mind that the study only analyses inputs projecting to three areas. This is not an inherent flaw of the study; however, it warrants caution when extrapolating the results to callosal projections terminating in other areas. As inputs to two primary sensory areas and one is the primary motor cortex are studied, some of the conclusions could potentially be different for inputs terminating in high-order sensory and motor areas. Given that primary areas were injected, there are few instances of feedforward connections sampled in the ipsilateral hemisphere. The study finds that while ipsi-projections from the visual cortex to the barrel cortex are feedforward given its fILN values, those from the contralateral visual cortex are feedback instead. One is left to wonder whether this is due to the cross-modal nature of these particular inputs and whether the same rule (that contralateral inputs consistently exhibit feedback characteristics regardless of the hierarchical relationship of their ipsilateral counterparts with the target area,) would also apply to feedforward inputs within the same sensory cortices.

      We acknowledge that what we find for primary sensory and motor target areas may not hold for other functionally different areas such as anterior cingulate cortex, retrosplenial cortex or frontal lobe that might be expected to receive strong feedforward cortical input. To begin to understand the organization of the global cortical input we have however first explored with primary sensory and motor areas. Please see that we have now added a sentence to the Discussion section of our manuscript that highlights the importance of investigating the hierarchical organization of intra and interhemispheric input onto higher cortical areas or within subregions of a given sensory area.

      Another issue that is left unexplored is that, in the current analyses the barrel and primary visual cortex are analyzed as a uniform structure. It is well established that both the laminar sources of callosal inputs and their terminations differ in the monocular and binocular areas of the visual cortex (border with V2L). Similarly, callosal projections differ when terminating the border of S1 (a row of whiskers), and then in other parts of S1. Thus, some of the conclusions regarding the laminar sources of callosal inputs might depend on whether one is analyzing inputs terminating or originating in these border regions.

      The aim of the present study was to analyse the global projectome to the VISp, SSp-bfd and MOp, irrespective of which subregions were included. Importantly, we purposely injected rather large bolus volumes to achieve large sample sizes of target neurons in each cortical layer.  For SSp-bfd, we utilised our previously reconstructed barrel map (Weiler et al., 2024) to precisely map our viral injection sites onto the barrels (Author response image 1). Analysis revealed that the six injection sites consistently encompassed 7–13 barrels (Author response image 1, three exemplary injection sites). Additionally, we determined the centres of mass for each injection site and mapped them onto the barrel map. Four of the injection sites were located in the lateral part of SSp-bfd, two in the central region, and none in the medial part. Notably, the injection sites within SSp-bfd exhibited significant overlap. As a result, a selective analysis of callosal projections targeting these injection sites would likely not yield distinct projection patterns, as the projectomes would inevitably include projections to surrounding barrels, leading to contamination.

      Author response image 1.

      Left: exemplary Injection sites mapped onto the 3D barrel map of SSp-bfd within the Mouse Allen Brain Atlas. Barrels were reconstructed using a specialized software as described previously (Weiler et al., 2024) Right: Centres of mass of all SSp-bfd injection sites mapped onto the 3D barrel map.

      Due to the fact we covered a significant proportion of the respective target primary sensory area any further subdivision of these data is not possible and requires more tailored injections into clearly defined subareas. Investigating the separate projectomes onto these subregions (e.g. onto V1M and V1B) remains an important interesting research question that we, at least in part, will address in a future study.

      Finally, while the paper emphasizes that projections from L6 "dominate" intra and contralateral cortico-cortical inputs, the data shows a more nuanced scenario. While it is true that the areas for which L6 neurons are the most common source of cortico-cortical projections are the most abundant, the picture becomes less clear when considering the number of neurons sending these connections. In fact, inputs from L2/3 and L5 combined are more abundant than those from L6 (Figure 3B), challenging the view that projections from L6 dominate ipsi- and contralateral projecting cortico-cortical inputs.

      We agree in the case of the barrel cortex, layer 5 significantly contributes in terms of the number of brain areas projecting from within the ipsilateral and contralateral hemispheres. Please see we have replaced the term “dominates” in the title, abstract and in the manuscript where relevant.

      Recommendations for the authors:  

      Reviewer #1 (Recommendations for the authors):

      The sections analyzing the role of L6 towards feedback (pg. 11-13, Figure 6) were a bit verbose and confusing to me. Three possible models are proposed:

      (1) a decrease in L23 projections, (2) an increase in L56 projections, or (3) both.

      However, what is being quantified appears to be the fractions inputs, with L23. L5, and L6 summing to 1. Thus, a decrease in L23 would necessarily result in an increase in L56 projections. It seems like it would make more sense to quantify the percent change in the total number of inputs (rather than fractional) from each layer so that the 3 models are actually independent possibilities.

      The issue with the suggested analysis is that, with one exception (one area projecting to MOp), the number of projection neurons in contralateral areas is always ~60-80% lower compared to their ipsilateral counterparts. Consequently, this is also true for the number of projection neurons in the different cortical layers. Thus, quantifying the percentage change from the ipsilateral to the contralateral hemisphere in the total number of inputs from each layer will always result in negative values. 

      Nevertheless, we addressed the reviewer’s issue by calculating the preservation index (1(ipsi-contra)/(ipsi+contra)) for the sensory-motor areas independently for the absolute number of neurons within L2/3, 5 and 6 for the cortical areas projecting to VISp, SSp-bfd and MOp (see Author response image 2). When analysing the shift from the ipsilateral to the contralateral hemisphere, we observed that significantly more projection neurons were preserved in L6 compared to L2/3 for VISp and SSp-bfd. This shows that the number of L6 projection neurons declines less from the ipsilateral to the contralateral hemisphere compared to L2/3. However, our focus was on the fraction of projection neurons within each layer relative to the other layers per hemisphere (see Fig.6 of our manuscript). This measure is critical for distinguishing between feedforward and feedback connectivity. Calculating the change for each layer independently unfortunately does not provide insights into this comparison, as it does not capture the relative distribution of projection neurons across layers, which is central to our analysis. Therefore, we chose to present the data as layer fractions normalised within each hemisphere separately, enabling a comparison of relative changes between hemispheres, as shown in Fig.6 in the manuscript. We agree that with our approach a decrease in the fraction of L2/3 neurons would necessarily lead to an increase in the fraction of L5+6 neurons. However, as we analysed the fractional change for L5 and L6 separately, we found that the fraction of projection neurons in L5 generally showed only minor changes, while the fraction of L6 projection neurons increased substantially (Fig.6C). In addition, excluding L5 from the ipsi- or contralateral default network had significant effects on the fILN in only a relatively small number of projection areas. Excluding L6 resulted in significant changes in many more projection areas than layer 5.

      Author response image 2.

      Preservation index for L2/3, L5 and L6 of the 24 sensory-motor areas projecting onto the three target areas VISp, SSp-bfd and MOp.

      Reviewer #2 (Recommendations for the authors):

      I feel that there are a few conclusions that could be strengthened in the paper:

      (1) The laminar sources of callosal inputs and their terminations differ in the monocular and binocular areas of the visual cortex (border with V2L. Similarly, callosal inputs are different close to the border of S1 with S2 than in the rest of the barrel cortex. From the methods sections and Figure S2, it seems that some injections targeted the V1 binocular zone while others were aimed at the monocular zone. Thus, it would be of interest to compare the laminar distribution and fILM of the contra inputs in inputs to the binocular and monocular zones (and S1 border vs the rest, if possible within this dataset).

      Please see the answer for the reviewer’s second point in the public review (above).

      (2) The results are currently a bit unclear on whether the contra inputs reflect the cortical hierarchy. Figure 4E-F makes it clear that the ipsi and contra fILMs do not always match. However, it seems from the plots in Figure 4D and Figure S6 that, while the contra fILM values are always higher, there might be a correlation between the ipsi and contra fILM. This could be addressed by directly plotting contra vs ipsi fILM.

      Similarly, it would be useful to directly address if there is any hint of the visual hierarchy, as calculated in Figure S5 for the contra inputs.

      Regarding the first point of the reviewer: We appreciate this comment. We do indeed find a positive correlation between the fILN ipsilateral and fILN contralateral across the individual cortical areas for all three targets. (please see Author response image 3 below). This is indeed an interesting observation that indicates a high degree of preservation concerning the rank order of the anatomical hierarchy within the input arising from both hemispheres. Please see we have included this new figure 4F into the manuscript and added a sentence in the results (lines 282-288): 

      Regarding the second point of the reviewer: For visual hierarchy, although weaker, we find that the hierarchical ranking of the higher cortical visual areas is preserved for the contralateral hemisphere (see Author response image 3 below). 

      Author response image 3.

      Rank ordered average fILN values (± sem) of higher visual cortical areas of the ventral and dorsal visual stream for the ipsilateral and contralateral hemisphere.

      (3) I find the emphasis in the title and other parts of the paper on Layer 6 corticocortical cells dominating the anatomical organization of intra and interhemispheric feedback a bit of an overstatement. While it is true that the areas for which L6 is the most abundant source of cortico-cortical projections are the most abundant (Figure 3C), when just focusing on the number of neurons sending corticocortical connections (Figure 3B), this is less clear. Ipsi connections are roughly divided 1/3, 1/3 , 1/3 between L2/3 , L5 and L6. In the contra, while projections from L6 neurons are the most abundant, there are not a majority and are less than those of L2/3 and L5 together. I suggest revising the statement about L6 cells dominating cortico-cortical connections to more accurately reflect these nuances.

      (4) The observations from Figure 3 discussed above suggest that L6 inputs dominate in areas with less abundant projections to the injected areas. Is this the case? Is the fraction of L6 inputs inversely correlated with the number of inputs from that area?

      Please see the following correlation plots for the total number of inputs versus the fraction of L6 inputs per area for both the ipsilateral and contralateral hemisphere. We do find on the ipsilateral hemisphere a negative correlation between the total number of inputs and the L6 input fraction for VISp and to a lesser degree for SSp-bfd. Interestingly, we find the opposite correlation for the ipsilateral MOp, contralateral VISp, SSp-bfd and MOp (Author response image 4, Author response table 1). While this is an interesting finding, the correlations often appeared to be weak and often absent within the individual animals and across the three target areas (Author response table 1). Thus, these correlations are seemingly not a general feature of cortical connectivity.

      Author response image 4.

      Total number of cells versus fraction of cells within L6 per cortical brain area (average across animals) for the ipsilateral (top) and contralateral (bottom) hemisphere for the three target areas VISp, SSp-bfd and MOp.

      Author response table 1: Respective correlations between total numbers of cells and fraction of cells within L6 per cortical brain area for the ipsilateral and contralateral hemisphere for the three target areas (significant correlations highlighted with green).

      Minor issues:

      (5) Where was the mouse in Figure 3A injected?

      In this exemplary mouse the retrograde tracer was injected into VISp. We added this information in the Figure legend of Figure 3A1. 

      (6) Clarify in panel 4F that the position of the circle corresponds to the area location.

      Done as suggested. 

      References

      Bieler M, Sieben K, Cichon N, Schildt S, Röder B, Hanganu-Opatz IL. 2017. Rate and Temporal Coding Convey Multisensory Information in Primary Sensory Cortices. eNeuro 4. doi:10.1523/ENEURO.0037-17.2017

      Weiler S, Rahmati V, Isstas M, Wutke J, Stark AW, Franke C, Graf J, Geis C, Witte OW, Hübener M, Bolz J, Margrie TW, Holthoff K, Teichert M. 2024. A primary sensory cortical interareal feedforward inhibitory circuit for tacto-visual integration. Nat Commun 15:3081. doi:10.1038/s41467-024-47459-2

      Yao S, Wang Q, Hirokawa KE, Ouellette B, Ahmed R, Bomben J, Brouner K, Casal L, Caldejon S, Cho A, Dotson NI, Daigle TL, Egdorf T, Enstrom R, Gary A, Gelfand E, Gorham M, Griffin F, Gu H, Hancock N, Howard R, Kuan L, Lambert S, Lee EK, Luviano J, Mace K, Maxwell M, Mortrud MT, Naeemi M, Nayan C, Ngo N-K, Nguyen T, North K, Ransford S, Ruiz A, Seid S, Swapp J, Taormina MJ, Wakeman W, Zhou T, Nicovich PR, Williford A, Potekhina L, McGraw M, Ng L, Groblewski PA, Tasic B, Mihalas S, Harris JA, Cetin A, Zeng H. 2023. A whole-brain monosynaptic input connectome to neuron classes in mouse visual cortex. Nat Neurosci 26:350–364. doi:10.1038/s41593-022-01219-x

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      The authors analyzed the causative association between circulating immune cells and periodontitis, and reported three risk immune cells related to periodontitis. The significance of the findings is fundamental, which substantially advances our understanding of periodontitis. The strength of evidence is convincing.

      Reviewer #1 (Public Review):

      Ye et al. used Mendelian randomization method to evaluate the causative association between circulating immune cells and periodontitis and finally screened out three risk immune cells related to periodontitis. Overall, this is an important and novel piece of work that has the potential to contribute to our understanding of the causal relationship between circulating immune cells related to periodontitis. However, there are still some concerns that need to be addressed.

      We sincerely appreciate the constructive feedback from the editor and reviewers, which has been instrumental in enhancing the quality of our manuscript.

      (1) The authors used 1e-9 as the threshold to select effective instrumental variables (IVs), which should give the corresponding references. Meanwhile, the authors should test and discuss the potential impact of inconsistent thresholds for exposure (1e-9, 5e-6 were selected by the author respectively) and outcome IVs (5e-8) on the robustness of the results.

      Thank you for your insightful comments. We have selected two GWAS databases as the data sources for the exposure group: the BCC Consortium with a sample size of 563,946, and the Sardinian cohort of 3,757. The considerable disparity in sample size between them may result in variations in outcomes, primarily showcased in the differences in positive SNP numbers. We, therefore, adopted an unconventional (non 5e-8) yet rigorously controlled screening strategy, an approach that is widely accepted in MR studies (Li et al., 2022; Liu et al., 2023). We believe that the present thresholds are sufficiently rigorous to guarantee the validity of the subsequent Mendelian randomization analysis.

      However, employing two distinct methods in exposure screening is not typical, and we posit that this method can be viewed as an innovative strategy, providing a reference for future research dealing with two databases with significant discrepancies (Huang et al., 2023; Kong et al., 2023). As you perceptively noted, we acknowledge that this strategy may exert a certain influence on the research outcomes, and we have factored this potential limitation into our manuscript. “Third, the considerable variation in sample size between the two exposure databases contributes to the discrepancies in the number of positive SNPs. Despite our exploration of multiple selection thresholds for IVs, the inconsistency in screening methods and the discrepancy in the included SNPs could potentially introduce bias.” (Page 14)

      As for the "outcome IVs with 5e-8" you mentioned, we didn't implement this screening threshold in the outcome IVs. Indeed, we applied the same screening criteria as specified at 5e-06 (refer to Stable 2). Is the statement that you're referring to the following: "Additionally, SNPs that displayed a direct association with the outcome would also be excluded to uphold the third MR assumption (P < 5e-8)" (Page 6)? In this context, we adopted a standard criterion in the IVs screening process to remove SNPs directly associated with the outcome.

      Reference

      Huang W, Wang Z, Zou C, Liu Y, Pan Y, Lu J, Zhou K, Jiao F, Zhong S, Jiang G. 2023. Effects of metabolic factors in mediating the relationship between Type 2 diabetes and depression in East Asian populations: A two-step, two-sample Mendelian randomization study. J Affect Disorders 335:120–128. doi:10.1016/j.jad.2023.04.114

      Kong L, Ye C, Wang Y, Zheng J, Zhao Z, Li M, Xu Y, Lu J, Chen Y, Xu M, Wang W, Ning G, Bi Y, Wang T. 2023. Causal effect of lower birthweight on non-alcoholic fatty liver disease and mediating roles of insulin resistance and metabolites. Liver Int 43:829–839. doi:10.1111/liv.15532

      Li P, Wang H, Guo L, Gou X, Chen G, Lin D, Fan D, Guo X, Liu Z. 2022. Association between gut microbiota and preeclampsia-eclampsia: a two-sample Mendelian randomization study. Bmc Med 20:443. doi:10.1186/s12916-022-02657-x Liu B, Lyu L, Zhou W, Song J, Ye D, Mao Y, Chen G-B, Sun X. 2023. Associations of the circulating levels of cytokines with risk of amyotrophic lateral sclerosis: a Mendelian randomization study. Bmc Med 21:39. doi:10.1186/s12916-023-02736-7

      (2) What is the reference for selecting Smoking, Fasting plasma glucose, and BMI as covariates? They do not seem to be directly related to immune cells as confounding factors.

      The variables of Smoking, Fasting Plasma Glucose (FPG), and Body Mass Index (BMI) are commonly used as covariates in multivariable Mendelian randomization studies (Kong et al., 2023; Liu et al., 2023). The association between Smoking, FPG, and BMI with immune cells may not be immediately apparent. However, these factors have been identified as potential confounders that could impact overall health, which in turn may indirectly modulate systemic immune responses, susceptibility, and inflammation.

      (1) . Smoking: It has been well-documented that smoking can cause inflammation and impair immune function, thereby increasing individual's susceptibility to infections and diseases (Shiels et al., 2014). As such, smoking is recognized as a covariate that could potentially influence the outcomes of an investigation into immune cells.

      (2) FPG: Elevated FPG levels indicate poor glycemic control, potentially leading to conditions like diabetes (Choi et al., 2018). Consequently, studies have demonstrated that elevated FPG levels can compromise the immune system's ability to combat infections.

      (3) BMI: It is a measure of body fat that takes into account a person's weight and height. Both obesities, characterized by a high BMI, and underweights, characterized by a low BMI, have been associated with a range of health issues, inclusive of a compromised immune system (Piñeiro-Salvador et al., 2022). Consequently, BMI is factored in as a covariate in this study.

      We have thus incorporated these factors as covariates in our study to mitigate their potential confounding effects. The selection of these covariates is primarily guided by previous research and established knowledge concerning the potential influences on immune function. We appreciate your query and will ensure to clarify this point in our revised manuscript. “We have incorporated covariates, including the number of cigarettes smoked, fasting plasma glucose (FPG) levels, and body mass index (BMI) into the MVMR analysis, given that these factors could indirectly affect systemic immune responses and inflammation (Liu et al., 2023).” (Page 6-7)

      Reference

      Choi S-C, Titov AA, Abboud G, Seay HR, Brusko TM, Roopenian DC, Salek-Ardakani S, Morel L. 2018. Inhibition of glucose metabolism selectively targets autoreactive follicular helper T cells. Nat Commun 9:4369. doi:10.1038/s41467-018-06686-0

      Kong L, Ye C, Wang Y, Zheng J, Zhao Z, Li M, Xu Y, Lu J, Chen Y, Xu M, Wang W, Ning G, Bi Y, Wang T. 2023. Causal effect of lower birthweight on non-alcoholic fatty liver disease and mediating roles of insulin resistance and metabolites. Liver Int 43:829–839. doi:10.1111/liv.15532

      Liu Y, Lai H, Zhang R, Xia L, Liu L. 2023. Causal relationship between gastro-esophageal reflux disease and risk of lung cancer: insights from multivariable Mendelian randomization and mediation analysis. Int J Epidemiol 52:1435–1447. doi:10.1093/ije/dyad090

      Piñeiro-Salvador R, Vazquez-Garza E, Cruz-Cardenas JA, Licona-Cassani C, García-Rivas G, Moreno-Vásquez J, Alcorta-García MR, Lara-Diaz VJ, Brunck MEG. 2022. A cross-sectional study evidences regulations of leukocytes in the colostrum of mothers with obesity. BMC Med 20:388. doi:10.1186/s12916-022-02575-y

      Shiels MS, Katki HA, Freedman ND, Purdue MP, Wentzensen N, Trabert B, Kitahara CM, Furr M, Li Y, Kemp TJ, Goedert JJ, Chang CM, Engels EA, Caporaso NE, Pinto LA, Hildesheim A, Chaturvedi AK. 2014. Cigarette smoking and variations in systemic immune and inflammation markers. J Natl Cancer Inst 106:dju294. doi:10.1093/jnci/dju294

      (3) It is not entirely clear about the correction of P-value for the total number of independent statistical tests.

      In our study, we used the Bonferroni correction to adjust the P-values for multiple comparisons. The adjusted P-value is calculated as the original P-value times the total number of independent statistical tests. Specifically, we applied multiple corrections in the following two aspects: First, we corrected the results of the FUSION algorithm in TWAS, with a correction value of P < 6.27 ×10-6 (0.05/7,890 genes) (Page 8). Second, we performed multiple corrections on the initial results of MR (P < 0.05/17 traits = 0.003). However, none of the results met the criteria after the correction, which is one of the limitations detailed in the discussion section of our study (Page 14).

      (4) The author used whole blood data to apply FUSION algorithm. Although whole blood is a representative site, the authors should add FUSION testing of periodontally relevant tissues, such as oral mucosa.

      We appreciate your insightful comments and suggestions. We concur that employing periodontally relevant tissues, like oral mucosa, for FUSION testing might yield more precise and pertinent results. However, in the Genotype-Tissue Expression project (GTEx) database, we could not find transcriptome data related to oral tissues, such as gums, oral mucosa, and alveolar bone (Review Table 1). Owing to the limitations of the database, in the context of our study, we primarily relied on whole blood data, given its availability and the extensive precedent documented in the literature for its utilization (Xu et al., 2023; Yuan et al., 2022).

      We acknowledge that this is a limitation of our study and will certainly consider incorporating periodontally relevant tissues in our future research. In the revised manuscript, we have explicitly stated this limitation and underscored the necessity for additional studies to corroborate our findings with periodontally relevant tissues. Fifth, we relied on the whole blood data For FUSION algorithm due to the lack of transcriptome data associated with oral tissues (such as gums, oral mucosa, and alveolar bone) in the GTEx database. “Fifth, we relied on the whole blood data For FUSION algorithm due to the lack of transcriptome data associated with oral tissues (such as gums, oral mucosa, and alveolar bone) in the GTEx database. This has led to an excessive focus on systemic immunological changes, thereby overlooking the significance of alterations in local periodontal tissue immunity. Such an oversight could potentially compromise the precision and pertinence of our research findings.” (Page 15)

      Author response table 1.

      Organizations and Samplesize in the GTEx database

      Reference

      Xu J, Si H, Zeng Y, Wu Y, Zhang S, Shen B. 2023. Transcriptome-wide association study reveals candidate causal genes for lumbar spinal stenosis. Bone Joint Res 12:387–396. doi:10.1302/2046-3758.126.BJR-2022-0160.R1

      Yuan J, Wang T, Wang L, Li P, Shen H, Mo Y, Zhang Q, Ni C. 2022. Transcriptome‐wide association study identifies PSMB9 as a susceptibility gene for coal workers’ pneumoconiosis. Environmental Toxicology 37:2103–2114. doi:10.1002/tox.23554

      (5) The authors chose gingival hyperplasia as a secondary validation phenotype of periodontitis in this study. However, gingival recession, as another important phenotype associated with periodontitis, should also be tested and discussed.

      We appreciate your insightful feedback highlighting the significance of incorporating gingival recession as a phenotype in periodontitis studies. Our emphasis on gingival hyperplasia in the study was primarily dictated by the initial study design and the data available from FinnGen R9K11. Notwithstanding the lack of gingival recession data in the available databases, we identified chronic gingivitis data in an earlier version of the Finnish database (FinnGen R5K11) as an alternative. We performed a Mendelian Randomization analysis on this dataset, with the results integrated into Supplementary Table 10. Concurrently, Table 1, Supplementary Table 1, Figure 4, and the corresponding descriptions in the manuscript were updated. We trust this adjustment can address the limitations identified in our research. We are confident that this not only augments the comprehensiveness of our study but also fosters a more holistic comprehension of periodontal disease.

      (6) This study used GLIDE data as a replicated validation, but the results were inconsistent with FinnGen's dataset.

      Thank you for your insightful comments and for bringing this issue to our attention. Indeed, it is of utmost importance to ensure the validity and reliability of our findings across various datasets. The observed inconsistency between the GLIDE data and FinnGen's dataset could be attributed to several reasons.

      Firstly, this discrepancy might originate from the differences in population composition. The former is grounded on a comprehensive meta-analysis of cohorts focusing on periodontitis, whereas the latter utilizes a dataset from a full-phenotype cohort. In the former, the ratio of periodontitis to the control groups is approximately 1:2. In contrast, the ratio in the latter seems to be minuscule. The sample size in the FinnGen data may not suffice to detect the effects observed in the GLIDE dataset, given that larger exposure sizes enhance the ability to detect genuine associations.

      Moreover, the heterogeneity of periodontitis can potentially result in variable outcomes. Phenotypic definition methods differ between the two databases. The GLIDE database diagnoses based on the criteria of Centers for Disease Control and Prevention/American Academy of Periodontology (CDC/AAP) and Community Periodontal Index (CPI) for physical signs. While the FinnGen database adopts the International Classification of Diseases (ICD) 10 standard for a comprehensive diagnosis. The former database employs a more practical yet broader standard for periodontitis, which might encompass pseudo-periodontitis.

      Finally, the observed differences could be attributed to the variations in immune responses at distinct stages of periodontitis. During the initial stages of periodontitis, neutrophils and macrophages primarily mediate the immune response. With the progression of the disease, the involvement of T cells and B cells increases, thereby leading to a more intricate immune response (Darveau, 2010). Besides, the immune system's response to these oral health conditions is not uniform and can be influenced by multiple factors, including the individual's overall health, genetics, and lifestyle, potentially impacting the results (Hung et al., 2023).

      Reference

      Darveau RP. 2010. Periodontitis: a polymicrobial disruption of host homeostasis. Nat Rev Microbiol 8:481–490. doi:10.1038/nrmicro2337

      Hung M, Kelly R, Mohajeri A, Reese L, Badawi S, Frost C, Sevathas T, Lipsky MS. 2023. Factors Associated with Periodontitis in Younger Individuals: A Scoping Review. J Clin Med 12:6442. doi:10.3390/jcm12206442

      Reviewer #2 (Public Review):

      This manuscript presents a well-designed study that combines multiple Mendelian randomization analyses to investigate the causal relationship between circulating immune cells and periodontitis. The main conclusions of the manuscript are appropriately supported by the statistics, and the methodologies used are comprehensive and rigorous.

      These findings have significant implications for periodontal care and highlight the potential for systemic immunomodulation management on periodontitis, which is of interest to readers in the fields of periodontology, immunology, and epidemiology.

      We greatly appreciate the positive feedback and valuable insights provided by the reviewer, which have significantly contributed to the improvement of our manuscript.

      Reviewer #2 (Recommendations for The Authors):

      *Abstract

      Line 30-32: "Two-sample bidirectional univariable MR followed by sensitivity testing, multivariable MR, subgroup analysis, and the Bayesian model averaging (MR-BMA) were performed to explore the causal association between them. " What does the term "them" refer to here, please clarify it. The research method here is unclear, please reorganize it.

      Line 39: "S100A9 and S100A12" here should be italic.

      We appreciate your meticulous suggestions and have revised the methods section accordingly. Additionally, the two genes have been highlighted in italics for emphasis.

      "Univariable MR, multivariable MR, subgroup analysis, reverse MR, and Bayesian model averaging (MR-BMA) were utilized to investigate the causal relationships. Furthermore, transcriptome-wide association study (TWAS) and colocalization analysis were deployed to pinpoint the underlying genes." (Page 1)

      Introduction

      Line 78-80: "As reported, the number of immune cells in periodontal tissue changes as periodontitis progresses, featuring an increase in monocytes, and B cells and a decrease in T cells." Does the author mean that both monocytes and B cells increase as periodontitis progresses?

      We are grateful for your meticulous reading and perceptive inquiries. We would like to confirm the accuracy of your understanding. In lines 78-80, our intended message was to communicate that with the progression of periodontitis, there is an increase in both monocytes and B cells in the periodontal tissue. This represents a typical immune response to the infection, where these cells play a pivotal role in counteracting periodontal pathogens. To enhance clarity, we have revised these lines in the manuscript as follows:

      "With the progression of periodontitis, there is a significant alteration in the quantity of immune cells present within the periodontal tissue. Specifically, an increase in the count of both monocytes and B cells is observed, whereas a decrease is noted in the count of T cells." (Page 3)

      Method

      Line 164-165: "As the main test, the MVMR-IVW method, offered by the MVMR-least absolute shrinkage and selection operator (MVMR-LASSO), and the MVMR-Egger method were chosen." The author's expression here is ambiguous.

      In response to your comment on the ambiguity in lines 164-165, we have revised the sentence for clarity. We hope this addresses your concern and clarifies our point more effectively.

      "The MVMR-IVW method was utilized as the primary test, supplemented by the MVMR-least absolute shrinkage and selection operator (MVMR-LASSO) and the MVMR-Egger method." (Page 7)

      Table 1: FinnGen has a greater sample size and more SNPs than GLIDE; why do authors choose the latter as the primary analysis?

      Our choice to utilize GLIDE as the primary analysis tool, instead of FinnGen, was mainly guided by the specific research question we aimed to address. Despite FinnGen offering a larger sample size and more SNPs, GLIDE offers a more specialized and targeted dataset that suits the unique requirements of our study. In most MR studies, a similar strategy is adopted, wherein a large database of disease GWAS meta is utilized for exploration, followed by validation in full phenotype cohort (such as UKBiobank and FinnGen) (Liu et al., 2023; Yuan et al., 2023). To summarize, the reasons may primarily include the following:

      Firstly, GLIDE offers a concentrated and targeted methodology for examining genetic data pertinent to periodontitis. This dataset is grounded in a comprehensive meta-analysis of cohorts centered on periodontitis, wherein the ratio of periodontitis cases to control groups is approximately 1:2. Conversely, the proportion in FinnGen seems to be negligible, given that it employs a dataset derived from a comprehensive phenotype cohort. Consequently, employing the GLIDE database as a primary investigative tool can generate more abundant genetic information associated with periodontitis.

      Furthermore, the methodological facets of GLIDE align more accurately with the analytical framework of our study. For instance, the diagnostic criteria methods vary between the two databases. The GLIDE database derives its basis from the Centers for Disease Control and Prevention/American Academy of Periodontology (CDC/AAP) and Community Periodontal Index (CPI) for physical indicators. In contrast, the FinnGen database employs the International Classification of Diseases (ICD) 10 standard for an exhaustive diagnosis. The former adopts a more pragmatic, yet broader, standard for diagnosing periodontitis. The latter continues to use concepts of diseases such as "chronic periodontitis", which have been replaced by "periodontitis" in the latest disease classification from the "2017 World Workshop on the Classification of Periodontal and Peri-Implant Diseases and Conditions" in the periodontal field (Caton et al., 2018).

      Reference

      Caton JG, Armitage G, Berglundh T, Chapple ILC, Jepsen S, Kornman KS, Mealey BL, Papapanou PN, Sanz M, Tonetti MS. 2018. A new classification scheme for periodontal and peri-implant diseases and conditions - Introduction and key changes from the 1999 classification. J Clin Periodontol 45 Suppl 20:S1–S8. doi:10.1111/jcpe.12935

      Liu Y, Lai H, Zhang R, Xia L, Liu L. 2023. Causal relationship between gastro-esophageal reflux disease and risk of lung cancer: insights from multivariable Mendelian randomization and mediation analysis. Int J Epidemiol 52:1435–1447. doi:10.1093/ije/dyad090

      Yuan S, Xu F, Li X, Chen J, Zheng J, Mantzoros CS, Larsson SC. 2023. Plasma proteins and onset of type 2 diabetes and diabetic complications: Proteome-wide Mendelian randomization and colocalization analyses. Cell Rep Med 4:101174. doi:10.1016/j.xcrm.2023.101174

      Result

      Line 224: "The observed significant results remained robust after removing pleiotropic SNPs." It is not clear what the authors mean by "remain robust".

      Line 229-231: "The causal relationship between neutrophils and periodontitis remained stable with no evidence of heterogeneity or pleiotropy." It is also not clear what the authors mean by "remain stable". How does the author get to the conclusion that there is no evidence of heterogeneity or pleiotropy?

      Figure S5: Please offer a brief explanation on how to investigate outlier or influential changes using scatter plots and Cochran's Q test and Cook's distance.

      Line 224: We apologize for the confusion caused by the term "remain robust". In the revised manuscript, we clarified this by stating, "The observed significant results are considered 'robust' if the effect of sensitivity analyses was identical to that of Inverse Variance Weighted (IVW) method, yielding a P-value less than 0.05." (Page 6)

      Line 229-231: We used the terms "remain stable" and "remain robust" interchangeably to express the same idea. To clarify, we have now unified the expression in the revised manuscript. As for the conclusion of "no evidence of heterogeneity or pleiotropy", it is derived from the results of Cochran's Q and Egger's intercept tests (P<0.05). We have added this explanation to the revised manuscript for better clarity.

      Figure S5: In the revised manuscript and Table, we have provided a succinct explanation regarding the investigation of outliers or influential changes as follows: " A genetic variant was defined as either an outlier or an influential variant if it possessed a q-value exceeding 10 or if its Cook's distance surpassed the median of the corresponding F-distribution. " (Page 7)

      We have made all the necessary changes in the revised manuscript based on your comments. We hope our responses and revisions adequately address your concerns.

      Discussion

      I have consulted several pieces of literature to ensure a thorough explanation, which may be helpful for your writing.

      (1) Hajishengallis G, Li X, Divaris K, Chavakis T. Maladaptive trained immunity and clonal hematopoiesis as potential mechanistic links between periodontitis and inflammatory comorbidities. Periodontol 2000. 2022;89(1):215-230. doi:10.1111/prd.12421

      (2) Hajishengallis G, Chavakis T. Mechanisms and Therapeutic Modulation of Neutrophil-Mediated Inflammation. J Dent Res. 2022;101(13):1563-1571. doi:10.1177/00220345221107602

      We appreciate your valuable feedback and the additional references you provided to enrich our manuscript. Upon receiving your comments, we have meticulously reviewed and incorporated the suggested literature into our revised manuscript. These references have furnished insightful information, which has been assimilated into the revised manuscript (Page 12) to enhance the explanation of the mechanisms of neutrophil-mediated inflammation and the potential association between periodontitis and inflammatory comorbidities.

      "The quantity and functionality of neutrophils both act as critical indicators of inflammation severity. The reduction in neutrophil count and inflammatory mediators, observed after successful periodontitis treatment, suggests a reduction in systemic inflammation (Hajishengallis , 2022)." (Page 12)

      "Trained myeloid cells have the potential to amplify the functionality of neutrophils, thereby fortifying the body's defense against subsequent infections. Nevertheless, within the framework of chronic inflammation, these cells could potentially intensify tissue damage (Hajishengallis and Chavakis, 2022)." (Page 12)

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This manuscript reveals important insights into the role of ipsilateral descending pathways in locomotion, especially following unilateral spinal cord injury. The study provides solid evidence that this method improves the injured side's ability to support weight, and as such the findings may lead to new treatments for stroke, spinal cord injuries, or unilateral cerebral injuries. However, the methods and results need to be better detailed, and some of the statistical analysis enhanced.

      Thank you for your assessment. We incorporated various text improvements in the final version of the manuscript to address the weaknesses you have pointed out. The specific improvements are outlined below.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This manuscript provides potentially important new information about ipsilateral cortical impact on locomotion. A number of issues need to be addressed.

      Strengths:

      The primary appeal and contribution of this manuscript are that it provides a range of different measures of ipsilateral cortical impact on locomotion in the setting of impaired contralateral control. While the pathways and mechanisms underlying these various measures are not fully defined and their functional impacts remain uncertain, they comprise a rich body of results that can inform and guide future efforts to understand cortical control of locomotion and to develop more effective rehabilitation protocols.

      Weaknesses:

      (1) The authors state that they used a cortical stimulation location that produced the largest ankle flexion response (lines 102-104). Did other stimulation locations always produce similar, but smaller responses (aside from the two rats that showed ipsilateral neuromodulation)? Was there any site-specific difference in response to stimulation location?

      We derived motor maps in each rat, akin to the representation depicted in Fig 6. In each rat, alternative cortical sites did, indeed, produce distal or proximal contralateral leg flexion responses. Distal responses were more likely to be evoked in the rostral portion of the array, similarly to proximal responses early after injury. This distribution in responses across different cortical sites is reported in this study (Fig. 6) and is consistent with our prior work. The Results section has been revised to provide additional clarification of the passage you indicated and context for the data presented in Figure 6:

      On page 4, we have clarified: “Stimulation through these channels produced a strong whole-leg flexion movement, with an evident distal component. From visual inspection, all responding electrodes in the array produced contralateral leg flexion, although with different strength of contraction for a fixed stimulation intensity (100μA). Moreover, some sites did not present a distal movement component, failing in eliciting ankle flexion and resulting in a generally weaker proximal flexion.”

      On page 12, we have further noted: “By visually inspecting the responses elicited by stimulation delivered through each of the array electrodes, we categorized movements as proximal or distal. This classification was based on whether the ankle participated in the evoked response or if the movement was restricted to the proximal hindlimb. Each leg was scored independently.”

      (2) Figure 2: There does not appear to be a strong relationship between the percentage of spared tissue and the ladder score. For example, the animal with the mild injury (based on its ladder score) in the lower left corner of Figure 2A has less than 50% spared tissue, which is less spared tissue than in any animal other than the two severe injuries with the most tissue loss. Is it possible that the ladder test does not capture the deficits produced by this spinal cord injury? Have the authors looked for a region of the spinal cord that correlates better with the deficits that the ladder test produces? The extent of damage to the region at the base of the dorsal column containing the corticospinal tract would be an appropriate target area to quantify and compare with functional measures.

      In Fig. S6 of our 2021 publication "Bonizzato and Martinez, Science Translational Medicine", we investigated the predictive value of tissue sparing in specific sub-regions of the spinal cord for ladder performance. Among others, we examined the correlation between the accuracy of left leg ladder performance in the acute state and the preservation of the corticospinal tract (CST). Our results indicated that dorsal CST sparing serves as a mild predictor for ladder deficits, confirming the results obtained in this study.

      (3) Lines 219-221: The authors state that "phase-coherent stimulation reinstated the function of this muscle, leading to increased burst duration (90{plus minus}18% of the deficit, p=0.004, t-test, Fig. 4B) and total activation (56{plus minus}13% of the deficit, p=0.014, t-test, Fig. 3B). This way of expressing the data is unclear. For example, the previous sentence states that after SCI, burst duration decreased by 72%. Does this mean that the burst duration after stimulation was 90% higher than the -72% level seen with SCI alone, i.e., 90% + -72% = +18%? Or does it mean that the stimulation recovered 90% of the portion of the burst duration that had been lost after SCI, i.e., -72% * (100%-90%)= -7%? The data in Figure 4 suggests the latter. It would be clearer to express both these SCI alone and SCI plus stimulation results in the text as a percent of the pre-SCI results, as done in Figure 4.

      Your assessment is correct; we intended to report that the stimulation recovered 90% of the portion of the burst duration that had been lost after SCI. This point has been clarified (see page 9):

      “…leading to increased burst duration (recovered 90±18% of the lost burst duration, p=0.004, t-test, Fig. 4B) and total activation (recovered 56±13% of the total activation, p=0.014, t-test, Fig. 3B)”

      (4) Lines 227-229: The authors claim that the phase-dependent stimulation effects in SCI rats are immediate, but they don't say how long it takes for these effects to be expressed. Are these effects evident in the response to the first stimulus train, or does it take seconds or minutes for the effects to be expressed? After the initial expression of these effects, are there any gradual changes in the responses over time, e.g., habituation or potentiation?

      The effects are immediately expressed at the very first occurrence of stimulation. We never tested a rat completely naïve to stimuli, as each treadmill session involves prior cortical mapping to identify a suitable active site for involvement in locomotor experiments. Yet, as demonstrated in Supplementary Video 1 accompanying our 2021 publication on contralateral effects of cortical stimulation, "Bonizzato and Martinez, Science Translational Medicine," the impact of phase-dependent cortical stimulation on movement modulation is instantaneous and ceases promptly upon discontinuation of the stimulation. We did not quantify potential gradual changes in responsiveness over time, but we cannot exclude that for long stimulation sessions (e.g., 30 min or more), stimulus amplitude may need to be slightly increased over time to compensate habituation.

      (5) Awake motor maps (lines 250-277): The analysis of the motor maps appears to be based on measurements of the percentage of channels in which a response can be detected. This analytic approach seems incomplete in that it only assesses the spatial aspect of the cortical drive to the musculature. One channel could have a just-above-threshold response, while another could have a large response; in either case, the two channels would be treated as the same positive result. An additional analysis that takes response intensity into account would add further insight into the data, and might even correlate with the measures of functional recovery. Also, a single stimulation intensity was used; the results may have been different at different stimulus intensities.

      We confirm that maps of cortical stimulation responsiveness may vary at different stimulus amplitudes. To establish an objective metric of excitability, we identified 100µA as a reliable stimulation amplitude across rats and used this value to build the ipsilateral motor representation results in Figure 6. This choice allows direct comparison with Figure 6 of our 2021 article, related to contralateral motor representation. The comparison reveals a lack of correlation with functional recovery metrics in the ipsilateral case, in contrast to the successful correlation achieved in the contralateral case.

      Regarding the incorporation of stimulation amplitudes into the analysis, as detailed in the Method section (lines 770-771), we systematically tested various stimulation amplitudes to determine the minimal threshold required for eliciting a muscle twitch, identified as the threshold value. This process was conducted for each electrode site.

      Upon reviewing these data, we considered the possibility of presenting an additional assessment of ipsilateral cortical motor representation based on stimulation thresholds. However, the representation depicted in the figure did not differ significantly from the data presented in Figure 6A. Furthermore, this representation introduced an additional weakness, as it was unclear how to represent the absence of a response in the threshold scale. We chose to arbitrarily designate it as zero on the inverse logarithmic scale, where, for reference, 100 µA is positioned at 0.2 and 50 µA at 0.5.

      In conclusion, we believe that the conclusions drawn from this analysis align substantially with those in the text. The addition of the threshold analysis, in our assessment, would not contribute significantly to improving the manuscript.

      Author response image 1.

      Threshold analysis

      Author response image 2.

      Occurrence probability analysis, for comparison.

      (6) Lines 858-860: The authors state that "All tests were one-sided because all hypotheses were strictly defined in the direction of motor improvement." By using the one-sided test, the authors are using a lower standard for assessing statistical significance that the overwhelming majority of studies in this field use. More importantly, ipsilateral stimulation of particular kinds or particular sites might conceivably impair function, and that is ignored if the analysis is confined to detecting improvement. Thus, a two-sided analysis or comparable method should be used. This appropriate change would not greatly modify the authors' current conclusions about improvements.

      Our original hypothesis, drawn from previous studies involving cortical stimulation in rats and cats, as well as other neurostimulation research for movement restoration, posited a favorable impact of neurostimulation on movement. Consistent with this hypothesis, we designed our experiments with a focus on enhancing movement, emphasizing a strict direction of improvement.

      It's important to note that a one-sided test is the appropriate match for a one-sided hypothesis, and it is not a lower standard in statistics. Each experiment we conducted was constructed around a strictly one-sided hypothesis: the inclusion of an extensor-inducing stimulus would enhance extension, and the inclusion of a flexion-inducing stimulus would enhance flexion. This rationale guided our choice of the appropriate statistical test.

      We acknowledge your concern regarding the potential for ipsilateral stimulation to have negative effects on locomotion, which might not be captured when designing experiments based on one-sided hypotheses. That is, when hypothesizing that an extensor stimulus would enhance extension (a one-sided hypothesis) in a functional task, and finding an opposite result (inhibition), statistical rigor would impose that we cannot present that result as significant. This concern is valid, and we explicitly mentioned our design choice it in the method section, Quantification and statistical analyses:

      “All tests were one-sided, as our hypotheses were strictly defined to predict motor improvement. Specifically, we hypothesized that delivering an extension-inducing stimulus would enhance leg extension, and delivering a flexion-inducing stimulus would enhance leg flexion. Consequently, any potentially statistically significant result in the opposite direction (e.g., inhibition) would not be considered. However, no such occurrences were observed.”

      As a final note, even if such opposite observations were made, they could serve as the basis for triggering an ad-hoc follow-up study.

      Reviewer #1 also provided several detailed suggestions in the section “Recommendations for the authors”. We estimated that each of them was beneficial for the correctness or for the readability of the text, and thus all were incorporated into the final version.

      Reviewer #2 (Public Review):

      Summary:

      The authors' long-term goals are to understand the utility of precisely phased cortex stimulation regimes on recovery of function after spinal cord injury (SCI). In prior work, the authors explored the effects of contralesion cortex stimulation. Here, they explore ipsilesion cortex stimulation in which the corticospinal fibers that cross at the pyramidal decussation are spared. The authors explore the effects of such stimulation in intact rats and rats with a hemisection lesion at the thoracic level ipsilateral to the stimulated cortex. The appropriately phased microstimulation enhances contralateral flexion and ipsilateral extension, presumably through lumbar spinal cord crossed-extension interneuron systems. This microstimulation improves weight bearing in the ipsilesion hindlimb soon after injury, before any normal recovery of function would be seen. The contralateral homologous cortex can be lesioned in intact rats without impacting the microstimulation effect on flexion and extension during gait. In two rats ipsilateral flexion responses are noted, but these are not clearly demonstrated to be independent of the contralateral homologous cortex remaining intact.

      Strengths:

      This paper adds to prior data on cortical microstimulation by the laboratory in interesting ways. First, the strong effects of the spared crossed fibers from the ipsi-lesional cortex in parts of the ipsi-lesion leg's step cycle and weight support function are solidly demonstrated. This raises the interesting possibility that stimulating the contra-lesion cortex as reported previously may execute some of its effects through callosal coordination with the ipsi-lesion cortex tested here. This is not fully discussed by the authors but may represent a significant aspect of these data. The authors demonstrate solidly that ablation of the contra-lesional cortex does not impede the effects reported here. I believe this has not been shown for the contra-lesional cortex microstimulation effects reported earlier, but I may be wrong. Effects and neuroprosthetic control of these effects are explored well in the ipsi-lesion cortex tests here.

      In the revised version of the manuscript, we incorporated various text improvements to address the points you have highlighted in your review. Additionally, we have integrated the suggested discussion topic on callosal coordination related to contralateral cortical stimulation. The discussion section now incorporates:

      “Since bi-cortical interactions in sculpting descending commands are known (Brus-Ramer et al., 2009), and in light of the changes we report in ipsilesional motor cortex excitability, the role of the ipsilateral cortex in mediating or supporting functional descending commands from the contralateral cortex, particularly the immediate increase in flexion of the affected hindlimb and long-term recovery of functional control (Bonizzato & Martinez, 2021), could be further explored.”

      The localization of the specific channels closest to the interhemispheric fissure (Fig. 7D) may suggest the involvement of transcallosal interactions in mediating the transmission of the cortical command generated in the ipsilateral motor cortex (Brus-Ramer, Carmel, & Martin, 2009). “While ablation experiments (Fig. 8) refute this hypothesis for ipsilateral extension control, they do not conclusively determine whether a different efferent pathway is involved in ipsilateral flexion control in this specific case."

      Weaknesses:

      Some data is based on very few rats. For example (N=2) for ipsilateral flexion effects of microstimulation. N=3 for homologous cortex ablation, and only ipsi extension is tested it seems. There is no explicit demonstration that the ipsilateral flexion effects in only 2 rats reported can survive the contra-lateral cortex ablation.

      We agree with this assessment. The ipsilateral flexion representation is here reported as a rare but consistent phenomenon, which we believe to have robustly described with Figure 7 experiments. We underlined in the text that the ablation experiment did not conclude on the unilateral-cortical nature of ipsilateral flexion effects, by replacing the sentence with the following:

      “While ablation experiments (Fig. 8) refute this hypothesis for ipsilateral extension control, they do not conclusively determine whether a different efferent pathway is involved in ipsilateral flexion control in this specific case."

      Some improvements in clarity and precision of descriptions are needed, as well as fuller definitions of terms and algorithms.

      Likely Impacts: This data adds in significant ways to prior work by the authors, and an understanding of how phased stimulation in cortical neuroprosthetics may aid in recovery of function after SCI, especially if a few ambiguities in writing and interpretation are fully resolved.

      The manuscript text has been revised in its final version, and we sought to eliminate all ambiguity in writing and data interpretation.

      In the section “Recommendations for the authors” Reviewer #2 also suggested to better define multiple terms throughout the manuscript. A clarification was added for each.

      The Reviewer pointed out that we might have overlooked a correlation between locomotor recovery and motor maps increase in Figure 6. We re-approached this evaluation and found that the reviewer is correct. We were led to think that there was no correlation by “horizontally” looking at whether motor map size across rats would predict locomotor scores (as it did in the case of contralateral cortex mapping, Bonizzato and Martinez, 2021). However we now found a strong correlation between changes that happen over time for each rat and locomotor recovery, a result that was only hinted with no appropriate quantification in the previous version of the manuscript. We have now reformulated the results of Figure 6 on page 12, to include this result, and we would like to thank the reviewer for having noticed this opportunity.

      Finally, we have expanded the discussion to include the following points:

      The possibility that hemi-cortex coordination of contralesional microstimulation inputs may explain the Sci Transl Med results for contralesional cortex ICMS, which warrants further investigation.

      The recognition that the ablation experiments do not provide conclusive evidence regarding ipsilateral flexion control and whether an alternative efferent pathway might be involved in this specific case.

      Reviewer #3 (Public Review):

      Summary:

      This article aims to investigate the impact of neuroprosthesis (intracortical microstimulation) implanted unilaterally on the lesion side in the context of locomotor recovery following unilateral thoracic spinal cord injury.

      Strength:

      The study reveals that stimulating the left motor cortex, on the same side as the lesion, not only activates the expected right (contralateral) muscle activity but also influences unexpected muscle activity on the left (ipsilateral) side. These muscle activities resulted in a substantial enhancement in lift during the swing phase of the contralateral limb and improved trunk-limb support for the ipsilateral limb. They used different experimental and stimulation conditions to show the ipsilateral limb control evoked by the stimulation. This outcome holds significance, shedding light on the engagement of the "contralateral projecting" corticospinal tract in activating not only the contralateral but also the ipsilateral spinal network.

      The experimental design and findings align with the investigation of the stimulation effect of contralateral projecting corticospinal tracts. They carefully examined the recovery of ipsilateral limb control with motor maps. They also tested the effective sites of cortical stimulation. The study successfully demonstrates the impact of electrical stimulation on the contralateral projecting neurons on ipsilateral limb control during locomotion, as well as identifying important stimulation spots for such an effect. These results contribute to our understanding of how these neurons influence bilateral spinal circuitry. The study's findings contribute valuable insights to the broader neuroscience and rehabilitation communities.

      Thank you for your assessment of this manuscript. The final version of the manuscript incoporates your suggestions for improving term clarity and we enhanced the discussion on the mechanisms of spinal network engagement, as outlined below.

      Weakness:

      The term "ipsilateral" lacks a clear definition in the title, abstract, introduction, and discussion, potentially causing confusion for the reader.

      [and later] However, in my opinion, readers can easily link the ipsilateral cortical network to the ipsilateral-projecting corticospinal tract, which is less likely to play a role in ipsilateral limb control in this study since this tract is disrupted by the thoracic spinal injury.

      In order to mitigate the risk of having readers linking the effects of ipsilateral cortical stimulation with ipsilateral-projecting corticospinal tract, we specified:

      In the abstract, we precise that our goal was: “to investigate the functional role of the ipsilateral motor cortex in rat movement through spared contralesional pathways.”

      In the introduction: “In most cases, this lesion also disrupts all spinal tracts descending on the same side as the cortex under investigation at the thoracic level, meaning that the transmission of cortical commands to the ipsilesional hindlimb must depend on crossed descending tracts (Fig. S1).”

      The unexpected ipsilateral (left) muscle activity is most likely due to the left corticospinal neurons recruiting not only the right spinal network but also the left spinal network. This is probably due to the joint efforts of the neuroprosthesis and activation of spinal motor networks which work bilaterally at the spinal level.

      We agree with your assessment and the discussion section now emphasizes the effects of supraspinal drive onto spinal circuits.

      In the section “Recommendations for the authors” Reviewer #3 suggested to provide an early reminder to the reader that the focus is on exploring the control of the ipsilateral limb through the corticospinal tract of the same side, projecting contralaterally. We did so in the abstract and introduction, as presented above.

      The reviewer also suggested that the discussion could be shorter. While we recognize it covers diverse subjects that may appeal to different readers, we believe omitting some sections could limit its overall scope. The manuscript underwent three revisions and a thorough dialogue with reviewers from diverse backgrounds, and we are hesitant to undo some of these improvements.

      Moreover, the section falls short of fully exploring the involvement of contralateral projecting corticospinal neurons in spinal networks for diverse motor behaviors. It could potentially delve into aspects like the potential impact of corticospinal inputs on gating the cross-extensor reflex loop and elucidating the mechanisms underlying the recruitment of the ipsilateral spinal network for generating ipsilateral limb movements. Is it a direct control on motor neurons or via existing spinal circuits?

      The discussion section now includes the potential spinal circuits through which corticospinal neurons may affect motor control and reflexes.

      Reviewer #3 also provided several detailed suggestions in the sub-section “Minor points”. We estimated that all of them were beneficial for the correctness or for the readability of the text, and thus were incorporated into the final version. Some of the questions raised were answered directly in the text (defining “% of chronic map” and rephrasing the original Line 479). We would like to answer here below two remaining questions:

      Fig. 3C I wonder what is the average latency between stimulation onset and onset of right ankle flexor activity. Is the latency fixed, or variable (which probably indicates that the Cortical activation signal is integrated with spinal CPG activity.)

      ICMS trains, unfortunately, do not allow for precise dissection of transmission timing. Single pulses at 100 µA are insufficient to generate motoneuron responses and require multiple pulses to build up cortical transmission. Alstermark et al. (Journal of Neurophysiology, 2004) used two to four stimuli with higher amplitudes to investigate forelimb transmission timing. In our 2021 Science Translational Medicine paper, we employed single pulses at 1 mA to establish transmission delays from the contralateral cortex to the ankle flexor. However, the circuits recruited at 1 mA are not directly comparable to those activated by shorter trains.

      In this study, we used cortical trains of approximately 14 pulses, typical of ICMS protocols. Each pulse could potentially be the first to generate a response volley in the ankle flexor, with delays measured at 30 to 60 ms from ICMS train onset. While we believe that cortical commands are necessarily integrated with spinal CPG activity—as indicated in Figures 1B and 3D, where timing is crucial and descending commands can be gated out if delivered off-phase—the variability in latency that we recorded could be attributed to any of the following factors: cortical activation build-up, integration within reticular relay networks, or CPG integration.

      Fig. 4A. Why is the activity of under contralateral ankle flexor intact condition is later than the stimulation condition?

      We timed the stimulation to coincide with the contralateral leg lift and did not adjust its onset relative to spontaneous walking in SCI rats. Although stimulation could induce leg lift, as shown in Fig. 4A, SCI rats exhibited a slightly earlier and stronger activation of the right (contralateral) ankle flexor muscle even during spontaneous walking. This phenomenon is attributed to the deficits observed on the left side. The stronger right leg bears the body weight, as illustrated in Fig. 3, and thus, during body advancement, the right leg is engaged sooner and more rapidly (with a shorter swing phase) to provide support (right foot forward).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      (1) Although there are many citations acknowledging relevant previous work, there often isn't a very granular attribution of individual previous findings to their sources. In the results section, it's sometimes ambiguous when the paper is recapping established background and when it is breaking new ground. For example, around equation 8 in the results (sv = r - rho*t), it would be good to refer to previous places where versions of this equation have been presented. Offhand, McNamara 1982 (Theoretical Population Biology) is one early instance and Fawcett et al. 2012 (Behavioural Processes) is a later one. Line 922 of the discussion seems to imply this formulation is novel here.

      We would like to clarify that original manuscript equation 8, , as we derive, is not new, as it is similarly expressed in prior foundational work by McNamara (1982), and we thank the reviewer for drawing our attention to the extension of this form by Fawcett, McNamara, Houston (2012).

      We now so properly acknowledge this foundational work and extension in the results section…

      “This global reward-rate equivalent immediate reward (see Figure 4) is the subjective value of a pursuit, svPursuit (or simply, sv, when the referenced pursuit can be inferred), as similarly expressed in prior foundational work (McNamara 1982), and subsequent extensions (see (Fawcett, McNamara, Houston (2012)).”

      …and in the Discussion section at the location referenced by the reviewer:

      “From it, we re-expressed the pursuit’s worth in terms of its global reward rate-equivalent immediate reward, i.e., its ‘subjective value’, reprising McNamara’s foundational formulation (McNamara 1982).”

      (2) The choice environments that are considered in detail in the paper are very simple. The simplicity facilitates concrete examples and visualizations, but it would be worth further consideration of whether and how the conclusions generalize to more complex environments. The paper considers "forgo" scenario in which the agent can choose between sequences of pursuits like A-B-A-B (engaging with option B at all opportunities, which are interleaved with a default pursuit A) and A-A-A-A (forgoing option B). It considers "choice" scenarios where the agent can choose between sequences like A-B-A-B and A-C-A-C (where B and C are larger-later and smaller-sooner rewards, either of which can be interleaved with the default pursuit). Several forms of additional complexity would be valuable to consider. [A] One would be a greater number of unique pursuits, not repeated identically in a predictable sequence, akin to a prey-selection paradigm. It seems to me this would cause t_out and r_out (the time and reward outside of the focal prospect) to be policy-dependent, making the 'apportionment cost' more challenging to ascertain. Another relevant form of complexity would be if there were [B] variance or uncertainty in reward magnitudes or temporal durations or if [C] the agent had the ability to discontinue a pursuit such as in patch-departure scenarios.

      A) We would like to note that the section “Deriving Optimal Policy from Forgo Decision-making worlds”, addresses the reviewer’s scenario of n-number of pursuits”, each occurring at their own frequency, as in prey selection, not repeating identically in a predictable sequence. Within our subsection “Parceling the world…”, we introduce the concept of dividing a world (such as that) into the considered pursuit type, and everything outside of it. ‘Outside’ would include any number of other pursuits currently part of any policy, as the reviewer intuits, thus making t<sup>out</sup> and r<sup>out</sup> policy dependent. Nonetheless, a process of excluding (forgoing) pursuits by comparing the ‘in’ to the ‘out’ reward rate (section “Reward-rate optimizing forgo policy…”) or its equivalent sv (section “The forgo decision can also be made from subjective value), would iteratively lead to the global reward rate maximizing policy. This manner of parceling into ‘in’ and ‘out’ thus simplifies visualization of what can be complex worlds. Simpler cases that resemble common experimental designs are given in the manuscript to enhance intuition.

      We thank the reviewer for this keen suggestion. We now include example figures (Supplemental 1 & 2) for multi-pursuit worlds which have the same (Supplemental 1) and different pursuit frequencies (Supplemental 2), which illustrate how this evaluation leads to reward-rate optimization. This addition demonstrates how an iterative policy would lead to reward rate maximization and emphasizes how parcellating a world into ‘in’ and ‘out’ of the pursuit type applies and is a useful device for understanding the worth of any given pursuit in more complex worlds. The policy achieving the greatest global reward rate can be realized through an iterative process where pursuits with lower reward rates than the reward rate obtained from everything other than the considered pursuit type are sequentially removed from the policy.

      B) We would also emphasize that the formulation here contends with variance or uncertainty in the reward magnitudes or temporal durations. The ‘in’ pursuit is the average reward and the average time of the considered pursuit type, as is the ‘out’ the average reward and average time outside of the considered pursuit type.

      C) In this work, we consider the worth of initiating one-or-another pursuit (from having completed a prior one), and not the issue of continuing within a pursuit (having already engaged it), as in patch/give-up. Handling worlds in which the agent may depart from within a pursuit, which is to say ‘give-up’ (as in patch foraging), is outside the scope of this work.

      (3) I had a hard time arriving at a solid conceptual understanding of the 'apportionment cost' around Figure 5. I understand the arithmetic, but it would help if it were possible to formulate a more succinct verbal description of what makes the apportionment cost a useful and meaningful quality to focus on.

      We thank the reviewer for pressing for a succinct and intuitive verbal description.

      We added the following succinct verbal description of apportionment cost… “Apportionment cost is the difference in reward that can be expected, on average, between a policy of taking versus a policy of not taking the considered pursuit, over a time equal to its duration.” This definition appears in new paragraphs (as below) describing apportionment cost in the results section “Time’s cost: opportunity & apportionment costs determine a pursuit’s subjective value”, and is accompanied by equations for apportionment cost, and a figure giving its geometric depiction (Figure 5). We also expanded original figure 5 and its legend (so as to illustrate the apportionment scaling factor and the apportionment cost), and its accompanying main text, to further illustrate and clarify apportionment cost, and its relationship to opportunity cost, and time’s cost.

      “What, then, is the amount of reward by which the opportunity cost-subtracted reward is scaled down to equal the sv of the pursuit? This amount is the apportionment cost of time. The apportionment cost of time (height of the brown vertical bar, Figure 5F) is the global reward rate after taking into account the opportunity cost (slope of the magenta-gold dashed line in Figure 5F) times the time of the considered pursuit. Equally, the difference between the inside and outside reward rates, times the time of the pursuit, is the apportionment cost when scaled by the pursuit’s weight, i.e., the fraction that the considered pursuit is to the total time to traverse the world (Equation 9, right hand side). From the perspective of decision-making policies, apportionment cost is the difference in reward that can be expected, on average, between a policy of taking versus a policy of not taking the considered pursuit, over a time equal to its duration (Equation 9 center, Figure 5F).

      Equation 9. Apportionment Cost.

      While this difference is the apportionment cost of time, the opportunity cost of time is the amount that would be expected from a policy of not taking the considered pursuit over a time equal to the considered pursuit’s duration. Together, they sum to Time’s Cost (Figure 5G). Expressing a pursuit’s worth in terms of the global reward rate obtained under a policy of accepting the pursuit type (Figure 5 left column), or from the perspective of the outside reward and time (Figure 5 right column), are equivalent. However, the latter expresses sv in terms that are independent of one another, conveys the constituents giving rise to global reward rate, and provides the added insight that time’s cost comprises an apportionment as well as an opportunity cost.”

      The above definition of apportionment cost adds to other stated relationships of apportionment cost found throughout the paper (original lines 434,435,447,450).

      I think Figure 6C relates to this, but I had difficulty relating the axis labels to the points, lines, and patterned regions in the plot.

      We thank the reviewer for pointing out that this figure can be made to be more easily understood.

      We have done so by breaking its key features over a greater number of plots so that no single panel is overloaded. We have also changed text in the legend to clarify how apportionment and opportunity costs add to constitute time’s cost, and also correspondingly in the main text.

      I also was a bit confused by how the mathematical formulation was presented. As I understood it, the apportionment cost essentially involves scaling the rest of the SV expression by t<sup>out</sup>/(t<sup>in</sup> + t<sup>out</sup>).

      The reviewer’s understanding is correct: the amount of reward of the pursuit that remains after subtracting the opportunity cost, when so scaled, is equivalent to the subjective value of that pursuit. The amount by which that scaling decreases the rest of the SV expression is equal to the apportionment cost of time.

      The way this scaling factor is written in Figure 5C, as 1/(1 + (1/t<sup>out</sup>) t<sup>in</sup>), seems less clear than it could be.

      To be sure, we present the formula in original Figure 5C in this manner to emphasize the opportunity cost subtraction as separable from the apportionment rescaling, expressing the opportunity cost subtraction and the apportionment scaling component of the equation as their own terms in parentheses.

      But we understand the reviewer to be referring to the manner by which we chose to express the scaling term. We presented it in this way in the original manuscript, (rather than its more elegant form recognized by the reviewer) to make direct connection to temporal discounting literature. In this literature, discounting commonly takes the same mathematical form as our apportionment cost scaling, but whereas the steepness of discounting in this literature is controlled by a free fit parameter, k, we show how for a reward rate maximizing agent, the equivalent k term isn’t a free fit parameter, but rather is the reciprocal of the time spent outside the considered pursuit type.

      We take the reviewer’s advice to heart, and now first express subjective value in the format that emphasizes opportunity cost subtraction followed by an apportionment downscaling, identifying the apportionment scaling term, t<sup>out</sup>/(t<sup>out</sup> + t<sup>in</sup>), ie the outside weight. Figure 5 now shows the geometric representation of apportionment scaling and apportionment cost. Only subsequently in the discounting function section then do we now in the revised manuscript rearrange this subjective value expression to resemble the standard discounting function form.

      Also, the apportionment cost is described in the text as being subtracted from sv rather than as a multiplicative scaling factor.

      What we describe in the original text is how apportionment cost is a component of time’s cost, and how sv is the reward less time’s cost. It would be correct to say that apportionment cost and opportunity cost are subtracted from the pursuit’s reward to yield the subjective value of the pursuit. This is what we show in the original Figure 5D graphically. Original Figure 5 and accompanying formulas at its bottom show the equivalence of expressing sv in terms of subtracting time’s cost as calculated from the global reward rate under a policy of accepting the considered pursuit, or, of subtracting opportunity cost and then scaling the opportunity cost subtracted reward by the apportionment scaling term, thereby accounting for the apportionment cost of time.

      The revision of original figure 5, its figure legend, and accompanying text now make clear the meaning of apportionment cost, how it can be considered a subtraction from the reward of a pursuit, or, equivalently, how it can be thought of as the result of scaling down of opportunity cost subtracted reward.

      It could be written as a subtraction, by subtracting a second copy of the rest of the SV expression scaled by t_in/(t_in + t_out). But that shows the apportionment cost to depend on the opportunity cost, which is odd because the original motivation on line 404 was to resolve the lack of independence between terms in the SV expression.

      On line 404 of the original manuscript, we point out that the simple equation―which is a reprisal of McNamara’s insight―is problematic in that its terms on the RHS are not independent: the global reward rate is dependent on the considered pursuit’s reward (see Fig5B). The alternative expression for subjective value that we derive expresses sv in terms that are all independent of one another. We may have unintentionally obscured that fact by having already defined rho<sup>in</sup> as r<sup>in</sup>/ t<sup>in</sup> and rho<sup>out</sup> as r<sup>out</sup>/t<sup>out</sup> on lines 306 and 307.

      Therefore, in the revision, Ap 8 is expressed so to keep clear that it uses terms that are all independent of one another, and only subsequently express this formula with the simplifying substitution, rho<sup>out</sup>.

      That all said, we understand the reviewer’s point to be that the parenthetical terms relating the opportunity cost and the apportionment rescaling both contain within them the parameter t<sup>out</sup>, and in this way these concepts we put forward to understand the alternative equation are non-independent. That is correct, but it isn’t at odds with our objective to express SV in terms that are independent with one another (which we do). Our motivation in introducing these concepts is to provide insight and intuition into the cost of time (especially now with a clear and simple definition of apportionment cost stated). We go to lengths to demonstrate their relationship to each other.

      (4) In the analysis of discounting functions (line 664 and beyond), the paper doesn't say much about the fact that many discounting studies take specific measures to distinguish true time preferences from opportunity costs and reward-rate maximization.

      We understand the reviewer’s comment to connote that temporal decision-making worlds in which delay time does not preclude reward from outside the current pursuit is a means to distinguish time preference from the impact of opportunity cost. One contribution of this work is to demonstrate that, from a reward-rate maximization framework, an accounting of opportunity cost is not sufficient to understand apparent time preferences as distinguishable from reward-rate maximization. The apportionment cost of time must also be considered to have a full appreciation of the cost of time. For instance, let us consider a temporal decision-making world in which there is no reward received outside the considered pursuit. In such a world, there is no opportunity cost of time, so apparent temporal discounting functions would appear as if purely hyperbolic as a consequence of the apportionment cost of time alone. Time preference, as revealed experimentally by the choices made between a SS and a LL reward, then, seem confounding, as preference can reverse from a SS to a LL option as the displacement of those options (maintaining their difference in time) increases (Green, Fristoe, and Myerson 1994; Kirby and Herrnstein 1995). While this shift, the so-called “Delay effect”, could potentially arise as a consequence of some inherent time preference bias of an agent, we demonstrate that a reward-rate maximal agent exhibits hyperbolic discounting, and therefore it would also exhibit the Delay effect, even though it has no time preference.

      In the revision we now make reference to the Delay Effect (in abstract, results new section “The Delay Effect” with new figure 14, and in the discussion), which is taken as evidence of time preference in human and animal literature, and note explicitly how a reward-rate maximizing agent would also exhibit this behavior as a consequence of apparent hyperbolic discounting.

      In many of the human studies, delay time doesn't preclude other activities.

      Our framework is generalizable to worlds in which being in pursuit does not preclude an agent from receiving reward during that time at the outside reward rate. Original Ap 13 solves for such a condition, and shows that in this context, the opportunity cost of time drops out of the SV equation, leaving only the consequences of the apportionment cost of time. We made reference to this case on lines 1032-1034 of the original manuscript: “In this way, such hyperbolic discounting models [models that do not make an accounting of opportunity cost] are only appropriate in worlds with no “outside” reward, or, where being in a pursuit does not exclude the agent from receiving rewards at the rate that occurs outside of it (Ap. 13).”

      The note and reference is fleeting in the original work. We take the reviewer’s suggestion and now add paragraphs in the discussion on the difference between humans and animals in apparent discounting, making specific note of human studies in which delay time doesn’t preclude receiving outside reward while engaged in a pursuit. Relatedly, hyperbolic discounting is oft considered to be less steep in humans than in animals. As the reviewer points out, these assessments are frequently made under conditions in which being in a pursuit does not preclude receiving reward from outside the pursuit. When humans are tested under conditions in which outside rewards are precluded, they exhibit far steeper discounting. We now include citation to that observation (Jimura et al. 2009). We handle such conditions in original AP 13, and show how, in such worlds, the opportunity cost of time drops out of the equation. The consequence of this is that the apparent discounting function would become less steep (the agent would appear as if more patient), consistent with reports.

      “Relating to the treatment of opportunity cost, we also note that many investigations into temporal discounting do not make an explicit distinction between situations in which 1) subjects continue to receive the usual rewards from the environment during the delay to a chosen pursuit, and 2) situations in which during a chosen pursuit’s delay no other rewards or opportunities will occur (Kable & Glimcher, 2007; Kirby & Maraković, 1996; McClure, Laibson, Loewenstein, & Cohen, 2004). Commonly, human subjects are asked to answer questions about their preferences between options for amounts they will not actually earn after delays they will not actually have to wait, during which it is unclear whether they are really investing time away from other options or not (Rosati et al., 2007). In contrast, in most animal experiments, subjects actually receive reward after different delays during which they do not receive new options or rewards. By our formulation, when a pursuit does not exclude the agent from receiving rewards at the rate that occurs outside, the opportunity cost of time drops out of the subjective value equation (Ap 12).

      Equation 10. The value of initiating a pursuit when pursuit does not exclude receiving rewards at the outside rate (Ap 12)

      Therefore, the reward-rate maximizing discounting function in these worlds is functionally equivalent to the situation in which the outside reward rate is zero, and will―lacking an opportunity cost―be less steep. This rationalizes why human discounting functions are often reported to be longer (gentler) than animal discounting functions: they are typically tested in conditions that negate opportunity cost, whereas animals are typically tested in conditions that enforce opportunity costs. Indeed, when humans are made to wait for actually received reward, their observed discounting functions are much steeper (Jimura et al. 2009). “

      In animal studies, rate maximization can serve as a baseline against which to measure additional effects of temporal discounting. This is an important caveat to claims about discounting anomalies being rational under rate maximization (e.g., line 1024).

      We agree that the purpose of this reward-rate maximizing framework is to serve as a point of comparison in which effects of temporal intervals and rewards that define the environment can be analyzed to better understand the manner in which animals and humans deviate from this ideal behavior. Our interest in this work is in part motivated by a desire to have a deeper understanding of what “true” time preference means. Using the reward-rate maximizing framework here provides a means to speak about time preferences (ie biases) in terms of deviation from optimality. From this perspective, a reward-rate maximal agent doesn’t exhibit time preference: its actions are guided solely by reward-rate optimizing valuation. Therefore, one contribution of this work is to show that purported signs of time preference (hyperbolic discounting, magnitude, sign, and (now) delay effect) can be explained without invoking time preference. What errors from optimality that remain following an proper accounting of reward-rate maximizing behavior should then, and only then, be considered from the lens of time preference (bias).

      (5) The paper doesn't feature any very concrete engagement with empirical data sets. This is ok for a theoretical paper, but some of the characterizations of empirical results that the model aims to match seem oversimplified. An example is the contention that real decision-makers are optimal in accept/reject decisions (line 816 and elsewhere). This isn't always true; sometimes there is evidence of overharvesting, for example.

      We would like to note that the scope of this paper is limited to examining the value of initiating a pursuit, rather than the value of continuing within a pursuit. The issue of continuing within a pursuit constitutes a third fundamental topology, which could be called give-up or patch-foraging, and is complex and warrants its own paper. In Give-up topologies, which are distinct from Forgo, and Choice topologies, the reviewer is correct in pointing out that the preponderance of evidence demonstrates that animals and humans are as if overpatient, adopting a policy of investing too much time within a pursuit, than is warranted_._ In Forgo instances, however, the evidence supports near optimality.

      (6) Related to the point above, it would be helpful to discuss more concretely how some of this paper's theoretical proposals could be empirically evaluated in the future. Regarding the magnitude and sign effects of discounting, there is not a very thorough overview of the several other explanations that have been proposed in the literature. It would be helpful to engage more deeply with previous proposals and consider how the present hypothesis might make unique predictions and could be evaluated against them.

      We appreciate the reviewer’s point that there are many existing explanations for these various ‘anomalous’ effects. We hold that the point of this work is to demonstrate that these effects are consistent with a reward-rate maximizing framework so do not require additional assumptions, like separate processes for small and large rewards, or the inclusion of a utility function.

      Nonetheless, there is a diversity of explanations for the sign and magnitude effect, and, (now with its explicit inclusion in the revision) the delay effect. Therefore, we now also include reference to additional work which proffers alternative explanations for the sign and magnitude effects, (as reviewed by (Kalenscher and Pennartz 2008; Frederick et al. 2002)), as well as a scalar timing account of non-stationary time preference (Gibbon, 1977).

      With respect to making predictions, this framework makes the following in regards to the magnitude, sign, and (now in the revision) delay effect: in Discussion, Magnitude effect subsection: “The Magnitude Effect should be observed, experimentally, to diminish when 1) increasing the outside time while holding the outside reward constant, (thus decreasing the outside reward rate), or when 2) decreasing the outside reward while holding the outside time constant (thus decreasing the outside reward rate). However, 3) the Magnitude Effect would exaggerate as the outside time increased while holding the outside reward rate constant.”, in Sign effect subsection: “…we then also predict that the size of the Sign effect would diminish as the outside reward rate decreases (and as the outside time increases), and in fact would invert should the outside reward rate turn negative (become net punishing), such that punishments would appear to discount more steeply than rewards.” Delay effect subsection: “...a sign of irrationality is that a preference reversal occurs at delays greater than what a reward-rate-maximizing agent would exhibit.”

      A similar point applies to the 'malapportionment hypothesis' although in this case there is a very helpful section on comparisons to prior models (line 1163). The idea being proposed here seems to have a lot in common conceptually with Blanchard et al. 2013, so it would be worth saying more about how data could be used to test or reconcile these proposals.

      We thank the reviewer for holding that the section of model comparisons to be very helpful. We believe the text previously dedicated to this issue to be sufficient in this regard. We have, however, adding substantively to the Malapportionment Hypothesis section (Discussion) and its accompanying figure, to make explicit a number of predictions from the Malapportionment hypothesis as it relates to Hyperbolic discounting, the Delay Effect, and the Sign and Magnitude Effects.

      Reviewer #1 Recommendations

      (1) As a general note about the figures, it would be helpful to specify, either graphically or in the caption, what fixed values of reward sizes and time intervals are being assumed for each illustration.

      Thank you for the suggestion. We attempted to keep graphs as uncluttered as possible, but agree that for original figures 4,5,16, and 17, which didn’t have numbered axes, that we should provide the amounts in the captions in the revised figures (4,5, and now 17,18). These figures did not have numerics as their shapes and display are to illustrate the form of the relationship between vectors, being general to the values they may take.

      We now include in the captions for these figures the parameter amounts used.

      (2) Should Equation 2 have t in the denominator instead of r?

      Indeed. We thank the reviewer for catching this typographical error.

      We have corrected it in the revision.

      (3) General recommendation:

      My view is that in order for the paper's eLife assessment to improve, it would be necessary to resolve points 1 through 4 listed under "weaknesses" in my public review, which pertain to clarity and acknowledgement of prior work. I think a lot hinges on whether the authors can respond to point #3 by making a more compelling case for the usefulness and generality of the 'apportionment cost' concept, since that idea is central to the paper's contribution.

      We believe these critical points (1-4) to improve the paper will now have been addressed to the reviewer’s satisfaction.

      Reviewer #2 (Public review):

      While the details of the paper are compelling, the authors' presentation of their results is often unclear or incomplete:

      (1) The mathematical details of the paper are correct but contain numerous notation errors and are presented as a solid block of subtle equation manipulations. This makes the details of the authors' approach (the main contribution of the paper to the field) highly difficult to understand.

      We thank the reviewers for having detected typographical errors regarding three equations. They have been corrected. The first typographical error in the original main text (Line 277) regards equation 2 and will be corrected so that equation 2 appears correctly as

      The second typo regards the definition of the considered pursuit’s reward rate which appear in the original main text (line 306), and has been corrected to appear as

      The third typographical error occurred in conversion from Google Sheets to Microsoft Word appearing in the original main text (line 703) and regards the subjective value expression when no reward is received in an intertrial interval (ITI). It has been corrected to appear as

      (2) One of the main contributions of the paper is the notion that time’s cost in decision-making contains an apportionment cost that reflects the allocation of decision time relative to the world. The authors use this cost to pose a hypothesis as to why subjects exhibit sub-optimal behavior in choice decisions. However, the equation for the apportionment cost is never clearly defined in the paper, which is a significant oversight that hampers the effectiveness of the authors' claims.

      We thank the reviewer for pressing on this critical point. Reviewers commonly identified a need to provide a concise and intuitive definition of apportionment cost, and to explicitly solve and provide for its mathematical expression.

      We added the following succinct verbal description of apportionment cost… “Apportionment cost is the difference in reward that can be expected, on average, between a policy of taking versus a policy of not taking the considered pursuit, over a time equal to its duration.” This definition appears in new paragraphs (as below) describing apportionment cost in the results section “Time’s cost: opportunity & apportionment costs determine a pursuit’s subjective value”, and is accompanied by equations for apportionment cost, and a figure giving its geometric depiction (Figure 5). We also expanded original figure 5 and its legend (so as to illustrate the apportionment scaling factor and the apportionment cost), and its accompanying main text, to further illustrate and clarify apportionment cost, and its relationship to opportunity cost, and time’s cost.

      “What, then, is the amount of reward by which the opportunity cost-subtracted reward is scaled down to equal the sv of the pursuit? This amount is the apportionment cost of time. The apportionment cost of time (height of the brown vertical bar, Figure 5F) is the global reward rate after taking into account the opportunity cost (slope of the magenta-gold dashed line in Figure 5F) times the time of the considered pursuit. Equally, the difference between the inside and outside reward rates, times the time of the pursuit, is the apportionment cost when scaled by the pursuit’s weight, i.e., the fraction that the considered pursuit is to the total time to traverse the world (Equation 9, right hand side). From the perspective of decision-making policies, apportionment cost is the difference in reward that can be expected, on average, between a policy of taking versus a policy of not taking the considered pursuit, over a time equal to its duration (Equation 9 center, Figure 5F).

      Equation 9. Apportionment Cost.

      While this difference is the apportionment cost of time, the opportunity cost of time is the amount that would be expected from a policy of not taking the considered pursuit over a time equal to the considered pursuit’s duration. Together, they sum to Time’s Cost (Figure 5G). Expressing a pursuit’s worth in terms of the global reward rate obtained under a policy of accepting the pursuit type (Figure 5 left column), or from the perspective of the outside reward and time (Figure 5 right column), are equivalent. However, the latter expresses sv in terms that are independent of one another, conveys the constituents giving rise to global reward rate, and provides the added insight that time’s cost comprises an apportionment as well as an opportunity cost.”

      (3) Many of the paper's figures are visually busy and not clearly detailed in the captions (for example, Figures 6-8). Because of the geometric nature of the authors' approach, the figures should be as clean and intuitive as possible, as in their current state, they undercut the utility of a geometric argument.

      We endeavored to make our figures as simple as possible. We have made in the revision changes to figures that we believe improve their clarity. These include: 1) breaking some figures into more panels when more than one concept was being introduced (such as in revised Figure 5 , 6, 7, and 8), 2) using the left hand y axis for the outside reward, and the right hand axis for the inside reward when plotting the “in” and “outside” reward, and indicating their respective numerics (which run in opposite directions), 3) adding a legend to the figures themselves where needed (revised figures 10, 11, 12, 14) 4) adding the values used to the figure captions, where needed, and 5) ensuring all symbols are indicated in legends.

      (4) The authors motivate their work by focusing on previously-observed behavior in decision experiments and tell the reader that their model is able to qualitatively replicate this data. This claim would be significantly strengthened by the inclusion of experimental data to directly compare to their model's behavior. Given the computational focus of the paper, I do not believe the authors need to conduct their own experiments to obtain this data; reproducing previously accepted data from the papers the authors' reference would be sufficient.

      Our objective was not to fit experimentally observed data, as is commonly the goal of implementation/computational models. Rather, as a theory, our objective is to rationalize the broad, curious, and well-established pattern of temporal decision-making behaviors under a deeper understanding of reward-rate maximization, and from that understanding, identify the nature of the error being committed by whatever learning algorithm and representational architecture is actually being used by humans and animals. In doing so, we make a number of important contributions. By identifying and analyzing reward-rate-maximizing equations, we 1) provide insight into what composes time’s cost and how the temporal structure of the world in which it is embedded (its ‘context’) impacts the value of a pursuit, 2) rationalize a diverse assortment of temporal decision-making behaviors (e.g., Hyperbolic discounting, the Magnitude Effect, the Sign Effect, and the Delay effect), explaining them with no assumed free-fit parameter, and then, by analyzing error in parameters enabling reward-rate maximization, 3) identify the likely source of error and propose the Malapportionment Hypothesis. The Malapportionment Hypothesis identifies the underweighting of a considered pursuit’s “outside”, and not error in pursuit’s reward rates, as the source of error committed by humans and animals. It explains why animals and humans can present as suboptimally ‘impatient’ in Choice, but as optimal in Forgo. At the same time, it concords with numerous and diverse observations in decision making regarding whether to initiate a pursuit. The nature of this error also, then, makes numerous predictions. These insights inform future computational and experimental work by providing strong constraints on the nature of the algorithm and representational architecture used to learn and represent the values of pursuits. Rigorous test of the Malapportionment Hypothesis will require wholly new experiments.

      In the revision, we also now emphasize and add predictions of the Malapportionment Hypothesis, updated its figure (Figure 21), its legend, and its paragraphs in the discussion.

      “We term this reckoning of the source of error committed by animals and humans the Malapportionment Hypothesis, which identifies the underweighting of the time spent outside versus inside a considered pursuit but not the misestimation of pursuit rates, as the source of error committed by animals and humans (Figure 21). This hypothesis therefore captures previously published behavioral observations (Figure 21A) showing that animals can make decisions to take or forgo reward options that optimize reward accumulation (Krebs et al., 1977; Stephens and Krebs, 1986; Blanchard and Hayden, 2014), but make suboptimal decisions when presented with simultaneous and mutually exclusive choices between rewards of different delays (Logue et al., 1985; Blanchard and Hayden, 2015; Carter and Redish, 2016; Kane et al., 2019). The Malapportionment Hypothesis further predicts that apparent discounting functions will present with greater curvature than what a reward-rate-maximizing agent would exhibit (Figure 21B). While experimentally observed temporal discounting would have greater curvature, the Malapportionment Hypothesis also predicts that the Magnitude (Figure 21C) and Sign effect (Figure 21D) would be less pronounced than what a reward-rate-maximizing agent would exhibit, with these effects becoming less pronounced the greater the underweighting. Finally, with regards to the Delay Effect (Figure 21E), the Malapportionment Hypothesis predicts that preference reversal would occur at delays greater than that exhibited by a reward-rate-maximizing agent, with the delay becoming more pronounced the greater the underweighting outside versus inside the considered pursuit by the agent.”

      (5) While the authors reference a good portion of the decision-making literature in their paper, they largely ignore the evidence-accumulation portion of the literature, which has been discussing time-based discounting functions for some years. Several papers that are both experimentally-(Cisek et al. 2009, Thurs et al. 2012, Holmes et al. 2016) and theoretically-(Drugowitsch et al. 2012, Tajima et al. 2019, Barendregt et al. 22) driven exist, and I would encourage the authors to discuss how their results relate to those in different areas of the field.

      In this manuscript, we consider the worth of initiating one or another pursuit having completed a prior one, and not the issue of continuing within a pursuit having already engaged in it. The worth of continuing a pursuit, as in patch-foraging/give-up tasks, constitutes a third fundamental time decision-making topology which is outside the scope of the current work. It engages a large and important literature, encompassing evidence accumulation, and requires a paper on the value of continuing a pursuit in temporal decision making, in its own right, that can use the concepts and framework developed here. The excellent works suggested by the reviewer will be most relevant to that future work concerning patch-foraging/give-up topologies.

      Reviewer #2 Recommendations:

      (1) In Equation 1, the term rho_d is referred to as the reward rate of the default pursuit, when it should be the reward of the default pursuit.

      Regarding Equation 1, it is formulated to calculate the average reward received and average time spent per unit time spent in the default pursuit. So, f<sub>i</sub> is the encounter rate of pursuit i for one unit of time spent in the default pursuit (lines 259-262). Added to the summation in the numerator, we have the average reward obtained in the default pursuit per unit time () and in the denominator we have the time spent in the default pursuit per unit time (1).

      We have added clarifying text to assist in meaning of the equation in Ap 1, and thank the reviewer for pointing out this need.

      (2) The notation for "in" and "out" of a considered pursuit type begins as being used to describe the contribution from a single pursuit (without inter-trial interval) towards global reward rate and the contribution of all other factors (other possible pursuits and inter-trial interval) towards global reward rate, respectively, but is then used to describe the pursuit's contribution and the inter-trial interval's contribution, respectively, to the global reward rate. This should be cleaned up to be consistent throughout, or at the very least, it should be addressed when this special case is considered the default.

      As understood by the reviewer, “in” and “out” of the considered pursuit type describes the general form by which a world can be cleaved into these two parts: the average time and reward received outside of the considered pursuit type for the average time and reward received within that pursuit type. A specific, simple, and common experimental instance would be a world composed of one or another pursuit and an intertrial interval.

      We now make clear how such a world composed of a considered pursuit and an inter trial interval would be but one special case. In example cases where t<sup>out</sup> represents the special case of an inter-trial interval, this is now stated clearly. For instance, we do so when discussing how a purely hyperbolic discounting function would apply in worlds in which no reward is received in t<sup>out</sup>, stating that this is often the case common to experimental designs where t<sup>out</sup> represents an intertrial interval with no reward. Importantly, by the new inclusion of illustrated worlds in the revision that have n-number pursuits that could occur from a default pursuit and 1) equal frequency (Supplemental 1), and 2) at differing frequencies (Supplemental 2), we make more clear the generalizability and utility of this t<sup>out</sup>/tin concept.

      (3) Figure 5 should make clear the decomposition of time's cost both graphically and functionally. As it stands, the figure does not define the apportionment cost.

      In the revision of original fig 5, we now further decompose the figure to effectively convey 1) what opportunity cost, and (especially) 2) the apportionment cost is, both graphically and mathematically, 3) how time’s cost is comprised by them, 4) how the apportionment scaling term scales the opportunity-cost-subtracted reward by time’s allocation to equal the subjective value, and 4) the equivalence between the expression of time’s cost using terms that are not independent of one another with the expression of time’s cost using terms that are independent of one another.

      (4) Figures 6-8 do not clearly define the dots and annuli used in panels B and C.

      We have further decomposed figures 6-8 so that the functional form of opportunity, apportionment, and time’s cost can be more clearly appreciated, and what their interrelationship is with respect to changing outside reward and outside time, and clearly identify symbols used in the corresponding legends.

      (5) The meaning of a negative subjective value should be specifically stated. Is it the amount a subject would pay to avoid taking the considered pursuit?

      As the reviewer intuits, negative subjective value can be considered the amount an agent ought be willing to pay to avoid taking the considered pursuit.

      We now include the following lines in “The forgo decision can also be made from subjective value” section in reference to negative subjective value…

      “A negative subjective value thus indicates that a policy of taking the considered pursuit would result in a global reward rate that is less than a policy of forgoing the considered pursuit. Equivalently, a negative subjective value can be considered the amount an agent ought be willing to pay to avoid having to take the considered pursuit.”

      (6) Why do you define the discounting function as the normalized subjective value? This choice should be justified, via literature citations or a well-described logical argument.

      The reward magnitude normalized subjective value-time function is commonly referred to as the temporal discounting function as it permits comparison of the discount rate isolated from a difference in reward magnitude and/or sign and is deeply rooted in historical precedent. As the reviewer points out, the term is overloaded, however, as investigations in which comparisons between the form of subjective value-time functions is not needed tend to refer to these functions as temporal discounting functions as well.

      We make clear in the revised text in the introduction our meaning and use of the term, the justification in doing so, and its historical roots.

      “Historically, temporal decision-making has been examined using a temporal discounting function to describe how delays in rewards influence their valuation. Temporal discounting functions describe the subjective value of an offered reward as a function of when the offered reward is realized. To isolate the form of discount rate from any difference in reward magnitude and sign, subjective value is commonly normalized by the reward magnitude when comparing subjective value-time functions (Strotz, 1956, Jimura, 2009). Therefore, we use the convention that temporal discounting functions are the magnitude-normalized subjective value-time function (Strotz, 1956).”

      Special addition. In investigating the historical roots of the discounting function prompted by the reviewer, we learned (Grüne-Yanoff 2015) that it was Mazur that simply added the “1+k” in the denominator of the hyperbolic discounting function. Our derivation for the reward-rate optimal agent makes clear why apparent temporal discounting functions ought have this general form.

      Therefore, we add the following to the “Hyperbolic Temporal Discounting Function section in the discussion…

      “It was Ainslie (Ainslie, 1975) who first understood that the empirically observed “preference reversals” between SS and LL pursuits could be explained if temporal discounting took on a hyperbolic form, which he initially conjectured to arise simply from the ratio of reward to delay (Grüne-Yanoff 2015). This was problematic, however, on two fronts: 1) as the time nears zero, the value curve goes to infinity, and 2) there is no accommodation of differences observed within and between subjects regarding the steepness of discounting. Mazur (Mazur, 1987) addressed these issues by introducing 1 + k into the denominator, providing for the now standard hyperbolic discounting function, . Introduction of “1” solved the first issue, though “it never became fully clear how to interpret this 1” (Grüne-Yanoff 2015; interviewing Ainslie). Introduction of the free-fit parameter, k, accommodated the variability observed across and within subjects by controlling the curvature of temporal discounting, and has become widely interpreted as a psychological trait, such as patience, or willingness to delay gratification (Frederick et al., 2002).”

      …continuing later in that section to explain why the reward-rate optimal agent would exhibit this general form…

      “Regarding form, our analysis reveals that the apparent discounting function of a reward-rate-maximizing agent is a hyperbolic function…

      …which resembles the standard hyperbolic discounting function, , in the denominator, where . Whereas Mazur introduced 1 + k to t in the denominator to 1) force the function to behave as t approaches zero, and 2) provide a means to accommodate differences observed within and between subjects, our derivation gives cause to the terms 1 and k, their relationship to one another, and to t in the denominator. First, from our derivation, “1” actually signifies taking t<sub>out</sub> amount of time expressed in units of t<sub>out</sub> (t<sub>out</sub>/t<sub>out</sub>=1) and adding it to t<sub>in</sub>  amount of time expressed in units of t<sub>out</sub> (ie, the total time to make a full pass through the world expressed in terms of how the agent apportions its time under a policy of accepting the considered pursuit).”

      Additional Correction. In revising the section, “Hyperbolic Temporal Discounting Functions” in the discussion, we also detected an error in our description of the meaning of suboptimal bias for SS. In the revision, the sentence now reads…

      More precisely, what is meant by this suboptimal bias for SS is that the switch in preference from LL to SS occurs at an outside reward rate that is lower—and/or an outside time that is greater —than what an optimal agent would exhibit.”

      (7) Figure 15B should have negative axes defined for the pursuit's now negative reward.

      Yes- excellent point.

      To remove ambiguity regarding the valence of inside and outside reward magnitudes, we have changed all such figures so that the left hand y-axis is used to signify the outside reward magnitude and sign, and so that the right hand y-axis is used to signify the inside reward magnitude and sign.

      With respect to the revision of original 15B, this change now makes clear that the inside reward label and numerics on the right hand side of the graph run from positive (top) to negative (bottom) values so that it can now be understood that the magnitude of the inside reward is negative in this figure (ie, a punishment). The left hand y-axis labeling the outside reward magnitude has numerics that run in the opposite direction, from negative (top) to positive (bottom). In this figure, the outside reward rate is positive whereas the inside reward rate is negative.

      (8) When comparing your discounting function to the TIMERR and Heuristic models, it would be useful to include a schematic plot illustrating the different obtainable behaviors from all models rather than just telling the reader the differences.

      We hold that the descriptions and references are sufficient to address these comparisons.

      (9) I would strongly suggest cleaning up all appendices for notation…

      The typographical errors that have been noted in these reviews have all been corrected. We believe the reviewer to be referring here to the manner that we had cross-referenced Equations in the appendices and main text which can lead to confusion between whether an equation number being referenced is in regard to its occurrence in the main text or its occurrence in the appendices.

      In the revision, we eliminate numbering of equations in the appendices except where an equation occurs in an appendix that is referenced within the main text. In the main text, important equations are numbered sequentially and note the appendix from which they derive. If an equation in an appendix is referenced in the main text, it is noted within the appendix it derives.

      …and replacing some of the small equation manipulations with written text describing the goal of each derivation.

      To increase clarity, we have taken the reviewer’s helpful suggestion, adding helper text in the appendices were needed, and have bolded the equations of importance within the Appendices (rather than removing equation manipulations making clear steps of derivation).

      (10) I would suggest moving the table in Appendix 11 to the main text where misestimation is referenced.

      So moved. This appendix now appears in the main text as table 1 “Definitions of misestimating global reward rate-enabling parameters”.

      Reviewer #3 (Public review):

      One broad issue with the paper is readability. Admittedly, this is a complicated analysis involving many equations that are important to grasp to follow the analyses that subsequently build on top of previous analyses.

      But, what's missing is intuitive interpretations behind some of the terms introduced, especially the apportionment cost without referencing the equations in the definition so the reader gets a sense of how the decision-maker thinks of this time cost in contrast with the opportunity cost of time.

      We thank the reviewer for encouraging us to formulate a succinct and intuitive statement as to the nature of apportionment cost. We thank the reviewer for pressing for a succinct and intuitive verbal description.

      We added the following succinct verbal description of apportionment cost… “Apportionment cost is the difference in reward that can be expected, on average, between a policy of taking versus a policy of not taking the considered pursuit, over a time equal to its duration.” This definition appears in a new paragraph (as below) describing apportionment cost in the results section “Time’s cost: opportunity & apportionment costs determine a pursuit’s subjective value”, and is accompanied by equations for apportionment cost, and a figure giving its geometric depiction (Figure 5). We also expanded original figure 5 and its legend (so as to illustrate the apportionment scaling factor and the apportionment cost), and its accompanying main text, to further illustrate and clarify apportionment cost, and its relationship to opportunity cost, and time’s cost.

      “What, then, is the amount of reward by which the opportunity cost-subtracted reward is scaled down to equal the sv of the pursuit? This amount is the apportionment cost of time. The apportionment cost of time (height of the brown vertical bar, Figure 5F) is the global reward rate after taking into account the opportunity cost (slope of the magenta-gold dashed line in Figure 5F) times the time of the considered pursuit. Equally, the difference between the inside and outside reward rates, times the time of the pursuit, is the apportionment cost when scaled by the pursuit’s weight, i.e., the fraction that the considered pursuit is to the total time to traverse the world (Equation 9, right hand side). From the perspective of decision-making policies, apportionment cost is the difference in reward that can be expected, on average, between a policy of taking versus a policy of not taking the considered pursuit, over a time equal to its duration (Equation 9 center, Figure 5F).

      Equation 9. Apportionment Cost.

      While this difference is the apportionment cost of time, the opportunity cost of time is the amount that would be expected from a policy of not taking the considered pursuit over a time equal to the considered pursuit’s duration. Together, they sum to Time’s Cost (Figure 5G). Expressing a pursuit’s worth in terms of the global reward rate obtained under a policy of accepting the pursuit type (Figure 5 left column), or from the perspective of the outside reward and time (Figure 5 right column), are equivalent. However, the latter expresses sv in terms that are independent of one another, conveys the constituents giving rise to global reward rate, and provides the added insight that time’s cost comprises an apportionment as well as an opportunity cost.”

      The above definition of apportionment cost adds to other stated relationships of apportionment cost found throughout the paper (original lines 434,435,447,450).

      Re-analysis of some existing empirical data through the lens of their presented objective functions, especially later when they describe sources of error in behavior.

      Our objective was not to fit experimentally observed data, as is commonly the goal of implementation/computational models. Rather, as a theory, our objective is to rationalize the broad, curious, and well-established pattern of temporal decision-making behaviors under a deeper understanding of reward-rate maximization, and from that understanding, identify the nature of the error being committed by whatever learning algorithm and representational architecture is actually being used by humans and animals. In doing so, we make a number of important contributions. By identifying and analyzing reward-rate-maximizing equations, we 1) provide insight into what composes time’s cost and how the temporal structure of the world in which it is embedded (its ‘context’) impacts the value of a pursuit, 2) rationalize a diverse assortment of temporal decision-making behaviors (e.g., Hyperbolic discounting, the Magnitude Effect, the Sign Effect, and the Delay effect), explaining them with no assumed free-fit parameter, and then, by analyzing error in parameters enabling reward-rate maximization, 3) identify the likely source of error and propose the Malapportionment Hypothesis. The Malapportionment Hypothesis identifies the underweighting of a considered pursuit’s “outside”, and not error in pursuit’s reward rates, as the source of error committed by humans and animals. It explains why animals and humans can present as suboptimally ‘impatient’ in Choice, but as optimal in Forgo. At the same time, it concords with numerous and diverse observations in decision making regarding whether to initiate a pursuit. The nature of this error also, then, makes numerous predictions. These insights inform future computational and experimental work by providing strong constraints on the nature of the algorithm and representational architecture used to learn and represent the values of pursuits. Rigorous test of the Malapportionment Hypothesis will require wholly new experiments.

      In the revision, we also now emphasize and add predictions of the Malapportionment Hypothesis, augmenting its figure (Figure 21), its legend, and its paragraphs in the discussion.

      “We term this reckoning of the source of error committed by animals and humans the Malapportionment Hypothesis, which identifies the underweighting of the time spent outside versus inside a considered pursuit but not the misestimation of pursuit rates, as the source of error committed by animals and humans (Figure 21). This hypothesis therefore captures previously published behavioral observations (Figure 21A) showing that animals can make decisions to take or forgo reward options that optimize reward accumulation (Krebs et al., 1977; Stephens and Krebs, 1986; Blanchard and Hayden, 2014), but make suboptimal decisions when presented with simultaneous and mutually exclusive choices between rewards of different delays (Logue et al., 1985; Blanchard and Hayden, 2015; Carter and Redish, 2016; Kane et al., 2019). The Malapportionment Hypothesis further predicts that apparent discounting functions will present with greater curvature than what a reward-rate-maximizing agent would exhibit (Figure 21B). While experimentally observed temporal discounting would have greater curvature, the Malapportionment Hypothesis also predicts that the Magnitude (Figure 21C) and Sign effect (Figure 21D) would be less pronounced than what a reward-rate-maximizing agent would exhibit, with these effects becoming less pronounced the greater the underweighting. Finally, with regards to the Delay Effect (Figure 21E), the Malapportionment Hypothesis predicts that preference reversal would occur at delays greater than that exhibited by a reward-rate-maximizing agent, with the delay becoming more pronounced the greater the underweighting outside versus inside the considered pursuit by the agent.”

      Reviewer #3 Recommendations:

      As mentioned above, the readability of this paper should be improved so that the readers can follow the derivations and your analyses better. To this end, careful numbering of equations, following consistent equation numbering formats, and differentiating between appendix referencing and equation numbering would have gone a long way in improving the readability of this paper. Some specific questions are noted below.

      To increase clarity, in the revision we eliminated numbering of equations in the appendices except where an equation occurs in an appendix that is referenced within the main text. In the main text, important equations are thus numbered sequentially as they appear and note the appendix from which they derive. If an equation in an appendix is referenced in the main text, it is noted within the appendix it derives.

      (1) In general, it is unclear what the default pursuit is. From the schematic on the left (forgo decision), it appears to be the time spent in between reward-giving pursuits. However, this schematic also allows for smaller rewards to be attained during the default pursuit as do subsequent equations that reference a default reward rate. Here is where an example would have really benefited the authors in getting their point across as to what the default pursuit is in practice in the forgo decisions and how the default reward rate could be modulated.

      (1) The description of the default pursuit has been modified in section “Forgo and Choice decision topologies” to now read… “After either the conclusion of the pursuit, if accepted, or immediately after rejection, the agent returns to a pursuit by default (the “default” pursuit). This default pursuit effectively can be a waiting period over which reward could be received, and reoccurs until the next pursuit opportunity becomes available.” (2) Additionally, helper text has been added to Ap1 regarding the meaning of time and reward spent in the default pursuit. Finally, (3) new figures concerning n-pursuits occurring at the same (Supplement 1) or different (Supplement 2) frequencies from a default pursuit is now added, providing examples as suggested by the reviewer.

      (2) I want to clarify my understanding of the topologies in Figure 1. In the forgo, do they roam in the "gold" pursuit indefinitely before they are faced with the purple pursuit? In general, comparing the 2 topologies, it seems like in the forgo decision, they can roam indefinitely in the gold topology or choose the purple but must return to the gold.

      The reviewer’s understanding of the topology is correct. The agent loops across one unit time in the default gold pursuit indefinitely, though the purple pursuit (or any pursuit that might exist in that world) occurs on exit from gold at its frequency per unit time. The default gold pursuit will then itself have an average duration in units of time spent in gold. As the reviewer states, the agent can re-enter into gold from having exited gold, and can enter gold from having exited purple, but cannot re-enter purple from having exited purple; rather, it must enter into the default pursuit.

      …Another point here is that this topology is highly simplified (only one considered pursuit). So it may be helpful to either add a schematic for the full topology with multiple pursuits or alternatively, provide the corresponding equations (at least in appendix 1 and 2) for the simplified topology so you can drive home the intuition behind derived expressions in these equations.

      We understand the reviewer to be noting that, while, the illustrated example is of the simple topology, the mathematical formulation handles the case of n-number pursuits, and that illustrating a world in which there are a greater number of pursuits, corresponding to original appendices 1&2, would assist readers in understanding the generality of these equations.

      An excellent suggestion. We have now n-pursuit world illustrations where each pursuit occurs at the same (Supplemental Figure 1) and at different frequencies (Supplemental Figure 2) to the manuscript, and have added text to assist in understanding the form of the equation and its relationship to unit time in the default pursuit in the main and in the appendices.

      (3) In Equation and Appendix 1, there are a few things that are unclear. Particularly, why is the expected time of the default option E(t_default )= 1/(∑_(i=1)^n f_i )? Similarly, why is the E(r_default )= ρ_d/(∑_(i=1)^n f_i )? Looking at the expression for E(r_default ), it implies that across all pursuits 1 through n, the default option is encountered only once. Ultimately, in Equation 1.4, (and Equation 1), the units of the two terms in the numerator don't seem to match. One is a reward rate (ρ_d) and the other is a reward value. This is the most important equation of the paper since the next several equations build upon this. Therefore, the lack of clarity here makes the reader less likely to follow along with the analysis in rigorous detail. Better explanations of the terms and better formatting will help alleviate some of these issues.

      The equation is formulated to calculate the average reward received and average time spent per unit time spent in the default pursuit. So, f<sub>i</sub> is the encounter rate of pursuit i for one unit of time spent in the default pursuit. Added to the summation in the numerator we have the average reward obtained in the default pursuit per unit time () and in the denominator we have the time spent in the default pursuit per unit time (1).

      Text explaining the above equation has been added to Ap 1.

      (4) In equation and appendix 2, I'm trying to relate the expressions for t_out and r_out to the definitions "average time spent outside the considered pursuit". If I understand the expression in Equation 2.4 on the right-hand side, the numerator is the total time spent in all of the pursuits in the environment and the denominator refers to the number of times the considered pursuit is encountered. It is unclear as to why this is the average time spent outside the considered pursuit. In my mind, the expression for average time spent outside the considered pursuit would look something like t_out=1+ ∑_(i≠in)〖p_i t_i 〗= 1+ ∑_(i≠in)〖f_i/(∑_(j=1)^n f_j ) * t_i 〗. It is unclear how these expressions are then equivalent.

      Regarding the following equation,

      f<sub>i</sub> is the probability that pursuit i will be encountered during a single unit of time spent in the default pursuit. The numerator of the expression is the average amount of time spent across all pursuits, excepting the considered pursuit, per unit time spent in the default pursuit. Note that the + 1 in the numerator is accounting for the unit of time spent in the default pursuit and is added outside of the sum. Since f<sub>in</sub> is the probability that the considered pursuit will be encountered per unit of time spent in the default pursuit, is the average amount of time spent in the default pursuit between encounters of the considered pursuit. By multiplying the average time spent across all outside pursuits per unit of time in the default pursuit by the average amount of time spent in the default pursuit between encounters of the considered pursuit, we get the average amount of time spent outside the considered pursuit per encounter of the considered pursuit. This is calculated as if the pursuit encounters are mutually exclusive within a single unit of time spent within the default pursuit, as this is the case as the length of our unit time (delta t) approaches zero.

      The above text explaining the equation has been added to Ap 2.

      (5) In Figure 3, one huge advantage of this separation into in-pursuit and out-of-pursuit patches is that the optimal reward rate maximizing rule becomes one that compares ρ_in and ρ_out. This contrasts with an optimal foraging rule which requires comparing to the global reward rate and therefore a circularity in solution. In practice, however, it is unclear how ρ_out will be estimated by the agent.

      How, in practice, a human or animal estimates the reward rates―be they the outside and/or global reward rate under a policy of accepting a pursuit―is the crux of the matter. This work identifies equations that would enable a reward-rate maximizing agent to calculate and execute optimal policies and emphasizes that the effective reward rates and weights of pursuits must be accurately appreciated for global reward rate optimization. In so doing, it makes a reckoning of behaviors commonly but erroneously treated as suboptimal. Then, by examining the consequences of misestimation of these enabling parameters, it identifies mis-weighting pursuits as the nature of the error committed by whatever algorithm and representational architecture is being used by humans and animals (the Malapportionment Hypothesis). This curious pattern identified and analyzed in this work thus provides a clue into the nature of the learning algorithm and means of representing the temporal structure of the environment that is used by humans and animals―the subject of future work.

      We note, however, that we do discuss existing models that grapple with how, in practice, how a human or animal may estimate the outside reward rate. Of particular importance is the TIMERR model, which estimates the outside reward rate from its past experience, and can make an accounting of many qualitative features widely observed. However, while appealing, it would mix prior ‘in’ and ‘outside’ experiences within that estimate, and so would fail to perform forgo tasks optimally. Something is still amiss, as this work demonstrates.

      (6) The apportionment time cost needs to be explained a little bit more intuitively. For instance, it is clear that the opportunity cost of time is the cost of not spending time in the rest of the environment relative to the current pursuit. But given the definition of apportionment cost here in lines 447- 448 "The apportionment cost relates to time's allocation in the world: the time spent within a pursuit type relative to the time spent outside that pursuit type, appearing in the denominator." The reference to the equation (setting aside the confusion regarding which equation) within the definition makes it a bit harder to form an intuitive interpretation of this cost. Please reference the equation being referred to in lines 447-448, and again, an example may help the authors communicate their point much better

      We thank the reviewer for pressing on this critical point.

      Action: We added the following succinct verbal description of apportionment cost… “Apportionment cost is the difference in reward that can be expected, on average, between a policy of taking versus a policy of not taking the considered pursuit, over a time equal to its duration.” This definition appears in a new paragraph (as below) describing apportionment cost in the results section “Time’s cost: opportunity & apportionment costs determine a pursuit’s subjective value”, and is accompanied by equations for apportionment cost, and a figure giving its geometric depiction (Figure 5).

      “What, then, is the amount of reward by which the opportunity cost-subtracted reward is scaled down to equal the sv of the pursuit? This amount is the apportionment cost of time. The apportionment cost of time (height of the brown vertical bar, Figure 5F) is the global reward rate after taking into account the opportunity cost (slope of the magenta-gold dashed line in Figure 5F) times the time of the considered pursuit. Equally, the difference between the inside and outside reward rates, times the time of the pursuit, is the apportionment cost when scaled by the pursuit’s weight, i.e., the fraction that the considered pursuit is to the total time to traverse the world (Equation 9, right hand side). From the perspective of decision-making policies, apportionment cost is the difference in reward that can be expected, on average, between a policy of taking versus a policy of not taking the considered pursuit, over a time equal to its duration (Equation 9 center, Figure 5F).

      Equation 9. Apportionment Cost.

      While this difference is the apportionment cost of time, the opportunity cost of time is the amount that would be expected from a policy of not taking the considered pursuit over a time equal to the considered pursuit’s duration. Together, they sum to Time’s Cost (Figure 5G). Expressing a pursuit’s worth in terms of the global reward rate obtained under a policy of accepting the pursuit type (Figure 5 left column), or from the perspective of the outside reward and time (Figure 5 right column), are equivalent. However, the latter expresses sv in terms that are independent of one another, conveys the constituents giving rise to global reward rate, and provides the added insight that time’s cost comprises an apportionment as well as an opportunity cost.”

      (7) The analyses in Figures 6 and 7 give a nice visual representation of how the time costs are distributed as a function of outside reward and time spent. However, without an expression for apportionment cost it is hard to intuitively understand these visualizations. This also relates to the previous point of requiring a more intuitive explanation of apportionment costs in relation to the opportunity cost of time. Based on my quick math, it seems that an expression for apportionment cost would be as follows: (r_in- ρ_out*t_in)*(t_in⁄t_out )/(t_in⁄t_out +1 ). The condition described in Figure 7 seems like the perfect place to compute the value of just apportionment cost when the opportunity cost is zero. It would be helpful to introduce the equation here.

      We designed original figure 7, as the reviewer appreciates, to emphasize that time has a cost even when there is no opportunity cost, being due entirely to the apportionment cost of time.

      We now provide the mathematical expression of apportionment cost and apportionment scaling in Figure 5, the point in the main text of its first occurrence.

      …and have expanded original figure 5, its legend (so as to illustrate the apportionment scaling factor and the apportionment cost), and its accompanying main text, to further illustrate and clarify apportionment cost, and its relationship to opportunity cost, and time’s cost.

      (8) The analysis regarding choice decisions is relatively straightforward, pending the concerns for the main equations listed above for the forgo decisions. Legends certainly would have helped me grasp Figures 10-12 better.

      We believe the reviewer is referring to missing labels for the Sooner Smaller pursuit, and the Larger Later Pursuit in these figures? We used the same conventions as in Figure 9, but we see now that adding these labels to these figures would be helpful, and add them in the revision.

      We have now added to the figures themselves figure legends indicating the Sooner Small Pursuit and the Larger Later Pursuit. We have also added to the main text to emphasize the points made in these figures regarding the impact of opportunity cost and apportionment cost.

      (9) The derivation of the temporal discounting function from subjective reward rate is much appreciated as it provides further evidence for potential equivalence between reward rate optimization and hyperbolic discounting, which is known to explain a slew of decision-making behaviors in the economics literature.

      We thank and greatly appreciate the reviewer for this recognition.

      In response to the reviewer’s comment, we have added text that further relates reward rate optimization to hyperbolic discounting…

      (1) We add discussion of how our normative derivation gives explanation to Mazur’s ad hoc addition of 1 + k to Ainslie’s reward/time hyperbolic discounting conception. See new first paragraph under “Hyperbolic Temporal Discounting Functions” for the historical origins of the standard hyperbolic equation (which are decidedly not normatively derived). And then see our discussion (new second paragraph in sections “The apparent discounting function of global….”) of how our normative derivation gives explanation to “1”, “k”, and their relationship to each other.

      (2) We add explicit treatment of the Delay Effect in a new “The Delay Effect” section of the results along with a figure, and in its corresponding Discussion section.

      Minor comments:

      (1) Typo in equation 2, should be t_i in the denominator within the summation, not r_i .

      We thank the reviewer for catching this typo, and have corrected it in the revision.

      (2) Before equation 6, typo when defining ρ_in= r_in/(t_in.). Should be t_in in the denominator, not r_out.

      We thank the reviewer for catching this typo, and have corrected it in the revision.

      (3) Please be consistent with equation numbers, placement of equation references, and the reason for placing appendix numbers. This will improve readability immensely.

      To increase clarity, in the revision we eliminated numbering of equations in the appendices except where an equation occurs in an appendix that is referenced within the main text. In the main text, important equations are thus numbered sequentially and note the appendix from which they derive. If an equation in an appendix is referenced in the main text, it is noted within the appendix it derives.

      (4) Line 505 - "dominants" should be dominates.

      Typo fixed as indicated

      (5) Figures 10-12: add legends to the figures.

      Now so included.

      (6) Lines 701-703: please rewrite the equation separately. It is highly unclear what rt is here.

      We thank the reviewer for bringing attention to this error. The error arose in converting from Google Sheets to Microsoft Word.

      The equation has now been corrected.

      Additional citations noted in reply and appearing in Main text

      Ainslie, George. 1975. “Specious Reward: A Behavioral Theory of Impulsiveness and Impulse Control.” Psychological Bulletin 59: 257–72.

      Frederick, Shane, George Loewenstein, Ted O. Donoghue, and T. E. D. O. Donoghue. 2002. “Time Discounting and Time Preference : A Critical Review.” Journal of Economic Literature 40: 351–401.

      Gibbon, John. 1977. “Scalar Expectancy Theory and Weber’s Law in Animal Timing.” Psychological Review 84: 279–325.

      Green, Leonard, Nathanael Fristoe, and Joel Myerson. 1994. “Temporal Discounting and Preference Reversals in Choice between Delayed Outcomes.” Psychonomic Bulletin & Review 1: 383–89.

      Grüne-Yanoff, Till. 2015. “Models of Temporal Discounting 1937-2000: An Interdisciplinary Exchange between Economics and Psychology.” Science in Context 28 (4): 675–713.

      Jimura, Koji, Joel Myerson, Joseph Hilgard, Todd S. Braver, and Leonard Green. 2009. “Are People Really More Patient than Other Animals? Evidence from Human Discounting of Real Liquid Rewards.” Psychonomic Bulletin & Review 16: 1071–75.

      Kalenscher, Tobias, and Cyriel M. A. Pennartz. 2008. “Is a Bird in the Hand Worth Two in the Future? The Neuroeconomics of Intertemporal Decision-Making.” Progress in Neurobiology 84 (3): 284–315.

      Kirby, Kris N., and R. J. Herrnstein. 1995. “Preference Reversals Due to Myopic Discounting of Delayed Reward.” Psychological Science 6 (2): 83–89.

      Mazur, James E. 1987. “An Adjusting Procedure for Studying Delayed Reinforcement.” In The Effect of Delay and of Intervening Events on Reinforcement Value., 55–73. Quantitative Analyses of Behavior, Vol. 5. Hillsdale, NJ, US: Lawrence Erlbaum Associates, Inc.

      McNamara, John. 1982. “Optimal Patch Use in a Stochastic Environment.” Theoretical Population Biology 21 (2): 269–88.

      Rosati, Alexandra G., Jeffrey R. Stevens, Brian Hare, and Marc D. Hauser. 2007. “The Evolutionary Origins of Human Patience: Temporal Preferences in Chimpanzees, Bonobos, and Human Adults.” Current Biology: CB 17: 1663–68.

      Strotz, R. H. 1956. “Myopia and Inconsistency in Dynamic Utility Maximization.” The Review of Economic Studies 23: 165–80.

    1. Author response:

      The following is the authors’ response to the original reviews.

      This valuable study combines multidisciplinary approaches to examine the role of insulin-like growth factor 2 mRNA-binding protein 2 (IGF2BP2) as a potential novel host dependency factor for Zika virus. The main claims are partially supported by the data, but remain incomplete. The evidence would be strengthened by improving the immunofluorescence analyses, addressing the role of IGF2BP2 in "milder" infections, and elucidating the role of IGF2BP2 in the biogenesis of the viral replication organelle. With the experimental evidence strengthened, this work will be of interest to virologists working on flaviviruses.

      We thank the reviewers for their feedback and constructive suggestions. In this revised version of the manuscript, we have addressed the reviewer’s comments to the best of our ability as detailed below. We believe that the newly incorporated data strengthens our study and conclusions. We hope that this revised manuscript will satisfy the reviewers and will be of high interest to flavivirologists.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This study investigated the co-option of IGF2BP2, an RNA-binding protein by ZIKV proteins. Designed experiments evaluated if IFG2BP2 co-localized to sites of viral RNA replication, interacted with ZIKV proteins, and how ZIKV infection changed the IGF2BP2 interactome.

      Strengths:

      The authors have used multiple interdisciplinary techniques to address several questions regarding the interaction of ZIKV proteins and IGF2BP2.

      The findings could be exciting, specifically regarding how ZIKV infection alters the interactome of IGF2BP2.

      We thank the reviewer for acknowledging the multidisciplinary approach of our study and its exciting potential.

      Weaknesses:

      Significant concerns regarding the current state of the figures, descriptions in the figure legends, and the quality of the immunofluorescence and electron microscopy exist.

      In this new version of the manuscript, we have improved the quality of the microscopy data and included the requested information in the figure legends as described below in the Recommendations section.

      Reviewer #2 (Public Review):

      Clément Mazeaud et al. identified the insulin-like growth factor 2 mRNA-binding protein 2 (IGF2BP2) as a proviral cellular protein that regulates Zika virus RNA replication by modulating the biogenesis of virus-induced replication organelles.

      The absence of IGF2BP2 specifically dampens ZIKV replication without having a major impact on DENV replication. The authors show that ZIKV infection changes IGF2BP2 cellular distribution, which relocates to the perinuclear viral replication compartment. These assays were conducted by infecting cells with an MOI of 10 for 48 hours. Considering the ZIKV life cycle, it is noteworthy that at this time there may be a cytopathic effect. One point of concern arises regarding how the authors can ascertain that the observed change in localization is a consequence of the infection rather than of the cytopathic effect. To address this concern, shorter infection periods (e.g., 24 hours post-infection) or additional controls, such as assessing cellular proteins that do not change their localization or infecting with another flavivirus lacking the IGF2BP2 effect, could be incorporated into their experiments.

      We thank the reviewer for these relevant comments regarding the specificity of IGF2BP2 relocalization to the ZIKV replication compartment.

      It is noteworthy that we chose the 2-day post-infection time point for our analyses because it corresponds to the peak of replication with much more titers produced compared to those at 24 hours post-infection (generally ~106 PFU/mL vs. ~104 PFU/mL). Consistently, the abundance of viral replication factories is more obvious at this time-point. A MOI of 5-10 was chosen to maximize the % of infected cells. That said, as suggested by the reviewer, we have analyzed the distribution of IGF2BP2 in ZIKV-infected cells at one-day post-infection, and we provide evidence in Figure S1 that IGF2BP2 relocalizes to the dsRNA-containing compartment at this time point.

      Importantly, we now show in Figure S5 that in contrast to IGF2BP2, other host RNA-binding proteins such as LARP1 and DDX5 do not accumulate to ZIKV replication compartment at 2 days post-infection. LARP1 actually seems to be excluded from it while DDX5 remains nuclear. Of note, consistent with the ZIKV-induced decrease in expression observed in western blots (Fig 4A), the intensity of DDX5 signal decreases in infected cells. Altogether, this demonstrates that the IGF2BP2 relocalization phenotype is specific and is not due to ZIKV-induced cell death.

      By performing co-immunoprecipitation assays on mock and infected cells that express HAtagged IGF2BP2, the authors propose that the observed change in IGF2BP2 localization results from its recruitment to the replication compartment by the viral NS5 polymerase and associated with the viral RNA. Given that both IGF2BP2 and NS5 are RNA-binding proteins, it is plausible that their interaction is mediated indirectly through the RNA molecule. Notably, the authors do not address the treatment of lysates with RNase before the IP assay, leaving open the possibility of this indirect interaction between IGF2BP2 and NS5.

      We agree with the hypothesis of the reviewer. As suggested, we have performed coimmunoprecipitation assays following RNase A treatment of the cell lysates. As shown in new Fig S6, the abundance of ZIKV NS5 co-immunoprecipitating with IGF2BP2-HA is drastically decreased upon RNase A treatment compared to the untreated condition. This demonstrates that the IGF2BP2/NS5 interaction is mostly RNA-dependent, which is not surprising as RNA is often a structural component of ribonucleoprotein complexes. Of note, the same is observed with ATL2. This new set of data allows us to refine our model of Figure 11 and the discussion as they strongly suggest that the direct binding of IGF2BP2 to viral RNA (evidenced in vitro; Fig 5D) is required for subsequent association with NS5 and ER-shaping protein ATL2. This is in line with the fact that viral RNA is a co-factor in the biogenesis of ER-derived ZIKV vesicle packets (PMID: 32640225). However, we cannot exclude a contribution of cellular RNA in these processes as discussed.   

      In in vitro binding assays, the authors demonstrate that the RNA-recognition motifs of the IGF2BP2 protein specifically bind to the 3' nontranslated region (NTR) of the ZIKV genome, excluding binding to the 5' NTR. However, they cannot rule out the possibility of this host protein associating with other regions of the viral genome. Using a reporter ZIKV subgenomic replicon system in IGF2BP2 knock-down cells, they additionally demonstrate that IGF2BP2 enhances viral genome replication. Despite its proviral function, the authors note that the "overexpression of IGF2BP2 had no impact on total vRNA levels." However, the authors do not delve into a discussion of this latter statement.

      We agree with the reviewer’s comments. We now mention in the discussion that we cannot exclude the possibility that IGF2BP2 associates with RNA motifs within the coding region of the viral genomic RNA, especially considering that it contains N6A-methylated sequences (PMID: 27773535; 27773536; 29373715). Moreover, we discuss the observation that IGF2BP2 overexpression has no impact on vRNA levels (as well as titers). We believe that this is because endogenous IGF2BP2 is highly expressed in cancer cells such as the Huh7.5 and JEG-3 cells used here and is presumably not limiting for viral replication in our system (PMID: 38320625; 35111811; 34309973; 35023719; 37088822; 33224879; 35915142).

      In this study, the authors extend their findings by illustrating that ZIKV infection triggers a remodeling of IGF2BP2 ribonucleoprotein complex. They initially evaluate the impact of ZIKV infection on IGF2BP2's interaction with its endogenous mRNA ligands. Their results reveal that viral infection alters the binding of specific mRNA ligands, yet the physiological consequences of this loss of binding in the cell remain unexplored. 

      We acknowledge that it would be of interest to further study the physiological relevance of the modulation of IGF2BP2 ribo-interactome. Since we have focused here on the role of IGF2BP2 in viral replication, we feel that this will be the focus of future studies notably involving a larger omic-centered approach to identify the most impacted IGF2BP2 mRNA ligands. Of note, Gokhale and colleagues have already reported that CIRBP, TNRC6A and PUM2 proteins regulates the replication of Flaviviridae (PMID: 31810760).

      Additionally, the authors demonstrate that ZIKV infection modifies the IGF2BP2 interactome. Through proteomic assays, they identified 62 altered partners of IGF2BP2 following ZIKV infection, with proteins associated with mRNA splicing and ribosome biogenesis being the most represented. In particular, the authors focused their research on the heightened interaction between IGF2BP2 and Atlastin 2, an ER-shaping protein reported to be involved in flavivirus vesicle packet formation. The validation of this interaction by Western blot assays prompted an analysis of the effect of ZIKV on organelle biogenesis using a newly described replication-independent vesicle packet induction system. Consequently, the authors demonstrate that IGF2BP2 plays a regulatory role in the biogenesis of ZIKV replication organelles.

      Based on these findings and previously published data, the authors propose a model outlining the role of IGF2BP2 in ZIKV infectious cycle, detailing the changes in IGF2BP2 interactions with both cellular and viral proteins and RNAs that occur during viral infection.

      The conclusions drawn in this paper are generally well substantiated by the data.

      We thank the reviewers for this encouraging general comments on our study.

      However, it is worth noting that the majority of infections were conducted at a high MOI for 48 hours, spanning more than one infectious cycle. To enhance the robustness of their findings and mitigate potential cell stress, it would be valuable to observe these effects at shorter time intervals, such as 24 hours post-infection.

      As explained above, IGF2BP2 relocalization to the (dsRNA-enriched) replication compartment was also observed in ZIKV infected cells at one day post-infection.

      Furthermore, the assertion regarding the association of IGF2BP2 with NS5 could be strengthened through additional immunoprecipitation (IP) assays. These assays, performed in the presence of RNAse treatment, would help exclude the possibility of an indirect interaction between IGF2BP2 and NS5 (both RNA-binding proteins) through viral RNA, thus providing more confidence in the observed association.

      See above for our answer and the description of the new data of Fig. S7.

      Reviewer #3 (Public Review):

      Summary:

      The manuscript by Mazeaud and colleagues pursued a small-scale screen of a targeted RNAi library to identify novel players involved in Zika (ZIKV) and dengue (DENV) virus replication. Loss-of-function of IGF2BP2 resulted in reduced titers for ZIKV of the Asian and African lineages in hepatic Huh7.5 cells, but not for either of the four DENV serotypes nor West Nile virus (WNV). The phenotype was further confirmed in two additional cell lines and using a ZIKV reporter virus. In addition, using immunoprecipitation assays the interaction between IGF2BP2 and ZIKV NS5 protein and RNA genome was detected. The work addressed the role of IGF2BP2 in the infected cell combining confocal microscopy imaging, and proteomic analysis. The approach indicated an altered distribution of IGF2BP2 in infected cells and changes in the protein interactome including disrupted association with partner mRNAs and modulation of the abundance of a specific set of protein partners in IGF2BP2 immunoprecipitated ribonucleoprotein (RNP) complexes. Finally, based on the changes in IGF2BP2 interactome and specifically the increment in the abundance of Atlastin 2, the biogenesis of ZIKV replication organelles (vRO) is investigated using a genetic system that allows virus replication-independent assembly of vRO. Electron microscopy showed that knockdown of IGF2BP2 expression reduced the number of cells with vRO.

      Strengths:

      The role of IGF2BP2 as a proviral factor for ZIKV replication is novel. The study follows a logical flow of experiments that altogether support the assembly of a specialized RNP complex containing IGF2BP2 and ZIKV NS5 and RNA genome.

      We thank the reviewer for their positive feedback on our study and its novelty.

      Weaknesses:

      The statistical analysis should clearly indicate the number of biological replicates of experiments to support statistical significance.

      This information has been included in all figure legends.

      The claim that IGF2BP2 knockdown impairs de novo viral organelle biogenesis and viral RNA synthesis is built upon data that show a reduction in RNA synthesis <0.5-fold as assessed using a reporter replicon, thus suggesting a limited impact of the knockdown on RNA replication.

      We agree that a 50% decrease in the replication of our reporter replicon might be considered mild. However, we want to pinpoint that in an infectious set-up, the phenotypes were higher as demonstrated by an 80% decrease in viral particle production even when IGF2BP2 levels were never depleted more that 80% compared to endogenous levels. Moreover, our findings were validated through the analysis of de novo vRO biogenesis by electron microscopy in a replication-independent set-up. Together, these experiments provide compelling evidence for a role for IGF2BP2 in the early stages of viral genome replication.

      Validation of IGF2BP2 partners that are modulated upon ZIKV infection (i.e. virus yield in knocked down cells) can be relevant especially for partners such as Atlastin 2, as the hypothesis of a role for IGF2BP2 RNP in vRO biogenesis is based on the observed increase in the abundance of Atlastin 2 in the RNP complex preciìtated from infected cells.

      First, we would like to emphasize that the proviral role of ATL2 in flavivirus replication, including links to vRO biogenesis, was already reported in two independent studies notably by one of the co-authors (PMID: 31636417; 31534046). Therefore, we have chosen to discuss these previous studies in the manuscript rather than repeating published experiments.  Second, we agree that it would be interesting to further interrogate the role of modulated IGF2BP2 protein partners in ZIKV replication. However, these experiments would constitute a new project per se involving fastidious RNAi-based phenotypic screening and subsequent functional characterization of the identified hits. Therefore, this will be the focus of follow-up studies.  

      Recommendations for the Authors:

      Reviewer #1 (Recommendations For The Authors):

      All IFAs claimed that showing co-localization is minimal, this needs to be addressed.

      We have performed colocalization analyses for relevant images in the revised manuscript (see below and Figs. 4B, 5A, S4A-C and S5A-D. Although this quantification increases confidence in our analysis, we were still cautious in our conclusions, stating that colocalization was partial and that IGF2BP2 accumulates in the replication compartment.

      Western blots and IPs need to be quantified.

      As requested, we have included WB quantification in Figs. 2A, 4A, 4D, 8B-D, S6C and S7D.

      Figure 1: What is the strain background for the ZIKV reporter virus?

      As indicated in the legend of Figure 1E of the primary submission, the Rluc-expressing ZIKV reporter virus (ZIKV-R2A) was based on the FSS13025 isolate (Asian lineage)(PMID: 27198478). To clarify this, we have also indicated the strain background in the main text of the Results and Material & Methods sections.

      Figure 2A: If shGF2BP2 reduces viral titer, the NS3 should show a reduction in 2A, but it doesn't.

      We agree with the reviewer. Although NS3 seems not to be decreased upon IGF2BP2 knockdown in the experiment initially shown in Figure 2A, it should be noted that our homemade rat anti-NS3 antibody is highly sensitive, leading to signal saturation that makes it challenging to distinguish changes in NS3 expression without diluting substantially the lysate sample before the PAGE-SDS. The initial reason for including Fig 2A was not to make a statement about viral protein expression but to validate IGF2BP2 knock-down efficiency. Conclusions about NS3 levels in the initial figure are further complicated by the high MOI of ZIKV was used in Huh7.5 cells which are not quantitative for viral replication measurements. To address this issue, we assessed the impact of IGF2BP2 knockdown on viral protein abundance (as a read-out of overall viral replication) with a lower MOI of ZIKV. The results of the repeat experiment (seen in the new Fig. 2A) show that IGF2BP2 knockdown leads to a decrease in the abundance of NS4A, NS5 and NS3, which is consistent with the titer decrease phenotypes.

      Figure S3: The re-localization claimed is minimal and does not show overlap with NS3. The dsRNA is difficult to see here. Suggest improving the immunofluorescence images and reducing the claim for "strong" co-option of RNP complexes.

      In addition to replication complexes, NS3 labels convoluted membranes which are devoid of dsRNA and IGF2BP2 and surround the cage-like replication compartment as large puncta (PMID: 27545046; 33432690; 28249158). The signal overlap is more obvious between IGF2BP2 and NS3/dsRNA-containing areas, which is reflected by the Mander’s coefficients that have been included in the revised version (Fig. S5C-D). We have also adjusted the text to conclude that the colocalization was partial and that IGF2BP2 accumulated in the replication compartment. We acknowledge that the dsRNA signal is weak, and we have updated the images (and others, when relevant) to better visualize this viral component. Moreover, we have rephrased the sentence to remove the word “strongly”.

      Figure 4A: Western blot needs quantification.

      This is now included in the figure.

      Figure 4B: As in many of the IFAs, the co-localization is only partial. Additionally, the dsRNA is not visible. So the images need to be improved. The colocalization should be quantified across the cell diameter.

      We changed the color and intensity of the dsRNA staining to make it more visible. Mander’s colocalization coefficients have been determined and included in Figures 4B and S5C-D.

      Figure 4C: It is difficult to understand what the +/- is on the blots for the cell extracts and the anti-HA IP samples. It is not described in the figure legend or the text.

      As already indicated on the right of the panel, the +/- indicates whether or not IGF2BP2-HA was overexpressed in the cells. In the revised version, this is clarified in the figure legend.

      Figure 5A: Once again similar to other IFAs, the co-localization is only minimal and thus difficult to claim as "co-localization" is actually happening. It would be good to either improve the images or discuss this observation in the text and reduce the claim of colocalization. Specifically, since the two proteins might be co-localizing in specific regions which would make it a very interesting observation. Also, quantification of co-localizing regions would be beneficial.

      We have included the requested colocalization analysis. We have been cautious to indicate that colocalization was only partial. It is noteworthy that, despite many efforts in the optimization of the cell permeabilization procedure, we noticed that the FISH probes were not very efficient in accessing the perinuclear area of the infected cells, where replication complexes accumulate. In that respect, it is likely that this imaging approach “miss” some of the IGF2BP2/vRNA complexes and that the determined colocalization factor is underestimated. This explains why the confirmation of the vRNA/IGF2BP2 complex with a biochemical approach (Fig. 5B) was very relevant.

      Figure 5D: It is unclear what the blue squares represent. Clearer figure legends and text would be beneficial.

      As stated in the initial figure, the blue squares indicate values obtained with the ZIKV 5’ UTR probe while the green circles involve a 3’ UTR probe. We have further emphasized this information in the figure legend to make it clearer.

      Figure 6B. The graph is missing the data and X-axis label for shIGF2BP2.

      We had initially omitted the values of the conditions with shIGF2BP2 and the replicationdead GAA replicon, since this viral system does not allow accumulation of viral genomes or proteins and was not relevant at the 48h time point. We thought that the inclusion of the shNT/GAA condition was enough an internal negative control of viral replication since values for shIGF2BP2/GAA did not exceed background. Nevertheless, we have now included this condition in the revised figure.

      Figure 7D: It is unclear what the -/+ signs are in the cell extracts and the IP blots. Specifically, since there is an NS5 signal in the (-) lanes.

      As explained above, the +/- indicates whether IGF2BP2-HA was overexpressed. The meaning of these symbols is now further clarified in the figure legend.

      Figure 8C: The circles with the different colors are not clearly described. What does it mean?

      As indicated in the figure (left part), the red and green circles identify the partners of the STRING network whose association with IGF2BP2 is decreased and increased during infection, respectively. We have included this information in the figure legend.

      Figure 9: The electron microscopy to quantify vesicles should be carried out using whole-cell tomography in order to get the most accurate quantification of the vesicles following different treatments. This is because if you only look at one cell profile (slice), the number of vesicles might be less in that profile and more in another below or above it. It is unclear how many cell profiles were used for the quantification and how the calculations were carried out.

      We agree with the reviewer that ideally, one should perform 3D electron tomography to precisely assess the morphology of VPs. Regardless the fact that we do not possess the imaging infrastructure to perform that type of analysis, such an approach would represent a tremendous amount of work if one would like to process at least 200-400 vesicles from > 50 cells and their whole cytoplasm (as we did). Despite not having 3D images, this number of data points is sufficient to see general changes in viral replication vesicle morphology, especially considering that Huh7-Lunet cells are relatively flat cells. (PMID: 32640225; 36700643; 34696522; 31636417). Furthermore, since IGF2BP2 knockdown decreases the abundance of VPs and does not impact their diameter, we believe that the addition of sophisticated 3D analysis would not bring any new and relevant information and that the TEM data stand by themselves for the conclusion we made. A more refined morphological analysis to determine how IGF2BP2 is structurally involved in virus-mediated membrane reorganization could be the focus of a future study.

      We feel that we have already provided sufficient information about the quantification in the Material & Methods section of the first version of the manuscript: “Quantification was performed by systematically surveying cells and evaluating the presence of VPs. Only cells with >2 VPs were considered as positive. For each condition, >50 cells were surveyed over 4 biological replicas. All observed VPs were imaged, and VP diameters were determined using ImageJ by measuring the distance across two axes and averaging”.

      Reviewer #2 (Recommendations For The Authors):

      The inclusion of a control in the knock-down and infection assays with the reporter virus could enhance the validity of the findings. Introducing STAT2 knockdown, a recognized antiviral protein for ZIKV, as a control would provide a valuable benchmark to evaluate the extent of viral enhancement in the experiments. This additional control not only supports the proposed function of LARP1 in virus assembly/release but also strengthens the overall interpretation of the results.

      We agree that adding a positive control could have been relevant for assessing the extent of replication modulation, especially for increases such as that observed with shLARP1. However, finding such control proteins in our system was a challenge. Indeed, STAT2 would not have been a good control for these experiments since we used Huh7.5 cells for the RNAi mini-screening, which do not express a functional RIG-I protein, and generally do not produce type I and III interferons. Thus, STAT2 knockdown is not expected to result in an increase in replication. That said, we feel that it was unnecessary to include a control for replication inhibition here given that only a few statistically reliable candidates we obtained. Instead, we have opted for an extensive secondary validation approach by assessing the proviral role of IGF2BP2 for multiple viruses - DENV1-2-3-4, WNV and SARS-CoV-2, and 3 ZIKV strains in three relevant cell types.

      Additionally, in Figure S4, the authors employ an antibody against NS5 that specifically recognizes ZIKV NS5 but not DENV NS5. Given the objective of highlighting distinctions between these two viruses, it is advisable to use an antibody that detects DENV NS5 as well. This approach would contribute to a more comprehensive comparison, ensuring a balanced representation of both viruses in the experimental analysis.

      We thank the reviewer for this relevant suggestion. We have repeated the coimmunoprecipitation assays using antibodies specific to DENV NS5 (Aithor response image 1). While we specifically pulled down ZIKV NS5 with IGF2BP2-HA as expected, this was not the case for DENV NS5 when using extracts from DENV-infected cells despite our multiple attempts. Indeed, the amount of pulled-down DENV NS5 with IGF2BP2-HA was always comparable to that in the negative control (“empty” pWPI lentivirus-transduced cells, “-“ condition), which corresponds to non-specific binding to the HA-resin. Thus, while the antibody was very efficient at detecting DENV NS5 in the cell extracts, no specific binding between DENV NS5 and IGF2BP2-HA could be evidenced. Consistent with our different replication phenotypes between DENV and ZIKV, this strongly supports that the NS5/IGF2BP2 interaction is specific to ZIKV. The specificity of the IGF2BP2 interaction with ZIKV NS5 compared to DENV NS5 is discussed in the updated manuscript.

      Author response image 1.

      DENV NS5 is not specifically co-immunoprecipitated with IGF2BP2-HA in contrast to ZIKV NS5. Huh7.5 cells stably expressing IGF2BP2-HA (+) and control cells (-) were infected with ZIKV H/PF/2013 at a MOI of 10 or left uninfected. Two days later, cell extracts were prepared and subjected to RNase A treatment (+) or not (-) before anti-HA immunoprecipitations. The resulting complexes were analyzed by western blotting for their abundance in the indicated proteins.

      Reviewer #3 (Recommendations For The Authors):

      (1) Statistical analysis. Please clearly indicate what columns and error bars represent for bar graphs such as those presented in Figures 1A-D and F, Figures 2B-C, and bottom panels in DE, Figure 3, Figure 5B, Figure 6B-C, and Figures 9B-D and F. For instance, the mean of n independent experiments and standard deviation.

      Information about the number of replicates, error bars, and statistical tests has been added for all figures in the legends. 

      (2) What is the scale in the Y-axis of Figure 2C? As shown, it is difficult to know what is the virus titer in knocked-down cells. Please use a linear scale or a log scale.

      This is a linear scale of viral titers, which we have modified to make it clearer for the reader.

      (3) Throughout the manuscript (e.g. Figures 1, 2, and 3) the fold reduction in titer is presented instead of the actual virus titers. I suggest showing the titer as it may be much more informative for the reader.

      We prefer showing the data as fold reduction as they better reflect the IGF2BP2 knockdowninduced phenotypes across the independent biological replicates. Indeed, from one experiment to another, the reference titers in the control condition sometimes varies because of the cell passage or the lentiviral transduction efficiency for instance, especially when low multiplicities of infection are used. However, the reduction phenotype in foldchange observed upon IGF2BP2 knockdown was always consistent regardless of the titer value.  Of note, all considered experiments had reference titers above 105 PFU/mL.

      (4) Is it possible to perform a colocalization analysis of confocal images showing overlapping signals?

      This has been done and the results of these analyses are included in the updated figures 4B, 5A, S4 and S5.

      (5)  Assessing the effect of Atlastin2 knockdown in virus yield and showing coimmunoprecipitation of Atlastin 2 with NS5 can add relevant information.

      As mentioned in the discussion and above, ATL2 was already reported to be required for DENV and ZIKV replication in two independent studies (including one by one of the coauthors)(PMID: 31636417; 31534046). We have not tested whether ATL2 associates with NS5. However, new Fig. S7 of the revised manuscript shows that IGF2BP2/ATL2 is RNAdependent. This suggests that, as initially depicted in our model, IGF2BP2 associates with the ER (and thus, ATL2) after its binding to the viral RNA. Further interrogation into the role of atlastins in the flavivirus replication cycle is the focus of another ongoing IGF2BP2-unrelated study from one of the co-authors which will be reported elsewhere.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors investigate ligand and protein-binding processes in GPCRs (including dimerization) by the multiple walker supervised molecular dynamics method. The paper is interesting and it is very well written.

      Strengths:

      The authors' method is a powerful tool to gain insight into the structural basis for the pharmacology of G protein-coupled receptors.

      Weaknesses:

      Cholesterol may play a fundamental role in GPCR dimerization (as cited by the authors, Prasanna et al, "Cholesterol-Dependent Conformational Plasticity in GPCR Dimers"). Yet they do not use cholesterol in their simulations of the dimerization.

      We thank Reviewer #1 for the positive comment on mwSuMD.

      In the revised version of the manuscript, the section about the A<sub>2A</sub>/D2 receptors dimerization has been removed because largely speculative. We agree that the lack of cholesterol in those simulations added uncertainty to the presented results.

      Reviewer #2 (Public Review):

      The study by Deganutti and co-workers is a methodological report on an adaptive sampling approach, multiple walker supervised molecular dynamics (mwSuMD), which represents an improved version of the previous SuMD.

      Case-studies concern complex conformational transitions in a number of G protein Coupled Receptors (GPCRs) involving long time-scale motions such as binding-unbinding and collective motions of domains or portions. GPCRs are specialized GEFs (guanine nucleotide exchange factors) of heterotrimeric Gα proteins of the Ras GTPase superfamily. They constitute the largest superfamily of membrane proteins and are of central biomedical relevance as privileged targets of currently marketed drugs.

      MwSuMD was exploited to address:

      (1) Binding and unbinding of the arginine-vasopressin (AVP) cyclic peptide agonist to the V2 vasopressin receptor (V2R);

      (2) Molecular recognition of the β2-adrenergic receptor (β2-AR) and heterotrimeric GDPbound Gs protein;

      (3) Molecular recognition of the A1-adenosine receptor (A1R) and palmitoylated and geranylgeranylated membrane-anchored heterotrimeric GDP-bound Gi protein;

      (4) The whole process of GDP release from membrane-anchored heterotrimeric Gs following interaction with the glucagon-like peptide 1 receptor (GLP1R), converted to the active state following interaction with the orthosteric non-peptide agonist danuglipron;

      (5) The heterodimerization of D2 dopamine and A2A adenosine receptors (D2R and A2AR, respectively) and binding to a bi-valent ligand.

      The mwSuMD method is solid and valuable, has wide applicability, and is compatible with the most world-widely used MD engines. It may be of interest to the computational structural biology community.

      The huge amount of high-resolution data on GPCRs makes those systems suitable, although challenging, for method validation and development.

      While the approach is less energy-biased than other enhanced sampling methods, knowledge, at the atomic detail, of binding sites/interfaces and conformational states is needed to define the supervised metrics, the higher the resolution of such metrics is the more accurate the outcome is expected to be. The definition of the metrics is a user- and system-dependent process.

      The too many and ambitious case-studies undermine the accuracy of the output and reduce the important details needed for a methodological report. In some cases, the available CryoEM structures could have been exploited better.

      The most consistent example concerns AVP binding/unbinding to V2R. The consistency with CryoEM data decreases with an increase in the complexity of the simulated process and involved molecular systems (e.g. receptor recognition by membrane-anchored G protein and the process of nucleotide exchange starting from agonist recognition by an inactive-state receptor). The last example, GPCR hetero-dimerization, and binding to a bi-valent ligand, is the most speculative one as it does not rely on high-resolution structural data for metrics supervision.

      We praise Reviewer #2 for the detailed comment on the manuscript. In this revised version, the hetero-dimerization between A<sub>2A</sub>R and D<sub>2</sub>R has been removed. Also, results about GPCR case studies other than GLP-1R have been reduced and downgraded in importance to focus on the fundamental key points of the adaptive sampling method.  We agree that the consistency with cryoEM data tends to decrease with an increase in the complexity of the simulated process and involved molecular systems. While it is possible to approximate cryoEM results  our unbiased adaptive sampling technique finds its most interesting application in mechanistically unknown out-of-equilibrium processes rather than reproducing known experimental data perfectly. The simulated case studies we present showcase the versatility, speed and consistency of our adaptive method to explore energetically unbiased transitions.

      Reviewer #3 (Public Review):

      Summary:

      In the present work, Deganutti et al. report a structural study on GPCR functional dynamics using a computational approach called supervised molecular dynamics.

      Strengths:

      The study has the potential to provide novel insight into GPCR functionality. An example is the interaction between loops of GPCR and G proteins, which are not resolved experimentally, or the interaction between D344 and R385 identified during the Gs coupling by GLP-1R. However, validation of the findings, even computationally through for instance in silico mutagenesis study, is advisable.

      Weaknesses:

      In its current form, the manuscript seems immature and in particular, the described results grasp only the surface of the complex molecular mechanisms underlying GPCR activation. No significant advance of the existing structural data on GPCR and GPCR/G protein coupling is provided. Most of the results are a reproduction of the previously reported structures.

      We thank Reviewer #3 for the positive comment on the work. The revised manuscript focuses more on the GLP-1R and Gs case studies. We believe it addresses the weaknesses raised by showing the behaviour of key structural motifs and providing new hypotheses about GDP release.  

      Reviewer #2 (Recommendations For The Authors):

      In this methodological report, Deganutti and co-workers propose an improved version of supervised molecular dynamics (SuMD), named multiple walker SuMD (mwSuMD). Such an adaptive sampling method was challenged in simulations of complex transitions involving GPCRs, which are out of reach by classical MD.

      Although less energy-biased than other enhanced sampling methods, mwSuMD requires knowledge of the atomic detail of the ligand-protein or protein-protein binding site/interfaces and the structural hallmarks of the states whose conversion the method is going to address. Such knowledge is, indeed, necessary to define the supervised metrics (e.g. distances, RMSD, etc), which is a user- and system-dependent process.

      We classify mwSuMD as an adaptive, rather than enhanced, sampling method as it does not use any energy bias. We agree with the Reviewer that some knowledge of the system is required to productively set up the simulations, but this is the case for almost any MD advanced methods.  

      The text requires improvement in the essential methodological details and cleaning of those parts is not properly instrumental in method validation.

      While attempting to prove the widest possible applicability of the method, the authors exaggerated the number of examples, which, in spite of the increasing complexity were only summarily described. Please, limit the case studies to AVP binding/unbinding to V2R and the whole process of GDP release from membrane-anchored Gs following activation of GLP1R by danuglipron. The latter case, indeed, involves small ligand binding (danuglipron), small ligand dissociation (GDP), receptor activation, and activated receptor binding to membraneanchored G protein and G protein conformational transition instrumental to nucleotide depletion, which is already too much. In this framework, the cases of Gs-β2AR and Gi-A2R recognition are redundant. Most importantly, the case of D2R-A2AR heterodimerization and binding to a bi-valent ligand must be eliminated. The reason is that the case is not entirely based on the mwSuMD and the biased protein-protein interface does not rely on highresolution data (i.e. no structural model of D2R-A2AR dimer has been determined so far). Last but not least, the high intrinsic flexibility of the bi-valent ligand adds further indetermination to the computational experiment. Being too speculative, the case-study does not serve to model validation.

      We thank the Reviewer for the suggestion. In the current revised form, the manuscript focuses on AVP binding/unbinding to V2R and the GLP-1R activation, Gs recognition and GDP release.

      While eliminating the three case studies mentioned above, the remaining ones should be described more extensively and clearly, highlighting the most productive setup for each system. Incidentally, listing the performance parameters (e.g. distribution mode and minimum RMSD) of each simulation setting in Table S1 is worth doing.

      More accuracy in the methodological description is needed.

      As for the supervised metrics, the rationale behind the choice of a particular index and whether it is the outcome of a number of trials must be declared and the selected indices must be better defined. Here there are a few examples.

      AVP-V2R case. It is not clear why the AVP centroids were computed on residues C1-Q4 (I suppose the Cα-atoms) and not on the Cα-atoms of the whole cyclic part (C1-C6). Along the same line, the choice of the Cα-atoms of four amino acid residues to compute the receptor binding-site centroids requires justification.

      We have amended the text to clarify that all the heavy atoms of AVP residues C1-Q4, which are anticipated to bind deep into V<sub>2</sub>R, were considered alongside V<sub>2</sub>R residues part of the peptide binding site (Cα atoms only). From our experience, the choice of including side chains or not for the definition of centroids usually does not affect the supervision output. It should only affect the output of mwSuMD simulations based on the RMSD which considers the specific relative distance from the reference. However, a benchmark of the differences produced by divergent selections is beyond the scope of the present work.

      GLP1R case. The statement: "Since the opening of TM1-ECL1 was observed in two replicas out of four, we placed the ligand in a favorable position for crossing that region of GLP-1R" is rather weak as a strategy to manually (?) define the input position of the ligand.

      As stated in the manuscript, placing the agonist in that position was driven by preliminary 8 μs of classic MD simulations that pointed out the possible path for binding.  We agree with the Reviewer that there is still some degree of arbitrarity in it and for this reason, we have not presented structural details of the F06882961 binding path.

      As for the supervised metrics, what does it mean "the distance between the ligand and GLP-1R TM7 residues L3797.34-F3817.36"? Was the distance computed between ligand and L379-F381 centroids? Also: "In the supervised stages, the distance between residues M386-L394 Gas of helix 5 (α5) and the GLP-1R intracellular residues R1762.46, R3486.37, S3526.41, and N4057.60 was monitored" was it an inter-centroid distance? Furthermore, "supervising the distance between AHD residues G70-R199 Gas and K300-L394Gas" was it the distance between the centroid of the AHD and the centroid of the C-terminal half of the Ras-like domain? In general, when more than two atoms are involved in distance calculation, please, specify if the distance is inter-centroid.

      Also: "During the third phase, the RMSD of PF06882961, as well as the RMSD of ECL3 (residues A3686.57-T3787.33, Ca atoms), were supervised" was the RMSD computed without superimposing the ligand to estimate its roto-translations?

      We have added details about the selections used for computing centroids throughout the methods section. For example, all the heavy atoms of F06882961 and the Ca atoms of L379-F381 were considered. RMSD values during GLP-1R activation were computed after superimposition on TM2, ECL1, and TM3 residues 170-240 (Ca atoms). This now has been specified in the text.

      The authors considered the 7LCJ GLP1R-danuglipron complex as a fully active reference state instead of considering the receptor from a ternary complex with Gs. The ternary complex (7LCI) was indeed considered as a reference only in simulations of receptor-G protein recognition. 

      7LCJ and 7LCI are both fully active states. The main difference is that in 7LCJ, Gs coordinates were not deposited. Indeed, their RMSD computed on the TMD Ca atoms and F06882961 is 0.63 Å and 0.54 Å, respectively.

      Most importantly, the ternary complex chosen by the authors is not adequate as a reference for simulating the "opening" of the AHD because it bears a miniGs, hence, missing the AHD. In that framework, such an opening is rather vague and was not properly supervised by mwSuMD. The authors must repeat metrics supervisions by using, as a reference, the 6X1A ternary complex, which bears a displaced AHD. This would likely lead to a different path of GDP release.

      To the best of our knowledge, there is no evidence that a specific open conformation of the AHD is linked to GDP release. In support, we note that in GPCR ternary complexes, the AHD is usually not modelled because of its high flexibility. The only body of evidence we are aware of is that AHD must open up to allow GDP release. For this reason,  we decided to supervise the distance between AHD and the Ras domain without using a reference.

      In the statement: "The AHD opening was simulated starting from the best GLP-1R:Gs binding mwSuMD replica" the definition "best binding" requires clarification.

      This has been amended, specifying that Replica 2 was considered the “best replica” due to the closed deviation to the cryoEM structure.

      As for the case study on β2-AR-Gs recognition, I strongly suggest to eliminate it. However, I'd like to make some comments. The sentence: "the adrenergic β2 receptor (b2 AR) in an intermediate active state was downloaded from GPCRdb (https://gpcrdb.org/)" is vague as it does not indicate what intermediate active state structure was used. Since the goal of the case study was to probe the method in simulating receptor-G protein binding, it would have been better to start with a fully active state of the receptor like the 4LDO structure, employed by the authors only to extract epinephrine.

      mwSuMD is designed to provide insights into structural transitions. We started from an intermediate active state of β2-AR in complex with adrenaline because resembling the most populated state stabilised by a full agonist according to NMR studies (DOI:10.1016/j.cell.2015.08.045); the fully-active β2-AR conformation is stabilized only after Gs binding. However, following the Reviewer’s suggestion, we have reduced the presented results for the β2-AR-Gs recognition.

      Also in this case, it is not clear if the supervised receptor-G protein distance is between the centroid of the whole 7-helix bundle and the centroid of Gs α5. It is not clear why the TM6 RMSD concerned only the cytosolic end of the helix and did not include the kink region. With that selection, to estimate the outward displacement, RMSD should have been computed without superimposing the considered portion (once all remaining Cα-atoms of the receptors are superimposed).

      As the Reviewer pointed out above, some knowledge of the system is required to set up mwSuMD. Using more generic metrics as we did in this case, like the distance between the whole TMD and Gs α5 represents a general approach applicable to other GPCRs, that should allow orthogonal metrics to evolve independently from the supervision.

      As now specified in the text, the superimposition for RMSD calculation was performed on residues 40 to 140 Ca atoms, hence not considering TM6.

      As for the A1R-Gi recognition, as already stated, I strongly suggest eliminating it. However, I'd like to add some comments. I would discourage the employment of an AlphaFold model for simulations deputed to model validation in general and, in particular, when highresolution structures are available. In this case, the authors would have used the 1GP2 structure of heterotrimeric Gi no matter if from the rat species.

      Following the Reviewer’s suggestion, we have dramatically reduced the results presented for the A1R-Gi recognition. We considered 1GP2 for the simulations but H5 lacks the Cterminal six residues and therefore some extent of modelling was still necessary. However, we take the Reviewer’s comment on board and consider it for future work.

      Also, the palmitoylation and geranylgeranylation process is quite tortuous and it is not clear why the NVT ensemble was employed in the second stage of equilibration. This is reflected also on the GLP1R case study.

      We have amended the text to clarify this passage. The second NVT stage is required for stabilizing the G protein and its orientation in the simulation box. The figure below shows that a plateau of the Ca RMSD during the NVT step was reached after 700 ns for both Gi (black) and Gs (orange).

      Author response image 1.

      Here, it is not clear if the RMSD of α5 of Gi was computed with or without superposition.

      The RMSD of α5  was computed after superimposing on A<sub>1</sub>R residues 40-140 Ca atoms (the less flexible region of the receptor). We have now amended the text to report this information. 

      Reviewer #3 (Recommendations For The Authors):  

      Points to address:

      (1) Root Mean Square Deviation (RMSD) data are often reported as minimum values. It would be useful to provide the average value along the stable part of the trajectories. From the plots in Figure 2ab, it seems that the minimum values reported in the paper are very far from the average ones and thus represent special cases that are seldom reached during simulation. The authors should clarify this point;

      For the revised manuscript, we moved Figure 2 to the supplementary material and added average RMSD values for the most notable replicas in Figures 4e and S8a,b. As a reference, in the text, we now report RMSDs from our previous classic MD simulations (https://doi.org/10.1038/s41467-021-27760-0) of Gs:GLP-1R cryoEM structure (G<sub>α</sub> = 6.18 ± 2.40 Å; G<sub>β</sub> \= 7.22 ± 3.12 Å; G<sub>γ</sub> = 9.30 ± 3.65 Å) which show how flexible G proteins bound to GPCRs are and give better context to the RMSD values we measured during mwSuMD simulations.

      (2) The RMSD values reported in the paper always refer to single molecules or proteins. It would be useful to also report the RMSD computed over the whole complexes (ligand/GPCR or GPCR/G protein). It would provide a better metric for understanding the general distance between the results and the reference experimental structures;

      We have now removed the results sections for A<sub>1</sub>R and β<sub>2</sub> AR to focus on GLP-1R, whose RMSD is analyzed in detail in Figures 2, 3 and 4.

      (3) A number of computational works investigated the GPCR/G protein interaction and these studies should be cited and discussed. Examples are the works from Mafi et al. 2023 (doi: 10.1038/s41557-023-01238-6), Fleetwood et al. 2020 (doi: 10.1021/acs.biochem.9b00842), Calderon et al. 2023 and 2024 (doi: 10.1021/acs.jcim.3c00805 and doi: 10.1021/acs.jcim.3c01574), Maria-Solano and Choi 2023 (doi: 10.7554/eLife.90773.1), Mitrovic et al. 2023 (doi: 10.1021/acs.jpcb.3c04897), and D'Amore et al. 2023 (doi: 10.1101/2023.09.14.557711). Many of these works focused on the activation of B2AR and the interaction with its G protein. In addition, Maria-Solano and Choi 2023 and D'Amore et al. 2023 also characterized the rotation of TM6 during the A1R and A2AR activation. Therefore, the claim "To the best of our knowledge, this is the first time an MD simulation captures the TM6 rotation upon receptor activation as results reported so far are largely limited to the TM6 opening and kinking55." is untimely;

      We thank the Reviewer for the suggested references. We have added them to the introduction as examples of energy-biased (Calderon et al. 2023 and 2024, Maria-Solano and Choi, Mitrovic et al., D'Amore et al) or adaptive sampling (Fleetwood et al) approaches to GPCR. Since the above articles focus on β<sub>2</sub>  AR and A<sub>1</sub>R, we do not discuss them in detail because the results sections for A<sub>1</sub>R and b<sub>2</sub> AR have been drastically reduced in the manuscript.

      We note that among the suggested references, only Mafi et al report about a simulated G protein (in a pre-formed complex) and none of the work sampled TM6 rotation without input of energy. However, we have removed the claim from the text.   

      (4) In the discussion section, the authors claim that a distance-based approach can be employed when the structural data of the endpoints is limited. However, the results obtained from the distance-based protocol during the validation of the approach, which was done using V2R as a reference, are unsatisfying, as acknowledged by the authors themselves. For instance, the RMSD mode value reported for the AVP C alpha atoms with respect to 7DW9 is high, 0.7 nm, whereas the minimum value is 0.38 nm. In addition, some side chains are not oriented in the experimental conformation and might have a different interaction pattern with the receptor if compared with the experimental structure. Considering that in this case the endpoint is known, it is plausible that the performance of the method would degrade even further when data about the target structure is limited. In a real case scenario, the ligand binding mode is unknown and in such a case no RMSD matrix can be used. This represents the major concern of this study that is no prediction is provided, but only - rather inaccurate - reproduction of the known structural data;

      The goal of the first part of the work was to compare mwSuMD to SuMD to justify its application on ligand binding using a challenging case study like vasopressin. The general validation of the parent method SuMD as a predictive tool for ligand binding mode has been extensively reported over the years (a few examples: https://doi.org/10.1021/ci400766b ; https://doi.org/10.1021/acs.jcim.5b00702 ; https://doi.org/10.1038/s41598-020-77700-z) and fell beyond the scope of this work. 

      (5) In the discussion, the authors write "A complete characterization of the possible interfaces between GPCR monomers, which falls beyond the goal of the present work, should be achieved by preparing different initial unbound states characterized by divergent relative orientations between monomers to dynamically dock." It would be useful for the reader to refer to and cite here advanced computational approaches that allow a comprehensive sampling of GPCR dimerization independently from the starting conformation of the receptors. One example is coarse-grained metadynamics as shown in doi: 10.1038/s41467-023-42082-z;

      The A<sub>2A</sub/D<sub>2</sub receptors dimerization has been removed from the manuscript. 

      (6) In many cases, it is not reported how residues missing from the experimental structures used to model the proteins were reconstructed. This information is important, considering that the authors comment on the results of their calculations on addressing these regions, such as in the case of B2AR. Furthermore, the authors did not report how their initial models were validated. The authors should also explain why they did not model the IC loops of A2AR and D2R;

      In the current version of the manuscript, for V2R ECL2 and GLP-1R, we specify that we produced 10 solutions with Modeller and considered the best one in terms of the DOPE score. 

      The only receptor model used,  β<sub>2</sub> AR, is now presented as preliminary data focusing on Gs and avoiding any structural detail of the Gs recognition. 

      As reported above the A2A-D2 dimerization has been removed from the manuscript.

      (7) In several cases, the authors state that residues never investigated before play an important role in the interaction between different proteins. An example is provided on page 6 for the B2AR/G protein association. Since this claim is quite significant, it would benefit from validation, at least for further calculations such as in silico mutagenesis studies. Another example is at the end of page 10 where the authors report a hidden interaction between D344 and R385 that is pivotal for Gs coupling by GLP-1R. Is there other evidence supporting this result (previously reported literature data, conservation rate of these residues, etc.)?;

      We have removed the supplementary table reporting B2AR/G protein interactions to reduce speculations and added a reference that reports GLP-1 EC50 reduction upon mutation of position 344 to Ala (https://doi.org/10.1021/acscentsci.3c00063).

      (8) The authors should provide a deeper discussion about the conformational rearrangement of GPCR and G protein during the coupling. In detail, the conformational changes of microswitich amino acids of GPCR (e.g., PIF, NPxxY, inactivating ionic lock) and alpha helix 5 of G proteins should be discussed in relation to the literature data and experimental structures;

      We have removed the A1R and b2 AR results to focus on GLP-1R. Key structural motifs in the polar central network and TM6 kink are analyzed more in detail in Figure 3.

      (9) The chronology of the conformational changes of GLP-1R is arbitrarily chosen. During the simulation, the RMSD values reported in Fig. 3 are high and do not demonstrate the full accomplishment of the simulation of the activation process of the receptor;

      We agree with the Reviewer that the GLP-1R inactive to active transition was not fully accomplished, compared to other work on class A GPCRs.  Unlike class A, class B GPCRs represent a challenging system to work with in silico because inactive starting conformations (e.g 6LN2) are extremely distant from the active one (e.g 7LCJ, 7LCI or 6X18), as demonstrated in Figure S6 for GLP-1R. Here we report the first attempt to model a class B GPCR activation mechanism starting from the inactive state, and even if not fully achieved, we believe it represents state-of-the-art simulations for this class of receptors.

      (10) It would be helpful for the reader not familiar with the employed technique that the authors explain in one sentence in the main text the pros and cons of using multiple walkers instead of single walker SuMD;

      We thank the Reviewer for the excellent suggestion. In the Discussion, we have now commented that: “more extensive sampling obtainable by seeding multiple parallel short simulations instead of a single simulation for batch”, while in the Methods we explain that “mwSuMD is designed to increase the sampling from a specific configuration by seeding user-decided parallel replicas (walkers) rather than one short simulation as per SuMD. Since one replica for each batch of walkers is always considered productive, mwSuMD gives more control than SuMD on the total wall-clock time used for a simulation. On the flip side, mwSuMD requires multiple GPUs to be the most effective, although any multi-threaded GPU can run more walkers on the same hardware keeping the sampling variety.”.

      Minor points to address:

      (11) Page 3: the following sentence is duplicated (also found on page 2) "GPCRs preferentially couple to very few G proteins out of 23 possible counterparts";

      (12) Page 20: Figure S13 refers to the QM validation of PF06882961 torsional angle, not to the image of the receptor conformational changes, which is instead Figure S14 (please correct figure caption).

      We thank the Reviewer for the accurate reading of the manuscript. These typos have been corrected.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This important study convincingly shows that the less common D-serine stereoisomer is transported in the kidney by the neutral amino acid transporter ASCT2 and that it is a noncanonical substrate for sodium-coupled monocarboxylate transporter SMCTs. With a multihierarchical approach, this important study further shows that Ischemia-Reperfusion Injury in the kidney causes a specific increment in renal reabsorption carried out, in part, by ASCT2.

      Public Reviews:

      Reviewer #1 (Public Review):

      Most amino acids are stereoisomers in the L-enantiomer, but natural D-serine has also been detected in mammals and its levels shown to be connected to a number of different pathologies. Here, the authors convincingly show that D-serine is transported in the kidney by the neutral amino acid transporter ASCT2 and as a non-canonical substrate for the sodium-coupled monocarboxylate transporter SMCTs. Although both transport D-serine, this important study further shows in a mouse model for acute kidney injury that ASCT2 has the dominant role.

      Strengths:

      The paper combines proteomics, animal models, ex vivo transport analyses, and in vitro transport assays using purified components. The exhaustive methods employed provide compelling evidence that both transporters can translocate D-serine in the kidney.

      Weakness:

      In the model for acute kidney injury, the SMCTs proteins were not showing a significant change in expression levels and were rather analysed based on other, circumstantial evidence. Although its clear SMCTs can transport D-serine its physiological role is less obvious compared to ASCT2.

      We greatly value the reviewer's efforts and feedback in reviewing our manuscript. We acknowledge the reviewer's observation that the changes indicated by our proteomic results are not markedly pronounced. To reinforce our findings, we have incorporated an analysis of gene alterations at the single-cell level (snRNA-seq) from the publicly accessible IRI mouse model data (Figure supplement 7). The snRNA-seq data align with our proteomic data in terms of the general trend of gene/protein alterations, but reveal more substantial changes in both ASCT2 and SMCTs. These discrepancies might stem from the different quantification methods used, suggesting a possible underestimation in our label-free proteomic quantification. The differences we see between the functional changes in transporters and their quantification in proteomics can be explained by the unique challenges posed by membrane proteins. Post-translational modifications and the complex nature of multiple transmembrane domains often impact the accurate measurement of these proteins in proteomic studies. This complexity can lead to a mismatch between the actual functional changes occurring in the transporters and their perceived abundance or alterations as detected by proteomic methods (Figure 4A) (Schey KL et al. Biochemistry 2015, doi: 10.1021/bi301604j). However, this label-free quantitative proteomics approach is well-suited for our study, given its screening efficiency, compatibility with animal models, and the absence of a labeling requirement. We may consider incorporating alternative quantitative proteomic methods in future for a more thorough comparison. We have included these considerations in lines 351-356 of the revised manuscript.

      Manuscript lines 351-356

      “When evaluating the extent of gene/protein alterations between the control and IRI conditions, we observed that the gene alterations of both Asct2 and Smcts, as revealed by snRNAsequencing, are more pronounced than the protein alteration ratios obtained from proteomics. This discrepancy may stem from difficulty in the quantification method, especially for membrane transport proteins in label-free quantitative proteomics.”

      Regarding the roles of ASCT2 and SMCTs in renal D-serine transport, snRNA-seq showed that ASCT2 expression in the controls is less than 10% of the cell population. We suggest that ASCT2 contributes to D-serine reabsorption because of its high affinity and SMCTs (SMCT1 and SMCT2) would play a role in D-serine reabsorption in the cells without ASCT2 expression. In addition, we included other factors (the turnover rate and the presence of local canonical substrates) that may determine the capability of D-serine reabsorption. We have included this suggestion in the Discussion lines 386-404.

      Manuscript lines 386-404

      “Kinetics analysis of D-serine transport revealed the high affinity by ASCT2 (Km 167 µM) (Foster et al., 2016) and low affinity by SMCT1 (Km 3.39 mM; Figure 5E). In addition to transport affinity, the expression levels and co-localization of multiple transporters within the same cells are critical for elucidating the physiological roles of transporters or transport systems (Sakaguchi et al., 2024). In our proteome data, the chromatogram intensities of Smct1 (2.9 x 109 AU) and Smct2 (1.6 x 108 AU) were significantly higher than that of Asct2 (1.5 x 107 AU) in control mice (Table 1: abundance in Sham). While direct intensity comparisons between different proteins in mass spectrometry analyses are not precise, they can provide a general indication of relative protein amounts. This finding aligns with the snRNA-seq data, where Asct2 expression was found to be minimal, present in less than 10% of cell populations under both control and IRI conditions, suggesting that many cells do not express Asct2. Conversely, Smct1 and Smct2 show high and ubiquitous expression in control conditions, but their levels are markedly reduced in IRI conditions (Figure supplement 7). Our ex vivo assays demonstrate that both ASCT2 and SMCTs mediate D-serine transport (Figure 7B). Consequently, Asct2 may contribute to D-serine reabsorption due to its high affinity, whereas Smcts, owing to their abundance, particularly in cells lacking Asct2, likely play a significant role in D-serine reabsorption. Moreover, factors such as transport turnover rate (Kcat) and the presence of local canonical substrates are also vital in defining the overall contribution of Dserine transport systems.”

      Reviewer #2 (Public Review):

      Summary:

      The manuscript "A multi-hierarchical approach reveals D-1 serine as a hidden substrate of sodium-coupled monocarboxylate transporters" by Wiriyasermkul et al. is a resubmission of a manuscript, which focused first on the proteomic analysis of apical membrane isolated from mouse kidney with early Ischemia-Reperfusion Injury (IRI), a well-known acute kidney injury (AKI) model. In the second part, the transport of D-serine by Asct2, Smct1, and Smct2 has been characterized in detail in different model systems, such as transfected cells and proteoliposomes.

      Strengths:

      A major problem with the first submission was the explanation of the link between the two parts of the manuscript: it was not very clear why the focus on Asct2, Smct1, and Smct2 was a consequence of the proteomic analysis. In the present version of the manuscript, the authors have focused on the expression of membrane transporters in the proteome analysis, thus making the reason for studying Asct2, Smct1, and Smct2 transporters more clear. In addition, the authors used 2D-HPLC to measure plasma and urinary enantiomers of 20 amino acids in plasma and urine samples from sham and Ischemia-Reperfusion Injury (IRI) mice. The results of this analysis demonstrated the value of D-serine as a potential marker of renal injury. These changes have greatly improved the manuscript and made it more convincing.

      We deeply appreciate the reviewer’s comments on the manuscript. We have responded to the recommendations one by one in the later section.

      Reviewer #3 (Public Review):

      Summary:

      The main objective of this work has been to delve into the mechanisms underlying the increment of D-serine in serum, as a marker of renal injury.

      Strengths:

      With a multi-hierarchical approach, the work shows that Ischemia-Reperfusion Injury in the kidney causes a specific increment in renal reabsorption of D-serine that, at least in part, is due to the increased expression of the apical transporter ASCT2. In this way, the authors revealed that SMCT1 also transports D-serine.

      The experimental approach and the identification of D-serine as a new substrate for SMCT1 merit publication in Elife.

      The manuscript also supports that increased expression of ASCT2, even together with the parallel decreased expression of SMCT1, in renal proximal tubules underlies the increased reabsorption of D-serine responsible for the increment of this enantiomer in serum in a murine model of Ischemia-Reperfusion Injury.

      Weaknesses:

      Remains to be clarified whether ASCT2 has substantial stereospecificity in favor of D- versus L-serine to sustain a ~10-fold decrease in the ratio D-serine/L-serine in the urine of mice under Ischemia-Reperfusion Injury (IRI).

      It is not clear how the increment in the expression of ASCT2, in parallel with the decreased expression of SMCT1, results in increased renal reabsorption of D-serine in IRI.

      We thoughtfully appreciate the reviewer’s comment on the manuscript. Considering the alteration of D-/L-serine ratios, there are several factors including protein expression levels at both apical and basolateral sides, properties of the transporters (e.g. transport affinities, substrate stereoselectivities), and the expression of DAAO (D-amino acid oxidase) which selectively degrades D-amino acids. Moreover, the mechanism becomes more complicated when the transport systems of L- and D-enantiomers are different and have distinct stereoselectivities as in the case of serine. Future studies are required to complete the mechanism. However, we would like to explore the mechanism based on the current knowledge.

      From this study, we identified ASCT2 and SMCTs (SMCT1 and SMCT2) as D-serine transport systems. We showed that SMCT1 prefers D-serine. Although we did not analyze ASCT2 stereoselectivity, based on the previous studies, ASCT2 recognizes both D- and Lserine with high affinities and slightly prefers L-enantiomer (Km of 18.4 µM for L-serine in oocyte expression system (Utsunomiya-Tate et al. J Biol Chem 1996) and 167 µM for Dserine in oocyte expression system (Foster et al. Plos ONE 2016), and the IC50 of 0.7 mM for L-serine and 4.9 mM for D-serine (in HEK293 expression systems, Foster et al. PLOS ONE 2016). The proteomics showed an increase of ASCT2 (1.6-fold increase) and a decrease of SMCTs (1.7-fold decrease in SMCT1, and 1.3-fold decrease in SMCT2) in IRI conditions. The table below summarizes D-serine transport by ASCT2 and SMCTs.

      In the case of L-serine, ASCT2 and B0ATs (in particular B0AT3) have been revealed as L-serine transport systems in the kidneys (Bröer et al. Physiol Rev 2008; Singer et al. J Biol Chem 2009). Proteomics showed that B0ATs have higher expression levels than ASCT2 supporting the idea that B0ATs are the main L-serine transport system (Table S1: Abundance of B0AT1 = 1.34E+09, B0AT3 = 2.13E+08, ASCT2 = 1.46E+07). In IRI conditions, B0AT3 decreased 1.8 fold and B0AT1 decreased 1.1 fold. From these results, we included the contribution of B0ATs in L-serine transport in Author response table 1.

      Author response table 1.

      Taken together, we suggest that high ratios of D-/L-serine in IRI conditions are a combinational result of 1) increase of D-serine reabsorption by ASCT2 enhancement and SMCTs reduction and 2) decrease of L-serine reabsorption by B0ATs. We have included this suggestion in the Discussion lines 438-451.

      Manuscript lines 438-451

      “The enantiomeric profiles of serine revealed distinct plasma D/L-serine ratio, with low rations in the normal control but elevated ratios in IRI, despite the weak stereoselectivity of ASCT2 (Figure 1B). This observation suggested differential renal handling of D-serine compared to L-serine. While we identified SMCTs as a D-serine transport system, it has been reported that L-serine reabsorption is mediated by B0AT3 (Singer et al., 2009). We propose that the alterations in plasma and urinary D/L-serine ratios are the combined outcomes of: 1) transport systems for L-serine, and 2) transport systems for D-serine. In normal kidneys, the low plasma D/L-serine ratios could result from the efficient reabsorption of L-serine by B0AT3, coupled with the DAAO activity that degrades intracellular D-serine reabsorbed by SMCTs. In IRI conditions, our enantiomeric amino acid profiling revealed low plasma L-serine and high urinary L-serine (Figure supplements 1B, 2B). Additionally, the proteomic analysis indicated a reduction in B0AT3 levels (4h IRI/sham = 0.56 fold; 8h IRI/sham = 0.65 fold; Table S1). These observations suggest that the low L-serine reabsorption in IRI is a result of B0AT3 reduction.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      This is a thorough study that was reviewed previously under the old system. I think the authors have strengthened their findings and have no further suggestions.

      We appreciate reviewer 1 for his/her effort and comments, which greatly contributed to improving this manuscript.

      Reviewer #2 (Recommendations For The Authors):

      The experiments seem to me to have been well performed and the data are readily available.

      Weaknesses:

      More than weakness I would speak of discussion points: I have a few suggestions that may help to make the paper more accessible to a general audience.

      (1) In the Introduction, when the authors introduce the term "micromolecules", it would be beneficial to provide a precise definition or clarification of what they mean by this term. Adding a brief explanation may help the reader to better understand the context.

      Following the reviewer’s comment, we have included the explanation of the micromolecule and membrane transport proteins in lines 41-43.

      Manuscript lines 41-43

      “Membrane transport proteins function to transport micromolecules such as nutrients, ions, and metabolites across membranes, thereby playing a pivotal role in the regulation of micromolecular homeostasis.”

      (2) In line 91, I suggest specifying that this is a renal IRI model.

      Following the reviewer’s comment, we have added the information that it is a renal IRI model of AKI (lines 90-92).

      Manuscript lines 90-92

      “We applied 2D-HPLC to quantify the plasma and urinary enantiomers of 20 amino acids of renal ischemia-reperfusion injury (IRI) mice, a model of AKI and AKI-to-CKD transition (Sasabe et al., 2014; Fu et al., 2018).”

      (3) Lines 167-168 state that Asct2 is localised to the apical side of the renal proximal tubules. Is there any expression of Asct2 in other nephron segments?

      To our knowledge, there is no report of ASCT2 expression in other nephron segments. Our immunofluorescent data of the ASCT2 staining in the whole kidney at the low magnification and another region of Figure 3 (below) as well as immunohistochemistry from Human Protein Atlas (update: Jun 9th, 2023) did not show a strong signal of ASCT2 expression in other regions besides the proximal tubules. Thus, we conclude that ASCT2 is mainly expressed in proximal tubules, but not in other nephron regions.

      Author response image 1.

      (4) Lines 225-226: Have the authors expressed the candidate genes in HEK293 cells with ASCT2 knockdown?

      This experiment was done by expressing the candidate genes in the presence of endogenous ASCT2. We have added the information in lines 225-227 to emphasize this process.

      Manuscript lines 225-227

      “Based on this finding, we utilized cell growth determination assay as the screening method even in the presence of endogenous ASCT2 expression. HEK293 cells were transfected with human candidate genes without ASCT2 knockdown.”

      (5) Lines 254-255: why was D-serine transport enhanced by ASCT2 knockdown in FlpInTRSMCT1 or 2 cells?

      We appreciate the reviewer to point out this data. We apologize for causing the confusion in the text. The total amount of D-serine uptake in the cells did not enhance but the net uptake (uptake subtracted from the background) was increased. This enhancement is a result of the lower background by ASCT2 knockdown. We have revised the texts and explained this result in more detail (lines 256-258).

      Manuscript lines 256-258

      “In the cells with ASCT2 knockdown, the background level was lower, thereby enhancing the D-[3H]serine transport contributed by both SMCT1 and SMCT2 (the net uptake after subtracted with background) (Figure 5C).”

      (6) Line 265: The low affinity of SMCT1 for D-serine alone makes it an unlikely transporter for urinary D-serine.

      We admitted the reviewer’s concern about the low affinity of SMCT1. However, Km at mM range is widely accepted for several low-affinity amino acid transporters such as proton-coupled amino acid transporter PAT1 (Km = 2 – 5 mM; Miyauchi et al. Biochem J 2010), cationic amino acid transporter CAT2A (Km = 3 – 4 mM; Closs et al. Biochem 1997), and large-neutral amino acid transporter LAT4 (Km = 17 mM; Bodoy et al. J Biol Chem 2005). In the kidneys, many compounds are well-known to be reabsorbed by the low-affinity but high-capacity (high-expression) transporters. Similarly, D-serine was reported to be reabsorbed by the low-affinity transporter (Kragh-Hansen and Sheikh, J Physiol 1984; Shimomura et al. BBA 1988; Silbernagl et al. Am J Physiol Renal Physiol 1999). Moreover, amino acid profile showed urinary D-serine in the range of 100 – 200 µM (Figure supplement 2). This concentration range could drive SMCT1 function (Figure 5). Combined with the high and ubiquitous expression of SMCT1, we propose that SMCT1 is a low-affinity but highcapacity D-serine transporter in the kidneys.

      snRNA-seq is a method that can directly compare the expression levels between different genes within the same cells. From Figure supplement 7, expression of SMCT1 is much more abundant than ASCT2. ASCT2 was presented in less than 10% of cell population. It is possible that 90% of the cells that do not express ASCT2 use SMCT1 to reabsorb Dserine.

      We have revised the Discussion regarding this comment (lines 386-404).

      Manuscript lines 386-404

      “Kinetics analysis of D-serine transport revealed the high affinity by ASCT2 (Km 167 µM) (Foster et al., 2016) and low affinity by SMCT1 (Km 3.39 mM; Figure 5E). In addition to transport affinity, the expression levels and co-localization of multiple transporters within the same cells are critical for elucidating the physiological roles of transporters or transport systems (Sakaguchi et al., 2024). In our proteome data, the chromatogram intensities of Smct1 (2.9 x 109 AU) and Smct2 (1.6 x 108 AU) were significantly higher than that of Asct2 (1.5 x 107 AU) in the control mice (Table 1: abundance in Sham). While direct intensity comparisons between different proteins in mass spectrometry analyses are not precise, they can provide a general indication of relative protein amounts. This finding aligns with the snRNA-seq data, where Asct2 expression was found to be minimal, present in less than 10% of cell populations under both control and IRI conditions, suggesting that many cells do not express Asct2. Conversely, Smct1 and Smct2 show high and ubiquitous expression in control conditions, but their levels are markedly reduced in IRI conditions (Figure supplement 7). Our ex vivo assays demonstrate that both ASCT2 and SMCTs mediate D-serine transport (Figure 7B). Consequently, Asct2 may contribute to D-serine reabsorption due to its high affinity, whereas Smcts, owing to their abundance, particularly in cells lacking Asct2, likely play a significant role in D-serine reabsorption. Moreover, factors such as transport turnover rate (Kcat) and the presence of local canonical substrates are also vital in defining the overall contribution of Dserine transport systems.”

      (7) Line 316: The authors state that there is a high tubular D-serine reabsorption in IRI and in line 424 there is an inactivation of DAAO during the pathology. This suggests that there is a reabsorption of D-serine mediated by a transport system in the basolateral membrane domain of proximal tubular cells. Do the authors have any information about this transporter?

      We agree with the reviewer that transporters at the basolateral membrane are important to complete the D-serine reabsorption in the kidney, and have included this issue in the original manuscript. We stated that transport systems at the basolateral side are necessary to be analyzed in order to complete the picture of D-serine transport systems in the kidney (lines 481-483 of the revised manuscript). However, we did not have any strong candidates for basolateral D-serine transport systems. Because we analyzed the proteome of BBMV, which concentrates on the apical membrane proteins, the analysis did not detect several transporters at the basolateral side.

      (8) In lines 462-463, the authors state: "It is suggested that PAT1 is less active at the apical membrane where the luminal pH is neutral". However, the pH of urine in the proximal tubules is normally acidic due to the high activity of NH3. I suggest rewording this sentence.

      Thank you for your comment. Proximal tubule (PT) is the first and the main region to maintain acid-base homeostasis in the kidney. In PT cells, NH3 secretes H+ to titrate luminal HCO3- and creates CO2, which is absorbed into PT cells and produces "new intracellular HCO3-", which is subsequently reabsorbed into the blood. Although ion fluxes in PT is to maintain the pH homeostasis, the pH regulation in both luminal and intracellular PT cells is highly dynamic. We totally agree with the reviewer and to follow that, we have revised the text by emphasizing the pH around PT segments, rather than the final urine pH, and leaving the discussion open for the possibility of PAT1 function in PT of normal kidneys (lines 474481).

      Manuscript lines 474-481

      “PAT1, a low-affinity proton-coupled amino acid transporter (Km in mM range), has been found at both sub-apical membranes of the S1 segment and inside of the epithelia (The Human Protein Atlas: https://www.proteinatlas.org; updated on Dec 7th, 2022) (Sagné et al., 2001; Vanslambrouck et al., 2010). PAT1 exhibits optimum function at pH 5 - 6 but very low activity at pH 7 (Miyauchi et al., 2005; Bröer, 2008b). Future research is required to address the significance of PAT1 on D-serine transport in the proximal tubule segments where pH regulation is known to be highly dynamic (Boron, 2006; Nakanishi et al., 2012; Bouchard and Mehta, 2022; Imenez Silva and Mohebbi, 2022).”

      Reviewer #3 (Recommendations For The Authors):

      The authors proposed that the increased expression of ASCT2, even together with the decreased expression of SMCT1/2, causes the increased renal reabsorption of D-serine that occurs in IRI. In the discussion, the main argument to sustain this hypothesis is the higher apparent affinity for D-serine of ASCT2 (<200 uM Km) versus SMCT1 (3.4 mM Km). In the Discussion section (page 18- 1st complete paragraph), the authors indicate that the Mass Spec intensities of SMCT1 and 2 are two and one order of magnitude higher respectively than that of ASCT2. This suggests that SMCT1 is clearly more expressed than ASCT2 in control conditions. IRI increments ASCT2 protein expression in brush-border membrane vesicle from kidney 1.6 folds and decreases that of SMCT1 0.6 folds. How this fold changes, even taking into account the lower Km of ASCT2 versus SMCT1 would explain the dramatic changes in the D-/L-serine ratios in plasma and urine in IRI? The authors might discuss whether other transport characteristics, even unknown (e.g., a higher turnover rate of ASCT2 vs SMCT1), would also contribute to the higher D-serine reabsorption in IRI.

      SMCT1 shows some enantiomer selectivity for D- vs L-serine. At 50 uM concentration the transport is almost double for D. vs L-serine, but is ASCT2 stereoselective between the two enantiomers of serine? Some of the authors of this manuscript showed in a previous paper that the basolateral transporter Asc1 also participates in the accumulation of D-serine in serum caused by renal tubular damage. (Serum D-serine accumulation after proximal renal tubular damage involves neutral amino acid transporter Asc-1. Suzuki M et al. Sci Rep. 2019 Nov 13;9(1):16705 (PMID: 31723194)). Asc1 shows no stereoselectivity between L- and D-serine. Can the authors discuss possible mechanisms resulting in increased renal reabsorption of Dserine than L-serine in IRI with the participation of transporters with modest stereoselectivity for D- vs L-serine?

      We appreciate the reviewer’s comments on the degree of protein alteration in proteomics, the functional contributions of ASCT2 and SMCTs, and the alteration of D/L ratios. We have included the possibilities of the technical concerns and the discussion on the roles of ASCT2 and SMCTs as follows.

      • Regarding the expression levels, proteomics and snRNA-seq showed the same tendency that ASCT2 increase and SMCTs decrease in IRI conditions. However, the degrees of alterations are more contrast in snRNA-seq. This may be due to the difference in quantification methods and probably points out the underestimated quantification of membrane transport proteins in label-free proteomics. The accuracy of protein quantifications in the label-free proteomics are often impacted by the presence of post-translational modifications and multiple trans-membrane domains like in the case of the membrane transport proteins (Schey KL et al. Biochemistry 2015, doi: 10.1021/bi301604j). Alternative methods of quantitative proteomics may be added in the future for a more thorough comparison. We have added this issue in lines 351-356 of the revised version.

      Manuscript lines 351-356

      “When evaluating the extent of gene/protein alterations between the control and IRI conditions, we observed that the gene alterations of both Asct2 and Smcts, as revealed by snRNA-sequencing, are more pronounced than the protein alteration ratios obtained from proteomics. This discrepancy may stem from difficulty in the quantification method, especially for membrane transport proteins in label-free quantitative proteomics.”

      • For the functional contributions of ASCT2 and SMCTs in the kidney, we admitted the reviewer’s concern about the low affinity of SMCT1. Following the reviewer’s comment, we have included other factors besides transport affinities, e.g. expression levels and turnover rates of the transporters. From the results of both proteomics and snRNA-seq, ASCT2 expression is significantly lower than SMCTs in the normal conditions. snRNA-seq showed that ASCT2 was presented in less than 10% of the cell population (Figure supplement 7). We propose that most of the cells that do not express ASCT2 may use SMCT1 to reabsorb D-serine. This topic was included in the revised manuscript lines 386-404.

      Manuscript lines 386-404

      “Kinetics analysis of D-serine transport revealed the high affinity by ASCT2 (Km 167 µM) (Foster et al., 2016) and low affinity by SMCT1 (Km 3.39 mM; Figure 5E). In addition to transport affinity, the expression levels and co-localization of multiple transporters within the same cells are critical for elucidating the physiological roles of transporters or transport systems (Sakaguchi et al., 2024). In our proteome data, the chromatogram intensities of Smct1 (2.9 x 109 AU) and Smct2 (1.6 x 108 AU) were significantly higher than that of Asct2 (1.5 x 107 AU) in the control mice (Table 1: abundance in Sham). While direct intensity comparisons between different proteins in mass spectrometry analyses are not precise, they can provide a general indication of relative protein amounts. This finding aligns with the snRNA-seq data, where Asct2 expression was found to be minimal, present in less than 10% of cell populations under both control and IRI conditions, suggesting that many cells do not express Asct2. Conversely, Smct1 and Smct2 show high and ubiquitous expression in control conditions, but their levels are markedly reduced in IRI conditions (Figure supplement 7). Our ex vivo assays demonstrate that both ASCT2 and SMCTs mediate D-serine transport (Figure 7B). Consequently, Asct2 may contribute to D-serine reabsorption due to its high affinity, whereas Smcts, owing to their abundance, particularly in cells lacking Asct2, likely play a significant role in D-serine reabsorption. Moreover, factors such as transport turnover rate (Kcat) and the presence of local canonical substrates are also vital in defining the overall contribution of D-serine transport systems.”

      • As for the dramatic alterations of D/L-serine ratios juxtaposed with minimal changes in ASCT2 and SMCTs expression level, we cautiously refrain from drawing a definitive conclusion regarding the entire mechanism. This caution is grounded in the scientific understanding of a comprehensive elucidation of both L-serine transport systems and D-serine transport systems at both apical and basolateral membranes. Nevertheless, we would like to suggest a mechanism at the apical membrane based on the current knowledge.

      For D-serine transport systems, we found ASCT2 and SMCTs contributions in this study. Meanwhile, L-serine was previously reported to be mediated mainly by the neutral amino acid transporters B0AT3 (in particular B0AT3; Bröer et al. Physiol Rev 2008; Singer et al. J Biol Chem 2009). Hence, the mechanism behind the alterations of D/L-serine ratios should include B0AT3 functions as well. In IRI conditions, B0AT3 decreased 1.8 fold. We suggest that high ratios of D-/L-serine in IRI conditions are a combined outcome of 1) increase of D-serine reabsorption by ASCT2 enhancement and SMCTs reduction, and 2) decrease of L-serine reabsorption by B0AT3. We have included this suggestion in the Discussion lines 438-451.

      Manuscript lines 438-451

      “The enantiomeric profiles of serine revealed distinct plasma D/L-serine ratios, with low ratios in the normal control but elevated ratios in IRI, despite the weak stereoselectivity of ASCT2 (Figure 1B). This observation suggested the differential renal handling of D-serine compared to L-serine. While we identified SMCTs as a Dserine transport system, it has been reported that L-serine reabsorption is mediated by B0AT3 (Singer et al., 2009). We propose that the alterations in plasma and urinary D/Lserine ratios are the combined outcomes of: 1) transport systems for L-serine, and 2) transport systems for D-serine. In normal kidneys, the low plasma D/L-serine ratios could result from the efficient reabsorption of L-serine by B0AT3, coupled with the DAAO activity that degrades intracellular D-serine reabsorbed by SMCTs. In IRI conditions, our enantiomeric amino acid profiling revealed low plasma L-serine and high urinary L-serine (Figure supplements 1B, 2B). Additionally, the proteomics analysis indicated a reduction in B0AT3 levels (4h IRI/sham = 0.56 fold; 8h IRI/sham = 0.65 fold; Table S1). These observations suggest that the low L-serine reabsorption in IRI is a result of B0AT3 reduction.”

      • In the case of Asc-1, it was reported to be a D-serine transporter in the brain (Rosenberg et al. J Neurosci 2013). Suzuki et al. 2019 showed the increase of Asc-1 in cisplatin-induced tubular injury. Notably, the mRNA of Asc-1 is predominantly found in Henle’s loop, distal tubules, and collecting ducts but not in proximal tubules, and its protein expression level is dramatically low in the kidney (Human Protein Atlas: update on Jun 19, 2023). Furthermore, in this study, Asc-1 expression was not detected in the brush border membrane proteome. Consequently, we have decided not to include Asc-1 in the Discussion of this study, which primarily focuses on the proximal tubules.
    1. Author response:

      The following is the authors’ response to the original reviews.

      We thank the constructive criticism provided by the reviewers and editor. Based on these suggestions, we have thoroughly reworked the manuscript. More specifically but not limit:

      (1) We have corrected the mistakes mentioned by the reviewers on a point-by-point basis.

      (2) We have provided additional experimental evidences to explain the rationale behind selecting five miRNAs for q-PCR validation. Furthermore, we have elaborated on the reasons for focusing primarily on research related to cartilage.

      (3) In response to concerns regarding overinterpretation in the manuscript, we have made more precise descriptions and revisions. Furthermore, we have added some details in our methods, including the addition of results showing the conservation of miR-199b-5p sequences between human and mouse species.

      (4) We have provided additional details on the experiments, including the process for predicting target genes, timing of chondrocyte culture and other experimental operations.

      (5) Finally, we have made additional revisions to the details of the figures to avoid any distortions and enhance the precision of the language.

      Below please find our responses to the reviewers’ comments on a point-by-point basis. You also can track the changes in the modified manuscript. We believe that this revision has been substantially improved.

      eLife assessment

      The manuscript provides interesting evidence that miR-199b-5p regulates osteoarthritis and as such it may be considered as a potential therapeutic target. This finding may be useful to further advance the field.

      Thank you for your positive comments.

      Although the study is considered potentially clinically relevant, the evidence provided was deemed insufficient and incomplete to support the conclusions drawn by the authors.

      Thank you for your critical comments and constructive advices. We have response point to point according to the reviewers’ questions and thoroughly re-working our manuscript. We hope the revised manuscript can be qualified to the criteria and be published on the journal of eLife.

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, the authors observed that miR-199b-5p is elevated in osteoarthritis (OA) patients. They also found that overexpression of miR-199b-5p induced OA-like pathological changes in normal mice and inhibiting miR-199b-5p alleviated symptoms in knee OA mice. They concluded that miR-199b-5p is not only a potential micro-target for knee OA but also provides a potential strategy for the future identification of new molecular drugs.

      Thanks for your comment.

      Strengths:

      The data are generated from both human patients and animal models.

      Thanks for the positive comment.

      Weaknesses:

      The data presented in this manuscript is not solid enough to support their conclusions. There are several questions that need to be addressed to improve the quality of this study.

      The following questions that need to be addressed to improve the quality of the study.

      (1) Exosomes were characterized by electron microscopy and western blot analysis (for CD9, 264 CD63, and CD81). However, figure S1 only showed two sample WB results and there is no positive and negative control as well as the confused not clear WB figure.

      Thank you for your suggestion. We acknowledge that a comprehensive identification of extracellular vesicles should include both positive and negative samples. However, in some of the initial studies we referenced, the positive and negative control were not mentioned1;2. In our study, we identified extracellular vesicles using a combination of electron microscopy, nanoparticle tracking analysis, and marker detection of exosomes. We agree that having negative samples would make our results more convincing, and we will include a negative control group in our future experiments. Additionally, we have provided clearer images in the revised version. (supplemental fig1 A)

      Reference

      (1) Ying W, Riopel M, Bandyopadhyay G, et al. Adipose Tissue Macrophage-Derived Exosomal miRNAs Can Modulate In Vivo and In Vitro Insulin Sensitivity. Cell. 2017;171(2).

      (2) Fang T, Lv H, Lv G, et al. Tumor-derived exosomal miR-1247-3p induces cancer-associated fibroblast activation to foster lung metastasis of liver cancer. Nature Communications. 2018;9(1):191.

      (2) The sequencing of miRNAs in serum exosomes showed that 88 miRNAs were upregulated and 89 miRNAs were downregulated in KOA patients compared with the control group based on fold change > 1.5 and p < 0.05. Figure 2 legend did not clearly elucidate what those represent and why the authors chose those five miRNAs to further validate although they did mention it with several words in line 108 'based on the p-value and exosomal'.

      In fact, our study included two additional groups: the acupuncture treatment group (4 weeks of continuous acupuncture treatment) and the waiting treatment group (no intervention, followed by acupuncture treatment after 4 weeks), in addition to the healthy control and knee osteoarthritis (OA) patient groups. After comparing these four groups, we found that 11 genes (hsa-miR-504-3p, hsa-miR-1915-3p, hsa-miR-103a-2-5p, hsa-miR-887-3p, hsa-miR-1228-5p, hsa-miR-34c-3p, hsa-miR-3168, hsa-miR-518e-3p, hsa-miR-1296-5p, hsa-miR-338-3p, and hsa-miR-199b-5p) were upregulated in KOA patients but downregulated after acupuncture treatment, with no change in the waiting treatment group. Additionally, 7 genes (hsa-miR-448, hsa-miR-514a-3p, hsa-miR-4440, hsa-let-7f-5p, hsa-let-7a-5p, hsa-let-7d-5p, and hsa-miR-15b-3p) were downregulated in KOA patients but upregulated after acupuncture treatment, with no change in the waiting treatment group. Considering the improvement in clinical symptoms of KOA patients after acupuncture treatment, we believe that these 18 genes are of significant value. Based on overall expression abundance and species specificity, we finally selected 5 genes, namely the 5 genes mentioned in this article. Regarding this result, we have already included it in the supplementary fig5(fig. S5).

      Author response image 1.

      Venn diagram showing differentially expressed miRNAs in the OA group compared with healthy patients and patients who recovered after acupuncture treatment.

      (3) In Figure 3 legend and methods, the authors did not mention how they performed the cell viability assay. What cell had been used? How long were they treated and all the details? Other figure legends have the same problem without detailed information.

      Thank you for your suggestions. In Figure 3, cell viability was determined using the CCK-8 assay. We used second-generation chondrocytes for this analysis. The chondrocytes were obtained from young mice aged 3-5 days after birth. The cartilage tissues were extracted, and the cells were cultured in complete medium after digestion with collagenase. The detailed description of the cell viability assay, cell culture procedures, specific timing, and treatment methods of the cells used can be found in our revised manuscript. (page14-15,line304-313)

      Besides, we have made thorough revisions to all figure legends to provide a clearer explanation of the relevant content.

      (4) The authors claimed that Gcnt2 and Fzd6 are two target genes of miR-199b-5p. However, there is no convincing evidence such as western blot to support their bioinformatics prediction.

      In the current study, we first identified six potential target genes by intersecting the predicted targets obtained from six bioinformatics websites. Subsequently, q-PCR was employed to test all six genes, revealing two genes with significant changes, namely Fzd6 and Gcnt2. We then predicted the binding sites of these genes and validated their existence through luciferase assays. Moreover, we examined the expression of these two potential targets in human KOA samples using a human database and found them to be expressed specifically in the samples. These results suggest that Fzd6 and Gcnt2 are potential target genes for KOA. However, we didn’t do western blot assay to verify the results. Based on your suggestions, we have further discussed the limitations of our study in this regard and proposed future research strategies.

      (5) To verify the binding site on 3'UTR of two potential targets, the authors designed a mouse sequence for luciferase assay, but not sure if it is the same when using a human sequence.

      Thank for your great advice. We carried out the comparative analysis of sequence conservatism between human and mouse, and find the binding site on 3'UTR matches to human sequence very well. The sequence conservation between hsa_miR-199b-5p and mmu_miR-199b-5p was as high as 95.65%. We added the methods and results in the revised manuscript. (page9, line181-184; page17, line361-365) (supplemental fig6).

      In detail: Firstly, the sequence information of mmu_miRNA-199b-5p was used to locate the human homologous sequence in the UCSC database. The homologous sequence was found to be located in the human genome at chr9:128244721-128244830 (supplemental fig6 A). Based on this positional information and the source gene, a further comparison was conducted in miRbase to identify the nearest miRNA at the position of the human genome. It was discovered that hsa_miR-199b-5p is positionally conserved and located at chr9:128244721-128244830 (supplemental fig6 B). The sequence of hsa_miR-199b-5p was obtained from the miRbase database (supplemental fig6 C), and a comparative analysis was performed between the sequences of humans and mouse (supplemental fig6 D). Besides being positionally conserved, the sequence conservation between hsa_miR-199b-5p and mmu_miR-199b-5p was as high as 95.65%, indicating a good sequence conservation.

      Author response image 2.

      (A) By using the sequence information of mmu_miRNA-199b-5p, we located the position of its human homologous sequence in the UCSC database. (B) Based on the positional information and the source gene, we further aligned this position with the closest miRNA in miRbase. (C) We compared the sequences of hsa_miR-199b-5p and mmu_miR-199b-5p. (D) Conservation analysis was performed to compare the sequence conservation of miR-199b-5p.

      Reviewer #2 (Public Review):

      Summary:

      The authors identified miR-199b-5p as a potential OA target gene using serum exosomal small RNA-seq from human healthy and OA patients. Their RNA-seq results were further compared with publicly available datasets to validate their finding of miR-199b-5p. In vitro chondrocyte culture with miR-199b-5p mimic/inhibitor and in vivo animal models were used to evaluate the function of miR-199b-5p in OA. The possible genes that were potentially regulated by miR-199b-5p were also predicted (i.e., Fzd6 and Gcnt2) and then validated by using Luciferase assays.

      We greatly appreciate Reviewer #2 constructive comments.

      Strengths:

      (1) Strong in vivo animal models including pain tests.

      (2) Validates the binding of miR-199b-5p with Fzd6 and binding of miR-199b-5p with Gcnt2.

      Thanks for positive comment.

      Weaknesses:

      (1) The authors may overinterpret their results. The current work shows the possible bindings between miR-199b-5p and Fzd6 as well as bindings between miR-199b-5p and Gcnt2. However, whether miR-199b-5p truly functions through Fzd6 and/or Gcnt2 requires genetic knockdown of Fzd6 and Gcnt2 in the presence of miR-199b-5p.

      In this study, we employed a comprehensive approach by integrating data from six bioinformatics databases to identify potential target genes for miR-199b-5p. Subsequent qPCR analysis revealed significant changes in two genes, Fzd6 and Gcnt2. We then utilized luciferase assays to validate the predicted binding sites and confirmed the interaction between miR-199b-5p and these genes. Additionally, we examined the expression profiles of these potential target genes in human KOA samples using a human database, which unveiled distinct expression patterns.

      While our findings suggest that Fzd6 and Gcnt2 may serve as potential target genes for miR-199b-5p, we acknowledge the necessity for further experimental validation and in-depth functional characterization. Building upon your insightful recommendations, we have thoroughly addressed the research limitations and proposed potential research strategies for future investigations in our discussion. (page11,line227-231)

      (2) In vitro chondrocyte experiments were conducted in a 2D manner, which led to chondrocyte de-differentiation and thus may not represent the chondrocyte response to the treatments.

      We admit that 3D culture system will be more accurate and reliable. However, according to Liu Qianqian et al researches3, the 2D culture systems were also used and work well. Besides, the second-generation primary mice chondrocytes we used in the current study did not exhibit a significant dedifferentiated morphology. So, considering the experiment condition in our lab, we chose the second-generation cultured primary mouse chondrocytes in the whole process of cell experiment. To show the reliability of the cells, we provided more pictures in the supplement fig 7(fig. S7) In the future study, we will adopt 3D culture system for experiments. Thank you for your advices and we have added this limitation in the revised manuscript. (page11,line237-240)

      Author response image 3.

      Primary mice chondrocytes we cultured (P1)and the secondary generation cells(P2) we used in the following experiment.

      References which used 2D :

      (3) Liu Q, Zhai L, Han M, et al. SH2 Domain-Containing Phosphatase 2 Inhibition Attenuates Osteoarthritis by Maintaining Homeostasis of Cartilage Metabolism via the Docking Protein 1/Uridine Phosphorylase 1/Uridine Cascade. Arthritis & Rheumatology (Hoboken, NJ). 2022;74(3):462-474.

      (3) There is a lack of description for bioinformatic analysis.

      Sorry for our neglection. We have added relevant descriptions and details. (Pages 14, line299-303)

      (4) There are several errors in figure labeling.

      We have revised. (Fig. 3, Fig. 4, Fig. 5 and Fig. 7)

      Recommendations for the authors:

      We appreciate the reviewers' feedback as we believe it has significantly contributed to the refinement of our manuscript. We are confident that our revisions have strengthened the quality and impact of our study, and we agree that the suggestions presented by the reviewers are valuable and appropriate for publication.

      Reviewer #2 (Recommendations For The Authors):

      I would like to thank the authors for investigating the functional role of miR-199b-5p in knee OA. While this study has the potential to provide valuable knowledge to the fields of miRNAs and joint diseases, significant improvements in several areas are required.

      We appreciate your constructive comments, and we have made a substantial improvement to the manuscript. We thank all the reviewers for their advice as well as their criticisms.

      Major concerns:

      (1) According to the Authors, miR-199b-5p is identified by the results from their own miRNA-sequencing as well as comparison with other publicly available datasets (both synovium and cartilage datasets). It is unclear to me why the synovium dataset was used here as it appears that the entire manuscript was mainly focused on chondrocytes.

      Thank you for your question. As we are aware, cartilage degradation is the initial pathological change in knee osteoarthritis (KOA), which subsequently leads to other pathological changes such as synovial inflammation4. These factors are interrelated, and current research on KOA encompasses cartilage, synovium, and system inflammation et al. Therefore, when we identified a large number of dysregulated miRNAs in extracellular vesicles isolated from serum, it was crucial to determine whether these dysregulated miRNAs were also altered in cartilage or synovium. To address this, we compared our findings with publicly available databases and found a higher overlap with the cartilage cell dataset, including miRNA-199b. Consequently, we decided to focus our subsequent investigations on cartilage-related research.

      Reference

      (4) Hunter D, Bierma-Zeinstra S. Osteoarthritis. Lancet (London, England). 2019;393(10182):1745-1759.

      (2) Also, 169 of 177 differentially expressed exosome miRNAs were intersected with differentially expressed miRNAs from OA cartilage datasets. It is surprising that in the 5 selected miRNAs for further qRT-PCR validation, 3 out of 5 were not in the exosome miRNA dataset (i.e., hsa-mir-1296-5p, hsa-mir-15b-3p, and hsa-mir-338-3p; page 5, line 109 and Fig. 1B). Isn't that selecting the miRNAs that both differently expressed in exosome and cartilage datasets for validation more essential? Furthermore, from the Authors' exosome miRNA dataset, only 5 out of 15 KOA patients actually exhibited up-regulated miR-199b-5p vs. health controls. Please elaborate on how the target was determined.

      In fact, our study included two additional groups: the acupuncture treatment group (4 weeks of continuous acupuncture treatment) and the waiting treatment group (no intervention, followed by acupuncture treatment after 4 weeks), in addition to the healthy control and knee osteoarthritis (OA) patient groups. After comparing these four groups, we found that 11 genes (hsa-miR-504-3p, hsa-miR-1915-3p, hsa-miR-103a-2-5p, hsa-miR-887-3p, hsa-miR-1228-5p, hsa-miR-34c-3p, hsa-miR-3168, hsa-miR-518e-3p, hsa-miR-1296-5p, hsa-miR-338-3p, and hsa-miR-199b-5p) were upregulated in KOA patients but downregulated after acupuncture treatment, with no change in the waiting treatment group. Additionally, 7 genes (hsa-miR-448, hsa-miR-514a-3p, hsa-miR-4440, hsa-let-7f-5p, hsa-let-7a-5p, hsa-let-7d-5p, and hsa-miR-15b-3p) were downregulated in KOA patients but upregulated after acupuncture treatment, with no change in the waiting treatment group. Considering the improvement in clinical symptoms of KOA patients after acupuncture treatment, we believe that these 18 genes are of significant value. Based on overall expression abundance and species specificity, we finally selected 5 genes, namely the 5 genes mentioned in this article. Regarding this result, we have already included it in the supplementary fig5(fig. S5).

      Author response image 4.

      Venn diagram showing differentially expressed miRNAs in the OA group compared with healthy patients and patients who recovered after acupuncture treatment.

      (3) There is also a lack of description for bioinformatic analysis regarding how miRNA sequencing datasets were analyzed. What R/python packages or algorithms were used? What were the QC criteria?

      We apologize for any confusion caused. We have now included a clear description of the method employed, and R was utilized for this data analysis (revised in Page14, Line301-305). To ensure consistency, we compared our findings with publicly available human serum data from the database (GSE105027) using a fold change threshold of > 1.5 and a significance level of p < 0.05. In the cartilage data (GSE175961), we observed a list of miRNAs with shared expression patterns, yet the precise differential values could not be determined.

      (4) Another major concern is the chondrocyte culture method. Chondrocytes should be cultured in a 3D manner (i.e., a 3D pellet culture system or a micro mass culture method). 2D cultured chondrocytes tend to de-differentiate into MSC-like cells and thus lose their chondrocyte phenotype. This is evident from Fig. 3B and C. Cells started to spread out and only a few cells were positive for COL2A1 with a deep brown staining color. Thus, the results from the in vitro studies may not be representative of chondrocyte response to the treatments.

      We admit that 3D culture system will be more accurate and reliable. However, according to Liu Qianqian et al researches3, the 2D culture systems were also used and work well. Besides, the second-generation primary mice chondrocytes we used in the current study did not exhibit a significant dedifferentiated morphology. So, considering the experiment condition in our lab, we chose the second-generation cultured primary mouse chondrocytes in the whole process of cell experiment. To show the reliability of the cells, we provided more pictures in the supplement fig 7(fig. S7) In the future study, we will adopt 3D culture system for experiments. Thank you for your advices and we have added this limitation in the revised manuscript. (page11, line237-240)

      Author response image 5.

      Primary mice chondrocytes we cultured (P1)and the secondary generation cells(P2) we used in the following experiment.

      References which used 2D :

      (3) Liu Q, Zhai L, Han M, et al. SH2 Domain-Containing Phosphatase 2 Inhibition Attenuates Osteoarthritis by Maintaining Homeostasis of Cartilage Metabolism via the Docking Protein 1/Uridine Phosphorylase 1/Uridine Cascade. Arthritis & Rheumatology (Hoboken, NJ). 2022;74(3):462-474.

      (5) Page 7, lines 148-149: "The cartilage of mice injected with the miR-199b-5p mimic was slightly degraded (p=0.02) (Fig. 4E, F)". However, there was no significance between the groups found in Fig. 4F. Also, from the histological images of Fig. 4E, it looks like mice with inhibitor injection had more cartilage damage than miR-199b-5p mimic.

      We apologize for any confusion caused. Figures 4E and 4F represent the Safranin Fast Green Staining staining of the joint after the administration of miR-199b-5p inhibitor and mimic under physiological conditions. As you can see, there is minimal difference between these four images. There is no statistically significant difference. However, in Figures 5E and 5F, the MIA-induced KOA model was utilized, and noticeable differences can be observed after the administration of the inhibitor and mimic. In the revised version, we have emphasized that Figures 4E and 4F represent the results under physiological conditions, not under the MIA-induced model. (page 7, line 146-151)

      (6) Page 7, lines 149-150: "Additionally, the articular surface showed insect erosion (Fig. 4G)." It is also unclear how micro-CT analysis will be able to demonstrate the erosion of cartilage. Or the authors actually indicate the trochlear groove. However, this could also be observed in the control group and the results were not quantified. It is also unclear if the cross-section images of micro-CT shown here are helpful at all without any further explanation in the manuscript.

      Figure 4 G represents control, vehicle control, inhibitor, and mimic groups, while Figure 5 G represents model, model+vehicle control, model+inhibitor, and model+mimic groups. From Figure 4G, it can be observed that the simulator group showed the most obvious erosion appearance, while the inhibitor group did not exhibit this phenomenon5. From Figure 5G, it can be seen that the model group and model+mimic group exhibited the most pronounced erosion appearance, while the model+inhibitor group showed the best recovery. To highlight the pathological changes in the erosion appearance, we marked the typical locations with red arrows in the images for easy comparison and reading by the readers (Fig. 4G; Fig. 5G). We also made corresponding textual modifications in the original manuscript to address these findings (page 7, line 150-151; page 8, line 160-161). In addition, the 3D reconstruction of micro-CT is based on the synthesis of these cross-sectional images.

      References

      (5) Tao Y, Wang Z, Wang L, et al. Downregulation of miR-106b attenuates inflammatory responses and joint damage in collagen-induced arthritis. Rheumatology (Oxford, England). 2017;56(10):1804-1813.

      (7) Page 17, line 309-310: "Before model establishment and at 3, 7, 10, 14, 21, and 28 days after model establishment." Please re-write this as this is not clear regarding the experimental procedure.

      Thank you. We had to re-write the sentences as following:Baseline testing of behavioral pain thresholds was conducted prior to model establishment, followed by behavioral pain threshold testing on days 3, 7, 10, 14, 21, and 28 after model establishment. (pages15, line322-324)

      (8) Fig. 5A. The M + inhibitor and Model images are not at the same plane as M + mimic and M + RNAnc images.

      Thank you. We have modified.

      (9) Fig. 5B. There are two lines both with circle markers (Control and M+inhibitor). Please correct.

      We have corrected.

      (10) Fig. 5F. Missing * sign.

      We added *sign.

      (11) Please elaborate how the potential binding sites between miR-199b-5p and Gcnt2 and between miR-199b-5p and Fzd6.

      We apologize for any lack of clarity in the original text. In fact, we utilized targets to predict potential binding sites. Specifically, for the mouse species, we predicted that the 3'UTR of Fzd6 binds with miR-199b-5p at positions 2483-2490, 3244-3251, 3303-3309, and 3854-3860, while the 3'UTR of Gcnt2 binds with miR-199b-5p at positions 2755-2762 and 4144-4151. In the revised version, we provide a detailed description of the methodology used for predicting these sites and offer an elaborate explanation of the results. (pages16, line352)

      Additionally, to demonstrate consistency with human binding sites, we not only predicted the binding sites of human miR with these two target genes but also found a high conservation of up to 95.65% between the human and mouse sequences of miR-199b-5p. We have included this information in the supplementary materials (Fig. S6). In Fig. 6E-F, we presented the potential binding sites between miR-199b-5p and Gcnt2, as well as between miR-199b-5p and Fzd6. In addition, we provide the predicted binding of human sequence to illustrate the binding sites. Furthermore, the predicted binding of human miR-199b-5p with fzd6 and gcnt2 showed a high degree of consistency. (The fluorescent labeling in the following text indicates the potential predicted binding sites.) (Supplement file 8)

      hsa-miR-199b-5p MIMAT0000263

      CCCAGUGUUUAGACUAUCUGUUC

      NCBI Gene ID 8323 GenBank Accession NM_001164615

      Gene Symbol FZD6 3' UTR Length 1368

      Gene Description frizzled class receptor 6

      3' UTR Sequence: agaacattttctctcgttactcagaagcaaatttgtgttacactggaagtgacctatgcactgttttgtaagaatcactgttacattcttcttttgcacttaaagttgcattgcctactgttatactggaaaaaatagagttcaagaataatatgactcatttcacacaaaggttaatgacaacaatatacctgaaaacagaaatgtgcaggttaataatatttttttaatagtgtgggaggacagagttagaggaatcttccttttctatttatgaagattctactcttggtaagagtattttaagatgtactatgctattttacttttttgatataaaatcaagatatttctttgctgaagtatttaaatcttatccttgtatctttttatacatatttgaaaataagcttatatgtatttgaacttttttgaaatcctattcaagtatttttatcatgctattgtgatattttagcactttggtagcttttacactgaatttctaagaaaattgtaaaatagtcttcttttatactgtaaaaaaagatataccaaaaagtcttataataggaatttaactttaaaaacccacttattgataccttaccatctaaaatgtgtgatttttatagtctcgttttaggaatttcacagatctaaattatgtaactgaaataaggtgcttactcaaagagtgtccactattgattgtattatgctgctcactgatccttctgcatatttaaaataaaatgtcctaaagggttagtagacaaaatgttagtcttttgtatattaggccaagtgcaattgacttcccttttttaatgtttcatgaccacccattgattgtattataaccacttacagttgcttatattttttgttttaacttttgttttttaacatttagaatattacattttgtattatacagtacctttctcagacattttgtagaattcatttcggcagctcactaggattttgctgaacattaaaaagtgtgatagcgatattagtgccaatcaaatggaaaaaaggtagttttaataaacaagacacaacgtttttatacaacatactttaaaatattaaggagttttcttaattttgtttcctattaagtattattctttgggcaagattttctgatgcttttgattttctctcaatttagcatttgcttttggtttttttctctatttagcattctgttaaggcacaaaaactatgtactgtatgggaaatgttgtaaatattaccttttccacattttaaacagacaactttgaatacaaaaactttgttttgtgtgatcttttcattaataaaattatctttgtataagaaaaaaaaaaaaaa

      hsa-miR-199b-5p MIMAT0000263

      CCCAGUGUUUAGACUAUCUGUUC

      NCBI Gene ID 2651 GenBank Accession NM_001491

      Gene Symbol GCNT2 3' UTR Length 2780

      Gene Description glucosaminyl (N-acetyl) transferase 2 (I blood group)

      3' UTR Sequence: gctattcatgagctactcatgactgaagggaaactgcagctgggaagaggagcctgtttttgtgagagacttttgccttcgtaatgttaaccgtttcaggaccacgtttatagcttcaggacctggctacgtaattatacttaaaatatccactggacactgtgaaatacactaacaggatggctgggtagagcaatctgggcactttggccaattttagtcttgctgtttcttgatgctcacctctatattagtttattgttaggatcaatgataaatttaaatgacctcagatctttgcaccagatactcatcatatacaaatgttttagtaaaaaagagaattgtagataatactgtctaggaaaataagaattaggtttctttgaagaaggaatcttttataacaccttaacagtcaccactgtgctcaaccagacagatagtgaaacagctttctgggtaattcaccaatttcctttaaaacataagctacctgaatggagaatacatcttgtttctgagtttcaacactagcatttttggcttactcatggacaaagttctgtatatagtataaagtcattaacaagaaacaggatatgctttaagacagaattcactgtctgttgcttcagtaaaaggacctcggggaataaaacatttctctcttatatgccagaatgtaggctggtccctatgtcatgtcttccattaagaacactaaaaagtccttgcaagaatggagatatgcattcaagagaggtgctatcacatagatctagtctgaagtctggaacactttcctcttctatgacccctctctccccagtattatcttacttgcaaaatggagaccaaattctatcctgtgaggcttttaattgcaccatagtatgctctgagtagctttacactgcctggtactgatagtagtggctcgatttttaagagccttcaattgtagatgaacatctctgttatttatccctcattcatccatccgttcattcattcagccttcaatcaacatctcttgagtgtctattatgtacaggacatgtactgagacaaaaaggaaacataagagctttttcactctaaaaatcttggcaataatgtcaacaccagaaagcctcctctggagaatcttacagagtgattgtagtttaatacaggaacacacagggctgtgtagcatgataccaggcccaggagatcagtaattacaaattaagggttaaatcagagattattcaacagagagggagaaaggaggagacagagggaggacctgttgtgttccagccattctggtattcctttatgtatctaatttcattcaaacctcacaacagtcttgtgaggcccttatataattactcccattttgcagatgaagtaactgaggcttagaaaggttaatagcaccggggaacaatttctctgggtgagaattgggactctgttgctggtcttctcagttcatttcctgaggtggatttactgagagaaggtgaaataaagccatatttagtataccagagaaggtagattttaagaatggtctcagtgttaatactgagaaaaagtcctgtcagttcagaaaaaatgtgaagtctactttagtattcctgtaatactaaaccgttgagtttctaaatatttatttattctaacaaaaagcaattactacaaatggatgacacatttaatgaacacaattttattttttttctgtaactgtgcttgttgaatgtcaatcatatttaaagggaatgactttgaagtaaaaccttttttcttgctactgaaaaaaatggagttgttttgggtggtaaagtgttaaggaatagggacagctggtcacacaaggaactcttgaaggccacatgtgaaaacctgtcacttgcacagaggccagtcccactaaggtgaccagagtgggctccaagcacaaactgccattggctatagatgggactgtgtccccccaaaattcatgtgttggagccttaaccctcaatgtgatggtatttgagatggggcctttggtaagggaagtttagatgaggtcacgagggtaggaccctcatgatgggatgagtccccttacaagacctctggcttgggccgggcgtggtggctcacacctgtaatcccaacactttgggaggccaaggcaggtagatcacttgatgccaggagttccagaccaggctggccgacatggtgaaaccccatctctactaaaaaatataaaaattagccgggctttgtggcatgtgcctgtaatcccagctatttggcaggctgaggcatgagaatcgcttgaacccaggaggtggaggttacagtgagctgagagtgccccactgcactccagcctgggtgacagagcgagactttgtcccaaaacaaaataggtgaggggatagcgaatgcactcagggtcagcagtggagtttaaaaattgtctcttttcaacttatttaaatgacagcacctgagaagaggaaccgttttacactggatgtttctcatgtagaacaagaaatctttctggaattgatgtttacatgtctgttgttggtcatctctcctgtgtcttaaatactttaatgttggaagagcatagtgtttgggctagtgggtttctgacagcccatgggaatgccctgaaactactgtatctgatgtttgttttcgatgaggttccatgttttgttttcttgggaataaattaatatattgttttccaaaaaaaaaaaaaaaaaaaa

      (12) Page 10-11, Line 222-223: "Our findings indicate that miR-199b-5p plays a crucial role in KOA by targeting Fzd6 and Gcnt2". This is an overstatement. The current work shows the possible bindings of miR-199b-5p and Fzd6 as well as bindings of miR-199b-5p and Gcnnt2. Whether miR-199b-5p truly functions through Fzd6 and/or Gcnt2 requires genetic knockdown of Fzd6 and Gcnt2 in the presence of miR-199b-5p. Thus, please tune down this statement and the title of the manuscript.

      We agree your opinion of our conclusion. Therefore, we delete the overstatement sentences and tune down the conclusion of the manuscript. (the title; page 8,179; page11, line227-228)

      (13) The Schematic figure (the last figure). Please remove osteophyte as this was not quantified in the study.

      We modified the schematic figure accordingly.

      Minor concerns:

      (1) Most figures were distorted.

      We provide a new version of the figure to avoid distortions.

      (2) Providing GO term numbers in Fig. 1C is not very helpful. Maybe show the GO term and corresponding numbers in the manuscript (Page 4, lines 79 - 82).

      Thank you for your advice. We added the corresponding notes of the GO term numbers in the manuscript to explain each biological concept of it. (Page 4, line 77-89;Page 22,line 515-532)

      (3) What were M-0.5 and M-1 in Fig. 2D? Different MIA concentrations?

      Yes, these are different MIA concentrations, which we illustrate in the legend. (Page 23, line 535-536)

      (4) Please follow the nomenclature of the gene symbol. For example, Fig. 3E-P should be mouse genes (?).

      We modified the relevant gene symbol.

      (5) Page 3, line 59. Not all chondrocytes are pathogenic cells in OA.

      We are sorry for the mistake, now it has been modified. (Page 3, line 59)

      (6) Typo. Page 3, line 55.

      We changed the Typo.

      (7) Page 4, line 78. These are differentially expressed miRNAs, not genes.

      We have revised the unsuitable expression. (Page4, line75-76)

      I wish the authors all the best with their continued work in this area.

      Thank you for your wishes.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      The manuscript by Xia et al. investigated the mechanisms underlying Glucocorticoid-induced osteonecrosis of the femoral head (GONFH). The authors observed that abnormal osteogenesis and adipogenesis are associated with decreased β-catenin in the necrotic femoral head of GONFH patients, and that the inhibition of β-catenin signalling leads to abnormal osteogenesis and adipogenesis in GONFH rats. Of interest, the deletion of β-catenin in Col2-expressing cells rather than in Osx-expressing cells leads to a GONFH-like phenotype in the femoral head of mice.

      Strengths:

      A strength of the study is that it sets up a Col2-expressing cell-specific β-catenin knockout mouse model that mimics the full spectrum of osteonecrosis phenotype of GONFH. This is interesting and provides new insights into the understanding of GONFH. Overall, the data are solid and support their conclusions.

      Reviewer #1 (Recommendations For The Authors):

      1) Fig. 1I should be quantified and presented as bar graphs to make it consistent with other data, and the significance should be shown.

      Reply: Thanks for your comments. We have provided the quantitative bar graph in the new version.

      2) Fig. 2H, beta-catenin, ALP and FABP4 should be labled below the X axis. Moreover, the pattern of Fig. 2H is different from other bar graphs and the dots for individual samples are missing, so I could not judge the N values for the experiments. N values should also be provided for Fig. 3.

      Reply: Thanks for your comments. We have added the labels of beta-catenin, ALP and FABP4 below the X axis in Fig. 2H. The modes of quantitative bar graphs were changed to show the N values in the each experiment.

      3) Fig. 4 shows the fate mapping of Col2+ cells and Osx+ cells in the femoral head. In this regard, the authors presented images for Col2-expressing cells at all the indicated time points, i.e. 1, 3, 6, and 9 months, but only presented images for Osx-expressing cells for 1 month while those for 3, 6, and 9 months are missing.

      Reply: Thanks for your comments. Here, we showed that the expression of Osx+ cells in the femoral head were total different with Col2+ cells at the age of 3, 6 month, further indicating they were two different progenitor lineage cells.

      Author response image 1.

      4) Some experiments may need to be described in more detail" e.g., ABH/Orange G staining, biomechanical testing, μCT analysis, et al.

      Reply: Thanks for your comments. We have provided more information of experiment procedures.

      5) This study proposed that Col2-expressing cells play a key role in the progression of GONFH, did the authors use Col2+ cells for the in vitro experiments?

      Reply: As in vitro experiments could not reflect the location of Col2-expressing cells in the femoral head, therefore here we applied in vivo lineage tracing study. After as long as 9 month of linage trace, we thoroughly showed the self-renew ability and osteogenic commitment of Col2+ cells, as well as its space variation in the femoral head with age. Conditional knockout of β-catenin caused that Col2+ cells trans-differentiated into adipogenic cells instead of osteogenic cells, which directly clarified the mechanism of Col2+ cells leading to GONFH-like phenotype in mice.

      6) A few typo errors, such as Line 13, "contribute" should be "contributes"; Line 118, "reveled" should be "revealed".

      Reply: We have revised the grammar errors in the new manuscript.

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript, the authors reported a study to uncover that β-catenin inhibition disrupting the homeostasis of osteogenic/adipogenic differentiation contributes to the development of Glucocorticoid-induced osteonecrosis of the femoral head (GONFH). In this study, they first observed abnormal osteogenesis and adipogenesis associated with decreased β-catenin in the necrotic femoral head of GONFH patients, but the exact pathological mechanisms of GONFH remain unknown. They then performed in vivo and in vitro studies to further reveal that glucocorticoid exposure disrupted osteogenic/adipogenic differentiation of bone marrow stromal cells (BMSCs) by inhibiting β-catenin signaling in glucocorticoid-induced GONFH rats, and specific deletion of β-catenin in Col2+ cells shifted BMSCs commitment from osteoblasts to adipocytes, leading to a full spectrum of disease phenotype of GONFH in adult mice.

      Strengths:

      This innovative study provides strong evidence supporting that β-catenin inhibition disrupts the homeostasis of osteogenic/adipogenic differentiation that contributes to the development of GONFH. This study also identifies an ideal genetically modified mouse model of GONFH. Overall, the experiment is logically designed, the figures are clear, and the data generated from humans and animals is abundant supporting their conclusions.

      Weaknesses:

      There is a lack of discussion to explain how the Wnt agonist 1 works. There are several types of Wnt ligands. It is not clear if this agonist only targets Wnt1 or other Wnts as well. Also, why Wnt agonist 1 couldn't rescue the GONFH-like phenotype in β-cateninCol2ER mice needs to be discussed.

      Reply: Thanks for your constructive comments. Wnt agonist 1 is a cell-permeating activator of the Wnt signaling pathway that induces transcriptional activity dependent on β-catenin (PMID: 25514428,18624906). In the present study, we aim to demonstrate that activation of β-catenin signaling could alleviate the phenotype of rat GONFH, thus only β-catenin and downstream targets (RUNX2, ALP, PPAR-γ, FABP4) expressions were detected after Wnt agonist 1 intervention. Conditional knockout β-catenin in Col2+ cells lead to an mouse GONFH-like phenotype. Wnt agonist 1 couldn't rescue this GONFH-like, as it did not activate β-catenin signaling. We have discussed them in the new version.

      Reviewer #3 (Public Review):

      Summary:

      In this manuscript, the authors are trying to delineate the mechanism underlying the osteonecrosis of the femoral head.

      Strengths:

      The authors provided compelling in vivo and in vitro data to demonstrate Col2+ cells and Osx+ cells were differentially expressed in the femoral head. Moreover, inducible knockout of β-catenin in Col2+ cells but not Osx+ cells lead to a GONFH-like phenotype including fat accumulation, subchondral bone destruction, and femoral head collapse, indicating that imbalance of osteogenic/adipogenic differentiation of Col2+ cells plays an important role in GONFH pathogenesis. Therefore, this manuscript provided mechanistic insights into osteonecrosis as well as potential therapeutic targets for disease treatment.

      Weaknesses:

      However, additional in-depth discussion regarding the phenotype observed in mice is highly encouraged.

      Reply: Thanks for your comments. Inducible knockout of β-catenin in Col2+ cells but not Osx+ cells lead to a GONFH-like phenotype. Lineage tracing data showed Col2+ cells and Osx+ cells were different cell populations, and we have discussed the potential mechanism caused the different phenotypes between β-cateninCol2ER mice and β-cateninOsxER mice.

      1) Why did the authors use dexamethasone in the cellular experiments but methylprednisolone to induce the GONFH rat model?

      Reply: Thanks for the comments. Here, we applied a dexamethasone (DEX)-treated BMSC model in vitro and a methylprednisolone (MPS)-induced rat model in vivo for GONFH study based on the published literatures (PMID: 37317020, 29662787, 29512684,35126710, 32835568).

      2) Both bone damage and fat accumulation were observed in 3-month-old and 6-month-old β-cateninCol2ER mice, but the femoral head collapse (the feature of GONFH at the late stage) only occurred in the older β-catenin Col2ER mice. This interesting observation needs to be discussed. Reply: Thanks for the comments. Bone damage caused a poor mechanical support is the key to femoral head collapse. Despite of similar trabecular bone loss and fat accumulation in the 3-month-old and 6-month-old β-cateninCol2ER mice, the older mice also presented extensive subchondral bone destruction. Integrated subchondral bone provided a well mechanical support for femoral head morphology, therefore femoral head collapse were occurred in the older β-cateninCol2ER mice.

      3) In the Materials and Methods, detailed information on the reagents should be provided.

      Reply: We have provided detailed information of the important reagents.

      4) As shown in Figure 4, β-cateninOsxER mice at 3 months of age did not show differences in lipid droplet area and empty lacunae rate, but there was a decrease in bone area. The authors should at least provide some necessary discussion of this phenomenon.

      Reply: Thanks for your comments. In the present study, we found few lipid droplet and empty lacuna but a significant decrease of bone mass in the femoral heads of β-cateninOsxER mice. Previous studies showed that specific knockout of β-catenin in Osx-expressing cells promoted osteoclast formation and activity, leading to the bone mass loss (PMID: 29124436, 34973494). We discussed this phenomenon in the new version.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This study presents a potentially valuable discovery which indicates that activation of the P2RX7 pathway can reduce the lung fibrosis after its establishment by inflammatory damage. If confirmed, the study could clarify the role of specific immune networks in the establishment and progression of lung fibrosis. However, the presented data and analyses are incomplete as they primarily rely on limited pharmacological treatments with modest effect sizes. I hope you will be convinced by the validity of our approaches with the following explanation/information and I remain at your disposal to discuss

      Public Reviews:

      Reviewer #1 (Public Review):

      In this revised preprint the authors investigate whether a presumably allosteric P2RX7 activating compound that they previously discovered reduces fibrosis in a bleomycin mouse model. They chose this particular model as publicly available mRNA data indicate that the P2RX7 pathway is downregulated in idiopathic pulmonary fibrosis patients compared to control individuals. In their revised manuscript, the authors use three proxies of lung damage, Ashcroft score, collagen fibers, and CD140a+ cells, to assess lung damage following the administration of bleomycin. These metrics are significantly reduced on HEI3090 treatment. Additional data implicate specific immune cell infiltrates and cytokines, namely inflammatory macrophages and damped release of IL-17A, as potential mechanistic links between their compound and reduced fibrosis. Finally, the researchers transplant splenocytes from WT, NLRP3-KO, and IL-18-KO mice into animals lacking the P2RX7 receptor to specifically ascertain how the transplanted splenocytes, which are WT for P2RX7 receptor, respond to HEI3090 (a P2RX7 agonist). Based on these results, the authors conclude that HEI3090 enhanced IL-18 production through the P2RX7-NLRP3 inflammasome axis to dampen fibrosis.

      These findings could be interesting to the field, as there are conflicting results as to whether NLRP3 activation contributes to fibrosis and if so, at what stage(s) (e.g., acute damage phase versus progression). The revised manuscript is more convincing in that three orthogonal metrics for lung damage were quantified. However, major weaknesses of the study still include inconsistent and small effect sizes of HEI3090 treatment versus either batch effects from transplanted splenocytes or the effects of different genetic backgrounds. Moreover, the fundamental assumption that HEI3090 acts specifically and functionally through the P2RX7 pathway in this model cannot be directly tested, as the authors now provide results indicating that P2RX7 knockout mice do not establish lung fibrosis on bleomycin treatment.

      I’m particularly concerned by the assumption made by reviewer 1 concerning the fact that P2RX7 knockout mice do not establish lung fibrosis on bleomycin treatment.

      Indeed, what we showed in the point-to-point response is that BLM induces fibrosis in both WT and P2RX7 KO mice, but the intensity of the fibrosis is reduced in P2RX7KO mice, panel A. Therefore, as discussed in our first response, our results confirmed the previous publication of Riteau et al, that P2RX7 participates in BLM-induced lung fibrosis (see panel B).

      Author response image 1.

      Bleomycin induced lung fibrosis in WT versus p2rx7 KO mice. A: lung from BLM-treated mice were stained with HE and fibrosis was quantified using the Ashcroft protocol. Result showed that fibrosis induced by BLM in KO mice is reduced as compared to WT mice. B: Representative images of lung sections at day 14 after BLM treatment stained with H&E as published in Riteau et al. and illustrating that fibrosis induced by BLM in KO mice is reduced as compared to WT mice. WT mice vehicle (n=4) or p2rx7 KO (n=6) mice. Two-tailed Mann-Whitney test, p values: **p < 0.01.

      Importantly, this lower intensity of lung fibrosis in P2RX7 KO mice, does not interfere with the capacity of our molecule to attenuate lung fibrosis, as demonstrated in the adoptive transfer of IL1B KO splenocytes in P2RX7 KO mice, in which HEI3090 decreases the Ashcroft score, the % of fibrosis and the collagen fibers (see below).

      Author response image 2.

      HEI3090 activity requires P2RX7’s expressing immune cells: Experimental design. p2rx7-/- mice were given 3.106 il1β-/- splenocytes i.v. one day prior to BLM delivery (i.n. 2.5 U/kg). Mice were treated daily i.p. with 1.5 mg/kg HEI3090 or vehicle for 14 days. (C) Representative images of lung sections at day 14 after treatment stained with H&E and Sirius Red with il1β-/- splenocytes, bar= 100 µm (left) and fibrosis score assessed by the Ashcroft method, the % of fibrosis and the content of collagen fibers (right). Each point represents one mouse (n=2 in WT and NLRP3 experiment, n =1 in IL18 and IL1B experiment), data represented as violin plot or mean±SEM, two-tailed Mann-Whitney test, *p < 0.05. WT: Wildtype, KO: P2RX7 knock-out

      Importantly, in the same experimental setting, e.g adoptive transfer of splenocytes from different genetic backgrounds, HEI3090 decreases the fibrosis intensity only with WT and IL1B KO splenocytes and not with NLRP3 KO and IL18KO splenocytes.

      Author response image 3.

      HEI3090 activity requires P2RX7’s expressing immune cells: Experimental design. p2rx7-/- mice were given 3.106 WT, NLRP3-/-, IL18-/- or IL1β-/- splenocytes i.v. one day prior to BLM delivery (i.n. 2.5 U/kg). Mice were treated daily i.p. with 1.5 mg/kg HEI3090 or vehicle for 14 days. Fibrosis in whole lung was assessed by the % of fibrosis (upper panel) and the content of collagen fibers (lower panel). Each point represents one mouse (n=2 in WT and NLRP3 experiments, n =1 in IL18 and IL1B experiment). Data represented as violin plot or mean±SEM, two-tailed Mann-Whitney test, *p < 0.05. WT: Wildtype, KO: P2RX7 knock-out

      In order to provide clear evidence that HEI3090 functions through P2RX7, a different lung fibrosis model that does not require P2RX7 would be necessary. For example, in such a system the authors could demonstrate a lack of HEI3090-mediated therapeutic effect on P2RX7 knockout.

      Since BLM induces lung fibrosis in P2RX7 KO mice as we showed in this manuscript and as already published by Riteau in 2010, shown earlier in our response (first figure) and because HEI3090 is able to decrease the intensity of fibrosis in WT and IL1B-/- → P2RX7 KO mice but not in KO, NLRP3-/- → P2RX7 KO and IL18-/- → P2RX7 KO mice we believe that our data sustain the conclusion that

      1. HEI3090 required the expression of P2RX7 in immune cells to mediate the antifibrotic activity,

      2. IL1B is not a crucial effector mediating the antifibrotic effect of HEI3090.

      Molecularly, additional evidence on specificity, such as thermal proteome profiling and direct biophysical binding experiments, would also enhance the authors' argument that the compound indeed binds P2RX7 directly and specifically. Since all small molecules have some degree of promiscuity, the absence of an additional P2RX7 modulator, or direct recombinant IL-18 administration (as suggested by another reviewer), is needed to orthogonally validate the functional importance of this pathway. Another way the authors could probe pathway specificity would involve co-administering α-IL-18 with HEI3090 in several key experiments (similar to Figure 4L).

      At the moment we have no funds to do these experiments and given the high competition, we have decided to publish our story without these new data.

      Reviewer #2 (Public Review):

      In the study by Hreich et al, the potency of P2RX7-specific positive modulator HEI3090, developed by the authors, for the treatment of Idiopathic pulmonary fibrosis (IPF) was investigated. Recently, the authors have shown that HEI3090 can protect against lung cancer by stimulating dendritic cell P2RX7, resulting in IL-18 production that stimulates IFN-γ production by T and NK cells (DOI: 10.1038/s41467-021-20912-2). Interestingly, HEI3090 increases IL-18 levels only in the presence of high eATP. Since the treatment options for IPF are limited, new therapeutic strategies and targets are needed. The authors first show that P2RX7/IL-18/IFNG axis is downregulated in patients with IPF. Next, they used a bleomycin-induced lung fibrosis mouse model to show that the use of a positive modulator of P2RX7 leads to the activation of the P2RX7/IL-18 axis in immune cells that limits lung fibrosis onset or progression. Mechanistically, treatment with HEI3090 enhanced IL-18-dependent IFN-γ production by lung T cells leading to a decreased production of IL-17 and TGFβ, major drivers of IPF. The major novelty is the use of the small molecule HEI3090 to stimulate the immune system to limit lung fibrosis progression by targeting the P2RX7, which could be potentially combined with current therapies available. Overall, the study was well performed, and the manuscript is clear.

      We thank the reviewer for this very positive comments.

      However, there is need for more details on the description and interpretation of the adoptive transfer experiments, as well as the statistical analyses and number of replicate independent experiments.

      I’m concerned by the reviewer’s comments, and I would like to bring additional information/explanation, which I hope will convince you on the validity of our approaches.

      Author response image 4.

      Adoptive transfer experiment. Adoptive transfer experiments are classically used to document which immune cells participate in immune cell responses (with more than 150 publications in pubmed with the key words adoptive transfer and onco immunology) and intravenous administration is a common route to trigger lungs (PMID: 23336716). To characterize the molecular effector (P2RX7, NLRP3, IL18 and IL1B) accounting for the antifibrotic effect of HEI3090 we purified splenocytes from donor mice and administrated them intra venously in P2RX7 KO mice. As shown in Author response image 4, HEI3090 has no antifibrotic activity when splenocyte isolated from mice invalidated for p2rx7 are iv into P2RX7 KO mice (KO in KO). By contrast, HEI3090 has antifibrotic activity when WT splenocytes expressing P2RX7 (isolated from WT mice) are transferred into P2RX7 KO mice (WT in KO).

      This experiment brings strong evidence to demonstrate the efficacy of adoptive transfer approach to identify molecular effector required to mediate the antifibrotic effect of HEI3090.

      Statistical analyses and number of replicate independent experiments

      We thank the reviewer for his comment, and we apologize to not have been sufficiently clear in our previous response with this miss phrased statement “the experiment was stopped when significantly statistical results were observed” when we should have written “the experiment was stopped when each experimental group contained at least 5 mice”.

      To define the size of experimental groups we did a pilot experiment, with 4 WT mice (e.g. 4 biological replicates) in each group (as shown aside), and a statistical forecasting based on the result of the pilot experiment (40% difference, standard error: 0.9, α risk: 0.05, power: 0.8). Since we focused on the effect of HEI3090 we based our statistical analysis on a one-way ANOVA analysis comparing in each experiment the vehicle and the treated group.

      The pilot experiment and statistical forecasting indicated 4 mice per group to characterize the effect of HEI3090 on BLM-induced lung fibrosis. Each experiment was started with 6 to 8 mice per group. Being aware that 30% of mice can unexpectedly dye due to BLM treatment, we duplicated the experiment, when necessary, to include at least 5 mice in each group of each experiment meaning 5 biological replicates, knowing that 4 mice are sufficient to statistically analyze the results. In each experiment we have checked for the presence of outlier, using the ROULT method, and removed the outliers when necessary.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, Yang et al report a novel regulatory role of SIRT4 in the progression of kidney fibrosis. The authors showed that in the fibrotic kidney, SIRT4 exhibited an increased nuclear localization. Deletion of Sirt4 in renal tubule epithelium attenuated the extent of kidney fibrosis following injury, while overexpression of SIRT4 aggravates kidney fibrosis. Employing a battery of in vitro and in vivo experiments, the authors demonstrated that SIRT4 interacts with U2AF2 in the nucleus upon TGF-β1 stimulation or kidney injury and deacetylates U2AF2 at K413, resulting in elevated CCN2 expression through alternative splicing of Ccn2 gene to promote kidney fibrosis. The authors further showed that the translocation of SIRT4 is through the BAX/BAK pore complex and is dependent on the ERK1/2-mediated phosphorylation of SIRT4 at S36, and consequently the binding of SIRT4 to importin α1. This fundamental work substantially advances our understanding of the progression of kidney fibrosis and uncovers a novel SIRT4-U2AF2-CCN2 axis as a potential therapeutic target for kidney fibrosis.

      Strengths:

      Overall, this is an extensive, well-performed study. The results are convincing, and the conclusions are mostly well supported by the data. The message is interesting to a wider community working on kidney fibrosis, protein acetylation, and SIRT4 biology.

      Weaknesses:

      The manuscript could be further strengthened if the authors could address a few points listed below:

      (1) In the results part 3.9, an in vitro deacetylation assay employing recombinant SIRT4 and U2AF2 should be included to support the conclusion that SIRT4 is a deacetylase of U2AF2. Similarly, an in vitro binding assay can be included to confirm whether SIRT4 and U2AF2 are directly interacted.

      Thank you for your insightful comments and suggestions for improving our manuscript. We appreciate your recommendation to include an in vitro deacetylation assay employing recombinant SIRT4 and U2AF2 to support our conclusion regarding the deacetylase activity of SIRT4 on U2AF2.

      We would like to clarify that the data demonstrating the effect of SIRT4 on U2AF2 acetylation were already included in our original submission. Specifically, Figure 5C illustrates that the TGF-β1-caused decreased acetylation of U2AF2 is attenuated by Sirt4 knockdown. Conversely, overexpression of SIRT4 (SIRT4 OE) enhances the deacetylation process of U2AF2 in the presence of TGF-β1. These results support that SIRT4 is a deacetylase for U2AF2.

      Furthermore, we have already provided evidence of the direct interaction between SIRT4 and U2AF2 through a co-immunoprecipitation (CoIP) assay, which was shown in Figure 5B. This assay confirms the physical interaction between SIRT4 and U2AF2.

      We believe that the existing data sufficiently address the points raised in your comments. We are grateful for the opportunity to clarify these aspects of our study and hope that our response has adequately addressed your concerns.

      (2) In Figure 6D, the Western Blot data using U2AF2-K453Q is confusing and is quite disconnected from the rest of the data and not explained. This data can be removed or explained why U2AF2-K453Q is employed here.

      Thank you for your inquiry regarding the rationale behind the K453Q mutation in our study.

      In the study, we have predicted some acetylation sites. U2AF2-K453Q is another site mutation to mimic a hyperacetylated state of U2AF2, our results indicated that U2AF2 acetylation at K413 had little effects on CNN expression. Therefore, we found that only the U2AF2 acetylation at K413 can regulate CCN2 expression, not acetylation at other sites. In order not to cause ambiguity in the study, we have removed the results of U2AF2-K453Q in our revised manuscript.

      (3) Although ERK inhibitor U0126 blocked the nuclear translocation of SIRT4 in vivo, have the authors checked whether treatment with U0126 could affect the expression of kidney fibrosis markers in UUO mice?

      Thank you for your insightful question regarding the effects of the ERK inhibitor U0126 on the expression of kidney fibrosis markers in UUO mice.

      In our study, we indeed conducted in vivo experiments using U0126 and observed that it effectively ameliorated kidney fibrosis markers, which is consistent with its established role in inhibiting the fibrotic process. Specifically, U0126 treatment significantly suppressed the SIRT4-mediated renal fibrosis, which was evidenced by the reduced expression of fibrosis markers (Author response image 1).

      Author response image 1.

      U0126 treatment alleviates renal fibrosis in UUO mice.

      However, in the initial submission, we chose not to include these results in the main body of the manuscript based on the following reasons: 1) we intent to highlight the inhibitory effects of U0126 on ERK and its subsequent impact on kidney fibrosis might shift the focus of our study away from the central theme of SIRT4's role in renal fibrosis. 2) We aimed to maintain a clear narrative that emphasizes the novel findings related to SIRT4 and its regulation by the ERK pathway.

      Nonetheless, we recognize the importance of these findings and are willing to include the relevant data in the revised manuscript if it aligns with the journal's editorial direction and contributes to the broader understanding of renal fibrosis treatment strategies.

      We appreciate the opportunity to clarify this aspect of our research and are open to further suggestions from the editorial team.

      (4) The format of gene and protein abbreviations in the manuscript should be standardized.

      Thank you for your comment on the formatting of gene and protein abbreviations in our manuscript. We have carefully reviewed our formatting practices and confirmed that we have adhered to the standard conventions as follows:

      (1) Mouse gene names are presented with an initial capital letter and in italics.

      (2) Human gene names are written in uppercase and in italics.

      (3) Protein names are in all capital letters and not italicized.

      We understand the importance of consistency in scientific publications and have ensured that these standards are uniformly applied throughout the revised manuscript. If there were any discrepancies, we have corrected them to maintain the clarity and professionalism.

      We appreciate the opportunity to refine our work and are committed to upholding the standards of scientific communication.

      (5) There are a few grammar issues throughout the manuscript. The English/grammar could be stronger, thus improving the overall accessibility of the science to readers.

      Thank you for bringing the grammar issues to our attention. We have made diligent efforts to revise and improve the manuscript's English and grammar throughout. We have also enlisted the support of a professional language editing service to ensure the clarity and accuracy of our scientific communication.

      We are confident that these revisions have significantly enhanced the manuscript's accessibility to a broader readership and have addressed the language concerns raised.

      We appreciate your guidance and are committed to delivering a manuscript of the highest quality.

      Reviewer #2 (Public Review):

      Summary:

      This manuscript presents a novel and significant investigation into the role of SIRT4 For CCN2 expression in response to TGF-β by modulating U2AF2-mediated alternative splicing and its impact on the development of kidney fibrosis.

      Strengths:

      The authors' main conclusion is that SIRT4 plays a role in kidney fibrosis by regulating CCN2 expression via pre-mRNA splicing. Additionally, the study reveals that SIRT4 translocates from the mitochondria to the cytoplasm through the BAX/BAK pore under TGF-β stimulation. In the cytoplasm, TGF-β activated the ERK pathway and induced the phosphorylation of SIRT4 at Ser36, further promoting its interaction with importin α1 and subsequent nuclear translocation. In the nucleus, SIRT4 was found to deacetylate U2AF2 at K413, facilitating the splicing of CCN2 pre-mRNA to promote CCN2 protein expression. Overall, the findings are fully convincing. The current study, to some extent, shows potential importance in this field.

      Weaknesses:

      (1) Exosomes containing anti-SIRT4 antibodies were found to effectively mitigate UUO-induced kidney fibrosis in mice. While the protein loading capacity and loading methods were not mentioned.

      We appreciate your inquiry about the protein loading capacity and methods for the exosomes. As you have correctly noted, these details are indeed essential for the comprehensive understanding of our experimental approach. We have provided these information in the electronic supplementary material, specifically in Section 2.17, where we describe the methodology used for loading the anti-SIRT4 antibodies into the exosomes and the capacity at which this was achieved.

      We hope that this additional detail in the supplementary material addresses your concerns and enhances the clarity of our study's methodology.

      (2) The method section is incomplete, and many methods like cell culture, cell transfection, gene expression profiling analysis, and splicing analysis, were not introduced in detail.

      Thank you for your meticulous review and the feedback provided on our manuscript. We acknowledge your concern regarding the completeness of the methods section.

      We would like to clarify that in our initial submission, all text and figures were compiled into a single document, with the supplementary methods detailed at the end, separate from the main text methods. This format was chosen to adhere to submission guidelines that prioritize the concise presentation of core methods in the main text while providing additional details in the supplementary material for comprehensiveness.

      The detailed methodologies for cell culture, cell transfection, gene expression profiling analysis, and splicing analysis, which you inquired about, are now indeed included in the revised electronic supplementary material.

      We apologize for any misunderstanding caused by the initial structure of our submission and appreciate the opportunity to clarify the comprehensive nature of our methodological reporting.

      (3) The authors should compare their results with previous studies and mention clearly how their work is important in comparison to what has already been reported in the Discussion section.

      We appreciate the opportunity to discuss the significance of our findings in the broader context of renal fibrosis research. In response to your suggestion, we have further refined our discussion to explicitly compare our results with those of previous studies and to clearly articulate the importance of our work.

      (1) Novelty of SIRT4's Role in Renal Fibrosis: Our study introduces a novel concept in the field by demonstrating the nuclear translocation of SIRT4 as a key initiator of kidney fibrosis. This finding diverges from previous studies that have primarily focused on SIRT4's mitochondrial roles, highlighting a new dimension of SIRT4's function in renal pathophysiology.

      (2) Mechanistic Insights: We provide a detailed mechanistic pathway, from the release of SIRT4 from mitochondria through the BAX/BAK pore to its subsequent nuclear translocation and impact on U2AF2 deacetylation. This pathway has not been previously described, offering a fresh perspective on the regulation of fibrogenic gene expression.

      (3) Implications for Therapy: Our findings suggest potential therapeutic interventions targeting SIRT4 nuclear translocation, which could be a significant advancement over existing treatments that have shown limited efficacy in addressing the root causes of renal fibrosis.

      (4) Epigenetic Regulation: By elucidating the role of SIRT4 in regulating alternative splicing of CCN2 pre-mRNA through U2AF2 deacetylation, our study contributes to the growing understanding of epigenetic mechanisms in renal fibrosis, a field that has been understudied compared to genetic factors.

      Differential Cellular Roles of SIRT4: Our work indicates that SIRT4 may have distinct roles in different cell types, which is a complex and nuanced aspect of CKD pathophysiology that has not been fully explored in previous research.

      Integration with Previous Research: We have compared our findings with existing literature, noting where our work aligns with and diverges from previous studies. This comparison underscores the value of our research in expanding the current paradigm of renal fibrosis.

      In conclusion, we believe that our study provides critical insights into the pathogenesis of renal fibrosis and offers a potential therapeutic target. We have clarified these points in the discussion section of our manuscript to ensure that the significance of our work is clearly communicated to the readers.

      Reviewer #3 (Public Review):

      Summary:

      Yang et al reported in this paper that TGF-beta induces SIRT4 activation, TGF-beta activated SIRT4 then modulates U2AF2 alternative splicing, U2AF2 in turn causes CCN2 for expression. The mechanism is described as this: mitochondrial SIRT4 transport into the cytoplasm in response to TGF-β stimulation, phosphorylated by ERK in the cytoplasm, and pathway and then undergo nuclear translocation by forming the complex with importin α1. In the nucleus, SIRT4 can then deacetylate U2AF2 at K413 to facilitate the splicing of CCN2 pre-mRNA to promote CCN2 protein expression. Moreover, they used exosomes to deliver Sirt4 antibodies to mitigate renal fibrosis in a mouse model. TGF-beta has been widely reported for its role in fibrosis induction.

      Strengths:

      TGF-beta induction of SIRT4 translocation from mitochondria to nuclei for epigenetics or gene regulation remains largely unknown. The findings presented here that SIRT4 is involved in U2AF2 deacetylation and CCN2 expression are interesting.

      Weaknesses:

      SIRT4 plays a critical role in mitochondria involved in respiratory chain reaction. This role of SIRT4 is critically involved in many cell functions. It is hard to rule out such a mitochondrial activity of SIRT4 in renal fibrosis. Moreover, the major concern is what kind of message mitochondrial SIRT4 proteins receive from TGF-beta. Although nuclear SIRT4 is increased in response to TNF treatment, it is likely de novo synthesized SIRT4 proteins can also undergo nuclear translocation upon cytokine stimulation. TGF-beta-induced mitochondrial calcium uptake and acetyl-CoA should be evaluated for calcium and acetyl-CoA may contribute to the gene expression regulation in nuclei.

      Recommendations for the authors:

      Reviewer #3 (Recommendations For The Authors):

      (1) SIRT4 overall is a mitochondrial enzyme that indeed can undergo shuttling between mitochondria and cytoplasm. Renal fibrosis is a process of complex, SIRT4 deacetylates U2AF4 at K 413.

      Thank you for your comment highlighting the known mitochondrial localization of SIRT4 and its role in renal fibrosis.

      We concur with the literature that SIRT4 is predominantly a mitochondrial enzyme. However, our study expands upon this understanding by demonstrating a novel shuttling mechanism of SIRT4 between mitochondria and the nucleus in the context of renal fibrosis. Specifically, we observed that under conditions of obstructive nephropathy and renal ischemia reperfusion injury, SIRT4 significantly accumulates in the nucleus, which is a critical event in the fibrotic response.

      Our findings reveal that upon TGF-β stimulation, a known inducer of fibrosis, SIRT4 is released from the mitochondria through the BAX/BAK pore and subsequently translocates to the nucleus. This translocation is mediated by the ERK1/2-dependent phosphorylation of SIRT4 at serine 36, which enhances its interaction with importin α1, a key component in nuclear import processes.

      Once in the nucleus, SIRT4 exerts its effects on the alternative splicing of CCN2 pre-mRNA by deacetylating U2AF2 at lysine 413. This deacetylation event promotes the formation of the U2 small nuclear ribonucleoprotein (U2 snRNP) and facilitates the splicing of CCN2 pre-mRNA, leading to increased expression of the profibrotic protein CCN2.

      Our study, therefore, not only confirms the mitochondrial association of SIRT4 but also uncovers its nuclear function in the regulation of gene expression during renal fibrosis. These findings underscore the complexity of SIRT4's role in cellular processes and its potential as a therapeutic target for fibrotic diseases.

      (2) Figure 2 and Figure 3 should be combined.

      Thank you for your suggestion to combine Figures 2 and 3 for potential improvement in presentation.

      After careful consideration, we have found that merging these figures is not feasible due to space constraints on a standard A4 page, which is necessary to maintain the clarity and detail of the data presented in both figures. Each figure contains complex data that, when combined, would compromise the readability and the integrity of the individual elements.

      We believe that the current presentation of Figures 2 and 3 provides a clear and detailed visualization of the data, which is essential for the reader's understanding of our study's findings.

      (3) In Figure 4G, the mass spectrum of U2AF2 acetylation on K413 should be included rather than the alignment among species. Moreover, endogenous HAT1 on endogenous U2AF2 rather than exogenous FLAG-U2F2 should be examined.

      Thank you for your thoughtful comments and for the suggestion to include the mass spectrum of U2AF2 acetylation on K413 in Figure 4G.

      We appreciate the value that the mass spectrometry data would add to our study, providing a direct and definitive assessment of the acetylation status at this specific residue. However, we regret to inform you that our current facilities do not have access to the necessary mass spectrometry equipment to perform these analyses.

      While we are unable to include this data in the present manuscript, we concur with the importance of such evidence and plan to undertake these studies in the future. We are in the process of establishing collaborations with laboratories that have the required facilities to perform mass spectrometry. Our intention is to incorporate these data into a follow-up study, which will further validate and expand upon the findings presented in this manuscript.

      We believe that our current findings, although lacking the mass spectrometry confirmation, still provide valuable insights into the role of U2AF2 acetylation in [insert relevant biological process]. We have taken care to present our data rigorously and transparently, and we are committed to pursuing the highest standards of experimental validation in our future work.

      We hope you will consider the merits of our study in the context of the current limitations and appreciate the opportunity to clarify our position.

      Furthermore, regarding the examination of endogenous HAT1's effect on endogenous U2AF2 acetylation levels, we have conducted the necessary experiments. Our results demonstrate that overexpression of HAT1 leads to a significant increase in the acetylation of endogenous U2AF2 (Figure. R2). This new data set has been added to the revised manuscript and supports the role of HAT1 in the regulation of U2AF2 acetylation.

      We believe that these revisions address your concerns and provide a more comprehensive understanding of the molecular mechanisms underlying the regulation of U2AF2 acetylation.

      We appreciate the opportunity to improve our manuscript based on your constructive feedback and hope that our revisions meet with your satisfaction.

      Author response image 2.

      HAT1 OE reduces the acetylation of endogenous U2AF2

      (4) Figure 6F. Does portien mean protein?

      Thank you for your careful review and insightful comments on our manuscript. You are correct in pointing out the error regarding the term "portien" in Figure 6F. It was indeed a typographical oversight on our part, and we apologize for any confusion this may have caused.

      We have made the necessary correction to ensure that "protein" is accurately used in place of "portien" in Figure 6F. We appreciate the opportunity to enhance the clarity and accuracy of our presentation.

      (5) The authors should pay attention to their writing. There are many typos and other issues with the use of the English language and grammar.

      Thank you for bringing the grammar issues to our attention. We have made diligent efforts to revise and improve the manuscript's English and grammar throughout. We have also enlisted the support of a professional language editing service to ensure the clarity and accuracy of our scientific communication.

      We are confident that these revisions have significantly enhanced the manuscript's accessibility to a broader readership and have addressed the language concerns raised.

      We appreciate your guidance and are committed to delivering a manuscript of the highest quality.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1

      Reviewer #1’s main concerns revolved around the evidential strength of the study’s conclusion that age-specific effects of birth weight on brain structure are more localized and less consistent across cohorts than age-uniform, stable effects. Specifically, the reviewer points out the evidence (or lack of such) for age-specific effects. We have rearticulated as a “bullet-point summarization” the reviewer’s concerns for a better response (please, see the original reviewer’s response in the annexed document). We thank the reviewer for his/her comment.

      Concern #1: No direct statistical comparisons are conducted between samples (beyond the spin-tests).

      In the initial version of the manuscript, the spin-tests represented a key test since they compared the spatial distribution of birth weight effects across samples. In the revised manuscript, we additionally perform a replicability analysis across samples both for birth weight effects on brain characteristics and on brain change in a similar fashion as described for the within-sample analysis. The results of these analyses provide complementary evidence of robust associations of birth weight effects on cortical characteristics (for area and volume, less so for thickness) and of unreliable associations of birth weight on cortical change. These analyses are briefly mentioned in the main document and fully described as supplementary information. Briefly, the effects of birth weight on cortical area and cortical volume showed high (exploratory and confirmatory) replicability while replicability was almost nonexistent for the effects of birth weight on cortical change. See below, under Reviewer #1, concern #2, for a description of the changes in the revised manuscript.

      Concern #2: The differential composition of samples in terms of age distribution leads to the possibility that lack of results is explained by methodological differences.

      The revised version of the manuscript provides now a within-sample replicability analysis of the birth weight effects on cortical change. This analysis addresses the reviewer’s concern as the lack of replicability in this analysis cannot be attributed to sample or methodological differences. We thank the reviewer for suggesting this analysis which provides further quantification of the (lack of) robustness of the birth weight effects on cortical change. See below for changes in the revised version of the manuscript concerning additional replicability analyses which were carried out as a response to reviewer #1 concerns #1 and #2.

      pp. 12-3. “Additionally, we performed replicability analyses both across and within samples to further investigate the robustness of the effects of birth weight on cortical characteristics and cortical change. Split-half analyses within datasets were performed, to investigate the replicability of significant effects 36,37 of BW on cortical characteristics within samples (refer to Figure 1). These analyses further confirmed that the significant effects were largely replicable for volume and area, but not for thickness (see Supplementary Figure 11). Split-half analyses of BW on cortical change (refer to Figure 2) showed, in general, a very low degree of replicability on the three different cortical measures. See Supplementary Table 3. Replicability across datasets showed a similar pattern, that is, replicability was high for the effect of brain weight on cortical characteristics but very low for the effects of cortical change. See Supplementary Table 4 for stats. See Supplementary statistical methods for a full description of the analyses. These analyses provide complementary evidence of robust associations of BW with cortical area and volume – but not cortical change - across and within samples.”

      p. 41. “For each dataset and cortical measure, we assessed the effects of birth weight on cortical structure and cortical change (…)”

      p. 42. “Across samples replicability was performed as described in the within-sample replicability analysis (i.e., we assessed the exploratory and confirmatory replicability) except that split-half was not performed - the three datasets were compared with each other - and the analyses were performed in the original fsaverage space.”

      pp. 54-55. “The exploratory replicability of birth weight on cortical change was negligible across datasets and measures [.00 (.00), .00 (.00), .00 (.00) for area, .02 (.09), .00 (.02), .01 (.03) for volume, and .01 (.05), .01 (.14), .00 (.01) for thickness] while confirmatory replicability was generally poor, except for the ABCD dataset [.02 (.05), .68 (.35), .00 (.00) for area, .08 (.14), .56 (.25), .00 (.02) for volume, and .37 (.26), .60 (.27), .01 (.03) for thickness] (see Supplementary Table 3).

      These results are not fully comparable to other studies assessing the replicability of brain phenotype associations due to analytical differences (e.g. sample size, multiple-comparison correction method)20,36, yet clearly show that the rate of replicability of BW associations with cortical area and volume are comparable to benchmark brain-phenotype associations such as body-mass index and age68. Lower levels of replicability in the LCBC subsample are likely attributable to higher sample variability (e.g. increased age span). Kinship may lead to inflated patterns of replicability within the ABCD cohort. Confirmatory replicability is, also, to some degree, affected by sample size, and thus the estimates of confirmatory replicability may be somewhat inflated in the ABCD dataset.

      Finally, the degree of across-sample replicability was high for the effects of birth weight on cortical area and volume (average confirmatory replicability = .96 and .93), low for thickness (.27), and negligible for the effects of birth weight on cortical change (.03, .06, and .06). See further information in Supplementary Table 4.”

      Concern #3: Some datasets have a narrow age range precluding the detection of age-related effects.

      We do not believe concern #3 is a major problem since timebirth weight refers to a within subject contrast, e.g., longitudinal-only-based contrast. Birth weight, even when self reported, is a highly reliable measure and the sample sizes are relatively large (n = 635, 1759, and 3324 unique individuals). Note that the smaller dataset does have longer follow-up times and more observations per participant, increasing the reliability of estimations in individual change. Structural MRI measures have very high reliability. Clearly, longitudinal brain change is less reliable, yet the present sample size and the high reliability of birth weight should provide enough statistical power to capture even small time-varying effects of birth weight on brain structure. Note as well that in each model age is treated as a covariate. Rather, the consistency of timebirth weight (that is, the effects of birth weight on cortical change) is assessed with split-half replications within and across samples. In this methodological pipeline, a narrow age range for a given dataset, if anything, may constitute an advantage. We have clarified the statistical model (see changes in the revised manuscript, referred to in response to reviewer #1, concern #5).

      Concern #4 The modeling strategy does not allow for non-linear interaction between age and BW suggesting the use of spline models instead in a mega-analytical fashion.

      Indeed, we agree that some - if not most - brain structures follow non-linear trajectories throughout life. In the present study, age regressors are used only for accounting for variance in the data rather than capturing any effect of interest. Rather, it is the time*birth weight regressor that captures age-varying changes in brain structure. Time reflects within-subject follow-up time. We believe non-linear modeling of age will only account for additional variance (compared to linear models) in the LCBC dataset given the dataset’s wider age range, while it will not have any consequential effect in the ABCD and UKB datasets (as predicted in the provisional response). In any case, we recognize it as a valid concern. Consequently, we have rerun the main models in an ROI-based fashion using or not using spline models to fit age. Specifically, we have fitted the models in each of Desikan-Killiany’s ROIs using generalized additive mixed models (GAMM with age as a smooth term) or linear mixed models (LME with age as a linear regressor). The results are shown in Supplementary Figures 13 and 14. The Beta regressors are nearly identical. As expected, the differences are noticeable in the LCBC dataset while the effect of using - or not using- splines to fit age is almost null in the other two datasets. See also FDR-corrected maps below for both birth weight effects on brain structure and brain change (we opted to show Beta-maps as supplementary material as the multiple-comparisons correction in the ROI-based analysis is not fully comparable with the one used in the vertex-wise approach).

      p. 9: “Both birth weight effects on cortical characteristics and cortical change were rerun (ROIwise) using spline models that accounted for possible non-linear effects of age on cortical structure. The results were comparable to those reported above in Figures 1 and 2. See Supplementary Figures 13 and 14 for birth weight effects on cortical characteristics and cortical change, respectively.”

      Caption to Supplementary Figure 13. “Comparison between spline (GAMM) and linear (LME) models on the effect of birth weight on cortical characteristics. Age was fitted either as a smoothing spline using generalized additive mixed models (GAMM, mgcv r-package) or a linear regressor with a linear mixed models (LME, lmer r-package) framework. The analyses were performed ROI-wise using the Desikan-Killiany atlas. Significance was considered at a FDR corrected threshold of p < 0.04. All the remaining parameters were comparable to the main analyses shown in Figure 1. The viridis-yellow scale represents the lower-higher Beta regressors. Red contour displays regions showing significant effects of birth weight. Note the high correspondence with both fitting models. Differences are only noticeable in the LCBC sample due to the datasets’ wider age range (i.e., lifespan dataset).” Caption to Supplementary Figure 14. “Comparison between spline (GAMM) and linear (LME) models on the effect of birth weight on cortical change. Age was fitted either as a smoothing spline using generalized additive mixed models (GAMM, mgcv r-package) or a linear regressor with a linear mixed models (LME, lmer r-package) framework. The analyses were performed on ROI-based using the Desikan-Killiany atlas. Significance was considered at a FDR corrected threshold of p < 0.04. All the remaining parameters were comparable to the main analyses shown in Figure 1. The viridis-yellow scale represents the lower-higher Beta regressors. Red contour displays regions showing significant effects of birth weight. Note the high correspondence with both fitting models. Differences are only noticeable in the LCBC sample due to the datasets’ wider age range (i.e., lifespan dataset).” The figures below show the birth weight effects on brain characteristics (above) and change (below) using a GAMM or an LME approach; that is, using age as a smooth term or as a regressor. FDR-corrected p < 0.05 values are shown in a signed logarithmic scale. Red-yellow values represent positive associations between birth weight and brain while blue-lightblue values represent negative associations. The results are qualitatively comparable and quantitative differences exist only in the LCBC dataset. Please see Supplementary Figures 13 and 14 in the revised manuscript.

      Author response image 1.

      Concern #5: Greater clarity regarding the statistical models and the provision of effect-size maps.

      The revised manuscript provides additional information regarding the statistical model, especially in the results section, to avoid misunderstanding (see below examples of clarifications in the revised manuscript). We now provide Beta-maps, F-maps, unthresholded p-values maps, and degrees of freedom for the main univariate analyses. That is, we provide this information for both the whole sample and the twin analyses which correspond to Figures 1, 2, 4, and 5. We opted not to compute effect-size estimates (e.g. partial eta-squared, cohen’s d) due to the ambiguous interpretation of these maps in the context of linear mixed models.

      p.8. “To test the effect of birth weight on cortical change we rerun the analyses with BW x time and age x time interactions. Note BW x time (i.e., within-subject follow-up time) represents the contrasts of interest while age – and age interactions – are used to account for differences in age across individuals.”

      p.11. “In contrast, the spatial correlation of the maps capturing BW-associated cortical change (i.e., BW x time contrast) …”

      p. 12. “Additionally, we performed replicability analysis both across and within samples to further investigate the robustness of the effects of birth weight on cortical characteristics and cortical change.”

      p. 14: “BW discordance analyses on twins specifically were run as described for the main analyses above, with the exception that twin scans were reconstructed using FS v6.0.1. for ABCD and the addition of the twin’s mean birth weight as a covariate.”

      p .31. “Group-level unthresholded p-maps, F-maps, Beta-maps, and degrees of freedom for the univariate analyses accompany this manuscript as additional material.”

    1. Author Response

      The following is the authors’ response to the original reviews.

      necessary clarifications on some of the reviewers' suggestions.

      Reviewer #1 (Public Review):

      Weaknesses:

      • This is a pilot study with only 24 cases and 24 controls. Because the human microbiota entails individual variability, this work should be confirmed with a higher sample size to achieve enough statistical power.

      Thank you for your suggestion. Unlike the high sparsity of 16s rRNA, the data density of metagenomic data is higher. Based on the experience of previous research, the sample size used this time can basically meet the requirements. However, your suggestion is very valuable, increasing the sample size allows better in-depth analysis. Due to limitations of objective factors, it is difficult for us to continue to increase the sample size in this study.

      • The authors do not report here the use of blank controls. The use of this type of control is important to "subtract" the potential background from plasticware, buffer or reagents from the real signal. Lack of controls may lead to microbiome artefacts in the results. This can be seen in the results presented where the authors report some bacterial contaminants (Agrobacterium tumefaciensis, Aequorivita lutea, Chitinophagaceae, Marinobacter vinifirmus, etc) as part of the most common bacteria found in cervical samples.

      Thank you for your suggestion. Applying blank controls in low biomass areas can effectively avoid contamination caused by the environment or kits. This opinion is consistent with that published by Raphael Eisenhofer et al. in Trends in Microbiology. When designing this study, we considered that this study described a biomass-rich site, and the abundance of dominant species was much higher than that of the possible 'kitome', so we did not set a blank control. On the other hand, our main discussion object in this study is high-abundance species, and the species filtering threshold for some analyzes was raised to 50%. Therefore, we believe that the absence of the blank control has little effect on the conclusions of this study. However, your opinion is spot on. Failure to set up a negative control will affect our future research on rare species. We will add a description in the Limitations section of the Discussion section.

      • Samples used for this study were collected from the cervix. Why not collect samples from the uterine cavity and isthmocele fluid (for cases)? In their previous paper using samples from the same research protocol ((IRB no. 2019ZSLYEC-005S) they used endometrial tissue from the patients, so access to the uterine cavity was guaranteed.

      Thank you for your suggestion. In Author response image 1 we show the approximate location of our cervical swab sampling. There are two main reasons for choosing cervical swabs:

      1) The adsorption of swabs allows us to obtain sufficient nucleic acid for high-depth sequencing, while the isthmocele fluid varies greatly among patients, which will introduce unnecessary batch effects.

      2) Since the female reproductive tract is a continuous whole, our sampling location is close to the lesion in the cervix, which can be effectively studied. On the other hand, the microbial biomass of the endometrium is probably two orders of magnitude lower than that of the cervix, and it is difficult to avoid contamination of the lower genital tract when sampling.

      Based on the above reasons, we selected cervical swabs for our microbial data.

      Author response image 1.

      • Through the use of shotgun genomics, results from all the genomes of the organisms present in the sample are obtained. However, the authors have only used the metagenomic data to infer the taxonomical annotation of fungi and bacteria.

      Thank you for your suggestion. The advantage of metagenomics is that it can obtain all the nucleic acid information of the entire environment. However, in the study of the female reproductive tract, the database of viruses and archaea is still immature, in order to ensure the accuracy of the results, we did not conduct the study. Looking forward to the emergence of a mature database in the future.

      Reviewer #1 (Recommendations For The Authors):

      • It would be interesting to use another series of functional data coming from the metagenomic analyses (not only taxonomic) to expand and reinforce the results presented.

      Thank you for your suggestion. We have dissected the functional data of microbiota in the article.

      • The authors have previously published the 16S rRNA sequencing and transcriptomic analysis of the same set of patients. It would be nice to see the integration of all the datasets produced.

      Thank you for your suggestion. There is no doubt that integrating all the data will have more dimensional results. In our previous study we focused on microbe-host interactions. However, there is an unanswered question: What are the characteristics of the regulatory network within microbiota? Therefore, we answered this question in this study, exploring the complex interaction processes within microbial communities. In addition to direct effects, interactions between microbiota may also occur through special metabolite experiments. Therefore, we introduced the analysis of the untargeted metabolome. However, 16s rRNA can only provide bacterial information, so we did not integrate the data. In addition, the transcriptome provides host information and is not the focus of this study. However, your suggestion is very valuable, and we will integrate all the data in the next study on the exploration of treatment methods.

      Reviewer #2 (Public Review):

      Weaknesses: Methodological descriptions are minimal.

      Some example:

      *The CON group (line 147) has not been defined. I supposed it is the control group.

      • There are no statistics related to shotgun sequencing. How many reads have been sequenced? How many have been removed from the host? How many are left to study bacteria and fungi? Are these reads proportional among the 48 samples? If not, what method has been used to normalise the data?

      • ggClusterNet has numerous algorithms to better display the modules of the microbiome network. Which one has been used?

      Thank you for your suggestion. We have added details to the method.

      Reviewer #2 (Recommendations For The Authors):

      I think the author should take into account the points described in the "Weaknesses" section. The lack of detail extends to almost all the analyses that have been included in the manuscript. Although the results are sound, I think it is important to understand what has been analysed and how it has been analysed. It is important that all work is reproducible and this requires vital information.

      For example, what parameters have been used for bowtie2? has a local analysis been used? or end-to-end ? Some parameters like --very-sensitive are important for this kind of analysis. You can also use specific programs like kneaddata.

      The Raw data preprocessing section should be more detailed.

      The same with the "Taxa and functional annotation" section, how have the data been normalised? has any Zero-Inflated Gamma probabilistic model algorithm been taken into account? How were the 0 (no species detected) in the shallow samples treated?

      Which algorithms have been used for LEfSe ? Kluskal-Wallis->(Wilcoxon)->LDA ?

      Which p-value has been used as cut-off ? this p-value has been corrected for multiple testing?

      • Information on ggClusterNet should be included and explained.

      The first section of the results and Table 1 should be in the Materials and Methods.

      Thank you for your suggestion. We have added details to the method.

      In the fungi section, it is mentioned that 431 species have been found. They should be included in a supplementary table.

      How many bacteria were found? Please include them also in a supplementary table.

      Thank you for your suggestion. We have added the corresponding table.

      Reviewer #3 (Public Review):

      Major

      1. Smoke or drink conditions, as well as diseases like hypertension and diabetes are important factors that could influence the metabolism of the host, thus the authors should add them in the exclusion criteria in the Methods.

      Thanks to reviewer #3 for professional comments. We have made corresponding additions in the method section. We also followed this standard when recruiting subjects.

      1. The sample size of this study is not large enough to draw a convincing conclusion.

      Thank you for your suggestion. Unlike the high sparsity of 16s rRNA, the data density of metagenomic data is higher. Based on the experience of previous research, the sample size used this time can basically meet the requirements. However, your suggestion is very valuable, increasing the sample size allows better in-depth analysis. Due to limitations of objective factors, it is difficult for us to continue to increase the sample size in this study.

      Reviewer #3 (Recommendations For The Authors):

      Please recruit more samples.

      In addition, there are many formatting and grammatical mistakes in the manuscript.

      Minor

      1. In Line 24-25 of the "Composition and characteristics of fungal communities", the format of "Goyaglycoside A and Janthitrem E." shouldn't be italic.

      2. In Line 126 of the "Metabolite detection using liquid chromatography (LC) and mass spectrometry (MS)", the "10 ul" should be changed to "Ten ul". Beginning with arabic numerals in a sentence should be avoided.

      3. In Line 170 of the "Composition and characteristics of bacterial communities", the "162 differential species" should be "One hundred and sixty-two differential species".

      4. In Line 187 of the "Composition and characteristics of fungal communities", the "42 differential" should be "Forty-two differential".

      Thanks to reviewer #3 for professional comments. We have completely revised the language of the article.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      The authors focused on genetic variability in relation to insulin resistance. They used genetically different lines of mice and exposed them to the same diet. They found that genetic predisposition impacts the overall outcome of metabolic disturbances. This work provides a fundamental novel view on the role of genetics and insulin resistance.

      Reviewer #2 (Public Review):

      Summary:

      In the present study, van Gerwen et al. perform deep phosphoproteomics on muscle from saline or insulin-injected mice from 5 distinct strains fed a chow or HF/HS diet. The authors follow these data by defining a variety of intriguing genetic, dietary, or gene-by-diet phosphor-sites that respond to insulin accomplished through the application of correlation analyses, linear mixed models, and a module-based approach (WGCNA). These findings are supported by validation experiments by intersecting results with a previous profile of insulin-responsive sites (Humphrey et al, 2013) and importantly, mechanistic validation of Pfkfb3 where overexpression in L6 myotubes was sufficient to alter fatty acid-induced impairments in insulin-stimulated glucose uptake. To my knowledge, this resource provides the most comprehensive quantification of muscle phospho-proteins which occur as a result of diet in strains of mice where genetic and dietary effects can be quantifiably attributed in an accurate manner. Utilization of this resource is strongly supported by the analyses provided highlighting the complexity of insulin signaling in muscle, exemplified by contrasts to the "classically-used" C57BL6/J strain. As it stands, I view this exceptional resource as comprehensive with compelling strength of evidence behind the mechanism explored. Therefore, most of my comments stem from curiosity about pathways within this resource, many of which are likely well beyond the scope of incorporation in the current manuscript. These include the integration of previous studies investigating these strains for changes in transcriptional or proteomic profiles and intersections with available human phospho-protein data, many of which have been generated by this group.

      Strengths:

      Generation of a novel resource to explore genetic and dietary interactions influencing the phospho-proteome in muscle. This is accompanied by the elegant application of in silico tools to highlight the utility.

      Weaknesses:

      Some specific aspects of integration with other data among the same fixed strains could be strengthened and/or discussed.

      Reviewer #3 (Public Review):

      Summary:

      The authors aimed to investigate how genetic and environmental factors influence the muscle insulin signaling network and its impact on metabolism. They utilized mass spectrometry-based phosphoproteomics to quantify phosphosites in the skeletal muscle of genetically distinct mouse strains in different dietary environments, with and without insulin stimulation. The results showed that genetic background and diet both affected insulin signaling, with almost half of the insulin-regulated phosphoproteome being modified by genetic background on an ordinary diet, and high-fat high-sugar feeding affecting insulin signaling in a strain-dependent manner.

      Strengths:

      The study uses state-of-the-art phosphoproteomics workflow allowing quantification of a large number of phosphosites in skeletal muscle, providing a comprehensive view of the muscle insulin signaling network. The study examined five genetically distinct mouse strains in two dietary environments, allowing for the investigation of the impact of genetic and environmental factors on insulin signaling. The identification of coregulated subnetworks within the insulin signaling pathway expanded our understanding of its organization and provided insights into potential regulatory mechanisms. The study associated diverse signaling responses with insulin-stimulated glucose uptake, uncovering regulators of muscle insulin responsiveness.

      Weaknesses:

      Different mouse strains have huge differences in body weight on normal and high-fat high-sugar diets, which makes comparison between the models challenging. The proteome of muscle across different strains is bound to be different but the changes in protein abundance on phosphosite changes were not assessed. Authors do get around this by calculating 'insulin response' because short insulin treatment should not affect protein abundance. The limitations acknowledged by the authors, such as the need for larger cohorts and the inclusion of female mice, suggest that further research is needed to validate and expand upon the findings.

      Reviewer #1 (Recommendations For The Authors):

      I would suggest further discussion of the potential differences between males and females of the various strains.

      In the revised manuscript we have included a more detailed discussion of the potential differences between male and female mice in the "Limitations of this study" section on lines 455-459. In particular, a landmark study of HFD-fed inbred mouse strains found that insulin sensitivity, as inferred from the proxy HOMA-IR, was affected by interactions between sex and strain despite generally being greater in female mice (10.1016/j.cmet.2015.01.002). Furthermore, a recent phosphoproteomics study of human induced pluripotent stem-cell derived myoblasts identified groups of insulin-regulated phosphosites affected by donor sex, and by interactions between sex and donor insulin sensitivity (10.1172/JCI151818). Based on these results, we anticipate that both soleus insulin sensitivity and phoshoproteomic insulin responses would differ between male and female mice through interactions with strain and diet, adding yet another layer of complexity to what we observed in this study. This will be an important avenue for future research to explore.

      Reviewer #2 (Recommendations For The Authors):

      The following are comments to authors - many, if not all are suggestions for extended discussion and beyond the scope of the current elegant study.

      In the discussion section (line 428) the authors make a key point in that the genetic, dietary, and interacting patterns of variation of Phospho-sites could be due to changes in total protein and/or transcript levels across strains. For example, given the increased expression of Pfkfb3 was sufficient to impact glucose uptake, suggesting that the transcript levels of the gene might also show a similar correlation with insulin responsiveness as in Fig 6b. Undoubtedly, phospho-proteomics analyses will provide unique information on top of more classical omics layers and uncover what would be an important future direction. Therefore, I would suggest adding to the discussion some guidance on performing similar applications to datasets from, at least some, of the strains used where RNA-seq and proteomics are available.

      We thank the reviewer for this suggestion. To address this, we mined recently published total proteomics data collected from soleus muscles of seven CHOW or HFD-fed inbred mouse strains, three of which were in common with our study (C57Bl6J, BXH9, BXD34; 10.1016/j.cmet.2021.12.013). In this study ex vivo soleus glucose uptake was measured and correlation analysis was performed, so we directly extracted the resulting glucose uptake-protein associations and compared them to the glucose uptake-phosphoprotein associations identified in our study. Indeed, we found that only a minority of proteins correlated at both the phosphosite and total protein levels, highlighting the utility of phosphoproteomics to provide orthogonal information to more classical omics layers. We have included this analysis in lines 303-311.

      Relevant to this, the authors might want to consider depositing scripts to analyze some aspects of the data (ex. WGCNA on P-protein data or insulin-regulated anova) in a repository such as github so that these can be applied easily to other datasets.

      We refer the reviewer to the section "Code availability" on lines 511-513, where we deposited all code used to analyse the data on github.

      In contrast to the points above, I feel that the short time-course of insulin stimulation was one important aspect of the experimental design that was not emphasized enough as a strength. It was mentioned as a limitation in that other time points could provide more info, yes. But given that the total abundance of proteins and transcripts likely doesn't shift tremendously in this time frame, this provides an important appeal to the analysis of phosphor-proteomic data. I would suggest highlighting the insulin-stimulated response analysis here as something that leverages the unique nature of phosphoproteomics.

      We are grateful for the reviewer's positivity regarding this aspect of our experimental design. We have reiterated the value of the 10min insulin stimulation - that it temporally segregates phosphoproteomic and total proteomic changes - in the "Limitations of this study" section on lines 477-481.

      While I recognize the WGCNA analysis as an instrumental way to highlight global patterns of phospo-peptide abundance co-regulation, the analysis currently seems somewhat underdeveloped. For example, Fig 5f-h shows a lot of overlap between kinase substrates and pathways among modules. Clearly, there are informative differences based on the intersection with Humphries 2013 and the correlation with Pfkbp3. To highlight the specific membership of these modules, most people rank-order module members by correlation with eigen-gene (or P-peptide) and then perform pathway enrichments on these. Alternatively, it looks like all data was used to generate modules across conditions. One consideration would be to perform WGCNA on relevant comparison data separately (ex. chow mice only and HFHS only) and then compare modules whose membership is retained or shift between the two. Or even look at module representation for genes that show large correlations with insulin-responsiveness. This might also be a good opportunity to suggest readers intersect module members with muscle eQTLs which colocalize to glucose or insulin to prioritize some potential key drivers.

      We thank the reviewer for their helpful suggestions, which we feel have substantially improved the WGCNA analysis. To probe specific functional differences between subnetworks, we performed rank-based enrichment using phosphopeptide module membership scores. Interestingly, this did reveal pathways that were enriched only in certain modules. However, we found that after p-value adjustment, virtually all enriched pathways lost statistical significance, hence we interpret these results as suggestive only. We have made this analysis available to readers in Fig S4b-d and lines 263-265: "To further probe functional differences we analysed phosphopeptide subnetwork membership scores, which revealed additional pathways enriched in individual subnetworks. However, these results were not significant after p-value adjustment and hence are suggestive only (Fig. S4b-d)". We also visualised module representation for glucose-uptake correlated phosphopeptides. This agreed with our existing analyis in Fig. 6f, where the eigenpeptides of modules V and I were correlated with glucose uptake (Fig. 6f). We have incorporated this new analysis in Fig. S6b-c and lines 324-325: "Examining the subnetwork membership scores for glucose-uptake correlated phosphopeptides also revealed a preference for clusters V and I, supporting this analysis (Fig. S6b-c)." Finally, in the discussion we have presented the integration of genetic data, such as muscle-specific eQTLs, as a future direction (lines 398-401): "Alternatively, one could overlap subnetworks with genetic information, such as genes associated with glucose homeostasis and other metabolic traits in human GWAS studies, or muscle-specific eQTLs or pQTLs genetically colocalised with similar traits, to further prioritise subnetwork-associated phenotypes and identify potential drivers within subnetworks."

      Have the authors considered using their heritability and GxE estimated for module eigenpeptides? To my knowledge, this has never been performed and might provide some informative information as the co-regulated P-protein structure occurs as a result of relevant contexts.

      In the revised manuscript we have now analysed eigenpeptides with the same statistical tests used to identify Strain and Diet effects in insulin-regulated phosphopeptides. We have displayed the statistical results in Fig. S4a, and have explicitly mentioned examples of StrainxDiet effects on lines 245-247: "For example, HFD-feeding attenuated the insulin response of subnetwork I in CAST and C57Bl6J strains (t-test adjusted p = 0.0256, 0.0365), while subnetwork II was affected by HFD-feeding only in CAST and NOD (Fig. 5e, Fig. S4a, t-test adjusted p = 0.00258, 0.0256)."

      The integration of modules with adipocyte phosphoproteomic data from the authors 2013 Cell metab paper seems like an important way to highlight the integration of this resource to define critical cellular signaling mechanisms. To assess the conservation of signaling mechanisms and relationships to additional key contexts (ex. exercise), the intersection of the insulin-stimulated P-peptides with human datasets generated by this group (ex. cell metab 2015, nature biotech 2022) seems like an obvious future direction to prioritize targets. Figure S3B shows a starting point for these types of integrations.

      To demonstrate the value of integrating our results with related phosphoproteomics data, we have incorporated the reviewer's advice of comparing insulin-regulated phosphosites to exercise-regulated phosphosites from Needham et. Nature Biotech 2022 and Hoffman et al. Cell Metabolism 2015. We identified a small subset of commonly regulated phosphosites (8 across all three studies). Given insulin and exercise both promote GLUT4 translocation, these sites may represent conserved regulatory mechanisms. This analysis is presented in Fig. S3d, Table S2, and lines 129-135: "In addition to insulin, exercise also promotes GLUT4 translocation in skeletal muscle. We identified a small subset of phosphosites regulated by insulin in this study that were also regulated by exercise in two separate human phosphoproteomics studies (Fig. S3d, Table S2, phosphosites: Eef2 T57 and T59, Mff S129 and S131, Larp1 S498, Tbc1d4 S324, Svil S300, Gys1 S645), providing a starting point for exploring conserved signalling regulators of GLUT4 translocation."

      For the Pfkfb3 overexpression system, are there specific P-peptides that are increased/decreased upon insulin stimulation? This might be an interesting future direction to mention in order to link signaling mechanisms.

      We assessed whether canonical insulin signalling was affected by Pfkfb3 overexpression by immunoblotting. Insulin-stimulated phosphorylation of Akt S473, Akt T308, Gsk3a/b S21/S9, and PRAS40 T246 differed little across conditions, with only a weak, statistically insignificant trend towards increased pT308 Akt, pS21/S9 Gsk3a/b, and pT246 PRAS40 in palmitate-treated Pfkfb3-overexpressing cells. Hence, as the reviewer has suggested, an interesting future direction will be to perform phosphoproteomics to characterise more deeply the effects of palmitate and Pfkfb3 overexpression on insulin signalling. We have modified the manuscript to reflect these findings and suggested future directions on lines 362-365: "immunoblotting of canonical insulin-responsive phosphosites on Akt and its substrates GSK3α/β and PRAS40 revealed minimal effect of palmitate treatment and Pfkfb3 overexpression (Fig. S7e-f), hence more detailed phosphoproteomics studies are needed to clarify whether Pfkfb3 overexpression restored insulin action by modulating insulin signalling."

      Reviewer #3 (Recommendations For The Authors):

      This remarkable contribution by the esteemed research group has significantly enriched the field of metabolism. The extensive dataset, intertwined with a sophisticated research design, promises to serve as an invaluable resource for the scientific community. I offer a series of suggestions aimed at potentially elevating the manuscript to an even higher standard.

      Mouse Weight Variation and Correlation Analysis: The pronounced variances in mouse body weights pose a challenge to meaningful comparisons (Fig S1). Could the disparities in the phosphoproteome between basal and insulin-stimulated conditions be attributed to differences in body weight? Consider performing a correlation analysis. Furthermore, does the phosphoproteome of these mouse strains evolve comparably over time? Do these mice age similarly? Kindly incorporate this information.

      We thank the reviewer for the suggested analysis. We found there was a significant correlation between the phosphopeptide insulin response and mouse body weight, either in CHOW-fed mice (Strain effects) or across both diets (Diet effects), for ~ 25% of phosphopeptides that exhibited a Strain or Diet effect. Hence, while there is a clear effect of body weight on insulin signalling, this influences only a small proportion of the entire insulin-responsive phosphoproteome. Notably, insulin was dosed according to mouse lean mass to ensure equivalent dosage received by the soleus muscle, hence any insulin signalling differences associated with body weight are unlikely due to differences in dosing. As the reviewer also alludes to, different strains could have different lifespans. This may result in mice having different biological ages at the time of experimentation, and this in turn could influence insulin signalling. This possibility is challenging to assess in a quantitative manner because lifespan data is not available for most strains used. However, it is worth noting that female CAST mice live 77% as long as C57Bl6J mice (median age of 671 vs 866 (10.1073/pnas.1121113109); data is not available for male mice nor the other three strains), and substantial differences in insulin signalling were observed between these two strains. Ultimately, regardless of whether body weight and/or lifespan altered insulin signalling, such differences would still have arisen solely from the distinct genetic backgrounds and diets of the mice, hence we believe they are meaningful results that should not be dismissed. We have added this analysis to the revised manuscript in the "Limitations of this study" section on lines 471-477: "We were also unable to determine the extent to which signalling changes arose from muscle-intrinsic or extrinsic factors. For instance, body weight varied substantially across mice and correlated significantly with 25% of Strain and Diet-affected phosphopeptides (Fig. S8c), suggesting obesity-related systemic factors likely impact a subset of the muscle insulin signalling network. Furthermore, genetic differences in lifespan could alter the “biological age” of different strains and their phosphoproteomes, though we could not assess this possibility since lifespan data are not available for most strains used. "

      Soleus Muscle Data and Bias Considerations: Were measurements taken for lean mass and soleus muscle weight? If so, please present the corresponding data.

      Measurements for lean mass and the mass of soleus muscle after grinding have been including in Supplementary Figure S1 (panels c-d)

      As outlined in the methods section, the variation in protein yield from the soleus muscle across each strain is substantial. Notably, the distinct peptide input for phospho enrichment introduces biases, given that muscles with lower input may exhibit reduced identification (Fig S2). This bias might also manifest in the PCA plot (S2C). Ideally, adopting a uniform protein/peptide input would have been advantageous. Address this concern and contemplate moving the PCA plot to the main figure. It's prudent to reconsider the sentence stating, "Samples from animals of the same strain and diet were highly correlated and generally clustered together, implying the data are highly reproducible (Fig. S2b-d)," particularly if the input and total IDs were not matched.

      The reviewer highlights an important point. As the reviewer comments, it would have been our preference to use the same amount of protein material for all samples. However, as there was a wide range in the mass of the soleus muscle across mouse strains (in particular much lower in CAST mice), it was not appropriate to use the same amount of material for all strains. This is indeed evident in the PCA plot (Figure S2c), whereby samples cluster in the second component (PC2) based on the amount of protein material. However, this clustering is not observed in the hierarchical clustering (Figure S2d), and nor are the number of phosphopeptides quantified in each sample substantially impacted by these differences (Figure S2a) as implied by the reviewer. Indeed, the number of phosphopeptides quantified did not noticeably vary when comparing BXH9/BXD34 to C57Bl6J/NOD despite 32.3% less material used, and there were only 12.4% fewer phosphopeptides (average #13891.56 vs 15851.29) in CAST compared to C57Bl6J/NOD strains, despite 51.8% less material used. To further emphasise the minimal effect that input material had on phosphopeptide quantification, we have additionally plotted the number of phosphopeptides quantified in each sample following the filtering steps we employed prior to statistical analysis of the dataset (i.e. ANOVA). This plot (Author response image 1) shows that there is even less variation in the number of quantified phosphopeptides between strains, with only 9.12% fewer phosphopeptides quantified and filtered on average in CAST compared to C57Bl6J/NOD (average #9026.722 vs 9932.711). From a quantitative perspective, in both the PCA (Principal Component 1) and hierarchical clustering analyses, samples are additionally clustered by individual strains, and in the latter they also cluster generally by diet, implying that biological variation between samples remains the primary variation captured in our data. We have modified the manuscript so that these observations are forefront (lines 103-106): "Furthermore, while different strains clustered by the amount of protein material used in the second component of the PCA (Figure S2c), samples from animals of the same strain and diet were highly correlated and generally clustered together, indicating that our data are highly reproducible". To ensure that readers are aware of our decision to alter protein starting material and its implications, we have moved the description of this from the methods to the results, and we have highlighted the impact on phosphopeptide quantification in CAST mice (lines 99-103): "Due to the range in soleus mass across strains (Fig. S1D) we altered the protein material used for EasyPhos (C57Bl6J and NOD: 755 µg, BXH9 and BXD34: 511 µg, CAST: 364 µg), though phosphopeptide quantification was minimally affected, with only 12.4% fewer phosphopeptides quantified on average in CAST compared to the C57lB6J/NOD (average 13891.56 vs 15851.29 Fig. S2a)."

      Author response image 1.

      Phosphopeptide quantification following filtering. a) The number of phosphopeptides quantified in each sample after filtering prior to statistical analysis.

      Phosphosite Quantification Filtering: The quantified phosphosites have been dropped from 23,000 to 10,000. Could you elucidate the criteria employed for filtering and provide a concise explanation in the main text?

      We thank the reviewer for drawing this ambiguity to our attention. Before testing for insulin regulation, we performed a filtering step requiring phosphopeptides to be quantified well enough for comparisons across strains and diets. Specifically, phosphopeptides were retained if they were quantified well enough to assess the effect of insulin in more than eight strain-diet combinations (≥ 3 insulin-stimulated values and ≥ 3 unstimulated values in each combination). We have now included this explanation of the filtering in the main text on lines 108-114.

      ANOVA Choice Clarification: In Figure 4, there's a transition from one-way ANOVA in B to two-way ANOVA in C. Could you expound on the rationale for selecting these distinct methods?

      In panel B, we first focussed on kinase regulation differences between strains in the absence of a dietary perturbation. Hence, we performed one-way ANOVAs only within the CHOW-fed mice. In panel C, we then consider the effect of perturbation with the HFD. We perform two-way ANOVAs, allowing us to identify effects of the HFD that are uniform across strains (Diet main effect) or variable across strains (Strain-by-diet interaction).

      Cell Line Selection for Functional Experiments: Could you elucidate the rationale behind opting for L6 cells of rat origin over C2C12 mouse cells for functional experiments?

      We acknowledge that C2C12 cells have the benefit of being of mouse origin, which aligns with our mouse-derived phosphoproteomics data. However, they are unsuitable for glucose uptake experiments as they lack an insulin-responsive vesicular compartment even upon GLUT4 overexpression, and undergo spontaneous contraction when differentiated resulting in confounding non-insulin dependent glucose uptake (10.1152/ajpendo.00092.2002, 10.1007/s11626-999-0030-8). In contrast, L6 cells readily express insulin-responsive GLUT4, and cannot contract (doi.org/10.1113/JP281352, 10.1007/s11626-999-0030-8). Therefore they are a superior model for studying insulin-dependent glucose transport. We have added a justification of L6 cells over C2C12 cells in the revised manuscript, on lines 352-354: "While L6 cells are of rat origin, they are preferable to the popular C2C12 mouse cell line since the latter lack an insulin-responsive vesicular compartment and undergo spontaneous contraction, resulting in confounding non-insulin dependent glucose uptake."

      It's intriguing that while a phosphosite was modulated on Pfkfb2, functional assays were conducted on a different isoform (Pfkfb3) wherein the phosphosite was not detected.

      The correlation between Pfkfb2 S469 phosphorylation and insulin-stimulated glucose uptake suggests that F2,6BP production, and subsequent glycolytic activation, positively regulate insulin responsiveness. There are several ways of testing this: 1) Knock down endogenous Pfkfb2, and re-express either wild-type protein or a S469A phosphomutant. If S469 phosphorylation positively regulates insulin responsiveness, then knockdown should decrease insulin responsiveness and re-expression of wild-type Pfkfb2, but not S469A, should restore it. 2) Induce insulin resistance (e.g. through palmitate treatment), and overexpress phosphomimetic S469D or S469E Pfkfb2 to enhance F2,6BP production. Under our hypothesis, this should reverse insulin resistance. 3) There is some evidence that dual phosphorylation of S469 and S486, another activating phosphosite on Pfkfb2, enhances F2,6BP production through 14-3-3 binding (10.1093/emboj/cdg363). Hence, we may expect that introduction of an R18 sequence into Pfkfb2, which causes constitutive 14-3-3 binding (10.1074/jbc.M603274200), would increase Pfkfb2-driven F2,6BP production, and under our hypothesis this should reverse insulin resistance. 4) The paralog Pfkfb3 lacks Akt regulatory sites and has substantially higher basal activity than Pfkfb2. Thus, overexpression of Pfkfb3 should mimic the effect of phosphorylated Pfkfb2, and hence reverse insulin resistance under our hypothesis. While approaches 1), 2), and 3) directly target Pfkfb2, they have drawbacks. For example, 1) may not work if Pfkfb2 knockdown is compensated for by other Pfkfb isoforms, 2) may not work since D/E phosphomimetics often do not recapitulate the molecular effects of S/T phosphorylation (10.1091/mbc.E12-09-0677), and 3) may not work if S469 phosphorylation does not operate through 14-3-3 binding. Hence we performed 4) as it seemed to be the most robust and cleanest experiment to test our hypothesis. We have revised the manuscript to further clarify the challenges of directly targeting Pfkfb2 and the benefits of targeting Pfkfb3 on lines 342-349: "Since Pfkfb2 requires phosphorylation by Akt to produce F2,6BP substantially, increasing F2,6BP production via Pfkfb2 would require enhanced activating site phosphorylation, which is difficult to achieve in a targeted fashion, or phosphomimetic mutation of activating sites to aspartate/glutamate, which often does not recapitulate the molecular effects of serine/threonine phosphorylation. By contrast, the paralog Pfkfb3 has high basal production rates and lacks an Akt motif at the corresponding phosphosites. We therefore rationalised that overexpressing Pfkfb3 would robustly increase F2,6BP production and enhance glycolysis regardless of insulin stimulation and Akt signalling."

      Insulin-Independent Action of Pfkfb3: The functionality of Pfkfb3 unfolds in an insulin-independent manner, yet it restores insulin action (Fig 6h). Could you shed light on the mechanism underpinning this phenomenon? Consider measuring F2,6BP concentrations or assessing kinase activity upon overexpression.

      Pfkfb3 overexpression increased the glycolytic capacity of L6 myotubes in the absence of insulin stimulation, as inferred by extracellular acidification rate (Fig. S7c). This is indeed consistent with Pfkfb3 enhancing glycolysis through increased F2,6BP concentration in an insulin-independent manner. To shed light on the mechanism connecting this to insulin action, we performed immunoblotting experiments to assess the kinase activity of Akt, a master regulator of the insulin response. Indeed, this experimental direction has precedent as we previously observed that Pfkfb3 overexpression enhanced insulin-stimulated Akt signalling in HEK293 cells, while small-molecule inhibition of Pfkfb kinase activity reduced Akt signalling in 3T3-L1 adipocytes (10.1074/jbc.M115.658815). However, insulin-stimulated phosphorylation of Akt S473, Akt T308, Gsk3a/b S21/S9, and PRAS40 T246 differed little across conditions, with only a weak, statistically insignificant trend towards increased pT308 Akt, pS21/S9 Gsk3a/b, and pT246 PRAS40 in palmitate-treated Pfkfb3-overexpressing cells. Hence, a more detailed phosphoproteomics study will be needed to assess whether Pfkfb3 restores insulin action by modulating insulin signalling. We have described these immunoblotting experiments in lines 361-365 and Fig. S7e-f. We also discussed potential mechanisms through which Pfkfb3-enhanced glycolysis could connect to insulin action in the discussion (lines 427-434).

      Figure 6h Statistical Analysis: For the 2DG uptake in Figure 6h, a conventional two-way ANOVA might be more appropriate than a repeated measures ANOVA.

      On reflection, we agree that a conventional ANOVA is more appropriate. Furthermore, for simplicity and conciseness we have decided to analyse and present only insulin-stimulated/unstimulated 2DG uptake fold change values in Figure 6h. We have presented all unstimulated and insulin-stimulated values in Figure S7d.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, Janssens et al. addressed the challenge of mapping the location of transcriptionally unique cell types identified by single nuclei sequencing (snRNA-seq) data available through the Fly Cell Atlas. They identified 100 transcripts for head samples and 50 transcripts for fly body samples allowing the identification of every unique cell type discovered through the Fly Cell Atlas. To map all of these cell types, the authors divided the fly body into head and body samples and used the Molecular Cartography (Resolve Biosciences) method to visualize these transcripts. This approach allowed them to build spatial tissue atlases of the fly head and body, to identify the location of previously unknown cell types and the subcellular localization of different transcripts. By combining snRNA-seq data from the Fly Cell Atlas with their spatially resolved transcriptomics (SRT) data, they demonstrated an automated cell type annotation strategy to identify unknown clusters and infer their location in the fly body. This manuscript constitutes a proof-of-principle study to map the location of the cells identified by ever-growing single-cell transcriptomic datasets generated by others.

      Strengths:

      The authors used the Molecular Cartography (Resolve Biosciences) method to visualize 100 transcripts for head samples and 50 transcripts for fly body samples in high resolution. This method achieves high resolution by multiplexing a large number of transcript visualization steps and allows the authors to map the location of unique cell types identified by the Fly Cell Atlas. 

      We thank this reviewer for appreciating the quality of our spatial data. We do not know what caused the technical problem (grayscale version of PDF) for this reviewer (the PDF figures are in color on the eLife website and on bioRxiv). We are surprised that the eLife discussion session did not resolve this issue.

      Weaknesses:

      Combining single-nuclei sequencing (snRNA-seq) data with spatially resolved transcriptomics (SRT) data is challenging, and the methods used by the authors in this study cannot reliably distinguish between cells, especially in brain regions where the processes of different neurons are clustered, such as in neuropils. This means that a grid that the authors mark as a unique cell may actually be composed of processes from multiple cells. 

      The small size of an individual fly is one of the most challenging aspects of performing spatial transcriptomics. While the resolution of Molecular Cartography is rather high (< 200 nm), in the brain challenges remain as noted by the reviewer. Drosophila neuronal nuclei are notoriously small and cannot be easily resolved with the current imaging techniques. We agree that for a full atlas either expansion microscopy, 3D techniques or other super-resolution techniques will be required. 

      Reviewer #2 (Public Review):

      Summary:

      The landmark publication of the "Fly Atlas" in 2022 provided a single cell/nuclear transcriptomic dataset from 15 individually dissected tissues, the entire head, and the body of male and female flies. These data led to the annotation of more than 250 cell types. While certainly a powerful and datarich approach, a significant step forward relies on mapping these data back to the organism in time and space. The goal of this manuscript is to map 150 transcripts defined by the Fly Atlas by FISH and in doing so, provide, for the first time, a spatial transcriptomic dataset of the adult fly. Using this approach (Molecular Cartography with Resolve Biosciences), the authors, furthermore, distinguish different RNA localizations within a cell type. In addition, they seek to use this approach to define previously unannotated clusters found in the Fly Atlas. As a resource for the community at large interested in the computational aspects of their pipeline, the authors compare the strengths and weaknesses of their approach to others currently being performed in the field.

      Strengths:

      (1) The authors use Resolve Biosciences and a novel bioinformatics approach to generate a FISHbased spatial transcriptomics map. To achieve this map, they selected 150 genes (50 body; 100 head) that were highly expressed in the single nuclear RNA sequencing dataset and were used in the 2022 paper to annotate specific cell types; moreover, the authors chose several highly expressed genes characteristic of unannotated cell types. Together, the approach and generated data are important next steps in translating the transcriptomic data to spatial data in the organism.

      We thank the reviewer for this comment, as it reminded us that we need to be clearer in the text, about how we chose the genes to investigate. The statement that we selected “150 genes (50 body; 100 head) that were highly expressed in the single nuclear RNA sequencing dataset” is not correct. We have chosen genes with widely differing expression levels (log-scale range of 3.95 in body, 5.76 in head, we show this now in the new Figure 1 – figure fupplement 1B, D). Many of the chosen genes are also transcription factors. In fact, the here introduced method is more sensitive than the single cell atlas: the tinman positive cells were readily located (even non-heart cells were found to express tinman), whereas in the single cell FCA data tinman expression is often not detected in the cardiomyocytes (tinman is detected in 273 cells in the entire FCA (mean expression of 1.44 UMI in positive cells), and in 71 cells out of 273 cardiac cells (26%)). 

      (2) Working with Resolve, the authors developed a relatively high throughput approach to analyze the location of transcripts in Drosophila adults. This approach confirmed the identification of particular cell types suggested by the FlyAtlas as well as revealed interesting subcellular locations of the transcripts within the cell/tissue type. In addition, the authors used co-expression of different RNAs to unbiasedly identify "new cell types". This pipeline and data provide a roadmap for additional analyses of other time points, female flies, specific mutants, etc.

      (3) The authors show that their approach reveals interesting patterns of mRNA distribution (e.g alpha- and beta-Trypsin in apical and basal regions of gut enterocytes or striped patterns of different sarcomeric proteins in body muscle). These observations are novel and reveal unexpected patterns. Likewise, the authors use their more extensive head database to identify the location of cells in the brain. They report the resolution of 23 clusters suggested by the single-cell sequencing data, given their unsupervised clustering approach. This identification supports the use of spatial cell transcriptomics to characterize cell types (or cell states).

      (4) Lastly, the authors compare three different approaches --- their own described in this manuscript, Tangram, and SpaGE - which allow integration of single cell/nuclear RNA-seq data with spatial localization FISH. This was a very helpful section as the authors compared the advantages and disadvantages (including practical issues, like computational time).

      Weaknesses:

      (1) Experimental setup. It is not clear how many and, for some of the data, the sex of the flies that were analyzed. It appears that for the body data, only one male was analyzed. For the heads, methods say male and female heads, but nothing is annotated in the figures. As such, it remains unclear how robust these data are, given such a limited sample from one sex. As such, the claims of a spatial atlas of the entire fly body and its head ("a rosetta stone") are overstated. Also, the authors should clearly state in the main text and figure legends the sex, the age, how many flies, and how many replicates contributed to the data presented (not just the methods). What also adds to the confusion is the use of "n" in para 2 of the results. " ... we performed coronal sections at different depths in the head (n=13)..." 13 sections in total from 1 head or sections from 13 heads? Based on the body and what is shown in the figure, one assumes 13 sections from one head. Please clarify.

      While we agree that sex differences present indeed an interesting opportunity to study with spatial transcriptomics, our goal was not to define male/female differences but rather to establish the technology to go into this detail if wanted in the future. In the revised version, we have provided an additional supplementary table with a more detailed description of the head sections (Table S3). We have added the number of animals (12 for the head sections, mixed sex; and 1 male for the body sections) to the main text. We would like to point out that we verified the specificity of our MC method on all the 5 body sections (Figure 2A, TpnC4 & Act88F and text) and not only on one. Furthermore, we also would like to state that the idea of “a Rosetta stone” was mentioned as a future prospect that clearly goes beyond our presented work. We have rewritten the discussion to make this clearer and to any avoid overstatements.

      (2) Probes selected: Information from the methods section should be put into the main text so that it is clear what and why the gene lists were selected. The current main text is confusing. If the authors want others to use their approach, then some testing or, at the very least, some discussion of lower expressed genes should be added. How useful will this approach be if only highly expressed genes can be resolved? In addition, while it is understood that the company has a propriety design algorithm for the probes, the authors should comment on whether the probes for individual genes detect all isoforms or subsets (exons and introns?), given the high level of splicing in tissues such as muscle.

      As stated above, while there is a slight bias to higher expressed genes (as expected for marker genes), we have also used low expressed genes like salm, CG32121, tinman (body) or sens (head). This is now shown in new Figure 1 – figure Supplement 1B, D. This shows that our method is more sensitive than single-cell data, as all cardiomyocytes can be identified by tinman expression and not only some are positive, as is the case in the FCA data. In fact, the method cannot resolve too highly expressed genes due to optical crowding of the signal leading to a worse quantification. For this reason, ninaE was removed from the analysis (as mentioned in Spatial transcriptomics allows the localization of cell types in the head and brain and in Methods).

      As mentioned in the Methods, the probes are designed on gene level targeting all isoforms, but favoring principal isoforms (weighted by APPRIS level). The high level of splicing is indeed interesting and we expect that in the future spatial transcriptomics can help to generate more insight into this by designing isoform-specific probes.

      (3) Imaging: it isn't clear from the text whether the repeated rounds of imaging impacted data collection. In many of what appear to be "stitched" images, there are gradients of signal (eg, figure 2F); please comment. Also, since this a new technique, could a before and after comparison of the original images and the segmented images be shown in the supplemental data so that the reader can better appreciate how the authors assessed/chose/thresholded their data? More discussion of the accuracy of spot detection would be helpful. 

      High-resolution imaging (pixel size = 138 nm) of a large field of view (>1mm) for spatial transcriptomics uses a stitching method to combine several individual images to reconstruct a large field of view. This does not generate signal gradients, apart from lower signal at the extreme edges of each of the individual images, as seen in our images, too. The spot detection algorithm was written and used by Resolve Biosciences and benchmarked for human (Hela) and mouse (NIH-3T3) cell lines in Groiss et al. 2021 (Highly resolved spatial transcriptomics for detection of rare events in cells, bioR xiv). The specificity of the decoded probes was found to lie between 99.45 and 99.9% here, matching the results we found for specific detection of TpnC4 and Act88F (99.4 and 99.8%).

      (4) The authors comment on how many RNAs they detected (first paragraph of results). How do these numbers compare to the total mRNA present as detected by single-cell or single-nuclear sequencing?

      We can compare the numbers, but the different methodologies make the interpretation of such a comparison difficult. FCA used single nucleus sequencing, so only nuclear pre-mRNAs are detected. The total amount of counts per single cell sample strongly depends on how many cells were sequenced in an experiment. MC detects all mRNAs present in the section. Here, the size of the sample and hence the size or the number of cells analyzed determines how many mRNAs are detected. In Author response image 1, we have compared our MC results versus FCA data, comparing the genes investigated here in MC per section vs per sequencing experiment. Numbers for MC are slightly lower for the brain (not all cell types are on all sections) and much higher for the larger body samples. However, we feel a direct comparison is questionable, so we prefer to not include this figure in our manuscript.

      Author response image 1:

      Barplots showing total number of mRNA molecules detected in Molecular Cartography (MC, Resolve, spatial spots) and in snRNA-seq data from the Fly Cell Atlas (10x Genomics, UMIs). Individual black dots show individual experiments, counts are only shown for the chosen gene panel for each sample. Bar shows the mean, with error bars representing the standard error.

      (5) Using this higher throughput method of spatial transcriptomics, the authors discern different cell types and different localization patterns within a tissue/cell type.

      a. The authors should comment on the resolution provided by this approach, in terms of the detection of populations of mRNAs detected by low throughput methods, for example, in glia, motor neuron axons, and trachea that populate muscle tissue. Are these found in the images? Please show.

      We did not add any markers for trachea in our gene panel, but we do detect sparse spots of repo (glia) and elav/VGlut in the muscle tissues (Gad1/VAChT are hardly detected in the muscle tissue). This is consistent with the glutamatergic nature of motor neurons in Drosophila as described previously (Schuster CM (2006), Glutamatergic synapses of Drosophila neuromuscular junctions: a high-resolution model for the analysis of experience-dependent potentiation. Cell Tissue Res 326:

      287–299.). We have present these new data in new Figure 2 – figure supplement 1.

      b.The authors show interesting localization patterns in muscle tissue for different sarcomere proteincoding mRNAs, including enrichment of sls in muscle nuclei located near the muscle-tendon attachment sites. As this high throughput approach is newly being applied to the adult fly, it would increase confidence in these data, if the authors would confirm these data using a low throughput FISH technique. For example, do the authors detect such alternating "stripes" ( Act 88F, TpnC4, and Mhc) or enriched localization (sls) using FISH that doesn't rely on the repeated colorization, imaging, decolorization of the probes? 

      We thank the reviewer for the interest in the localization patterns in muscle tissue. We show that Act88F, TpnC4 are not detected outside of flight muscle cells (99.4% and 99.8% of the single molecular signal in flight muscles only), giving us confidence in the specificity of the MC method. Following the suggestion of the reviewer, we have adapted an HCR-FISH method to Drosophila adult body sections for the revised version of the manuscript. Using this method, we were able to confirm the higher expression/localization of sls transcripts to and around the adult flight muscle nuclei, with an enrichment in nuclei close to the muscle-tendon attachment sites (new Figure 4D-F and new Figure 4 – figure supplement 1). We have also been able to confirm some complementarity in the localization patterns of Act88F and TpnC4 in longitudinal stripes in adult flight muscles, however for Mhc we could not confirm this pattern with HCR-FISH (new Figure 5C-F and new Figure 5 – figure supplement 1). While we could confirm most of the pattern seen, we do not know the exact reason for the slight discrepancies. Thus, we now recommend that insights found with SRT should be confirmed with more classical FISH methods.

      (6) The authors developed an unbiased method to identify "new cell types" which relies on coexpression of different transcripts. Are these new cell types or a cell state? While expression is a helpful first step, without any functional data, the significance of what the authors found is diminished. The authors need to soften their statements.

      The term “new cell types” only appeared in the old title. We agree that with the current spatial map we cannot be sure to have found “new cell types”, instead we show where unannotated/uncharacterized clusters from the scRNA-seq atlas are located, based on their gene expression. Therefore, we have updated the title in the revised version (Spatial transcriptomics in the adult Drosophila brain and body) and thank the reviewer for this valuable suggestion.

      Appraisal:

      The authors' goal is to map single cell/nuclear RNAseq data described in the 2022 Fly Atlas paper spatially within an organism to achieve a spatial transcriptomic map of the adult fly; no doubt, this is a critical next step in our use of 'omics approaches. While this manuscript does the hard work of trying to take this next step, including developing and testing a new pipeline for high throughput FISH and its analysis, it falls short, in its present form, in achieving this goal. The authors discuss creating a robust spatial map, based on one male fly. Moreover, they do not reveal principles of mRNA localization, as stated in the abstract; they show us patterns, but nothing about the logic or function of these patterns. This same criticism can be said of the identification of "new cell types, just based on RNA colocalization. In both cases (mRNA subcellular localization or cell type identification), further data in the form of validation with traditional low throughput FISH and genetic manipulations to assess the relation to cell function are required for the authors to make such claims. 

      We have indeed used one male fly for the adult male body data. This is mainly due to the cost of the sample processing. We used 12 individuals for the head samples (from 1 individual we acquired 2 sections, a total of 13 sections). We show that the body samples show a high correlation with each other, while the head samples cover multiple depths of the head. Still, even in the head, we find that sections at similar depths show a high similarity to each other in terms of gene-gene coexpression and expression patterns. Although obtaining sections from more animals would be valuable, we do not believe it to be necessary for our current goals. Additional replicates beyond the ones we already provide would require significant amounts of extra time and budget, while they would very likely produce similar results as we already show. Following the reviewer’s suggestion, we have tested several genes with HCR-FISH and could readily confirm the localization pattern of sls mRNA close to the terminal nuclei of the flight muscles. This pattern is likely due to a higher expression of sls in these nuclei, as a large amount of sls mRNA signal is detected within the nuclei (Figure 4). A detailed dissection of the mechanism that establishes this pattern is beyond the scope of this manuscript, which is the first one on applying spatial transcriptomics to adult Drosophila.

      The usage of the term “new cell types” was indeed ambiguous and we removed this from the revised version. We now clarified that we map the spatial location of unannotated clusters in the brain. This may or may not include uncharacterized cell types. We now further specify that we have only inferred the location of the nuclei; thus, neuronal function or the location of their axonal processes are still unknown. As such, our data provides a starting point to identify uncharacterized cell types, since their marker genes and nuclear location are now determined. The next step to identify “new cell types” would indeed be to acquire genetic access to these cell types and characterize them in more detail. This is beyond the scope of this manuscript, and therefore we have toned down the title in the revised version and thank the reviewer for this valuable suggestion. 

      Discussion of likely impact:

      If revised, these data, and importantly the approach, would impact those working on Drosophila adults as well as those working in other model systems where single cell/nuclear sequencing is being translated to the spatial localization within the organism. The subcellular localization data - for example, the size of transcripts and how that relates to localization or the patterns of sarcomeric protein localization in muscle - are intriguing, and would likely impact our thinking on RNA localization, transport, etc if confirmed. Lastly, the authors compare their computational approaches to those available in the field; this is valuable as this is a rapidly evolving field and such considerations are critical for those wishing to use this type of approach.

      We thank this reviewer for appreciating the impact of our findings and approach to the Drosophila field and beyond. We here provide the groundwork for a full Drosophila adult spatial atlas, similar to how early scRNA-seq datasets provided a framework for the Fly Cell Atlas. In the manuscript we provide both experimental information on how to successfully perform spatial transcriptomics (treating slides for optimal attachment) and the data serves as a benchmark for future experiments to improve upon (similar to how early Drop-seq datasets were compared to later 10x datasets in single-cell transcriptomics). In addition, it also provides proof of principle methods on how to integrate the FCA data with these spatial data and it identifies localized mRNA species in large adult muscle cells, showing the complementarity of spatial techniques with single-cell RNA-seq. For a small number of genes, we have confirmed the mRNA patterns using HCR-FISH in the revised version of this manuscript. To conclude, this is the first spatial adult Drosophila transcriptomics paper, locating 150 mRNA species with easy data access in our user portal (https://spatialfly.aertslab.org/).

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) All figures in the manuscript were in grayscale, which made it difficult to interpret the results because the data could only be interpreted by distinguishing different colors to visualize different transcripts. This is likely a technical problem. The manuscript must contain colored images.

      We apologize to the reviewer for this technical issue. The manuscript was uploaded in color to bioRxiv and to eLife. We therefore do not understand to reason for this problem. We are surprised that this issue was not resolved in the reviewers’ discussion since color is obviously essential to appreciate the beauty of this manuscript.

      (2) In Figure 2a, the authors comment on the subcellular localization of trypsin isoforms, but the figure does not indicate the cell borders or the apical and basal regions of the cell. These must be indicated in the figure to help readers understand the results. 

      We thank the reviewer for pointing this out; we have now indicated the outlines of the single-cell layer epithelium on the figure. While we have no marker for cell borders, we have a nuclear marker showing that it is a single cell layer. We hope this allows the reader to appreciate the subcellular localization of the trypsin isoforms.

      (3) All figures (including the data on the authors' website) contain background staining, which I assume is labeling nuclei. This is not indicated in the manuscript, and should be clarified.

      We again thank the reviewer for pointing this out; the background staining indeed labels nuclei (using DAPI). We have indicated this better in the revised version.

      (4) In Figure 5c, the authors claim that neuronal and muscular genes are grouped into the same cluster, but they do not indicate which transcripts are neuronal and which ones are muscular. This must be indicated in the figure.

      We thank the reviewer for this comment. Indeed, there was only one gene, acj6, present in the muscle cluster. So, we decided to delete this statement in the revised version.

      (5) The authors utilized and compared three different approaches to integrate single nuclei sequencing data from the Fly Cell Atlas to their spatially resolved transcriptomics (SRT) data. I was wondering if it is possible to generate a virtual expression explorer using this integrated data, similar to the dataset published in the 2017 Science article by Karaiskos et al., where they combined publicly available in situ hybridization data of fly embryos and their single-cell sequencing data. This virtual expression explorer would be useful to visualize the expression pattern of transcripts that the authors of this paper did not use for their SRT.

      We thank the reviewer for this interesting comment. Using Tangram, we indeed infer gene expression for all genes from the Fly Cell Atlas. To make this visible we have created a Scope session (https://scope.aertslab.org/#/Spatial_Fly/*/welcome), with which users can browse inferred gene expression levels (note that this is on a segmented cell level). We do notice that the inferred gene expression levels contain many false positives and should therefore be used with caution. The spatial data themselves can be browsed through the spatial portal at https://spatialfly.aertslab.org/ .

      Reviewer #2 (Recommendations For The Authors):

      Suggestions for improved or additional experiments, data, or analyses:

      The authors have used a new high throughput approach to examine the location of 150 RNAs in adult Drosophila heads or one body. It is unclear whether the fixation/repeated imaging etc is accurately reflecting the patterns of expression in vivo. The authors should confirm these data using low throughput established techniques for the RNA patterns in muscle for example.

      The authors should clarify their experimental approaches and include additional samples if they indeed want to establish the rosetta stone of fly adults. These data are from only a male fly (and as such is not a complete analysis of the adult fly). To be a map of the adult fly, data from both sexes need to be included.

      Unless functional data that complement the descriptive data shown here are included, the authors have to soften their conclusions. For example, while spatial transcriptomics has mapped RNA expression to a location, without some functional data, it is difficult to conclude that these are indeed "new cell types". Same with the RNA localization principles.

      Recommendations for improving the writing and presentation:

      (1) The manuscript should be heavily revised: in many places, important details are left out or should be moved from the methods to the main text. In addition, the authors often overstate their findings throughout the manuscript. As an example, it appears that the data presented is only from 1 fly, so this doesn't increase the reader's confidence in the data or the applicability of the approach. Also, it isn't clear how many flies were analyzed for the heads (one male fly too?) nor what variability is present from fly to fly. For the approach and data to be used by others, this is important to know.

      We moved some text from the methods section to the main text to be clearer. We now also state how many animals were used for the MC method. While the data for the body has been generated from 1 male only, the data for the head was generated from 12 flies; for both cases, similar slices show very similar gene expression patterns. Furthermore, in the body we used widely known and published marker genes that all showed expected expression patterns, indicating robustness. We agree that this is not a full spatial atlas of the fly, this was also not our goal and we have removed such general statements from the revised version: we aimed to generate a spatial transcriptomics dataset, covering the entire fly (head and body) as a proof-of-principle, tackling data generation and analysis, and highlighting challenges in both.

      (2) The grammar and word choice throughout are challenging often making the text difficult to follow. This reads like an early draft of the paper.

      We apologize to the reviewer for any difficulties. We have revised the text and hope it is now easier to read, while still being accurate on the technical details of the various methods used in our manuscript.

      Minor corrections to the text and figures.

      See the weaknesses mentioned above. Also:

      Figure S1 is unreadable.

      There is no simple way to describe the expression values of 100 genes in 100 cell types on a single page. The resolution of the PDF is high enough that after zooming in, all the information can be read easily.

      Figure S2, in a, please include the axes so that the reader can better understand the sections shown.

      In b, it is unclear what the pink boxes mean. In c, the labels are barely legible.

      In Figure 1 – figure supplement 2 (head sections), we have ordered the head sections from anterior to posterior. The boxes in (B) represent boxplots. We have updated this plot for clarity to better display the number of mRNA molecules detected for each gene. We have increased the font size in (C).

      Figure S3, in a, please include axes. In b, the meaning of the pink box

      In Figure 1 – figure supplement 3 (the body sections) we have added the anterior to posterior and dorso-ventral axis, and ordered the sections that stem from the same animal. The boxes in (B) represent boxplots. We have updated this plot for clarity to better display the number of mRNA molecules detected for each gene. We have added an explanation to the figure legend.  

      Figure S4, the text in the axes of the heatmap should have a darker typeface

      We have changed it to black, thanks.

      Figure S5c, are the colors in the dendrogram supposed to match the spatial location on the right?

      The purple of the muscles is barely visible.

      Yes, they do match. Colors were modified in the revised version for better visibility.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We would like to thank the reviewer and the editor for carefully reading our manuscript, and acknowledging the strength of combining quantitative analysis with semi-naturalistic experiments on mice social behavior. Please find below our response to both the public review and the recommendation to the authors. As a summary, we have included additional figures and texts such as 

      - a new Results subsection “Choosing timescales for analysis ” (page 6)

      - a new Materials and Methods subsection “Maximum entropy model with triplet interactions” (page 17)

      - new supplementary figures, which have current labels of:

      - Figure 2 - figure supplement 5

      - Figure 2 - figure supplement 6

      - Figure 2 - figure supplement 7

      - Figure 4 - figure supplement 1

      - Figure 4 - figure supplement 2    

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      In this manuscript, Chen et al. investigate the statistical structure of social interactions among mice living together in the ECO-Hab. They use maximum entropy models (MEM) from statistical physics that include individual preferences and pair-wise interactions among mice to describe their collective behavior. They also use this model to track the evolution of these preferences and interactions across time and in one group of mice injected with TIMP-1, an enzyme regulating synaptic plasticity. The main result is that they can explain group behavior (the probability of being together in one compartment) by a MEM that only includes pair-wise interactions. Moreover, the impact of TIMP-1 is to increase the variance of the couplings J_ij, the preference for the compartment containing food, as well as the dissatisfaction triplet index (DTI). 

      Strengths: 

      The ECO-Hab is a really nice system to ask questions about the sociability of mice and to tease apart sociability from individual preference. Moreover, combining the ECO-Hab with the use of MEM is a powerful and elegant approach that can help statistically characterize complex interactions between groups of mice -- an important question that requires fine quantitative analysis. 

      Weaknesses: 

      However, there is a risk in interpreting these models. In my view, several of the comparisons established in the current study would require finer and more in-depth analysis to be able to establish firmer conclusions (see below). Also, the current study, which closely resembles previous work by Shemesh et al., finds a different result but does not provide the same quantitative model comparison included there, nor a conclusive explanation of why their results are different. In total, I felt that some of the results required more solid statistical testing and that some of the conclusions of the paper were not entirely justified. In particular, the results from TIMP-1 require proper interaction tests (group x drug) which I couldn't find. This is particularly important when the control group has a smaller N than the drug groups.  

      We would like to thank the reviewer and the editor for carefully reading our manuscript, and acknowledging the strength of combining quantitative analysis with semi-naturalistic experiments on mice social behavior. Thanks to the reviewer’s suggestion, we have improved our manuscript by 

      (1) A proper comparison with Shemesh et al., especially to include maximum entropy models with triplet interactions. We show that triplet models overfit even given the entire 10 day dataset, which limits our study to look at pairwise interactions.

      (2) Results on cross-validation for both triplet interaction models and pairwise interaction models, completed on aggregates of various length of days. This analysis showed that pairwise models overfit for single-day data, and led us to learn pairwise models only on 5day aggregation of data. We have updated the manuscript (both the text and the figures) to present these results.

      (3) New results that subsample the drug groups to the same size as the control group. The conclusions about TIMP-1 treated mice hordes hold when we compare groups of the same size. 

      Recommendations for the authors:  

      Reviewer #1 (Recommendations For The Authors): 

      (1) COMPARISON WITH PREVIOUS WORK. The comparison with the cited previous work of Shemesh et al. 2013 rests novelty to the use of ME models in characterizing social interactions between groups of mice as well as sheds doubts on the main claim of the manuscript, namely that second-order correlations are sufficient to describe the joint distribution of occupancies of all mice (in particular triplets; there is no quantification of the variance explained by model in panel Fig. 2D). In my view, to make the claim "These results show that pairwise interaction among mice are sufficient to assess the observed collective behavior", the authors should compare models with 2nd and 3rd order interactions and quantify how much of the total correlation can be explained by pair-wise interactions, triplet interactions, and so on. Without a proper model comparison, it is unclear how the authors can make such a claim. One thing observed by Shemesh et al. is that, on average, J_ij are negative. This does not seem to be the case in the current study and the authors should discuss why. 

      Finally, the explanation provided in the Discussion about this discrepancy (spatial resolution and different group size) are not completely satisfactory. With more animals, one would imagine that the impact of higher order correlations would increase (and not decrease) as the number of terms of 3rd, 4th, ... order will be very big. I would also think that the same could be true for the spatial scale: assessing interactions with a coarser spatial grid (whole cages in the case of the ECO-Hab) would allow for simultaneous interactions among more mice to happen compared with a situation in which the spatial grid is so small that only a few animals can fit in each subdivision. 

      We thank the reviewer for the recommendation. In the updated version of the manuscript, we explicitly learn the triplet interaction model. We show that because the number of mice in our experiment is much larger than Shemesh et al., a triplet model runs into the problem of overfitting.

      In particular, we found that the test set likelihood increases monotonically when the L2 regularization strength increases, which corresponds to a suppression of the triplet interaction strength (see additional supplementary figure, now Figure 2 - figure supplement 5). More specifically, for the range of regularization strength (β<sub>G</sub>) we tested (10<sup>-1</sup> < β<sub>G</sub> < 10<sup>1</sup>), the maximum test set likelihood is achieved at β<sub>G</sub> = 10<sup>1</sup>, which corresponds to . Notice that those learned triplet interactions are very close to zero. This means we should select a model with pairwise interactions over a model with triplet interactions.

      We have added the above reasoning in page 5, line 166-169 of the Results section with the sentence “Moreover, models with triplet interactions show signs of overfitting under crossvalidation, which is mitigated when the triplet interactions are suppressed close to zero using L2 regularization”,  a new subsection “Maximum entropy model with triplet interactions” in Materials and Methods (page 16-17, line 548 - 563) to describe the protocols of learning and crossvalidation for these triplet interaction models. 

      Furthermore, we extended the discussion about the difference between Shemesh et al. and our results in the Discussion section. In addition to the difference of spatial scales (chamber vs. location in the chamber), and the difference of group size and its impact on data analysis (N = 15 in our largest cohort and N = 4 in theirs), we added a discussion about the difference of experimental arena, which in Eco-HAB contains connected chambers that mimic the naturalistic environment, and in Shemesh et al. contains a single chamber. The change in the text is on page 12, between line 390 and line 394.

      We thank the reviewers for pointing out that the mean 2nd order interaction in Shemesh et al. is negative. One possibility is that the labeled areas in Shemesh et al. are much smaller than in our Eco-HAB setup, which could suggest that mice do have the space to stay in the same area, which will lead to a negative mean 2nd order interaction.

      (2) ASSESSMENT OF THE TEMPORAL EVOLUTION OF THE INTERACTIONS. The analysis of the stability of the social structure is not conclusive. First, I don't think the authors can conclude that "These results suggest that the structure of social interactions in a cohort as a whole is consistent across all days." If anything is preserved, they would be the statistics of that structure but not the structure itself (i.e., there is no evidence for that). The comparison of the stability of the mean <h\_i> and the mean <J\_ik> would also require a statistical test to be able to state that "Delta h_i changed more strongly from day to day (Fig. 3D, top panel) relative to the interaction measured as the Jij's." The same is true for the assessment of the TIMP: the differences found in the variability in J_ij and in the mean and variance of the h_i's, look noisy and would require a proper statistical test. The traces look quite variable across days in the control condition, so assessing differences may be difficult. Finally, it would be good to know if the variability in individual J_ij is because they truly vary from day to day or because estimating them within one day is difficult (statistical error). If the reason is the latter, one could decrease the temporal resolution to 2-3 days and see whether the estimated J_ijs are more stable. Perhaps, also for that reason, the summed interaction strength J_i is also more stable, simply because it aggregates more data and has a smaller statistical error. 

      We thank the reviewer for pointing out the necessity of assessing the temporal evolution of the interactions. The problem of shorter data duration leads to more noise in the estimation, together with the reviewer’s Comment 4 about the risk of overfitting, led us to add a new Results subsection “Choosing timescales for analysis” (page 6, line 171 to line 189). Specifically, we assess whether the pairwise maximum entropy model overfits using data from _K-_day aggregates, by computing the log-likelihood of both the training sets and the test sets,which is chosen to be 1 hour from the 6 hour data window of each day. We found that for single day data, the pairwise maximum entropy model overfits. In contrast, for data with aggregates of more or equal to 4 days of data, the pairwise model does not overfit. This new result is supported by an additional supplementary figure, now Figure 2 - figure supplement 6.

      To be consistent with later approaches in the manuscript where we consider the effects of TIMP1, we choose the analysis windows to be data aggregates from 5 days. This means for the experiment that collects a total of 10 days of data, there are only two time points, thus a study of the temporal evolution is limited to comparison between the first 5 days and the last 5 days of the experiment. We describe these results in the Results subsection “Stability of sociability over time” (page 6, line 190 - 220). An additional supplementary figure, now Figure 2 - figure supplement 7, shows in details the comparison of the inferred interaction strength J and the chamber preference between the first 5 days and the last 5 days for the 4 cohorts of male C57BL6/J mice, which shows the inferred interactions have a consistent variability across first and last 5 days, and across all cohorts. The small value of Pearsons’ correlation coefficient shows that the exact structure (pairspecific J<sub>ij</sub>) is not stable. At the end of the Results subsection “Stability of sociability over time”, we explicitly say that “This implies that the maximum entropy model does not infer a social structure that is stable over time.”

      (3) EFFECT OF TIMP-1. The reported effects of TIMP-1 on the variance of the J_ij seem very small and possibly caused by a few outlier J_ijs (perhaps from one or two animals) which

      are not present in the control group which seems to have fewer animals (N = 9 minus two mice that died after the surgery vs. N = 14 in the drug group), so the lack of a significant difference in the sigma[J_ij] could simply be due to a smaller N (a test for the interaction group x drug was not done). 

      The clearest effect of TIMP-1 seems to be a change in place preference (h_i) and not the interaction terms (J_ij) (Fig. 3F bottom). But this could be explained by a number of factors that have nothing to do with sociability such as that recovery from surgery makes them eat more/less. The fact that it seems to be present, as recognized by the authors, in the control group with no TIMP-1 and that this effect was not observed in the female group F1, puts into question the specificity and reproducibility of the result. 

      Finally, the effect of TIMP-1 in the DTI would require more statistics (testing the interaction group x drug). The fact that the control group has fewer animals (N = 9 vs. 15 and 13 in the drug groups), and that there is a weaker trend in the DTI of the control group to start high and then decrease, makes this test necessary.  

      Now, after we select a proper timescale to learn the pairwise maximum entropy model, we update the manuscript to present results only on 5-day aggregation of data (see updated Figure 3, updated supplementary figures, Figure 3 - figure supplement 1 and 2). For the variance of the J<sub>ij</sub>, the F-test between different 5-day aggregates before and after TIMP for the male drug group now shows a nonsignificant p-value after applying the Bonferroni correction. For the female drug group, the difference of the J<sub>ij</sub> variance is still significant. 

      To test the effect of different group size on DTI, we subsampled the drug groups by 1) subsampling the inferred interactions learned from the original N = 15 or N = 13 data, or 2) subsampling the mice colocalization data and then inferring the pairwise interactions.  In both cases, the resulting DTI for the subsampled drug group still exhibits the same global pattern as before, i.e. after TIMP-1 injection, DTI significantly increases, which after 5 days falls back to the baseline level. The results are supported by two additional supplementary figures, Figure 4 - figure supplement 1 and 2. This result is referred to in the text in the Results subsection “Impaired neuronal plasticity in the PL affects the structure of social interactions” (page 10, line 333 - 336): “Notably, the difference of the DTI is not due to the control group M4 has less mice, as subsampling both on the level of the inferred interactions (Figure 4 - figure supplement 1) and on the level of the mice locations (Figure 4 - figure supplement 2) give the same DTI for cohorts M1 and F1.”

      (4) MODEL COMPARISON. Any quantitative measure of "goodness" of the model , (i.e., comparison of the predictions of the model with triplet frequency as well as the distribution of p(K)) should be cross-validated. In particular, Fig. S2 needs to be cross-validated for the goodness of fit to be properly quantified. Is the analysis shown in Fig. 3F crossvalidated? Because otherwise, there is an expected increase in the likelihood simply explained by an increase in the number of parameters of the model (i.e., adding the J_ij's). 

      As discussed in our responses to Comment 1 and 2, we have added results about cross-validation in the new supplementary figures, Figure 2 – figure supplement 5 and 6 , for which we computed the test-set and training-set likelihood for maximum entropy models with pairwise interactions and also for models with triplet interactions. Figure 2 - figure supplement 6 shows the pairwise model does not overfit when we consider the aggregated data from more or equal to 4 days. 

      (5) EFFECT OF SLEEP. The comparison of p(K) between the data and the model requires a bit more investigation: the model underestimates instances in which almost all mice were in the same compartment (i.e., for K >= 13. p(K)_data >> p(K)_MEM; btw where is the pairwise point p(15) in Fig. 2E and Fig. S4?). Could this be because there were still short periods during the dark cycle in which all mice were asleep in one of the cages? As explained by the authors, sleep introduces very strong higher order correlations between animals as they like sleeping altogether. Knowing whether removing light periods was enough to remove this "sleep contamination" or not, would be important in order to interpret discrepancies between the pairwise model and the data. 

      Figure 2E shows that the pairwise maximum entropy model (in black) overestimates the data (in blue circles) for P(K) at large K (and not underestimates). In the data, we never observe all 15 mice being in the same box; hence P<sub>data</sub>(15) = 0, and does not show up in the log-scaled figure (same for Figure 2 - figure supplement 3). A possible explanation for the pairwise model overestimating P(K) at large K is that the finite-sized box limits the total number of mice that are comfortably staying in the same box. It can also be due to the fact that the number of time points at which K >= 13 is small and hence causes an underestimation due to finite data. We have added this interpretation of the discrepancy of P(K) to Section “Pairwise interaction model explains the statistics of social behavior” in page 6, line 160. 

      We thank the Reviewer for raising the point of “sleep contamination”. Indeed, Eco-HAB data, as do data from other 24h-testing behavioral systems, demonstrate distinct differences in activity levels during the light and dark phases of the light-dark cycle (Rydzanicz et al., EMBO Mol. Med., 2024). During the light phases, mice primarily sleep and, as noted, they huddle, so many individuals within the cohort tend to remain in close proximity for extended periods. We acknowledge that including such periods in the analysis could potentially introduce confounding effects to the model due to limited movement and interactions, and this is why we decided not to use this data. However, during the dark phases, mice are highly active, with individuals rarely staying in the same compartment for long periods. Specifically, in the dark phases, while there are occasional instances where a few mice may remain in the same compartment for over 1 hour, the majority exhibit considerable mobility, actively exploring and transitioning between compartments. We see no compelling reason to exclude these periods from our analysis, as such activity aligns with the natural behavioral repertoire of the mice and provides robust data for our model. Furthermore, it is well-established that mammals, including nocturnal species such as mice, are most active shortly after waking, typically at the onset of their active phase (i.e., the beginning of the dark phase). To ensure a conservative approach, we specifically analyzed the first 6 hours of the dark phase when the cumulative number of box visits is at its peak, indicating heightened activity levels. In our view, this period offers an optimal window for studying natural behaviors, including social interactions.

      Additionally, prior studies using the Eco-HAB system have consistently demonstrated that mice engage in social interactions both within the compartments and in the connecting tubes during the dark phase (Puścian et al., eLife, 2016, Winiarski et al. in press). Given this evidence and the observed behavioral dynamics in our data, the likelihood of mice being asleep during the analyzed periods of the dark phase is very low.

      We hope this clarification addresses the reviewer’s concerns and highlights the rationale underpinning our analysis choices. Thank you for raising this important point, which allowed us to provide additional context for our approach.

      (6) COMPARTMENT PREFERENCES. The differences between p(K) across compartments also would require a bit more attention: of a MEM with non-spatially dependent pair-wise interactions shows differences across compartments, it must be because of the terms h_{i,r} terms which contain a compartment index, right? Wouldn't this imply that the independence model, which always underrepresents data events with large K, already contains the difference in goodness of fit between compartments (1, 3) and (2, 4)? In the plots, it does not look like the goodness of the independent model depends on the compartment (the authors could compare directly the models' predictions between compartments). Moreover, when looking at Fig. 2C, it does not look like the value of h_{i,r} in compartments (1,3) is higher than in (2,4) (if anything, it would be the other way around). How can this be explained? It would be good to know if the difference across compartments comes from differences in the empirical p(K) or in the models' prediction? If the difference is in the data p(K), could it be that the compartments 2-4 showing higher p(K=15) (i.e., larger difference with the pairwise MEM prediction) are those chosen by mice to sleep during the light cycle? If not, what could explain these differences across compartments? Could the presence of food and water explain this difference? 

      The reviewer is correct, in the pairwise MEM, the difference across compartments enter in the box preference h<sub>ir</sub>. Greater h<sub>ir</sub> means compartment r is more attractive to mouse i. Because box 2 and 4 contain food and water, we expect that mice are more attracted to box 2 and 4, and this is what we see in Figure 2C, bottom subpanels. To reduce the number of parameters to look at, we introduce an index Δh<sub>i</sub> = h<sub>i2</sub> + h<sub>i4</sub> - h<sub>i1</sub> - h<sub>i3</sub>. This index Δh<sub>i</sub> is found to be mostly positive (see updated Figure 3C), which makes sense because mice are attracted to food and water. 

      Next we analyze the difference of P(K) across compartments (Figure 2 - figure supplement 3). There is already a difference in the P(K) calculated from empirical data. For example, P(K) in compartment 2 has a maximum at K = 5 while P(K) in compartment 1 has a maximum at K = 3

      One interesting observation is that it seems from Figure 2 - figure supplement 3 that the pairwise model explains P(K) in compartment 1 and compartment 3 better than in compartment 2 and in compartment 4. In compartment 2 and 4, the pairwise MEM overestimates P(K) for large K. An alternative MEM could include compartment-specific interaction strength, but it will also introduce 315 new parameters for a mice cohort with size N = 15.

      MINOR

      (1) A more quantitative comparison between in-cohort sociability and couplings J_ij as œwell as mean rates and parameters h_i is required. The matrices in Fig. 2C do look similar. So it is not clear how the comparison between these values is contributing to characterizing the correlation structure of the data. 

      The comparison between in-cohort sociability and coupling J<sub>ij</sub> is given by supplementary Figure 2 - figure supplement 2.  The key point for the model with the learned J<sub>ij</sub> reproducing the in-cohort sociability is given by Figure 2 - figure supplement 1.

      (2) Analysis of "in-state" probability is not explained. To me, it wasn't obvious what Fig. S5 is showing. I was assuming that this analysis was comparing the prediction of the MEM about the position of each animal at each time point, given its preference (h), pairwise interactions (J_ij), and the position of all other animals and the true position of the animal. But it seems like it is comparing the shape of the distribution of this prob across time between the data and the model (I guess the data had to be temporally binned in coarser temporal periods to yield prob values other than 0s and 1s). Also, not clear whether this analysis was done for each compartment separately and then averaged. This needs explanation. 

      The in-state probability is comparing the prediction of the MEM about the position of each animal at each time point, given its preference (h), pairwise interactions (J<sub>ij</sub>), and the position of all other animals and the true position of the animal. To achieve values between 0s and 1s, we bin the data temporally according to the model-predicted in-state probability. 

      We have added the explanation of in-state probability on page 6, line 163-166. We have also improved the description of in-state probability in Materials and Methods (subsection “Comparing in-state probability between model prediction and data”, line 493 - 503, page 15), and added a pointer from the main text to it. 

      (3) Looks like Fig. S3 is not cited in the text. 

      We added a pointer to Fig. S3 (now Figure 2 - figure supplement 2) in line 154. 

      (4) The authors say that "TIMP-1 release from the TIMP-1-loaded nanoparticles diminishes after 5 days." Does that mean from the day of the injection (4-5 days before the "After Day 1") or five days after reintroduced in the ECO-Hab? 

      It means five days after the mice were re-introduced in the ECO-Hab. We have updated the text in Results/Effects of impairing neuronal plasticity in the PL on subterritory preferences and sociability (the end of the first paragraph of this subsection) to 

      “The choice of five-day aggregated data for analysis is in line both with the proper timescales needed for the pairwise maximum entropy model to not overfit, and with the literature that TIMP-1 release from the TIMP-1-loaded nanoparticles is stable for 7-10 days after injection (Chaturvedi et al., 2014)  (i.e. 2-5 days after the mice are reintroduced to Eco-HAB).” (line 272 - 276, page 9)

      (5) In Methods, the authors should report the final N of each of the three groups. 

      The number of final N is reported in Table 1 (page 13). In the updated version, we have added a pointer to Table 1 in Materials and Methods/Animals, and in Materials and Methods/Exclude inactive and dead mice from analysis. We have also expanded the caption of Table 1 to clarify the difference between final N and initial N, and added a pointer to Materials and Methods/Exclude inactive and dead mice from analysis.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors attempt to fully characterize the immunoglobulin (Ig) heavy (H) chain repertoire of tumor-infiltrating B cells from three different cancer types by identifying the IgH repertoire overlap between these, their corresponding draining lymph nodes (DLNs), and peripheral B cells. The authors claim that B cells from tumors and DLNs have a closer IgH profile than those in peripheral blood and that DLNs are differentially involved with tumor B cells. The claim that tumor-resident B cells are more immature and less specific is made based on the characteristics of the CDR-H3 they express.

      Strengths:

      The authors show great expertise in developing in-house bioinformatics pipelines, as well as using tools developed by others, to explore the IgH repertoire expressed by B cells as a means of better characterizing tumor-associated B cells for the future generation of tumor-reactive antibodies as a therapy.

      Weaknesses:

      This paper needs major editing, both of the text and the figures, because as it stands it is convoluted and extremely difficult to follow. The conclusions reached are often not obvious from the figures themselves. Sufficient a priori details describing the framework for their analyses are not provided, making the outcome of their results questionable and leaving the reader wondering whether the findings are on solid ground.

      The authors are encouraged to explain in more detail the premises used in their algorithms, as well as the criteria they follow to define clonotypes, clonal groups, and clonal lineages, which are currently poorly defined and are crucial elements that may influence their results and conclusions.

      In response to this comment, we significantly expanded the paragraph dedicated to the tumor and non-tumor repertoire overlap and isotype composition. The following sections were added:

      First, we characterized the relative similarity of IGH repertoires derived from tumors, DLN, and PBMC on the individual CDR-H3 clonotype level. We define clonotype as an instance with an identical CDR-H3 nucleotide sequence  and identical V- and J- segment attribution (isotype attribution may be different). Unlike other authors, here we do not pool together similar CDR-H3 sequences to account for hypermutation. (Hypermutation analysis is done separately and defined as clonal group analysis. )

      As overlap metrics are dependent on overall repertoire richness, we normalized the comparison using the same number of top most frequent clonotypes of each isotype from each sample (N = 109). Repertoire data for each sample were split according to the immunoglobulin isotype, and the F2 metric was calculated for each isotype separately and plotted as an individual point.

      We also analyzed D metric, which represents the relative overlap diversity uninfluenced by clonotype frequency (Dij\=dij/(di*dj), where dij is the number of clonotypes present in both samples, while di and dj are the diversities of samples i and j respectively). The results for D metric are not shown, as they indicate a similar trend to that of F2 metric. This observation allows us to conclude that tumor IGH repertoires are more similar to the repertoires of lymph nodes than to those of peripheral blood, both if clonotype frequency is taken into account, and when it is not.

      Having excluded the IGHD gene segment from some of their analyses (at least those related to clonal lineage inference and phylogenetic trees), it is not well explained which region of CDR-H3 is responsible for the charge, interaction strength, and Kidera factors, since in some cases the authors mention that the central part of CDR-H3 consists of five amino acids and in others of seven amino acids.

      We considered different ways of calculating amino acid properties of CDR3 and used different parameters for sample-average and individual-sequence CDR3s. Now plots for Fig S6 C are updated  for consistency and the parameters depicted there are now calculated using 5 central amino acids, as in other sections.

      How can the authors justify that the threshold for CDR-H3 identity varies according to individual patient data? 

      Ideal similarity threshold may depend on several factors, such as sampling, sequencing depth etc. For example, imagine a sample picking up 100% of the clonal lineage sequences which differ only 1 amino acid from each other, and a worse quality sample/sequencing picking up only every other sequence. Obviously, the minimal threshold required to accumulate these into a cluster/clonal group  would be different for these two cases (1aa for the former, and ~2 aa for the latter for single-linkage clustering). Or, in other words, the more the sequencing depth, the more dense the clusters will be. The method of individual threshold tailoring relies on the following: https://changeo.readthedocs.io/en/latest/examples/cloning.html

      Although individual kidera factors that are significant in the context of our analysis are described in the text one by one on their first appearance, we now also added a sentence to describe Kidera factor analysis in general (page 8):

      Kidera factors are a set of scores which quantify physicochemical properties of protein sequences (Nakai et al. 1988). 188 physical properties of the 20 amino acids are encoded using dimension reduction techniques.

      Throughout the analyses, the reasons for choosing one type of cancer over another sometimes seem subjective and are not well justified in the text.

      Whenever possible, we pooled all patients with all cancer types together, because the number of available samples did not allow us to draw any significant conclusions comparing between individual cancer types. When analyzing and showing individual patient data, we also did not attempt to depict any cancer-type-specific findings, but it is inevitable that we name a specific cancer type when labelling a sample coming from a specific tumor.

      Overall, the narrative is fragmented. There is a lack of well-defined conclusions at the end of the results subheadings.

      In addition to the described above, a conclusion was added to the paragraph describing hypermutation analysis:

      IGHG clonotypes from lung cancer samples show higher number of hypermutations, possibly reflecting high mutational load found in lung cancer tissue. For melanoma, another cancer known for high mutational load, no statistically significant difference was found. This may be due to higher variance between melanoma samples, which hinders the analysis, or due to the small sample size.

      The exact same paragraph is repeated twice in the results section.

      Corrected.

      The authors have also failed to synchronise the actual number of main figures with the text, and some panels are included in the main figures that are neither described nor mentioned in the text  (Venn diagram Fig. 2A and phylogenetic tree Fig. 5D). Overall, the manuscript appears to have been rushed and not thoroughly read before submission.

      Corrected.

      Reviewers are forced to wade through, unravel, and validate poorly explained algorithms in order to understand the authors' often bold conclusions.

      We hope that the aforementioned additions to the text and also addition to the Figure 1 make the narrative more easily understandable.

      Reviewer #2 (Public Review):

      Summary:

      The authors sampled the B cell receptor repertoires of Cancers, their draining lymph nodes, and blood. They characterized the clonal makeup of all B cells sampled and then analyzed these clones to identify clonal overlap between tissues and clonal activation as expressed by their mutation level and CDR3 amino acid characteristics and length. They conclude that B cell clones from the Tumor interact more with their draining lymph node than with the blood and that there is less mutation/expansion/activation of B cell clones in Tumors. These conclusions are interesting but hard to verify due to the under-sampling and short sequencing reads as well as confusion as to when analysis is across all individuals or of select individuals.

      Strengths:

      The main strength of their analysis is that they take into account multiple characteristics of clonal expansion and activation and their different modes of visualization, especially of clonal expansion and overlap. The triangle plots once one gets used to them are very nice.

      Weaknesses:

      The data used appears inadequate for the conclusions reached. The authors' sample size of B cells is small and they do not address how it could be sufficient. At such low sampling rates, compounded by the plasmablast bias they mention, it is unclear if the overlap trends they observe show real trends. Analyzing only top clones by size does not solve this issue. As it could be that the top 100 clones of one tissue are much bigger than those of another and that all overlap trends are simply because the clones are bigger in one tissue or the other. i.e there is equal overlap of clones with blood but blood is not sufficiently sampled given its greater diversity and smaller clones.

      Regarding the number of clonotypes to be taken into account,  we were limited by the B cell infiltration of tumor samples and our ability to capture their repertoire. However, we use technical replicates on the level of cell suspension to ensure that at least top clonotypes are consistently sampled. So, this is how the data should be interpreted - as describing the most abundant clones in the repertoire (which also may be considered the most functionally relevant in case of tumor infiltrating lymphocytes).

      To analyze the repertoire overlap, we generally use the F2 metric that takes clone size into account - because we think that clone size is an important functional factor. However, we have now added the description of using D metric (does not include clone frequency as a parameter) - which shows exactly the same trend as F2 metric. So, both F2 and D overlap metrics support our conclusion of higher overlap between tumor and LN.

      The following text was added:

      We also analyzed D metric, which represents the relative overlap diversity uninfluenced by clonotype frequency (Dij\=dij/(di*dj), where dij is the number of clonotypes present in both samples, while di and dj are the diversities of samples i and j respectively). The results for D metric are not shown, as they indicate a similar trend to that of F2 metric. This observation allows us to conclude that tumor IGH repertoires are more similar to the repertoires of lymph nodes than to those of peripheral blood, both if clonotype frequency is taken into account, and when it is not.

      All in all, of course, the deeper the better, but given the data we were able to generate from the samples, this was the best approach to normalization that could be used.

      Similarly, the read length (150bp X2) is too short, missing FWR1 and CDR1 and often parts of FWR2 if CDR3 is long. As the authors themselves note (and as was shown in (Zhang 2015 - PMC4811607) this makes mutation analysis difficult.

      Indeed, we are aware of this problem, and therefore only a small part of the manuscript is dedicated to the hypermutation analysis. However, as the CDR-H3 region is the most mutated part, we still can capture significant diversity of mutations. To address the question of applicability of our data for the hypermutation phylogeny analysis, we compare the distribution of physico-chemical properties along the trees of hypermutation using the 150+150 and 300+300 data from the same donor and the same set of samples. The main conclusion is that neither for long, nor for short datasets could any correlation of physicochemical properties of the CDR-H3 region with the rank of the clonotype on the tree be found.  

      It also makes the identification of V genes and thus clonal identification ambiguous. This issue becomes especially egregious when clones are mutated.

      Again, this would be important for clonotype phylogeny analysis. However, for the simple questions that we address with our clonal group analysis, such as clonal group overlap between tissues etc, we consider this data acceptable, because if any mislabelling of V segment occurs, it is a) rare and b) is equally frequent in all types of samples. Therefore, any conclusions made are still valid despite this technical drawback.

      To directly address the question of mislabelling of V-genes in our data, we looked at the average number of different  V-genes attributed to the same nucleotide sequence of CDR-H3 region in the short (150+150) and long (300+300) datasets from the same donor. Indeed, some ambiguity of V-gene labelling is observed (see below), but we think that it is unlikely to influence any of our cautious conclusions.

      Author response image 1.

      Finally, it is not completely clear when the analysis is of single individuals or across all individuals. If it is the former the authors did not explain how they chose the individuals analyzed and if the latter then it is not clear from the figures which measurements belong to which individual (i.e they are mixing measurements from different people).

      We addressed this issue by adding a comment to each figure caption, describing whether a particular figure or panel describes individual or pooled data, and also whether the analysis is done on individual clonotype or clonal group level.

      Also, in case pooled data were used, we added the number of patients that was pooled for a particular type of analysis. This number differs from one type of analysis to the other, because not all the patients had a complete set of tissues, and also not all samples passed a quality check for a particular analysis.

      Here are the numbers listed:

      Fig 2A: N=6 (we were only considering those who had all three tissues)

      Fig 2C, N=14 (all)

      2D: N=14 (all)

      2E N=7 (have both tum and PBMC).

      2F N=9 (have both tum and PBMC).

      2G N=9 (have both tum and PBMC)

      2H N=7 (have both tum and LN)

      3A N=14 (all)

      3B N=11 (only those with tumor)

      3E - N=14

      7F N=11 (all that have tumor)

      Reviewer #3 (Public Review):

      In multiple cancers, the key roles of B cells are emerging in the tumor microenvironment (TME). The authors of this study appropriately introduce that B cells are relatively under-characterised in the TME and argue correctly that it is not known how the B cell receptor (BCR) repertoires across tumors, lymph nodes, and peripheral blood relate. The authors therefore supply a potentially useful study evaluating the tumor, lymph node, and peripheral blood BCR repertoires and site-to-site as well as intra-site relationships. The authors employ sophisticated analysis techniques, although the description of the methods is incomplete. Among other interesting observations, the authors argue that the tumor BCR repertoire is more closely related to that of draining lymph node (dLN) than the peripheral blood in terms of clonal and isotype composition. Furthermore, the author's findings suggest that tumor-infiltrating B cells (TIL-B) exhibit a less mature and less specific BCR repertoire compared with circulating B cells. Overall, this is a potentially useful work that would be of interest to both medical and computational biologists working across cancer. However, there are aspects of the work that would have benefitted from further analysis and areas of the manuscript that could be written more clearly and proofread in further detail.

      Major Strengths:

      (1) The authors provide a unique analysis of BCR repertoires across tumor, dLN, and peripheral blood. The work provides useful insights into inter- and intra-site BCR repertoire heterogeneity. While patient-to-patient variation is expected, the findings with regard to intra-tumor and intra-dLN heterogeneity with the use of fragments from the same tissue are of importance, contribute to the understanding of the TME, and will inform future study design.

      (2) A particular strength of the study is the detailed CDR3 physicochemical properties analysis which leads the authors to observations that suggest a less-specific BCR repertoire of TIL-B compared to circulating B cells.

      Major Weaknesses:

      The study would have benefitted from a deeper biological interpretation of the data. While given the low number of patients one can plausibly understand a reluctance to speculate about clinical details, there is limited discussion about what may contribute to observed heterogeneity.

      We indeed do not want to overinterpret our data, especially where it comes to the difference between types of cancer. On the other hand, extracting similar patterns between different cancer types allows to pinpoint mechanisms that are more general and do not depend on cancer type. As for the potential source of intratumoral heterogeneity that we observe, we think that it may be coming from the selective sampling of tertiary lymphoid structures. We include IHC data for TLS detection in the supplementary Fig.5.  Also, tumor mutation clonality may correlate with differential antibody response (i.e. different IGH clonotypes developing to recognize different antigens) – as has been previously described for TCRs by the lab of B.Chain in https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6890490/.

      For example, for the analysis of three lymph nodes taken per patient which were examined for inter-LN heterogeneity, there is a lack of information regarding these lymph nodes.

      Unfortunately no clinical information about the lymph nodes was available.

      'LN3' is deemed as exhibiting the most repertoire overlap with the tumor but there is no discussion as to why this may be the case.

      The following phrases describes this in the “LN-to-LN heterogeneity in colorectal cancer” paragraph:

      Similarly, an unequal interaction of tumors with DLNs was observed at the level of hypermutating clonal groups.

      Functionally, this may again indicate that within a group of DLNs, nodes are unequal in terms of access to tumor antigens, and this inequality shapes the BCR repertoires within these lymph nodes.

      (2) At times the manuscript is difficult to follow. In particular, the 'Intra-LN heterogeneity' section follows the 'LN-LN heterogeneity in colorectal cancer' section and compares the overlap of LN fragments (LN11, LN21, LN31) with the tumor in two separate patients (Fig 6A). In the previous section (LN-LN), LN11, LN21, LN31 are names given to separate lymph nodes from the same patient. The fragments are referred to as 'LN2' and the nodes in the previous section are referred to similarly. This conflation of naming for nodes and fragments is confusing.

      We corrected this.

      (3) There is a duplicated paragraph in 'Short vs long trees' and the following section 'Productive involvement in hypermutation lineages depends on CDR3 characteristics.

      Corrected.

      Reviewer #1 (Recommendations For The Authors):

      - Figures:

      Figure 1A lacks resolution

      Corrected

      Figure 2A, Venn diagram: What do the colors indicate?

      Corrected

      Figure 5D, why include this tree when there is no mention of it in the text?

      Described

      Figures 8, 9, and 10 are not to be found. One should not have to figure out that they became supplementary in the end.

      Corrected

      Regarding the physicochemical properties of CDR-H3, what do the authors mean by "the central part"? Do the authors refer to the CDR-H3 loop, and if so, how is that defined when the IGHD gene segment is excluded from the analyses? Is it 5 amino acids (Productive involvement in hypermutating lineages depends on CDR3 characteristics, Page 21/39 in merged document) and (CDR3 properties, Page 8/39 in merged document), or 7 amino acids (Short vs long trees phylogeny analysis, Page 19/39 in merged document)? Please clarify.  

      We considered different ways of calculating amino acid properties of CDR3 and used different parameters for sample-average and individual-sequence CDR3s. Now plots for Fig S6 C are updated for consistency. IGHD segment was not excluded from the analysis. The reviewer might be confused by our description of phylogenetic inference, when an artificial outgroup with D segment deleted is added to the clonal group to facilitate the inference process. All other sequences were analyzed in their original form with the D segment. This way, we could avoid biases in phylogeny introduced by misassignment of D gene germline to the outgroup.

      What was the threshold for CDR-H3 identity in their analyses? How can the authors justify that this value changes according to individual patient datasets? (Materials & methods, Clonal lineage inference Page 29/39 in merged document).

      As described earlier, ideal similarity threshold may depend on several factors, such as sampling, sequencing depth etc. For example, imagine a sample picking up 100% of the clonal lineage sequences which differ only 1 amino acid from each other, and a worse quality sample/sequencing picking up only every other sequence. Obviously, the minimal threshold required to accumulate these into a clonotype would be different for these two cases (1aa for the former, and ~2 aa for the latter for single-linkage clustering). The method of individual threshold tailoring relies on this: https://changeo.readthedocs.io/en/latest/examples/cloning.html

      What is the difference between tumor-induced and tumor-infiltrating B cells? How can the authors discriminate between the two? Page 6/39 in the merged document.

      corrected to tumor-infiltrating

      "Added nucleotides" meaning N additions? Page 3/39 in the merged document.

      yes

      How many cancer patients were enrolled? 17 or 14(Materials & methods page 27/39 in the merged document)? Please clarify.   

      In the current project 14 patients were enrolled. The appropriate changes have been introduced in the final text. Supplementary table 2 has been added with the patient data.

      Abbreviations are used without full descriptions.

      According to reviewer’s recommendation, a list of abbreviations was added in the manuscript, and also full descriptions were added in the text upon first mentioning of the term.

      Use either CDR3 or CDR-H3

      We corrected the text to use CDR-H3 abbreviation throughout the text.

      Reviewer #2 (Recommendations For The Authors):

      I would like to start by apologizing for the time it took me to review.

      As I mentioned above there are issues with the clonal sampling of the sequencing length and the statistics in this paper. From reading the paper I am not sure if they are fixable but there are some things that could be tried.

      (1) The authors mention the diversity of their individual analysis - 17 individuals across 3 cancer types, but do not then systematically show us how the different things they measure track across the different individuals and cancer types. it is possible that some trends would be more convincing if we saw them happening again and again across all individuals. But, as I said above, the authors do not identify individuals clearly across all their types of analysis nor do they explain why sometimes they show analysis of specific individuals.

      For overlap analysis (Fig. 2 except panel B), CDR3 properties analysis (Fig. 3, Fig. S7), clonal group analysis (Fig. 4) we used pooled data on all cancers, unless it is indicated otherwise on the panel. For overlap analysis, we used Cytoscape graph (Fig. 2B) for one patient, mp3, to illustrate the findings that were made on pooled data. For other types of analysis, such as overlap between individual lymph nodes, or tumor fragments (Fig. 5, 6, 7 except panel F) pooled analysis is not possible due to the individual nature of the processes in question.

      (2) The authors do not address how lacking their sampling is nor the distribution of clone sizes in different tissues/ individuals/ subsets. Without such a discussion it is not clear how tenuous or convincing their conclusions are.

      (3) The short sequencing lengths limit the ability to exactly identify V and thus the germline root of clones, whose positions are mutated and clonal association of sequences. The authors appear to be aware of this as they often use the most common ancestor as the start of their analysis... however, again there are inconsistencies that are not clearly described in the text. in creating trees with change they defined roots as the putative germline and at least in most cases also in clone association although in some analyses potentially similar clones were collapsed into clonotypes. Again it is not clear when one method was used or the other and how the choice was made what to choose.

      Here we can only state that we consistently used the approach described in the Methods section, which was the following:

      First, the repertoires were clustered into clonal lineages using the criteria described in “Methods: Clonal lineage inference” Assuming that each clonotype sequence in the clonal lineage originated from the same ancestor, we try to recover the phylogeny. Please note that we refer to the individual BCR sequences as “clonotypes”, and to a group of clonotypes that presumably share a common ancestor - as “clonal lineage” or “clonal group”.

      The phylogeny of B-cell hypermutations was inferred for each clonal lineage of size five or more using the maximum likelihood method and the GTR GAMMA nucleotide substitution model. To find the most recent common ancestor (MRCA) or “root” of the tree, we used an artificial outgroup constructed as a conjugate of germline segments V and J defined by MIXCR and added it to the clonal lineage. The D segment was excluded from the outgroup formation, as there was insufficient confidence in the germline annotations due to its short length and high level of mutations. The rest of the clonotypes were still analyzed in their original form with D segment in place. Deleting D segment from the outgroup simply eliminates the risk of biasing the phylogeny by missasigning D segment germline sequence to the outgroup. The MUSCLE tool was used for multiple sequence alignment and RAxML software was used to build and root phylogenetic trees.

      (4) Beyond the statistical issues mentioned above: the unclear selection of individual examples for comparison and significance testing, the mixing of individuals and cancer types without clear identification, etc. there is in general a lack of coherence in the statistical analysis performed. specifically:

      (a) the authors should choose one cutoff for significance (0.01 for instance) and then just mention when things are significant and when not. There is no need and it is confusing to add the p-value for every comparison. P-values are not good measures of effect size.

      We corrected the figures and left p-values only where they are below significance threshold.

      (b) the Bonferroni correction used is not well characterized. For an alpha of 0.01 in Figures 3 C and D how many tests were performed?

      The number of tests performed that was used for Bonferroni-Holm correction equals the number of comparisons on the heatmap which makes it 39 for each heatmap on Fig 3C and 13 for Fig 3D.

      Finally some minor issues -

      (1) Not all acronyms are described, for instance, TME and TIL. The first time any acronym is used it should be spelled out.  -> Katya B- список сокращений

      (2) The figure captions are not all there...

      (a) there is no caption for Figure 3E.

      corrected

      (b) there are Figure 7 F and G panels but no Figure 7E panel and Figure F is described after Figure G.

      corrected

      (3) A few problems with wording -

      (a) bottom paragraph of page 3 - instead of :

      "different lymph nodes from one draining lymph node pool may be more or less involved"

      Corrected to "different lymph nodes from one draining lymph node pool may be differentially involved"

      (b) figure caption for figure 3a: instead of:

      "CDR3 are on average significantly higher in tumor"

      Corrected to "CDR3 are on average significantly longer in tumor"

      Reviewer #3 (Recommendations For The Authors):

      - FIG1A - Suggest expanding the legend to include more information on the computational analyses.

      added

      - PAGE SIX: Suggest adding a table or some text on patient characteristics. Numbers of unique clonotypes per sample etc. Are there differences in age/sex that need to be considered? Some clonotype information is available in S1 but some summary and statistics would be appreciated.

      Added patient information as Supplementary table 2.

      - PAGE SIX: F2 Metric, suggestion to explain why this was used vs. other metrics.

      We expanded the following paragraph to include information about F2 metric and D metric, and the reason why we are using F2.

      Repertoire data for each sample were split according to the immunoglobulin isotype, and the F2 metric was calculated for each isotype separately and plotted as an individual point. We used the repertoire overlap metric F2 (Сlonotype-wise sum of geometric mean frequencies of overlapping clonotypes), which accounts for both the number and frequency of overlapping clonotypes (Fig. 2A). As expected, significantly lower overlaps were observed between the IGH repertoires of peripheral blood and tumors compared to LN/tumor overlaps. The LN/PBMC overlap also tended to be lower, but the difference was not statistically significant. We also analyzed D metric, which represents the relative overlap diversity uninfluenced by clonotype frequency (Dij\=dij/(di*dj), where dij is the number of clonotypes present in both samples, while di and dj are the diversities of samples i and j respectively). The results for D metric are not shown, as they indicate a similar trend to that of F2 metric. This observation allows us to conclude that tumor IGH repertoires are more similar to the repertoires of tumor-draining LNs than to those of peripheral blood, both if clonotype frequency is taken into account, and when it is not.

      - PAGE SIX: Make clear in the text that mp3 is a patient.

      Added “melanoma patient mp3”

      - PAGE EIGHT: Suggest explaining kidera factors at first use - not all readers will know what they are.

      We expanded the following paragraph to add more information about Kidera factors:

      To explore CDR-H3 physicochemical properties, we calculated the mean charge, hydropathy, predicted interaction strength, and Kidera factors 1 - 9 (kf1-kf9) for five central amino acids of the CDR-H3 region for the 100 most frequent clonotypes of each sample using VDJtools. Kidera factors are a set of scores which quantify physicochemical properties of protein sequences 61. 188 physical properties of the 20 amino acids are encoded using dimension reduction techniques, to yield 9 factors which are used to quantitatively characterize physicochemical properties of amino acid sequences.

      - Fig 5D is not referred to.

      Corrected

    1. Author response:

      The following is the authors’ response to the original reviews

      eLife assessment 

      This valuable study aims to present a mathematical theory for why the periodicity of the hexagonal pattern of grid cell firing would be helpful for encoding 2D spatial trajectories. The idea is supported by solid evidence, but some of the comparisons of theory to the experimental data seem incomplete, and the reasoning supporting some of the assumptions made should be strengthened. The work would be of interest to neuroscientists studying neural mechanisms of spatial navigation. 

      We thank the reviewers for this assessment. We have addressed the comments made by reviewers and believe that the revised manuscript has theoretical and practical implications beyond the subfield of neuroscience concerned with mechanisms underpinning spatial memory and spatial navigation. Specifically, the demonstration that four simple axioms beget the spatial firing pattern of grid cells is highly relevant for the field of artificial intelligence and neuromorphic computing. This relevance stems from the fact that the four axioms define a set of four simple computational algorithms that can be implemented in future work in grid cell-inspired computational algorithms. Such algorithms will be impactful because they can perform path integration, a function that is independent of an animal’s or agent’s location and therefore generalizable. Moreover, because of the functional organization of grid cells into modules, the algorithm is also scalable. Generalizability and scalability are two highly sought-after properties of brain-inspired computational frameworks. We also believe that the question why grid cells emerge in the brain is a fundamental one. This manuscript is, to our knowledge, the first one that provides an interpretable and intuitive answer to why grid cells are observed in the brain. 

      Before addressing each comment, we would like to point out that the first sentence of the assessment appears misphrased. The study does not aim to present a theory for why the periodicity in grid cell firing would be helpful for encoding 2D spatial trajectories. To present a theory “for why grid cell firing would be helpful for encoding 2D trajectories”, one assumes the existence of grid cells a priori. Instead of assuming the existence of grid cells and deriving a computational function from grid cells, our study derives grid cells from a computational function, as correctly summarized by reviewers #1 and #3 in their individual statements. In contrast to previous normative models, we prove mathematically that spatial periodicity in grid cell firing is implied by a sequence code of trajectories. If the brain uses cell sequences to code for trajectories, spatially periodic firing must emerge. As correctly pointed out by reviewer #1, the underlying assumptions of this study are that the brain codes for trajectories and that it does so using cell sequences. In response to comments by reviewer #1, we now discuss these two assumptions more rigorously.

      Public Reviews:

      Reviewer #1 (Public Review): 

      Rebecca R.G. et al. set to determine the function of grid cells. They present an interesting case claiming that the spatial periodicity seen in the grid pattern provides a parsimonious solution to the task of coding 2D trajectories using sequential cell activation. Thus, this work defines a probable function grid cells may serve (here, the function is coding 2D trajectories), and proves that the grid pattern is a solution to that function. This approach is somewhat reminiscent in concept to previous works that defined a probable function of grid cells (e.g., path integration) and constructed normative models for that function that yield a grid pattern. However, the model presented here gives clear geometric reasoning to its case. 

      Stemming from 4 axioms, the authors present a concise demonstration of the mathematical reasoning underlying their case. The argument is interesting and the reasoning is valid, and this work is a valuable addition to the ongoing body of work discussing the function of grid cells. 

      However, the case uses several assumptions that need to be clearly stated as assumptions, clarified, and elaborated on: Most importantly, the choice of grid function is grounded in two assumptions: 

      (1) that the grid function relies on the activation of cell sequences, and 

      (2) that the grid function is related to the coding of trajectories. While these are interesting and valid suggestions, since they are used as the basis of the argument, the current justification could be strengthened (references 28-30 deal with the hippocampus, reference 31 is interesting but cannot hold the whole case). 

      We thank this reviewer for the overall positive and constructive criticism. We agree with this reviewer that our study rests on two premises, namely that 1) a code for trajectories exist, and 2) this code is implemented by cell sequences. We now discuss and elaborate on the data in the literature supporting the two premises.

      In addition to the work by Zutshi et al. (reference 31 in the original manuscript), we have now cited additional work presenting experimental evidence for sequential activity of neurons in the medial entorhinal cortex, including sequential activity of grid cells.

      We have added the following paragraph to the Discussion section:

      “Recent studies provided compelling evidence for sequential activity of neurons representing spatial trajectories. In particular, Gardner et al. (2022) demonstrated that the sequential activity of hundreds of simultaneously recorded grid cells in freely foraging rats represented spatial trajectories. Complementary preliminary results indicate that grid cells exhibit left-rightalternating “theta sweeps,” characterized by temporally compressed sequences of spiking activity that encode outwardly oriented trajectories from the current location (Vollan et al., 2024).

      The concept of sequential grid cell activity extends beyond spatial coding. In various experimental contexts, grid cells have been shown to encode non-spatial variables. For instance, in a stationary auditory task, grid cells fired at specific sounds along a continuous frequency axis (Aronov et al., 2017). Further studies revealed that grid cell sequences also represent elapsed time and distance traversed, such as during a delay period in a spatial alternation task (Kraus et al., 2015). Similar findings were reported for elapsed time encoded by grid cell sequences in mice performing a virtual “Door Stop” task (Heys and Dombeck, 2018).

      Additionally, spatial trajectories represented by temporally compressed grid cell sequences have been observed during sleep as replay events (Ólafsdóttir et al., 2016; O’Neill et al., 2017). Collectively, these studies demonstrate that sequential activity of neurons within the MEC, particularly grid cells, consistently encodes ordered experiences, suggesting a fundamental role for temporal structure in neuronal representations.

      The theoretical underpinnings of grid cell activity coding for ordered experiences have been explored previously by Rueckemann et al. (2021) who argued that the temporal order in grid cell activation allows for the construction of topologically meaningful representations, or neural codes, grounded in the sequential experience of events or spatial locations. However, while Rueckemann et al. argue that the MEC supports temporally ordered representations through grid cell activity, our findings suggest an inverse relationship: namely, that grid cell activity emerges from temporally ordered spatial experiences. Additional studies demonstrate that hippocampal place cells may derive their spatial coding properties from higher-order sequence learning that integrates sensory and motor inputs (Raju et al., 2024) and that hexagonal grids, if assumed a priori, optimally encode transitions in spatiotemporal sequences (Waniek, 2018).

      Together, experimental and theoretical evidence demonstrate the significance of sequential neuronal activity within the hippocampus and entorhinal cortex as a core mechanism for representing both spatial and temporal information and experiences.”

      The work further leans on the assumption that sequences in the same direction should be similar regardless of their position in space, it is not clear why that should necessarily be the case, and how the position is extracted for similar sequences in different positions. 

      We thank this reviewer for giving us the opportunity to clarify this point. We define a trajectory as a path taken in space (Definition 6). By this definition, a code for trajectories is independent of the animal’s spatial location. This is consistent with the definition of path integration, which is also independent of an animal’s spatial location. If the number of neurons is finite (Axiom #4) and the space is large, sequences must eventually repeat in different locations. This results in neural sequences coding for the same directions being identical at different locations. We have clarified this point under new Remark 6.1. in the Results section of the revised:

      “Remark 6.1. Note that a code for trajectories is independent of the animal’s spatial location, consistent with the definition of path integration. This implies that, if the number of neurons is finite (Axiom #4) and the space is large, sequences must eventually repeat in different location, resulting in neural sequences coding for the same trajectories at different locations.”

      The formal proof was already included in the original manuscript: “Generally speaking, starting in a firing field of element i and going along any set of firing fields, some element must eventually become active again since the total number of elements is finite by axiom 4. Once there is a repeat of one element’s firing field, the whole sequence of firing fields of all elements must repeat by axiom 1. More specifically, if we had a sequence 1,2, … , k, 1, t of elements, then 1,2 and 1, t both would code for traveling in the same direction from element 1, contradicting axiom 1.”

      Further: “More explicitly, assuming axioms 1 and 4, the firing fields of trajectory-coding elements must be spatially periodic, in the sense that starting at any point and continuing in a single direction, the initial sequence of locally active elements must eventually repeat with a repeat length of at least 3”.

      Regarding the question how an animal’s position is extracted for similar sequences in different positions, we agree with this reviewer that this is an important question when investigating the contributions of grid cells to the coding of space. However, since a code for trajectories is independent of spatial location, the question of how to extract an animal’s position from a trajectory code is irrelevant for this study.

      While a trajectory code by neural sequences begets grid cells, a spatial code by neural sequences does not. Nevertheless, grid cells could contribute to the coding of space (in addition to providing a trajectory code). However, while experimental evidence from studies with rodents and human subjects and theoretical work demonstrated the importance of grid cells for path integration (Fuhs and Touretzky, 2006; McNaughton et al., 2006; Moser et al., 2017), experimental studies have shown that grid cells contribute little to the coding of space by place cells (Hales et al., 2014). Yet, theoretical work (Mathis et al., 2012) showed that coherent activity of grid cells across different modules can provide a code for spatial location that is more accurate than spatial coding by place cells in the hippocampus. Importantly, such a spatial code by coherent activity across grid cell modules does not require location-dependent differences in neural sequences.

      The authors also strengthen their model with the requirement that grid cells should code for infinite space. However, the grid pattern anchors to borders and might be used to code navigated areas locally. Finally, referencing ref. 14, the authors claim that no existing theory for the emergence of grid cell firing that unifies the experimental observations on periodic firing patterns and their distortions under a single framework. However, that same reference presents exactly that - a mathematical model of pairwise interactions that unifies experimental observations. The authors should clarify this point. 

      We thank this reviewer for this valuable feedback. We agree that grid cells anchor to borders and may be used to code navigated areas locally. In fact, the trajectory code performs a local function, namely path integration, and the global grid pattern can only emerge from performing this local computation if the activity of at least one grid unit or element (we changed the wording from unit to element based on feedback from reviewer #3) is anchored to either a spatial location or a border. Yet, the trajectory code itself does not require anchoring to a reference frame to perform local path integration. Because of the local nature of the trajectory code, path integration can be performed locally without the emergence of a global grid pattern. This has been shown experimentally in mice performing a path integration task where changes in the location of a task-relevant object resulted in translations of grid patterns in single trials. Although no global grid pattern was observed, grid cells performed path integration locally within the multiple reference frames defined by the task-relevant object, and grid patterns were visible when the changes in the references frames were accounted for in computing the rate maps (Peng et al., 2023). The data by Peng et al. (2023) confirm that the anchoring of the grid pattern to borders and the emergence of the global pattern are not required for local coding of trajectories. The global pattern emerges only when the reference frame does not change. However, this global pattern itself might not serve any function. According to the trajectory code model, the beguiling grid pattern is merely a byproduct of a local path integration function that is independent of the animal’s current location (which makes the code generalizable across space). The reviewer is correct that, if the reference frame used to anchor the grid pattern did not change in infinite space, the trajectory code model of grid cell firing would predict an infinite global pattern. But does the proof implicitly assume that space is infinite? The trajectory code model makes the quantitative prediction that the field size increases linearly with an increase in grid spacing (the distance between two fields). If the field size remains fixed, periodicity will emerge in finite spaces that are larger than the grid spacing. We have clarified these points in the revised manuscript:

      “Notably, the trajectory code itself does not require anchoring to a reference frame to perform local path integration. Because of the local nature of the trajectory code, path integration can be performed locally without the emergence of a global grid pattern. This has been shown experimentally in mice performing a path integration task where changes in the location of a task-relevant object resulted in translations of grid patterns in single trials (Peng et al., 2023). Although no global grid pattern was observed because the reference frame was not fixed in space, grid cells performed path integration locally within the reference frame defined by the moving task-relevant object, and grid patterns were visible when the changes in the references frames were accounted for in computing the rate maps”.

      Regarding how the emergence of grid cells from a trajectory code relates to the theory of a local code by grid cells brought forward by Ginosar et al. (ref. 14), we argue that the local computational function suggested by Ginosar et al. is to provide a code for trajectories. The perspective article by Ginosar et al. provides an excellent review of the experimental data on grid cells that point to grid cells performing a local function (see also Kate Jeffery’s excellent review article (Jeffery, 2024) on the mosaic structure of the mammalian cognitive map.) Assuming the existence of grid cells a priori, Ginosar et al. then propose three possible functions of grid cells, all of which are consistent with the trajectory code model of grid cell firing. Yet, the perspective article remains agnostic, in our opinion, on the exact nature of the local computation that is carried out by grid cells. But without knowing the local computation underlying grid cell function, a unifying theory explaining the emergence of grid cells cannot be considered complete. In contrast, our manuscript identifies the local computational function as a trajectory code by cell sequences. We have clarified these points in the revised manuscript:

      “The influential hypothesis that grid cells provide a universal map for space is challenged by experimental data suggesting a yet to be identified local computational function of grid cells (Ginosar et al., 2023; Jeffery, 2024). Here, we identify this local computational function as a trajectory code.”

      The mathematical model of pairwise interactions described by Ginosar et al. is fundamentally different from the mathematical framework developed in our manuscript. The mathematical model by Ginosar et al. describes how pairwise interactions between already existent grid fields can explain distortions in the grid pattern caused by the environment’s geometry, reward zones, and dimensionality. However, the model does not explain why there is a grid pattern in the first place. In contrast, our trajectory model provides an explanation for why grid cells may exist by demonstrating that a grid pattern emerges from a trajectory code by cell sequences. We stand by our assessment that a unifying theory of grid cells is not complete if it takes the existence of the grid pattern for granted.

      Reviewer #2 (Public Review): 

      Summary: 

      In this work, the authors consider why grid cells might exhibit hexagonal symmetry - i.e., for what behavioral function might this hexagonal pattern be uniquely suited? The authors propose that this function is the encoding of spatial trajectories in 2D space. To support their argument, the authors first introduce a set of definitions and axioms, which then lead to their conclusion that a hexagonal pattern is the most efficient or parsimonious pattern one could use to uniquely label different 2D trajectories using sequences of cells. The authors then go through a set of classic experimental results in the grid cell literature - e.g. that the grid modules exhibit a multiplicative scaling, that the grid pattern expands with novelty or is warped by reward, etc. - and describe how these results are either consistent with or predicted by their theory. Overall, this paper asks a very interesting question and provides an intriguing answer. However, the theory appears to be extremely flexible and very similar to ideas that have been previously proposed regarding grid cell function. 

      We thank this reviewer for carefully reading the manuscript and their valuable feedback which helps us clarify major points of the study. One major clarification is that the theoretical/axiomatic framework we put forward does not assume grid cells a priori. In contrast, we start by hypothesizing a computational function that a brain region shown to be important for path integration likely needs to solve, namely coding for spatial trajectories. We go on to show that this computational function begets spatially periodic firing (grid maps). By doing so, we provide mathematical proof that grid maps emerge from solving a local computational function, namely spatial coding of trajectories. Showing the emergence of grid maps from solving a local computational function is fundamentally different from many previous studies on grid cell function, which assign potential functions to the existing grid pattern. As we discuss in the manuscript, our work is similar to using normative models of grid cell function. However, in contrast to normative models, we provide a rigorous and interpretable mathematical framework which provides geometric reasoning to its case.

      Major strengths: 

      The general idea behind the paper is very interesting - why *does* the grid pattern take the form of a hexagonal grid? This is a question that has been raised many times; finding a truly satisfying answer is difficult but of great interest to many in the field. The authors' main assertion that the answer to this question has to do with the ability of a hexagonal arrangement of neurons to uniquely encode 2D trajectories is an intriguing suggestion. It is also impressive that the authors considered such a wide range of experimental results in relation to their theory.  

      We thank this reviewer for pointing out the significance of the question addressed by our manuscript.

      Major weaknesses: 

      One major weakness I perceive is that the paper overstates what it delivers, to an extent that I think it can be a bit confusing to determine what the contributions of the paper are. In the introduction, the authors claim to provide "mathematical proof that ... the nature of the problem being solved by grid cells is coding of trajectories in 2-D space using cell sequences. By doing so, we offer a specific answer to the question of why grid cell firing patterns are observed in the mammalian brain." This paper does not provide proof of what grid cells are doing to support behavior or provide the true answer as to why grid patterns are found in the brain. The authors offer some intriguing suggestions or proposals as to why this might be based on what hexagonal patterns could be good for, but I believe that the language should be clarified to be more in line with what the authors present and what the strength of their evidence is. 

      We thank this reviewer for this assessment. While there is ample experimental evidence demonstrating the importance of grid cells for path integration, we agree with this reviewer that there may be other computational functions that may require or largely benefit from the existence of grid cells. We now acknowledge the fact that we have provided a likely teleological cause for the emergence of grid cells and that there might be other causes for the emergence of grid cells. We have changed the wording in the abstract and discussion sections to acknowledge that our study does provide a likely teleological cause. We choose “likely” because the computational function – trajectory coding – from which grid maps emerge is very closely associated to path integration, which numerous experimental and theoretical studies associate with grid cell function.

      Relatedly, the authors claim that they find a teleological reason for the existence of grid cells - that is, discover the function that they are used for. However, in the paper, they seem to instead assume a function based on what is known and generally predicted for grid cells (encode position), and then show that for this specific function, grid cells have several attractive properties. 

      We agree with this reviewer that we leveraged what is known about grid cells, in particular their importance for path integration, in finding a likely teleological cause. However, the major significance of our work is that we demonstrate that coding for spatial trajectories requires spatially periodic firing (grid cells).This is very different from assuming the existence of grid cells a priori and then showing that grid cells have attractive, if not optimal, properties for this function. If we had shown that grid cells optimized a code for trajectories, this reviewer would be correct: we would have suggested just another potential function of grid cells. Instead, we provide both proof and intuition that trajectory coding by cell sequences begets grid cells (not the other way around), thereby providing a likely teleological cause for the emergence of grid cells. As stated above, we clarified in the revised manuscript that we provide a likely teleological cause which requires additional experimental verification.

      There is also some other work that seems very relevant, as it discusses specific computational advantages of a grid cell code but was not cited here: https://www.nature.com/articles/nn.2901

      We thank this reviewer for pointing us toward this article by (Sreenivasan and Fiete, 2011). The revised manuscript now cites this article in the Introduction and Discussion sections. We agree that the article by (Sreenivasan and Fiete, 2011) discusses a specific computational advantage of a population code by grid cells, namely unprecedented robustness to noise in estimating the location from the spiking information of noisy neurons. However, the work by (Sreenivasan and Fiete, 2011) differs from our work in that the authors assume the existence of grid cells a priori.

      In addition, we now discuss other relevant work, namely work on the conformal isometry hypothesis  by (Schøyen et al., 2024) and (Xu et al., 2024), published as pre-prints after publication of the first version of our manuscript, as well as work on transition scale- spaces by Nicolai Waniek. (Xu et al., 2024) and (Schøyen et al., 2024) investigate conformal isometry in the coding of space by grid cells. Conformal isometry means that trajectories in neural space map trajectories in physical space. (Xu et al., 2024) show that the conformal isometry hypothesis can explain the spatially periodic firing pattern of grid cells. (Schøyen et al., 2024) further show that a module of seven grid cells emerges if space is encoded as a conformal isometry, ensuring equal representation in all directions. While the work by (Xu et al., 2024) and (Schøyen et al., 2024) arrive at very similar conclusions as stated in the current manuscript, the conformal isometry hypothesis provides only a partial answer to why grid cells exist because it doesn’t explain why conformal isometry is important or required. In contrast, a sequence code of trajectories provides an intuitive answer to why such a code is important for animal behavior. Furthermore, we included the work by Nicolai Waniek, (2018, 2020) in the Discussion, who demonstrated that the hexagonal arrangement of grid fields is optimal for coding transitions in space. 

      The paragraph added to the Discussion reads as follows:

      “As part of the proof that a trajectory code by cell sequences begets spatially periodic firing fields, we proved that the centers of the firing fields must be arranged in a hexagonal lattice. This arrangement implies that the neural space is a conformally isometric embedding of physical space, so that local displacements in neural space are proportional to local displacements of an animal or agent in physical space, as illustrated in Figure 5. This property has recently been introduced in the grid cell literature as the conformal isometry hypothesis(Schøyen et al., 2024; Xu et al., 2024). Strikingly, Schøyen et al.(Schøyen et al., 2024) arrive at similar if not identical conclusions regarding the geometric principles in the neural representations of space by grid cells.”

      A second major weakness was that some of the claims in the section in which they compared their theory to data seemed either confusing or a bit weak. I am not a mathematician, so I was not able to follow all of the logic of the various axioms, remarks, or definitions to understand how the authors got to their final conclusion, so perhaps that is part of the problem. But below I list some specific examples where I could not follow why their theory predicted the experimental result, or how their theory ultimately operated any differently from the conventional understanding of grid cell coding. In some cases, it also seemed that the general idea was so flexible that it perhaps didn't hold much predictive power, as extra details seemed to be added as necessary to make the theory fit with the data. 

      I don't quite follow how, for at least some of their model predictions, the 'sequence code of trajectories' theory differs from the general attractor network theory. It seems from the introduction that these theories are meant to serve different purposes, but the section of the paper in which the authors claim that various experimental results are predicted by their theory makes this comparison difficult for me to understand. For example, in the section describing the effect of environmental manipulations in a familiar environment, the authors state that the experimental results make sense if one assumes that sequences are anchored to landmarks. But this sounds just like the classic attractornetwork interpretation of grid cell activity - that it's a spatial metric that becomes anchored to landmarks. 

      We thank this reviewer for giving us the opportunity to clarify in what aspects the ‘sequence code of trajectories’ theory of grid cell firing differs from the classic attractor network models, in particular the continuous attractor network (CAN) model. First of all, the CAN model is a mechanistic model of grid cell firing that is specifically designed to simulate spatially periodic firing of grid cells in response to velocity inputs. In contrast, the sequence code of trajectories theory of grid cell firing resembles a normative model showing that grid cells emerge from performing a specific function. However, in contrast to previous normative models, the sequence code of trajectories model grounds the emergence of grid cell firing in a mathematical proof and both geometric reasoning and intuition. The proof demonstrates that the emergence of grid cells is the only solution to coding for trajectories using cell sequences. The sequence code of trajectories model of grid cell firing is agnostic about the neural mechanisms that implements the sequence code in a population of neurons. One plausible implementation of the sequence code of trajectories is in fact a CAN. In fact, the sequence code of trajectories theory predicts conformal isometry in the CAN, i.e., a trajectory in neural space is proportional to a trajectory of an animal in physical space. However, other mechanistic implementations are possible. We have clarified how the sequence code of trajectories theory of grid cells relates to the mechanistic CAN models of grid cells. 

      We added the following text to the Discussion section:

      “While the sequence code of trajectories-model of grid cell firing is agnostic about the neural mechanisms that implements the sequence code, one plausible implementation is a continuous attractor network (McNaughton et al., 2006; Burak and Fiete, 2009). Interestingly, a sequence code of trajectories begets conformal isometry in the attractor network, i.e., a trajectory in neural space is proportional to a trajectory of an animal in physical space.”

      It was not clear to me why their theory predicted the field size/spacing ratio or the orientation of the grid pattern to the wall. 

      We thank this reviewer for bringing to our attention that we lacked a proper explanation for why the sequence code of trajectories theory predicts the field size/spacing ration in grid maps. We have modified/added the following text to the Results section of the manuscript to clarify this point:

      “Because the sequence code of trajectories model of grid cell firing implies a dense packing of firing fields, the spacing between two adjacent grid fields must change linearly with a change in field size. It follows that the ratio between grid spacing and field size is fixed. When using the distance between the centers of two adjacent grid fields to measure grid spacing and a diameter-like metric to measure grid field size, we can compute the ratio of grid spacing to grid field size as √7≈2.65 (see Methods).”

      We are also grateful for this reviewer’s correctly pointing out that the explanation as to why the sequence code of trajectories predicts a rotation of the grid pattern relative to a set of parallel walls in a rectangular environment. We have now made explicit the underlying premise that a sequence of firing fields from multiple grid cells are aligned in parallel to a nearby wall of the environment. We cite additional experimental evidence supporting this premise. Concretely, we quote Stensola and Moser summarizing results reported in (Stensola et al. 2015): “A surprising observation, however, was that modules typically assumed one of only four distinct orientation configurations relative to the environment” (Stensola and Moser, 2016). Importantly, all of the four distinct orientations show the characteristic angular rotation. Intriguingly, this is predicted by the sequence code of trajectories-model under the premise that a sequence of firing fields aligns with one of the geometric boundaries of the environment, as shown in Author response image 1 below.

      Author response image 1.

      Under the premise that a sequence of firing fields aligns with one of the geometric boundaries (walls) of a square arena, there are precisely four possible distinct configurations of orientations. This is precisely what has been observed in experiments (Stensola et al., 2015; Stensola and Moser, 2016).

      We added clarifying language to the Results section: “Under the premise that a sequence of firing fields aligns with one of the geometric boundaries of the environment, the sequence code model explains that the grid pattern typically assume one of only four distinct orientation configurations relative to the environment41,46. Concretely, the four orientation configurations arise when one row of grid fields aligns with one of the two sets of parallel walls in a rectangular environment, and each arrangement can result in two distinct orientations (Figure 3B).”

      I don't understand how repeated advancement of one unit to the next, as shown in Figure 4E, would cause the change in grid spacing near a reward. 

      In familiar environments, spatial firing fields of place cells in hippocampal CA1 and CA3 tend to shift backwards with experience (Mehta et al., 2000; Lee et al., 2004; Roth et al., 2012; Geiller et al., 2017; Dong et al., 2021). This implies that the center of place fields move closer to each other. A potential mechanism has been suggested, namely NMDA receptor-dependent longterm synaptic plasticity (Ekstrom et al., 2001). When we apply the same principle observed for place fields on a linear track to grid fields anchored to a reward zone, grid fields will “gravitate” towards the reward side. A similar idea has been presented by (Ginosar et al., 2023) who use the analogy of reward locations as “black holes”. In contrast to (Ginosar et al., 2023), who we cite multiple times, our idea unifies observations on place cells and grid cells in 1-D and 2-D environments and suggests a potential mechanism. We changed the wording in the revised manuscript and clarified the underlying premises.

      I don't follow how this theory predicts the finding that the grid pattern expands with novelty. The authors propose that this occurs because the animals are not paying attention to fine spatial details, and thus only need a low-resolution spatial map that eventually turns into a higher-resolution one. But it's not clear to me why one needs to invoke the sequence coding hypothesis to make this point. 

      We agree with this reviewer that this point needs clarification. The sequence code model adds explanatory power to the hypothesis that the grid pattern in a novel environment reflects a lowresolution mapping of space or spatial trajectories because it directly links spatial resolution to both field size and spacing of a grid map. Concretely, the spatial resolution of the trajectory code is equivalent to the spacing between two adjacent spatial fields, and the spatial resolution is directly proportional to the grid spacing and field size. If one did not evoke the sequence coding hypothesis, one would need to explain how and why both spacing and field size are related to the spatial resolution of the grid map. Lastly, as written in the manuscript text, we point out that, while the experimentally observed expansion of grid maps is consistent with the sequence code of trajectory, it is not predicted by the theory without making further assumption. 

      The last section, which describes that the grid spacing of different modules is scaled by the square root of 2, says that this is predicted if the resolution is doubled or halved. I am not sure if this is specifically a prediction of the sequence coding theory the authors put forth though since it's unclear why the resolution should be doubled or halved across modules (as opposed to changed by another factor). 

      We agree with reviewer #2 that the exact value of the scaling factor is not predicted by the sequence coding theory. E.g., the sequence code theory does not explain why the spatial resolution doesn’t change by a factor 3 or 1.5 (resulting in changes in grid spacing by square root of 3 or square root of 1.5, respectively). We have changed the wording to reflect this important point. We further clarified in the revised manuscript that future work on multiscale representations using modules of grid cells needs to show why changing the spatial resolution across modules by a factor of 2 is optimal. Interestingly, a scale ratio of 2 is commonly used in computer vision, specifically in the context of mipmapping and Gaussian pyramids, to render images across different scales. Literature in the computer vision field describes why a scaling factor of 2 and the use of Gaussian filter kernels (compare with Gaussian firing fields) is useful in allowing a smooth and balanced transition between successive levels of an image pyramid (Burt and Adelson, 1983; Lindeberg, 2008). Briefly, larger factors (like 3) could result in excessive loss of detail between levels, while smaller factors (like 1.5) would not reduce the image size enough to justify additional levels of computation (that would come with the structural cost of having more grid cell modules in the brain). We have clarified these points in the Discussion section.

      Reviewer #3 (Public Review): 

      The manuscript presents an intriguing explanation for why grid cell firing fields do not lie on a lattice whose axes aligned to the walls of a square arena. This observation, by itself, merits the manuscript's dissemination to the eLife's audience. 

      We thank this reviewer for their positive assessment.

      The presentation is quirky (but keep the quirkiness!). 

      We kept the quirkiness.

      But let me recast the problem presented by the authors as one of combinatorics. Given repeating, spatially separated firing fields across cells, one obtains temporal sequences of grid cells firing. Label these cells by integers from $[n]$. Any two cells firing in succession should uniquely identify one of six directions (from the hexagonal lattice) in which the agent is currently moving. 

      Now, take the symmetric group $\Sigma$ of cyclic permutations on $n$ elements.  We ask whether there are cyclic permutations of $[n]$ such that 

      \left(\pi_{i+1} - \pi_i \right) \mod n \neq \pm 1 \mod n, \; \forall i. 

      So, for instance, $(4,2,3,1)$ would not be counted as a valid permutation of $(1,2,3,4)$, as $(2,3)$ and $(1,4)$ are adjacent. 

      Furthermore, given $[n]$, are there two distinct cyclic permutations such that {\em no} adjacencies are preserved when considering any pair of permutations (among the triple of the original ordered sequence and the two permutations)? In other words, if we consider the permutation required to take the first permutation into the second, that permutation should not preserve any adjacencies. 

      {\bf Key question}: is there any difference between the solution to the combinatorics problem sketched above and the result in the manuscript? Specifically, the text argues that for $n=7$ there is only {\em one} solution. 

      Ideally, one would strive to obtain a closed-form solution for the number of such permutations as a function of $n$.  

      This is a great question! We currently have a student working on describing all possible arrangements of firing fields (essentially labelings of the hexagonal lattice) that satisfy the axioms in 2D, and we expect that results on the number of such arrangements will come out of his work. We plan to publish those results separately, possibly targeting a more mathematical audience.   

      The argument above appears to only apply in the case that every row (and every diagonal) contains all of the elements 1,...,n. However, when n is not prime, there are often arrangements where rows and/or diagonals do not contain every element from 1,...,n. For example, some admissible patterns with 9 neurons have a repeat length of 3 in all directions (horizontally and both diagonals). As a result the construction listed here will not give a full count of all possible arrangements. 

      Recommendations for the authors:  

      Reviewer #1 (Recommendations For The Authors): 

      I think the concise style of mathematical proof is both a curse and a blessing. While it delivers the message, I think the fluency and readability of the mathematical proof could be improved with longer paragraphs and some more editing. 

      We have added some clarifications in the text that we hope improve the readability.

      Reviewer #3 (Recommendations For The Authors): 

      A minor qualm I have with the nomenclature: 

      On page 7: 

      “To prove this statement, suppose that row A consists of units $1, \dots , k$ repeating in this order. Then any row that contains any unit from $1, \dots, k$ must contain the full repeat $1, \dots , k$ by axiom 1. So any row containing any unit from $1,\dots , k$ is a translation of row A, and any unit that does not contain them is disjoint from row A.”

      The last use of `unit' at the end of this paragraph instead of `row' is confusing. Technically, the authors have given themselves license to use this term by defining a unit to be “either to a single cell or a cell assembly”. Yet modern algebra tends to use `unit' as meaning a ring element that has an inverse.  

      We have renamed “unit” to “element” to avoid confusion with the terminology in modern algebra.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      The authors examine how probabilistic reversal learning is affected by dopamine by studying the effects of methamphetamine (MA) administration. Based on prior evidence that the effects of pharmacological manipulation depend on baseline neurotransmitter levels, they hypothesized that MA would improve learning in people with low baseline performance. They found this effect, and specifically found that MA administration improved learning in noisy blocks, by reducing learning from misleading performance, in participants with lower baseline performance. The authors then fit participants' behavior to a computational learning model and found that an eta parameter, responsible for scaling learning rate based on previously surprising outcomes, differed in participants with low baseline performance on and off MA.

      Questions:

      (1) It would be helpful to confirm that the observed effect of MA on the eta parameter is responsible for better performance in low baseline performers. If performance on the task is simulated for parameters estimated for high and low baseline performers on and off MA, does the simulated behavior capture the main behavioral differences shown in Figure 3?

      We thank the reviewer for this suggestion. We agree that the additional simulation provides valuable confirmation of the effect of methamphetamine (MA) on the eta parameter and subsequent choice behavior. Using individual maximum likelihood parameter estimates, we simulated task performance and confirmed that the simulated behavior reflects the observed mean behavioral differences. Specifically, the simulation demonstrates that MA increases performance later in learning for stimuli with less predictable reward probabilities, particularly in subjects with low baseline performance (mean ± SD: simPL low performance: 0.69 ± 0.01 vs. simMA low performance: 0.72 ± 0.01; t(46) = -2.00, p = 0.03, d = 0.23).

      We have incorporated this analysis into the manuscript. Specifically, we added a new figure to illustrate these findings and updated the text accordingly. Below, we detail the changes made to the manuscript.

      From the manuscript page 12, line 25:

      “Sufficiency of the model was evaluated through posterior predictive checks that matched behavioral choice data (see Figure 4D-F and Figure 5) and model validation analyses (see Supplementary Figure 2). Specifically, using individual maximum likelihood parameter estimates, we simulated task performance and confirmed that MA increases performance later in learning for stimuli with less predictable reward probabilities, particularly in subjects with low baseline performance (Figure 5A; mean ± SD: simPL low performance: 0.69 ± 0.01 vs. simMA low performance: 0.72 ± 0.01; t(46) = -2.00, p = 0.03, d = 0.23).”

      (2) In Figure 4C, it appears that the main parameter difference between low and high baseline performance is inverse temperature, not eta. If MA is effective in people with lower baseline DA, why is the effect of MA on eta and not IT?

      Thank you for raising this important point. It is correct that the primary difference between the low and high baseline performance groups in the placebo session lies in the inverse temperature (mean(SD); low baseline performance: 2.07 (0.11) vs. high baseline performance: 2.95 (0.07); t(46) = -5.79, p = 5.8442e-07, d = 1.37). However, there is also a significant difference in the eta parameter between these groups during the placebo session (low baseline performance: 0.33 (0.02) vs. baseline performance: 2.07 (0.11243) vs. high baseline performance: 0.25 (0.02); t(46) = 2.59, p = 0.01, d = 0.53).

      Interestingly, the difference in eta is resolved by MA (mean(SD); low baseline performance: 0.24 (0.02) vs. high baseline performance: 0.23 (0.02); t(46) = 0.39, p = 0.70, d = 0.08), while the difference in inverse temperature remains unaffected (mean(SD); low baseline performance: 2.16 (0.11) vs. high baseline performance: 2.99 (0.08); t(46) = -5.38, p < .001, d = 1.29). Moreover, we checked the distribution of the inverse temperature estimates on/offdrug to ensure the absent drug effect is not driven by outliers. Here, we do not observe any descriptive drug effect (see Author response image 1). Additionally, non-parametric tests indicate no drug effect (Wilcoxon signed-rank test; across groups: zval = -0.59; p = 0.55; low baseline performance: zval = -0.54; p = 0.58; high baseline performance: zval = -0.21; p = 0.83).

      Author response image 1.

      Inverse temperature distribution on/off drug suggest that this parameter is not affected by the drug. Inverse temperature for low (blue points) and high (yellow points) baseline performer tended to be not affected by the drug effect (Wilcoxon signed-rank test; across groups: zval = -0.59; p = 0.55; low baseline performance: zval = -0.54; p = 0.58; high baseline performance: zval = -0.21; p = 0.83).

      This pattern of results might suggests that MA specifically affects eta but not other parameters like the inverse temperature, pointing to a selective influence on a single computational mechanism. To verify this conclusion, we extended the winning model by allowing each parameter in turn to be differentially estimated for MA and placebo, while keeping other parameters fixed to the group (low and high baseline performance) mean estimates of the winning model fit to chocie behaviour of the placebo session.

      These control analyses confirmed that MA does not affect inverse temperature in either the low baseline performance group or the high baseline performance group. Similarly, MA did not affect the play bias or learning rate intercept parameter. Yet, it did affect eta in the low performer group (see supplementary table 1 reproduced below).

      Taken together, our data suggest that only the parameter controlling dynamic adjustments of the learning rate based on recent prediction errors, eta, was affected by our pharmacological manipulation and that the paremeters of our models did not trade off. A similar effect has been observed in a previous study investigating the effects of catecholaminergic drug administration in a probabilistic reversal learning task (Rostami Kandroodi et al., 2021). In that study, the authors demonstrated that methylphenidate influenced the inverse learning rate parameter as a function of working memory span, assessed through a baseline cognitive task. Similar to our findings, they did not observe drug effects on other parameters in their model including the inverse temperature.

      We have updated the section of the manuscript where we discuss the difference in inverse temperature between low and high performers in the task. From the manuscript (page 19, line 13):

      “While eta seemed to account for the differences in the effects of MA on performance in our low and high performance groups, it did not fully explain all performance differences across the two groups (see Figure 1C and Figure 7A/B). When comparing other model parameters between low and high baseline performers across drug sessions, we found that high baseline performers displayed higher overall inverse temperatures (2.97(0.05) vs. 2.11 (0.08); t(93) = 7.94, p < .001, d = 1.33). This suggests that high baseline performers displayed higher transfer of stimulus values to actions leading to better performance (as also indicated by the positive contribution of this parameter to overall performance in the GLM). Moreover, they tended to show a reduced play bias (-0.01 (0.01) vs. 0.04 (0.03); t(93) = -1.77, p = 0.08, d = 0.26) and increased intercepts in their learning rate term (-2.38 (0.364) vs. -6.48 (0.70); t(93) = 5.03, p < .001, d = 0.76). Both of these parameters have been associated with overall performance (see Figure 6A). Thus, overall performance difference between high and low baseline performers can be attributed to differences in model parameters other than eta. However, as described in the previous paragraph, differential effects of MA on performance on the two groups were driven by eta.

      This pattern of results suggests that MA specifically affects the eta parameter while leaving other parameters, such as the inverse temperature, unaffected. This points to a selective influence on a single computational mechanism. To verify this conclusion, we extended the winning model by allowing each parameter, in turn, to be differentially estimated for MA and PL, while keeping the other parameters fixed at the group (low and high baseline performance) mean estimates of the winning model for the placebo session. These control analyses confirmed that MA affects only the eta parameter in the low-performer group and that there is no parameter-trade off in our model (see Supplementary Table 1). A similar effect was observed in a previous study investigating the effects of catecholaminergic drug administration on a probabilistic reversal learning task (Rostami Kandroodi et al., 2021). In that study, methylphenidate was shown to influence the inverse learning rate parameter (i.e., decay factor for previous payoffs) as a function of working memory span, assessed through a baseline cognitive task. Consistent with our findings, no drug effects were observed on other parameters in their model, including the inverse temperature.”

      Additionally, we summarized the results in a supplementary table:

      Also, this parameter is noted as temperature but appears to be inverse temperature as higher values are related to better performance. The exact model for the choice function is not described in the methods.

      We thank the reviewer for bringing this to our attention. The reviewer is correct that we intended to refer to the inverse temperature. We have corrected this mistake throughout the manuscript and added information about the choice function to the methods section.

      From the manuscript (page 37, line 3):

      On each trial, this value term was transferred into a “biased” value term (𝑉<sub>𝐵</sub>(𝑋<sub>𝑡</sub>) = 𝐵<sub>𝑝𝑙𝑎𝑦</sub> + 𝑄<sub>𝑡</sub>(𝑋<sub>𝑡</sub>), where 𝐵<sub>𝑝𝑙𝑎𝑦</sub> is the play bias term) and converted into action probabilities (P(play|(𝑉<sub>𝐵 play</sub>(𝑡)(𝑋<sub>𝑡</sub>); P(pass|𝑉<sub>𝐵 pass</sub>(𝑡)(𝑋<sub>𝑡</sub>)) using a softmax function with an inverse temperature (𝛽):

      Reviewer #1 (Recommendations for the authors):

      (1) Given that the task was quite long (700+ trials), were there any fatigue effects or changes in behavior over the course of the task?

      To address the reviewer comment, we regressed each participant single-trial log-scaled RT and accuracy (binary variable reflecting whether a participant displayed stimulus-appropriate behavior on each trial) onto the trial number as a proxy of time on task. Individual participants’ t-values for the time on task regressor were then tested on group level via two-sided t-tests against zero and compared across sessions and baseline performance groups. The results of these two regression models are shown in the supplementary table 2 and raw data splits in supplementary figure S7. Results demonstrate that the choice behavior was not systematically affected over the course of the task. This effect was not different between low and high baseline performers and not affected by the drug. In contrast, participants’ reaction time decreased over the course of the task and this speeding was enhanced by MA, particularly in the low performance group.

      We added the following section to the supplementary materials and refer to this information in the task description section of the manuscript (page 35, line 26):

      “Time-on-Task Effects

      Given the length of our task, we investigated whether fatigue effects or changes in behavior occurred over time. Specifically, we regressed each participant's single-trial log-scaled reaction times (RT) and accuracy (a binary variable reflecting whether participants displayed stimulus-appropriate behavior on each trial) onto trial number, which served as a proxy for time on task. The resulting t-values for the time-on-task regressor were analyzed at the group level using two-sided t-tests against zero and compared across sessions and baseline performance groups. The results of these regression models are presented in Supplementary Table S2, with raw data splits shown in Supplementary Figure S3.

      Our findings indicate that choice behavior was not systematically affected over the course of the task. This effect did not differ between low and high baseline performers and was not influenced by the drug. In contrast, reaction times decreased over the course of the task, with this speeding effect being enhanced by MA, particularly in the low-performance group.”

      (2) Figure 5J is hard to understand given the lack of axis labels on some of the plots. Also, the scatter plot is on the left, not the right, as stated in the legend.

      We agree that this part of the figure was difficult to understand. To address this issue, we have separated it from Figure 5, added axis labels for clarity, and reworked the figure caption.

      (3) The data and code were not available for review.

      Thank you for pointing this out. The data and code are now made publicly available on GitHub: https://github.com/HansKirschner/REFIT_Chicago_public.git

      We updated the respective section in the manuscript:

      Data Availability Statement All raw data and analysis scripts can be accessed at: https://github.com/HansKirschner/REFIT_Chicago_public.git

      Reviewer #2 (Public review):

      Summary:

      Kirschner and colleagues test whether methamphetamine (MA) alters learning rate dynamics in a validated reversal learning task. They find evidence that MA can enhance performance for low-performers and that the enhancement reflects a reduction in the degree to which these low-performers dynamically up-regulate their learning rates when they encounter unexpected outcomes. The net effect is that poor performers show more volatile learning rates (e.g. jumping up when they receive misleading feedback), when the environment is actually stable, undermining their performance over trials.

      Strengths:

      The study has multiple strengths including large sample size, placebo control, double-blind randomized design, and rigorous computational modeling of a validated task.

      Weaknesses:

      The limitations, which are acknowledged, include that the drug they use, methamphetamine, can influence multiple neuromodulatory systems including catecholamines and acetylcholine, all of which have been implicated in learning rate dynamics. They also do not have any independent measures of any of these systems, so it is impossible to know which is having an effect.

      Another limitation that the authors should acknowledge is that the fact that participants were aware of having different experiences in the drug sessions means that their blinding was effectively single-blind (to the experimenters) and not double-blind. Relatedly, it is difficult to know whether subjective effects of drugs (e.g. arousal, mood, etc.) might have driven differences in attention, causing performance enhancements in the low-performing group. Do the authors have measures of these subjective effects that they could include as covariates of no interest in their analyses?

      We thank the reviewer for highlighting this complex issue. ‘Double blind’ may refer to masking the identity of the drug before administration, or to the subjects’ stated identifications after any effects have been experienced. In our study, the participants were told that they might receive a stimulant, sedative or placebo on any session, so before the sessions their expectations were blinded. After receiving the drug, most participants reported feeling stimulant-like effects on the drug session, but not all of them correctly identified the substance as a stimulant. We note that many subjects identified placebo as ‘sedative’. The Author response image 2 indicates how the participants identified the substance they received.

      Author response image 2.

      Substance identification.

      We share the reviewer’s interest in the extent to which mood effects of drugs are correlated with the drugs’ other effects, including cognitive function. To address this in the present study, we compared the subjective responses to the drug in participants who were low- or highperformers at baseline on the task. The low- and high baseline performers did not differ in their subjective drug effects, including ‘feel drug’ or stimulant-like effects (see Figure 1 from the mansucript reproduced below; peak change from baseline scores for feel drug ratings ondrug: low baseline performer: 48.36(4.29) vs. high baseline performer: 47.21 (4.44); t(91) = 0.18, p = 0.85, d = 0.03; ARCI-A score: low baseline performer: 4.87 (0.43) vs. high baseline performer: 4.00 (0.418); t(91) = 1.43, p = 0.15, d = 0.30). Moreover, task performance in the drug session was not correlated with the subjective effects (peak “feel drug” effect: r(94) = 0.09, p = 0.41; peak “stimulant like” effect: r(94) = -0.18, p = 0.07).

      We have added details of these additional analyses to the manuscript. Since there were no significant differences in subjective drug effects between low- and high-baseline performers, and these effects were not systematically associated with task performance, we did not include these measurements as covariates in our analyses. Furthermore, as both subjective measurements indicate a similar pattern, we have chosen not to report the ARCI-A effects in the manuscript.

      From the manuscript (page 6, line 5ff):

      “Subjective drug effects MA administration significantly increased ‘feel drug effect’ ratings compared to PL, at 30, 50, 135, 180, and 210 min post-capsule administration (see Figure 1; Drug x Time interaction F(5,555) = 38.46, p < 0.001). In the MA session, no differences in the ‘feel drug effect’ were observed between low and high baseline performer, including peak change-from-baseline ratings (rating at 50 min post-capsule: low baseline performer: 48.36(4.29) vs. high baseline performer: 47.21 (4.44); t(91) = 0.18, p = 0.85, d = 0.03; rating at 135 min post-capsule: low baseline performer: 37.27 (4.15) vs. high baseline performer: 45.38 (3.84); t(91) = 1.42, p = 0.15, d = 0.29).”

      Reviewer #2 (Recommendations for the authors):

      I was also concerned about the distinctions between the low- and high-performing groups. It is unclear why, except for simplicity of presentation, they chose to binarize the sample into high and low performers. I would like to know if the effects held up if they analyzed interactions with individual differences in performance and not just a binarized high/low group membership. If the individual difference interactions do not hold up, I would like to know the authors' thoughts on why they do not.

      Thank you for raising this important issue. We chose a binary discretization of baseline performance to simplify the analysis and presentation. However, we acknowledge that this simplification may limit the interpretability of the results.

      To address the reviewer’s concern, we conducted additional linear mixed-effects model (LMM) analyses, focusing on the key findings reported in the manuscript. See supplementary materials section “Linear mixed effects model analyses for key findings”

      From the manuscript (page 30, line 4ff):

      “Methamphetamine performance enhancement depends on initial task performance<br /> Another key finding of the current study is that the benefits of MA on performance depend on the baseline task performance. Specifically, we found that MA selectively improved performance in participants that performed poorly in the baseline session. However, it should be noted, that all the drug x baseline performance interactions, including for the key computational eta parameter did not reach the statistical threshold, and only tended towards significance. We used a binary discretization of baseline performance to simplify the analysis and presentation. To parse out the relationship between methamphetamine effects and baseline performance into finer level of detail, we conducted additional linear mixed-effects model (LMM) analyses using a sliding window regression approach (see supplementary results and supplementary figure S4 and S5). A key thing to notice in the sliding regression results is that, while each regression reveals that drug effects depend on baseline performance, they do so non-linearly, with most variables of interest showing a saturating effect at low baseline performance levels and the strongest slope (dependence on baseline) at or near the median level of baseline performance, explaining why our median splits were able to successfully pick up on these baseline-dependent effects. Together, these results suggest that methamphetamine primarily affects moderately low baseline performer. It is noteworthy to highlight again that we had a separate baseline measurement from the placebo session, allowing us to investigate baseline-dependent changes while avoiding typical concerns in such analyses like regression to the mean (Barnett et al., 2004). This design enhances the robustness of our baseline-dependent effects.”

      See supplementary materials section “Linear mixed effects model analyses for key findings”

      Perhaps relatedly, in multiple analyses, the authors point out that there are drug effects for the low-performance group, but not the high-performance group. This could reflect the well-documented baseline-dependency effect of catecholamergic drugs. However, it might also reflect the fact that the high-performance group is closer to their ceiling. So, a performance-enhancement drug might not have any room to make them better. Note that their results are not consistent with inverted-U-like effects, previously described, where high performers actually get worse on catecholaminergic drugs.

      Given that the authors have the capacity to simulate performance as a function of parameter values, they could specifically simulate how much better performance could get if their high-performance group all moved proportionally closer to optimal levels of the parameter eta. On the basis of that analysis do they have any evidence that they had the power to detect an effect in the high performance group? If not, they should just acknowledge that ceiling effects might have played a role for high performers.

      We agree with the reviewer's interpretation of the results. First, when plotting overall task performance and the probability of correct choices in the high outcome noise condition—the condition where we observe the strongest drug-induced performance enhancement—we find minimal performance variation among high baseline performers. In both testing sessions, high baseline performers cluster around optimal performance, with little evidence of drug-induced changes (see Supplementary Figure 6).

      Furthermore, performance simulations using (a) optimal eta values and (b) observed eta values from the high baseline performance group reveal only a small, non-significant performance difference (points optimal eta: 701.91 (21.66) vs. points high performer: 694.47 (21.71); t(46) = 2.84, p = 0.07, d = 0.059).

      These results suggest that high baseline performers are already near optimal performance, limiting the potential for drug-related performance improvements. We have incorporated this information into the manuscript (page 30, line 24ff).

      “It is important to note, that MA did not bring performance of low baseline performers to the level of performance of high baseline performers. We speculate that high performers gained a good representation of the task structure during the orientation practice session, taking specific features of the task into account (change point probabilities, noise in the reward probabilities). This is reflected in a large signal to noise ratio between real reversals and misleading feedback. Because the high performers already perform the task at a near-optimal level, MA may not further enhance performance (see Supplementary Figure S6 for additional evidence for this claim). Intriguingly, the data do not support an inverted-u-shaped effect of catecholaminergic action (Durstewitz & Seamans, 2008; Goschke & Bolte, 2018) given that performance of high performers did not decrease with MA. One could speculate that catecholamines are not the only factor determining eta and performance. Perhaps high performers have a generally more robust/resilient decision-making system which cannot be perturbed easily. Probably one would need even higher doses of MA (with higher side effects) to impair their performance.”

      Finally, I am confused about why participants are choosing correctly at higher than 50% on the first trial after a reversal (see Figure 3)? How could that be right? If it is not, does this mean that there is a pervasive error in the analysis pipeline?

      Thank you for pointing this out. The observed pattern is an artifact of the smoothing (±2 trials) applied to the learning curves in Figure 3. Below, we reproduce the figure without smoothing.

      Additionally, we confirm that the probability of choosing the correct response is not above chance level (t-test against chance): • All reversals: t(93)=1.64,p=0.10,d=0.17, 99% CI[0.49,0.55] • Reversal to low outcome noise: t(93)=1.67,p=0.10,d=0.17, 99% CI [0.49,0.56] • Reversal to high outcome noise: t(93)=0.87,p=0.38,d=0.09, 99% CI [0.47,0.56]

      We have amended the caption of Figure 3 accordingly. Moreover, we included an additional figure in this revision letter (Author response image 4) showing a clear performance drop to approximately 50% correct choices across all sessions, indicating random-choice behavior at the point of reversal. Notably, this performance is slightly better than expected (i.e., the inverse of pre-reversal performance). One possible explanation is that participants developed an expectation of the reversal, leading to increased reversal behaviour around reversals.

      Author response image 3.

      Learning curves after reversals suggest that methamphetamine improves learning performance in phases of less predictable reward contingencies in low baseline performer. Top panel of the Figure shows learning curves after all reversals (A), reversals to stimuli with less predictable reward contingencies (B), and reversals to stimuli with high reward probability certainty (C). Bottom panel displays the learning curves stratified by baseline performance for all reversals (D), reversals to stimuli with less predictable reward probabilities (E), and reversals to stimuli with high reward probability certainty (F). Vertical black lines divide learning into early and late stages as suggested by the Bai-Perron multiple break point test. Results suggest no clear differences in the initial learning between MA and PL. However, learning curves diverged later in the learning, particular for stimuli with less predictable rewards (B) and in subjects with low baseline performance (E). Note. PL = Placebo; MA = methamphetamine; Mean/SEM = line/shading.

      Author response image 4.

      Adaptive behavior following reversals. Each graph shows participants' performance (i.e., stimulus-appropriate behavior: playing good stimuli with 70/80% reward probability and passing on bad stimuli with 20/30% reward probability) around reversals for the (A) orientation session, (B) placebo session, and (C) methamphetamine session. Trial 0 corresponds to the trial when reversals occurred, unbeknownst to participants. Participants' performance exhibited a fast initial adaptation to reversals, followed by a slower, late-stage adjustment to the new stimulus-reward contingencies, eventually reaching a performance plateau. Notably, we observe a clear performance drop to approximately 50% correct choices across all sessions, indicating random-choice behavior at the point of reversal. This performance is slightly better than expected (i.e., the inverse of pre-reversal performance). One possible explanation is that participants developed an expectation of the reversal, leading to increased reversal behaviour around reversals.

      Minor comments:

      (1) I'm unclear on what the analysis in 6E tells us. What does it mean that the marginal effect of eta on performance predicts changes in performance? Also, if multiple parameters besides eta (e.g. learning rate) are strongly related to actual performance, why should it be that only marginal adjustments to eta in the model anticipate actual performance improvements when marginal adjustments to other model parameters do not?

      We agree that these simulations are somewhat difficult to interpret and have therefore decided to omit these analyses from the manuscript. Our key point was that individuals who benefited the most from methamphetamine were those who exhibited the most advantageous eta adjustments in response to it. We believe this is effectively illustrated by the example individual shown in Figure 8D.

      (2) Does the vertical black line in Figure 1 show when the tasks were completed, as it says in the caption, or when the task starts, as it indicates in the figure itself?

      Apologies for the confusion. There was a mistake in the figure caption—the vertical line indicates the time when the task started (60 minutes post-capsule intake). We have corrected this in the figure caption.

      (3) The marginally significant drug x baseline performance group interaction does not support strong inferences about differences in drug effects on eta between groups...

      We agree and have added information on this limitation to the Discussion. Additionally, we have addressed the complex relationship between drug effects and baseline performance in the supplementary analyses, as detailed in our previous response regarding the binary discretization of baseline performance.

      (4) Should lines 10-11 on page 12 say "We did not find drug-related differences in any other model parameters..."?

      Thank you for bringing this grammatical error to our attention. We have corrected it.

      (5) It would be good to confirm that the effect of MA on p(Correct after single MFB) does not have an opposite sign from the effect of MA on p(Correct after double MFB). I'm guessing the effect after single is just weak, but it would be good to confirm they are in the same direction so that we can be confident the result is not picking up on spurious relationships after two misleading instances of feedback.

      We confirm that the direction of the effect between eta and p(Correct after single MFB) is similar to p(Correct after double MFB). First, we see a similar negative association between p(Correct after single MFB) and eta (r(94) = -.26, p = 0.01). Similarly there was a descriptive increase in p(Correct after single MFB) for low baseline performer on- vs. off-drug ( p(Correct after single MFB): low baseline performance PL: 0.71 (0.02) vs. low baseline performance MA: 0.73 (0.02); t(46) = 1.27, p = 0.20, d = 0.17).

      (6) "implemented equipped" seems like a typo on page 16, line 26

      Thank you for bringing this typo to our attention. We have corrected it.

      Reviewing Editor (Public Review):

      Summary:

      In this well-written paper, a pharmacological experiment is described in which a large group of volunteers is tested on a novel probabilistic reversal learning task with different levels of noise, once after intake of methamphetamine and once after intake of placebo. The design includes a separate baseline session, during which performance is measured. The key result is that drug effects on learning rate variability depend on performance in this separate baseline session.

      The approach and research question are important, the results will have an impact, and the study is executed according to current standards in the field. Strengths include the interventional pharmacological design, the large sample size, the computational modeling, and the use of a reversal-learning task with different levels of noise.

      (i) One novel and valuable feature of the task is the variation of noise (having 70-30 and 8020 conditions). This nice feature is currently not fully exploited in the modeling of the task and the data. For example, recently reported new modeling approaches for disentangling two types of uncertainty (stochasticity vs volatility) could be usefully leveraged here (by Piray and Daw, 2021, Nat Comm). The current 'signal to noise ratio' analysis that is targeting this issue relies on separately assessing learning rates on true reversals and learning rates after misleading feedback, in a way that is experimenter-driven. As a result, this analysis cannot capture a latent characteristic of the subject's computational capacity.

      We thank the reviewing editor for the positive evaluation of our work and the suggestion to leverage new modeling approaches. In the light of the Piray/Daw paper, it is noteworthy, that the choice behavior of the low performance group in our sample mimics the behavior of their lesioned model, in which stochasticity is assumed to be small and constant. Specifically, low performers displayed higher learning rates, particularly in high outcome noise phases in our task. One possible interpretation of this choice pattern is that they have problems to distinguish volatility and noise. Consistently, surprising outcomes may get misattributed to volatility instead of stochasticity resulting in increased learning rates and overadjustments to misleading outcomes. This issue particularly surfaces in phases of high stochasticity in our task. Interestingly, methamphetamine seems to reduce this misattribution. In an exploratory analysis, we fit two models to our task structure using modified code provided by the Piray and Daw paper. The control model made inference about both the volatility and stochasticity. A key assumption of the model is, that the optimal learning rate increases with volatility and decreases with stochasticity. This is because greater volatility raises the likelihood that the underlying reward probability has changed since the last observation, increasing the necessity of relying on new information. In contrast, higher stochasticity reduces the relative informativeness of the new observation compared to prior beliefs about the underlying reward probability. The lesioned model assumed stochasticity to be small and constant. We show the results of this analyses in Figure 9 and Supplementary Figure S5 and S6. Interestingly, we found that the inability to make inference about stochasticity leads to misestimation of volatility, particularly for high outcome noise phases (Figure 9A-B). Consistently, this led to reduced sensitivity of the learning rate to volatility (i.e., the first ten trials after reversals). The model shows similar behaviour to our low performer group, with reduced accuracy in later learnings stages for stimuli with high outcome noise (Figure 9D). Finally, when we fit simulated data from the two models to our model, we see increased eta parameter estimates for the lesioned model. Together, these results may hint towards an overinterpretation of stochasticity in low performers of our task and that methamphetamine has beneficial effects for those individuals as it reduced the oversensitivity to volatility. It should be noted however, that we did not fit these models to our choice behaviour directly as this implementation is beyond the scope of our current study. Yet, our exploratory analyses make testable predictions for future research into the effect of catecholamines on the inference of volatility and stochasticity.

      We incorporated information on these explorative analyses to the manuscript and supplementary material.

      Form the result section (page 23, line 12ff):

      “Methamphetamine may reduce misinterpretation of high outcome noise in low performers

      In our task, outcomes are influenced by two distinct sources of noise: process noise (volatility) and outcome noise (stochasticity). Optimal learning rate should increase with volatility and decrease with stochasticity. Volatility was fairly constant in our task (change points around every 30-35 trials). However, misleading feedback (i.e., outcome noise) could be misinterpreted as indicating another change point because participants don’t know the volatility beforehand. Strongly overinterpreting outcome noise as change points will hinder building a correct estimate of volatility and understanding the true structure of the task. Simultaneously estimating volatility and stochasticity poses a challenge, as both contribute to greater outcome variance, making outcomes more surprising. A critical distinction, however, lies in their impact on generated outcomes: volatility increases the autocorrelation between consecutive outcomes, whereas stochasticity reduces it. Recent computational approaches have successfully utilised this fundamental difference to formulate a model of learning based on the joint estimation of stochasticity and volatility (Piray & Daw, 2021; Piray & Daw, 2024). They report evidence that humans successfully dissociate between volatility and stochasticity with contrasting and adaptive effects on learning rates, albeit to varying degrees. Interestingly they show that hypersensitivity to outcome noise, often observed in anxiety disorders, might arise from a misattribution of the outcome noise to volatility instead of stochasticity resulting in increased learning rates and overadjustments to misleading outcomes. It is noteworthy, that we observed a similar hypersensitivity to high outcome noise in low performers in our task that is partly reduced by MA. In an exploratory analysis, we fit two models to our task structure using modified code provided by Piray and Daw (2021) (see Methods for formal Description of the model). The control model inferred both the volatility and stochasticity. The lesioned model assumed stochasticity to be small and constant. We show the results of this analyses in Figure 9 and Supplementary Figure S7 and S8). We found that the inability to make inference about stochasticity, leads to misestimation of volatility, particularly for high outcome noise phases (Figure 9A-B). Consistently, this led to reduced sensitivity of the learning rate to volatility (i.e., the first ten trials after reversals). The model shows similar behaviour to our low performer group, with reduced accuracy in later learning stages for stimuli with high outcome noise (Figure 9D). Finally, when we fit simulated data from the two models to our model, we see increased eta parameter estimates for the lesioned model. Together, these results may hint towards an overinterpretation of stochasticity in low performer of our task and that MA has beneficial effects for those individuals as it reduced the oversensitivity to volatility. It should be noted however, that we did not fit these models to our choice behaviour directly as this implementation is beyond the scope of our current study. Yet, our exploratory analyses make testable predictions for future research into the effect of catecholamines on the inference of volatility and stochasticity.”

      From the discussion (page 28, line 15ff):

      “Exploratory simulation studies using a model that jointly estimates stochasticity and volatility (Piray & Daw, 2021; Piray & Daw, 2024), revealed that MA might reduce the oversensitivity to volatility.”

      See methods section “Description of the joint estimation of stochasticity and volatility model “

      (ii) An important caveat is that all the drug x baseline performance interactions, including for the key computational eta parameter did not reach the statistical threshold, and only tended towards significance.

      We agree and have added additional analyses on the issue. See also our response to reviewer 2. There is a consistent effect for low-medium baseline performance. We toned done the reference to low baseline performance but still see strong evidence for a baseline dependency of the drug effect.

      From the manuscript (page 30, line 4ff):

      “Methamphetamine performance enhancement depends on initial task performance<br /> Another key finding of the current study is that the benefits of MA on performance depend on the baseline task performance. Specifically, we found that MA selectively improved performance in participants that performed poorly in the baseline session. However, it should be noted, that all the drug x baseline performance interactions, including for the key computational eta parameter did not reach the statistical threshold, and only tended towards significance. We used a binary discretization of baseline performance to simplify the analysis and presentation. To parse out the relationship between methamphetamine effects and baseline performance into finer level of detail, we conducted additional linear mixed-effects model (LMM) analyses using a sliding window regression approach (see supplementary results and supplementary figure S4 and S5). A key thing to notice in the sliding regression results is that, while each regression reveals that drug effects depend on baseline performance, they do so non-linearly, with most variables of interest showing a saturating effect at low baseline performance levels and the strongest slope (dependence on baseline) at or near the median level of baseline performance, explaining why our median splits were able to successfully pick up on these baseline-dependent effects. Together, these results suggest that methamphetamine primarily affects moderately low baseline performer. It is noteworthy to highlight again that we had a separate baseline measurement from the placebo session, allowing us to investigate baseline-dependent changes while avoiding typical concerns in such analyses like regression to the mean (Barnett et al., 2004). This design enhances the robustness of our baseline-dependent effects.”

      (iii) Both the overlap and the differences between the current study and previous relevant work (that is, how this goes beyond prior studies in particular Rostami Kandroodi et al, which also assessed effects of catecholaminergic drug administration as a function of baseline task performance using a probabilistic reversal learning task) are not made explicit, particularly in the introduction.

      Thank you for raising this point. We have added information of the overlap and differences between our paper and the Rostami Kondoodi et al paper to the introduction and disscussion.

      In the intoduction we added a sentence to higlight the Kondoordi findings (page 3, line 24ff).

      For example, Rostami Kandroodi et al. (2021) reported that the re-uptake blocker methylphenidate did not alter reversal learning overall, but preferentially improved performance in participants with higher working memory capacity.”

      In our Discussion, we go back to this paper, and say how our findings are and are not consistent with their findings (page 32, line 16ff).

      Our findings can be contrasted to those of Rostami Kandroodi et al. (2021), who examined effects of methylphenidate on a reversal learning task, in relation to baseline differences on a cognitive task. Whereas Rostami Kandroodi et al. (2021) found that the methylphenidate improved performance mainly in participants with higher baseline working memory performance, we found that methamphetamine improved the ability to dynamically adjust learning from prediction errors to a greater extent in participants who performed poorly-tomedium at baseline. There are several possible reasons for these apparently different findings. First, MA and methylphenidate differ in their primary mechanisms of action: MPH acts mainly as a reuptake blocker whereas MA increases synaptic levels of catecholamines by inhibiting the vesicular monoamine transporter 2 (VMAT2) and inhibiting the enzyme monoamine oxidase (MAO). These differences in action could account for differential effects on cognitive tasks. Second, the tasks used by Rostami Kandroodi et al. (2021) and the present study differ in several ways. The Rostami Kandroodi et al. (2021) task assessed responses to a single reversal event during the session whereas the present study used repeated reversals with probabilistic outcomes. Third, the measures of baseline function differed in the two studies: Rostami Kandroodi et al. (2021) used a working memory task that was not used in the drug sessions, whereas we used the probabilistic learning task as both the baseline measure and the measure of drug effects. Further research is needed to determine which of these factors influenced the outcomes.”

      performance effects, but this is not true in the general sense, given that an accumulating number of studies have shown that the effects of drugs like MA depend on baseline performance on working memory tasks, which often but certainly not always correlates positively with performance on the task under study.

      We recognize that there is a large body of research reporting that the effects of stimulant drugs are related to baseline performance, and we have adjusted our wording in the Discussion accordingly. At the same time, numerous published studies report acute effects of drugs without considering individual differences in responses, including baseline differences in task performance.

      Reviewing Editor (Recommendations for the Authors):

      (i) To leverage recently reported new modeling approaches for disentangling two types of uncertainty (stochasticity vs volatility) might be usefully leveraged (Piray and Daw, 2021, Nat Comm) to help overcome the shortcomings of the 'signal-to-noise ratio' analysis performed here (learning rates on true reversals minus learning rates after misleading feedback) which is experimenter-driven, and thus cannot capture a latent characteristic of the subject's computational capacity.

      Please see our previous response.

      (ii) To highlight more explicitly the fact that various of the key drug x baseline performance interactions did not reach the statistical threshold.

      Please see our previous responses to this issue.

      (iii) To make more explicit, in the introduction, both the overlap and the differences between the current study and previous relevant work (that is, how this goes beyond prior study in particular Rostami Kandroodi et al, which also assessed effects of catecholaminergic drug administration as a function of baseline task performance using a probabilistic reversal learning task).

      Please see our previous response.

      (iv) To revise and tone down, in the discussion section, the statement about novelty, that the existing literature has, to date, overlooked baseline performance effects.

      Please see our previous response.

      (v) It is unclear why the data from the 4th session (under some other sedative drug, which is not mentioned) are not reported. I recommend justifying the details of this manipulation and the decision to omit the report of those results. By analogy 4 other tasks were administered in the current study, but not described. Is there a protocol paper, describing the full procedure?

      Thank you for pointing this out. We added additional information to the method section. We are analysing the other cognitive measures in relation to the brain imaging data obtained on sessions 3 and 4. Therefore we argue, that these are beyond the scope of the present paper. We did not administer any sedative drug. However, participants were informed during orientation that they might receive a stimulant, sedative, or placebo on any testing session to maintain blinding of their expectations before each session.

      “Design. The results presented here were obtained from the first two sessions of a larger foursession study (clinicaltrials.gov ID number NCT04642820). During the latter two sessions of the larger study, not reported here, participants participated in two fMRI scans. During the two 4-h laboratory sessions presented here, healthy adults received methamphetamine (20 mg oral; MA) or placebo (PL), in mixed order under double-blind conditions. One hour after ingesting the capsule they completed the 30-min reinforcement reversal learning task. The primary comparisons were on acquisition and reversal learning parameters of reinforcement learning after MA vs PL. Secondary measures included subjective and cardiovascular responses to the drug.”

      “Orientation session. Participants attended an initial orientation session to provide informed consent, and to complete personality questionnaires. They were told that the purpose of the study was to investigate the effects of psychoactive drugs on mood, brain, and behavior. To reduce expectancies, they were told that they might receive a placebo, stimulant, or sedative/tranquilizer. However, participants only received methamphetamine and placebo. They agreed not to use any drugs except for their normal amounts of caffeine for 24 hours before and 6 hours following each session. Women who were not on oral contraceptives were tested only during the follicular phase (1-12 days from menstruation) because responses to stimulant drugs are dampened during the luteal phase of the cycle (White et al., 2002). Most participants (N=97 out of 113) completed the reinforcement learning task during the orientation session as a baseline measurement. This measure was added after the study began. Participants who did not complete the baseline measurement were omitted from the analyses presented in the main text. We run the key analyses on the full sample (n=109). This sample included participants who completed the task only on the drug sessions. When controlling for session order and number (two vs. three sessions) effects, we see no drug effect on overall performance and learning. Yet, we found that eta was also reduced under MA in the full sample, which also resulted in reduced variability in the learning rate (see supplementary results for more details).”

      “Drug sessions. The two drug sessions were conducted in a comfortable laboratory environment, from 9 am to 1 pm, at least 72 hours apart. Upon arrival, participants provided breath and urine samples to test for recent alcohol or drug use and pregnancy (CLIAwaived Inc,Carlsbad, CAAlcosensor III, Intoximeters; AimStickPBD, hCG professional, Craig Medical Distribution). Positive tests lead to rescheduling or dismissal from the study. After drug testing, subjects completed baseline mood measures, and heart rate and blood pressure were measured. At 9:30 am they ingested capsules (PL or MA 20 mg, in color-coded capsules) under double-blind conditions. Oral MA (Desoxyn, 5 mg per tablet) was placed in opaque size 00 capsules with dextrose filler. PL capsules contained only dextrose. Subjects completed the reinforcement learning task 60 minutes after capsule ingestion. Drug effects questionnaires were obtained at multiple intervals during the session. They completed other cognitive tasks not reported here. Participants were tested individually and were permitted to relax, read or watch neutral movies when they were not completing study measures.”

      (vi) Some features of the model including the play bias parameter require justification, at least by referring to prior work exploring these features.

      We have added information to justify the features of the model.

      Form the method section:

      “The base model (M1) was a standard Q-learning model with three parameters: (1) an inverse temperature parameter of the softmax function used to convert trial expected values to action probabilities, (2) a play bias term that indicates a tendency to attribute higher value to gambling behavior (Jang et al., 2019), ….

      The two additional learning rate terms—feedback confirmation and modality—were added to the model set, as these factors have been shown to influence learning in similar tasks (Kirschner et al., 2023; Schüller et al., 2020).”

      Literature

      Doucet, A., & Johansen, A. M. (2011). A tutorial on particle filtering and smoothing: fifteen years later. Oxford University Press.

      Durstewitz, D., & Seamans, J. K. (2008). The dual-state theory of prefrontal cortex dopamine function with relevance to catechol-o-methyltransferase genotypes and schizophrenia. Biol Psychiatry, 64(9), 739-749. https://doi.org/10.1016/j.biopsych.2008.05.015

      Gamerman, D., dos Santos, T. R., & Franco, G. C. (2013). A NON-GAUSSIAN FAMILY OF STATE-SPACE MODELS WITH EXACT MARGINAL LIKELIHOOD. Journal of Time Series Analysis, 34(6), 625-645. https://doi.org/https://doi.org/10.1111/jtsa.12039

      Goschke, T., & Bolte, A. (2018). A dynamic perspective on intention, conflict, and volition: Adaptive regulation and emotional modulation of cognitive control dilemmas. In Why people do the things they do: Building on Julius Kuhl’s contributions to the psychology of motivation and volition. (pp. 111-129). Hogrefe. https://doi.org/10.1027/00540-000

      Jang, A. I., Nassar, M. R., Dillon, D. G., & Frank, M. J. (2019). Positive reward prediction errors during decision-making strengthen memory encoding. Nature Human Behaviour, 3(7), 719-732. https://doi.org/10.1038/s41562-019-0597-3

      Jenkins, D. G., & Quintana-Ascencio, P. F. (2020). A solution to minimum sample size for regressions. PLoS One, 15(2), e0229345. https://doi.org/10.1371/journal.pone.0229345

      Kirschner, H., Nassar, M. R., Fischer, A. G., Frodl, T., Meyer-Lotz, G., Froböse, S., Seidenbecher, S., Klein, T. A., & Ullsperger, M. (2023). Transdiagnostic inflexible learning dynamics explain deficits in depression and schizophrenia. Brain, 147(1), 201-214. https://doi.org/10.1093/brain/awad362

      Maris, E., & Oostenveld, R. (2007). Nonparametric statistical testing of EEG- and MEG-data. Journal of Neuroscience Methods, 164(1), 177-190. https://doi.org/https://doi.org/10.1016/j.jneumeth.2007.03.024

      Morean, M. E., de Wit, H., King, A. C., Sofuoglu, M., Rueger, S. Y., & O'Malley, S. S. (2013). The drug effects questionnaire: psychometric support across three drug types. Psychopharmacology (Berl), 227(1), 177-192. https://doi.org/10.1007/s00213-0122954-z

      Murphy, K., & Russell, S. (2001). Rao-Blackwellised particle filtering for dynamic Bayesian networks. In Sequential Monte Carlo methods in practice (pp. 499-515). Springer. Piray, P., & Daw, N. D. (2020). A simple model for learning in volatile environments. PLoS Comput Biol, 16(7), e1007963. https://doi.org/10.1371/journal.pcbi.1007963

      Piray, P., & Daw, N. D. (2021). A model for learning based on the joint estimation of stochasticity and volatility. Nature Communications, 12(1), 6587. https://doi.org/10.1038/s41467-021-26731-9

      Piray, P., & Daw, N. D. (2024). Computational processes of simultaneous learning of stochasticity and volatility in humans. Nat Commun, 15(1), 9073. https://doi.org/10.1038/s41467-024-53459-z

      Rostami Kandroodi, M., Cook, J. L., Swart, J. C., Froböse, M. I., Geurts, D. E. M., Vahabie, A. H., Nili Ahmadabadi, M., Cools, R., & den Ouden, H. E. M. (2021). Effects of methylphenidate on reinforcement learning depend on working memory capacity. Psychopharmacology (Berl), 238(12), 3569-3584. https://doi.org/10.1007/s00213021-05974-w

      Schüller, T., Fischer, A. G., Gruendler, T. O. J., Baldermann, J. C., Huys, D., Ullsperger, M., & Kuhn, J. (2020). Decreased transfer of value to action in Tourette syndrome. Cortex, 126, 39-48. https://doi.org/10.1016/j.cortex.2019.12.027

      West, M. (1987). On scale mixtures of normal distributions. Biometrika, 74(3), 646-648. https://doi.org/10.1093/biomet/74.3.646

      White, T. L., Justice, A. J., & de Wit, H. (2002). Differential subjective effects of Damphetamine by gender, hormone levels and menstrual cycle phase. Pharmacol Biochem Behav, 73(4), 729-741.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Editor’s summary:

      This paper by Castello-Serrano et al. addresses the role of lipid rafts in trafficking in the secretory pathway. By performing carefully controlled experiments with synthetic membrane proteins derived from the transmembrane region of LAT, the authors describe, model and quantify the importance of transmembrane domains in the kinetics of trafficking of a protein through the cell. Their data suggest affinity for ordered domains influences the kinetics of exit from the Golgi. Additional microscopy data suggest that lipid-driven partitioning might segregate Golgi membranes into domains. However, the relationship between the partitioning of the synthetic membrane proteins into ordered domains visualised ex vivo in GPMVs, and the domains in the TGN, remain at best correlative. Additional experiments that relate to the existence and nature of domains at the TGN are necessary to provide a direct connection between the phase partitioning capability of the transmembrane regions of membrane proteins and the sorting potential of this phenomenon.

      The authors have used the RUSH system to study the traffic of model secretory proteins containing single-pass transmembrane domains that confer defined affinities for liquid ordered (lo) phases in Giant Plasma Membrane derived Vesicles (GPMVs), out of the ER and Golgi. A native protein termed LAT partitioned into these lo-domains, unlike a synthetic model protein termed LAT-allL, which had a substituted transmembrane domain. The authors experiments provide support for the idea that ER exit relies on motifs in the cytosolic tails, but that accelerated Golgi exit is correlated with lo domain partitioning.

      Additional experiments provided evidence for segregation of Golgi membranes into coexisting lipid-driven domains that potentially concentrate different proteins. Their inference is that lipid rafts play an important role in Golgi exit. While this is an attractive idea, the experiments described in this manuscript do not provide a convincing argument one way or the other. It does however revive the discussion about the relationship between the potential for phase partitioning and its influence on membrane traffic.

      We thank the editors and scientific reviewers for thorough evaluation of our manuscript and for positive feedback. While we agree that our experimental findings present a correlation between trafficking rates and raft affinity, in our view, the synthetic, minimal nature of the transmembrane protein constructs in question makes a strong argument for involvement of membrane domains in their trafficking. These constructs have no known sorting determinants and are unlikely to interact directly with trafficking proteins in cells, since they contain almost no extramembrane amino acids. Yet, the LATTMD traffics through Golgi similarly to the full-length LAT protein, but quite different from mutants with lower raft phase affinity. We suggest that these observations can be best rationalized by involvement of raft domains in the trafficking fates and rates of these constructs, providing strong evidence (beyond a simple correlation) for the existence and relevance of such domains.

      We have substantially revised the manuscript to address all reviewer comments, including several new experiments and analyses. These revisions have substantially improved the manuscript without changing any of the core conclusions and we are pleased to have this version considered as the “version of record” in eLife.

      Below is our point-by-point response to all reviewer comments.

      ER exit:

      The experiments conducted to identify an ER exit motif in the C-terminal domain of LAT are straightforward and convincing. This is also consistent with available literature. The authors should comment on whether the conservation of the putative COPII association motif (detailed in Fig. 2A) is significantly higher than that of other parts of the C-terminal domain.

      Thank you for this suggestion, this information has now been included as Supp Fig 2B. While there are other wellconserved residues of the LAT C-terminus, many regions have relatively low conservation. In contrast, the essential residues of the COPII association motif (P148 and A150) are completely conserved across in LAT across all species analyzed.

      One cause of concern is that addition of a short cytoplasmic domain from LAT is sufficient to drive ER exit, and in its absence the synthetic constructs are all very slow. However, the argument presented that specific lo phase partitioning behaviour of the TMDs do not have a significant effect on exit from the ER is a little confusing. This is related to the choice of the allL-TMD as the 'non-lo domain' partitioning comparator. Previous data has shown that longer TMDs (23+) promote ER export (eg. Munro 91, Munro 95, Sharpe 2005). The mechanism for this is not, to my knowledge, known. One could postulate that it has something to do with the very subject of this manuscript- lipid phase partitioning. If this is the case, then a TMD length of 22 might be a poor choice of comparison. A TMD 17 Ls' long would be a more appropriate 'non-raft' cargo. It would be interesting to see a couple of experiments with a cargo like this.

      The basis for the claim that raft affinity has relatively minor influence on ER exit kinetics, especially in comparison to the effect of the putative COPII interaction motif, is in Fig 1G. We do observe some differences between constructs and they may be related to raft affinity, however we considered these relatively minor compared to the nearly 4-fold increase in ER efflux induced by COPII motifs.

      We have modified the wording in the manuscript to avoid the impression that we have ruled out an effect of raft affinity of ER exit.

      We believe that our observations are broadly consistent with those of Munro and colleagues. In both their work and ours, long TMDs were able to exit the ER. In our experiments, this was true for several proteins with long TMDs, either as fulllength or as TMD-only versions (see Fig 1G). We intentionally did not measure shorter synthetic TMDs because these would not have been comparable with the raft-preferring variants, which all require relatively long TMDs, as demonstrated in our previous work1,2. Thus, because our manuscript does not make any claims about the influence of TMD length on trafficking, we did not feel that experiments with shorter non-raft constructs would substantively influence our conclusions.

      However, to address reviewer interest, we did complete one set of experiments to test the effect of shortening the TMD on ER exit. We truncated the native LAT TMD by removing 6 residues from the C-terminal end of the TMD (LAT-TMDd6aa). This construct exited the ER similarly to all others we measured, revealing that for this set of constructs, short TMDs did not accumulate in the ER. ER exit of the truncated variant was slightly slower than the full-length LAT-TMD, but somewhat faster than the allL-TMD. These effects are consistent with our previous measurements with showed that this shortened construct has slightly lower raft phase partitioning than the LAT-TMD but higher than allL2. While these are interesting observations, a more thorough exploration of the effect of TMD length would be required to make any strong conclusion, so we did not include these data in the final manuscript.

      Author response image 1.

      Golgi exit:

      For the LAT constructs, the kinetics of Golgi exit as shown in Fig. 3B are surprisingly slow. About half of the protein Remains in the Golgi at 1 h after biotin addition. Most secretory cargo proteins would have almost completely exited the Golgi by that time, as illustrated by VSVG in Fig. S3. There is a concern that LAT may have some tendency to linger in the Golgi, presumably due to a factor independent of the transmembrane domain, and therefore cannot be viewed as a good model protein. For kinetic modeling in particular, the existence of such an additional factor would be far from ideal. A valuable control would be to examine the Golgi exit kinetics of at least one additional secretory cargo.

      We disagree that LAT is an unusual protein with respect to Golgi efflux kinetics. In our experiments, Golgi efflux of VSVG was similar to full-length LAT (t1/2 ~ 45 min), and both of these were similar to previously reported values3. Especially for the truncated (i.e. TMD) constructs, it is very unlikely that some factor independent of their TMDs affects Golgi exit, as they contain almost no amino acids outside the membrane-embedded TMD.

      Practically, it has proven somewhat challenging to produce functional RUSH-Golgi constructs. We attempted the experiment suggested by the reviewer by constructing SBP-tagged versions of several model cargo proteins, but all failed to trap in the Golgi. We speculate that the Golgin84 hook is much more sensitive to the location of the SBP on the cargo, being an integral membrane protein rather than the lumenal KDEL-streptavidin hook. This limitation can likely be overcome by engineering the cargo, but we did not feel that another control cargo protein was essential for the conclusions we presented, thus we did not pursue this direction further.

      Comments about the trafficking model

      (1) In Figure 1E, the export of LAT-TMD from the ER is fitted to a single-exponential fit that the authors say is "well described". This is unclear and there is perhaps something more complex going on. It appears that there is an initial lag phase and then similar kinetics after that - perhaps the authors can comment on this?

      This is a good observation. This effect is explainable by the mechanics of the measurement: in Figs 1 and 2, we measure not ‘fraction of protein in ER’ but ‘fraction of cells positive for ER fluorescence’. This is because the very slow ER exit of the TMD-only constructs present a major challenge for live-cell imaging, so ER exit was quantified on a population level, by fixing cells at various time points after biotin addition and quantifying the fraction of cells with observable ER localization (rather than tracking a single cell over time).

      For fitting to the kinetic model (which attempts to describe ‘fraction in ER/Golgi’) we re-measured all constructs by livecell imaging (see Supp Fig 5) to directly quantify relative construct abundance in the ER or Golgi. These data did not have the plateau in Fig 1E, suggesting that this is an artifact of counting “ER positive cells” which would be expected to have a longer lag than “fraction of protein in ER”. Notably however, t1/2 measured by both methods was similar, suggesting that the population measurement agrees well with single-cell live imaging.

      We have included all these explanations and caveats in the manuscript. We have also changed the wording from “well described” to “reasonably approximated”.

      (2) The model for Golgi sorting is also complicated and controversial, and while the authors' intention to not overinterpreting their data in this regard must be respected, this data is in support of the two-phase Golgi export model (Patterson et al PMID:18555781).

      The reviewers are correct, our observations and model are consistent with Patterson et al and it was a major oversight that a reference to this foundational work was not included. We have now added a discussion regarding the “two phase model” of Patterson and Lippincott-Schwartz.

      Furthermore contrary to the statement in lines 200-202, the kinetics of VSVG exit from the Golgi (Fig. S3) are roughly linear and so are NOT consistent with the previous report by Hirschberg et al.

      Regarding kinetics of VSVG, our intention was to claim that the timescale of VSVG efflux from the Golgi was similar to previously reported in Hirschberg, i.e. t1/2 roughly between 30-60 minutes. We have clarified this in the text. Minor differences in the details between our observations and Hirschberg are likely attributable to temperature, as those measurements were done at 32°C for the tsVSVG mutant.

      Moreover, the kinetics of LAT export from the Golgi (Fig. 3B) appear quite different, more closely approximating exponential decay of the signal. These points should be described accurately and discussed.

      Regarding linear versus exponential fits, we agree that the reality of Golgi sorting and efflux is far more complicated than accounted for by either the phenomenological curve fitting in Figs 1-3 or the modeling in Fig 4. In addition to the possibility of lateral domains within Golgi stacks, there is transport between stacks, retrograde traffic, etc. The fits in Figs 1-3 are not intended to model specifics of transport, but rather to be phenomenological descriptors that allowed us to describe efflux kinetics with one parameter (i.e. t1/2). In contrast, the more refined kinetic modeling presented in Figure 4 is designed to test a mechanistic hypothesis (i.e. coexisting membrane domains in Golgi) and describes well the key features of the trafficking data.

      Relationship between membrane traffic and domain partitioning:

      (1) Phase segregation in the GPMV is dictated by thermodynamics given its composition and the measurement temperature (at low temperatures 4degC). However at physiological temperatures (32-37degC) at which membrane trafficking is taking place these GPMVs are not phase separated. Hence it is difficult to argue that a sorting mechanism based solely on the partitioning of the synthetic LAT-TMD constructs into lo domains detected at low temperatures in GPMVs provide a basis (or its lack) for the differential kinetics of traffic of out of the Golgi (or ER). The mechanism in a living cell to form any lipid based sorting platforms naturally requires further elaboration, and by definition cannot resemble the lo domains generated in GPMVs at low temperatures.

      We thank the reviewers for bringing up this important point. GPMVs are a useful tool because they allow direct, quantitative measurements of protein partitioning between coexisting ordered and disordered phases in complex, cell-derived membranes. However, we entirely agree, that GPMVs do not fully represent the native organization of the living cell plasma membrane and we have previously discussed some of the relevant differences4,5. Despite these caveats, many studies have supported the cellular relevance of phase separation in GPMVs and the partitioning of proteins to raft domains therein 6-9. Most notably, elegant experiments from several independent labs have shown that fluorescent lipid analogs that partition to Lo domains in GPMVs also show distinct diffusive behaviors in live cells 6,7, strongly suggesting the presence of nanoscopic Lo domains in live cells. Similarly, our recent collaborative work with the lab of Sarah Veatch showed excellent agreement between raft preference in GPMVs and protein organization in living immune cells imaged by super-resolution microscopy10. Further, several labs6,7, including ours11, have reported nice correlations between raft partitioning in GPMVs and detergent resistance, which is a classical (though controversial) assay for raft association.

      Based on these points, we feel that GPMVs are a useful tool for quantifying protein preference for ordered (raft) membrane domains and that this preference is a useful proxy for the raft-associated behavior of these probes in living cells. We propose that this approach allows us to overcome a major reason for the historical controversy surrounding the raft field: nonquantitative and unreliable methodologies that prevented consistent definition of which proteins are supposed to be present in lipid rafts and why. Our work directly addresses this limitation by relating quantitative raft affinity measurements in a biological membrane with a relevant and measurable cellular outcome, specifically inter-organelle trafficking rates.

      Addressing the point about phase transition temperatures in GPMVs: this is the temperature at which macroscopic domains are observed. Based on physical models of phase separation, it has been proposed that macroscopic phase separation at lower temperatures is consistent sub-microscopic, nanoscale domains at higher temperatures8,12. These smaller domains can potentially be stabilized / functionalized by protein-protein interactions in cells13 that may not be present in GPMVs (e.g. because of lack of ATP).

      (2) The lipid compositions of each of these membranes - PM, ER and Golgi are drastically different. Each is likely to phase separate at different phase transition temperatures (if at all). The transition temperature is probably even lower for Golgi and the ER membranes compared to the PM. Hence, if the reported compositions of these compartments are to be taken at face value, the propensity to form phase separated domains at a physiological temperature will be very low. Are ordered domains even formed at the Golgi at physiological temperatures?

      It is a good point that the membrane compositions and the resulting physical properties (including any potential phase behavior) will be very different in the PM, ER, and Golgi. Whether ordered domains are present in any of these membranes in living cells remains difficult to directly visualize, especially for non-PM membranes which are not easily accessible by probes, are nanoscopic, and have complex morphologies. However, the fact that raft-preferring probes / proteins share some trafficking characteristics, while very similar non-raft mutants behave differently argues that raft affinity plays a role in subcellular traffic.

      (3) The hypothesis of 'lipid rafts' is a very specific idea, related to functional segregation, and the underlying basis for domain formation has been also hotly debated. In this article the authors conflate thermodynamic phase separation mechanisms with the potential formation of functional sorting domains, further adding to the confusion in the literature. To conclude that this segregation is indeed based on lipid environments of varying degrees of lipid order, it would probably be best to look at the heterogeneity of the various membranes directly using probes designed to measure lipid packing, and then look for colocalization of domains of different cargo with these domains.

      This is a very good suggestion, and a direction we are currently following. Unfortunately, due to the dynamic nature and small size of putative lateral membrane domains, combined with the interior of a cell being filled with lipophilic environments that overlay each other, directly imaging domains in organellar membranes with lipid packing probes remains extremely difficult with current technology (or at least available to us). We argue that the TMD probes used in this manuscript are a reasonable alternative, as they are fluorescent probes with validated selectivity for membrane compartments with different physical properties.

      Ultimately, the features of membrane domains suggested by a variety of techniques – i.e. nanometric, dynamic, relatively similar in composition to the surrounding membrane, potentially diverse/heterogeneous – make them inherently difficult to microscopically visualize. This is one reason why we believe studies like ours, which use a natural model system to directly quantify raft-associated behaviors and relate them to cellular effects (in our case, protein sorting), are a useful direction for this field.

      We believe we have been careful in our manuscript to avoid confusing language surrounding lipid rafts, phase separation, etc. Our experiments clearly show that mammalian membranes have the capacity to phase separate, that some proteins preferentially interact with more ordered domains, and that this preference is related to the subcellular trafficking fates and rates of these proteins. We have edited the manuscript to emphasize these claims and avoid the historical controversies and confusions.

      (4) In the super-resolution experiments (by SIM- where the enhancement of resolution is around two fold or less compared to optical), the authors are able to discern a segregation of the two types of Golgi-resident cargo that have different preferences for the lo-domains in GPMVs. It should be noted that TMD-allL and the LATallL end up in the late endosome after exit of the Golgi. Previous work from the Bonafacino laboratory (PMID: 28978644) has shown that proteins (such as M6PR) destined to go to the late endosome bud from a different part of the Golgi in vesicular carriers, while those that are destined for the cell surface first (including TfR) bud with tubular vesicular carriers. Thus at the resolution depicted in Fig 5, the segregation seen by the authors could be due to an alternative explanation, that these molecules are present in different areas of the Golgi for reasons different from phase partitioning. The relatively high colocalization of TfR with the GPI probe in Fig 5E is consistent with this explanation. TfR and GPI prefer different domains in the GPMV assays yet they show a high degree of colocalization and also traffic to the cell surface.

      This is a good point. Even at microscopic resolutions beyond the optical diffraction limit, we cannot make any strong claims that the segregation we observe is due to lateral lipid domains and not several reasonable alternatives, including separation between cisternae (rather than within), cargo vesicles moving between cisternae, or lateral domains that are mediated by protein assemblies rather than lipids. We have explicitly included this point in the Discussion: “Our SIM imaging suggests segregation of raft from nonraft cargo in the Golgi shortly (5 min) after RUSH release (Fig 5B), but at this level of resolution, we can only report reduced colocalization, not intra-Golgi protein distributions. Moreover, segregation within a Golgi cisterna would be very difficult to distinguish from cargo moving between cisternae at different rates or exiting via Golgi-proximal vesicles.”

      We have also added a similar caveat in the Results section of the manuscript: “These observations support the hypothesis that proteins can segregate in Golgi based on their affinity for distinct membrane domains; however, it is important to emphasize that this segregation does not necessarily imply lateral lipid-driven domains within a Golgi cisterna. Reasonable alternative possibilities include separation between cisternae (rather than within), cargo vesicles moving between cisternae, or lateral domains that are mediated by protein assemblies rather than lipids.”

      Finally, while probes with allL TMD do eventually end up in late endosomes (consistent with the Bonifacino lab’s findings which we include), they do so while initially transiting the PM2,11.

      Minor concerns:

      (1) Generally, the quantitation is high quality from difficult experimental data. Although a lot appears to be manual, it appears appropriately performed and interpreted. There are some claims that are made based on this quantitation, however, where there are no statistics performed. For example, figure 1B. Any quantitation with an accompanying conclusion should be subject to a statistical test. I think the quality of the model fits- this is particularly important.

      We appreciate the thoughtful feedback, the quantifications and fits were not trivial, but we believe important. We have added statistical significance to Figure 1B and others where it was missing.

      (2) Modulation of lipid levels in Fig 4E shows a significant change for the trafficking rate for the LAT-TMD construct and a not so significant change for all-TMD construct. However, these data are not convincing and appear to depend on a singular data point that appears to lower the mean value. In general, the experiment with the MZA inhibitor (Fig. 4D-F) is hard to interpret because cells will likely be sick after inhibition of sphingolipid and cholesterol synthesis. Moreover, the difference in effects for LAT-TMD and allL-TMD is marginal.

      We disagree with this interpretation. Fig 4E shows the average of three experiments and demonstrates clearly that the inhibitors change the Golgi efflux rate of LAT-TMD but not allL-TMD. This is summarized in the t1/2 quantifications of Fig 4F, which show a statistically significant change for LAT-TMD but not allL-TMD. This is not an effect of a singular data point, but rather the trend across the dataset.

      Further, the inhibitor conditions were tuned carefully to avoid cells becoming “sick”: at higher concentrations, cells did adopt unusual morphologies and began to detach from the plates. We pursued only lower concentrations, which cells survived for at least 48 hrs and without major morphological changes.

      (3) Line 173: 146-AAPSA-152 should read either 146-AAPSA-150 or 146-AAPSAPA-152, depending on what the authors intended.

      Thanks for the careful reading, we intended the former and it has been fixed.

      (4) What is the actual statistical significance in Fig. 3C and Fig. 3E? There is a single asterisk in each panel of the figure but two asterisks in the legend.

      Apologies, a single asterisk representing p<0.05 was intended. It has been fixed.

      (5) The code used to calculate the model. is not accessible. It is standard practice to host well-annotated code on Github or similar, and it would be good to have this publicly available.

      We have deposited the code on a public repository (doi: 10.5281/zenodo. 10478607) and added a note to the Methods.

      (1) Lorent, J. H. et al. Structural determinants and func7onal consequences of protein affinity for membrane ra=s. Nature communica/ons 8, 1219 (2017).PMC5663905

      (2) Diaz-Rohrer, B. B., Levental, K. R., Simons, K. & Levental, I. Membrane ra= associa7on is a determinant of plasma membrane localiza7on. Proc Natl Acad Sci U S A 111, 8500-8505 (2014).PMC4060687

      (3) Hirschberg, K. et al. Kine7c analysis of secretory protein traffic and characteriza7on of golgi to plasma membrane transport intermediates in living cells. J Cell Biol 143, 1485-1503 (1998).PMC2132993

      (4) Levental, K. R. & Levental, I. Giant plasma membrane vesicles: models for understanding membrane organiza7on. Current topics in membranes 75, 25-57 (2015)

      (5) Sezgin, E. et al. Elucida7ng membrane structure and protein behavior using giant plasma membrane vesicles. Nat Protoc 7, 1042-1051 (2012)

      (6) Komura, N. et al. Ra=-based interac7ons of gangliosides with a GPI-anchored receptor. Nat Chem Biol 12, 402-410 (2016)

      (7) Kinoshita, M. et al. Ra=-based sphingomyelin interac7ons revealed by new fluorescent sphingomyelin analogs. J Cell Biol 216, 1183-1204 (2017).PMC5379944

      (8) Stone, M. B., Shelby, S. A., Nunez, M. F., Wisser, K. & Veatch, S. L. Protein sor7ng by lipid phase-like domains supports emergent signaling func7on in B lymphocyte plasma membranes. eLife 6 (2017).PMC5373823

      (9) Machta, B. B. et al. Condi7ons that Stabilize Membrane Domains Also Antagonize n-Alcohol Anesthesia. Biophys J 111, 537-545 (2016)

      (10) Shelby, S. A., Castello-Serrano, I., Wisser, I., Levental, I. & S., V. Membrane phase separa7on drives protein organiza7on at BCR clusters. Nat Chem Biol in press (2023)

      (11) Diaz-Rohrer, B. et al. Rab3 mediates a pathway for endocy7c sor7ng and plasma membrane recycling of ordered microdomains Proc Natl Acad Sci U S A 120, e2207461120 (2023)

      (12) Veatch, S. L. et al. Cri7cal fluctua7ons in plasma membrane vesicles. ACS Chem Biol 3, 287-293 (2008)

      (13) Wang, H. Y. et al. Coupling of protein condensates to ordered lipid domains determines func7onal membrane organiza7on. Science advances 9, eadf6205 (2023).PMC10132753

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      (1) The main hypothesis/conclusion is summarized in the abstract: "Our study presents an intriguing model of cilia length regulation via controlling IFT speed through the modulation of the size of the IFT complex." The data clearly document the remarkable correlation between IFT velocity and ciliary length in the different cells/tissues/organs analyzed. The experimental test of this idea, i.e., the knock-down of GFP-IFT88, further supports the conclusion but needs to be interpreted more carefully. While IFT particle size and train velocity were reduced in the IFT88 morphants, the number of IFT particles is even more decreased. Thus, the contributions of the reduction in train size and velocity to ciliary length are, in my opinion, not unambiguous. Also, the concept that larger trains move faster, likely because they dock more motors and/or better coordinating kinesin-2 and that faster IFT causes cilia to be longer, is to my knowledge, not further supported by observations in other systems (see below).

      Thank you for your comments. We agree with the reviewer that the final section on IFT train size, velocity, and ciliary length regulation requires additional evidence. The purpose of the knockdown experiments was to investigate the potential relationship between IFT speed and IFT train size. We hypothesize that a deficiency in IFT88 proteins may disrupt the regular assembly of IFT particles, leading to the formation of shorter IFT trains. Indeed, we observed a shorter IFT particles and slight reduction in the transport speed of IFT particles in the morphants. Certainly, it would be more convincing to distinguish these IFT trains through ultrastructural analysis. However, with current techniques, performing such analysis on the zebrafish model will be very difficult due to the limited sample size. In the revised version, we have tempered the conclusions in these sections, as suggested by other reviewers as well.

      (2) I think the manuscript would be strengthened if the IFT frequency would also be analyzed in the five types of cilia. This could be done based on the existing kymographs from the spinning disk videos. As mentioned above, transport frequency in addition to train size and velocity is an important part of estimating the total number of IFT particles, which bind the actual cargoes, entering/moving in cilia.

      Thank you. We have analyzed the entry frequency of IFT in five types of cilia, both anterior and posterior. The analysis indicates that longer cilia also exhibit a higher frequency of fluorescent particles entering the cilia. These results are presented in Figure 3J.

      (3) Here, the variation in IFT velocity in cilia of different lengths within one species is documented - the results document a remarkable correlation between IFT velocity and ciliary length. These data need to be compared to observations from the literature. For example, the velocity of IFT in the quite long (~ 100 um) olfactory cilia of mice is similar to that observed in the rather short cilia of fibroblasts (~0.6 um/s). In Chlamydomonas, IFT velocity is not different in long flagella mutants compared to controls. Probably data are also available for C. elegans or other systems. Discussing these data would provide a broader perspective on the applicability of the model outside of zebrafish.

      Thank you for your suggestions. We believe the most significant novelty of our manuscript is the discovery that IFT velocities are closely related to cilia length in an in vivo model system. Our data suggest that longer cilia may require faster IFT transport to maintain their stable length, powered by larger IFT trains. We did observe substantial variability in IFT velocities across different studies. For example, anterograde IFT transport ranges from 0.2 µm/s in mouse olfactory neurons (Williams et al, 2014) to 0.8 µm/s in 293T cells (See et al, 2016) and 0.4 µm/s in IMCD-3 cells (Broekhuis et al, 2014). Even in NIH-3T3 cells, two studies report significant differences, despite using the same IFT reporters: 0.3 µm/s versus 0.9 µm/s (Kunova Bosakova et al, 2018; Luo et al, 2017). These findings suggest that cell types and culture conditions can influence IFT velocities in vitro, which may not accurately represent in vivo conditions. Interestingly, research on mouse olfactory neurons showed a strong correlation between anterograde and retrograde IFT velocities. Additionally, IFT velocity is closely related to the cell types within the olfactory neuron population, consistent with our results (Williams et al., 2014). 

      Reviewer #2 (Public Review):

      Summary:

      In this study, the authors study intraflagellar transport (IFT) in cilia of diverse organs in zebrafish. They elucidate that IFT88-GFP (an IFT-B core complex protein) can substitute for endogenous IFT88 in promoting ciliogenesis and use it as a reporter to visualize IFT dynamics in living zebrafish embryos. They observe striking differences in cilia lengths and velocity of IFT trains in different cilia types, with smaller cilia lengths correlating with lower IFT speed. They generate several mutants and show that disrupting the function of different kinesin-2 motors and BBSome or altering post-translational modifications of tubulin does not have a significant impact on IFT velocity. They however observe that when the amount of IFT88 is reduced it impacts the cilia length, IFT velocity as well as the number and size of IFT trains. They also show that the IFT train size is slightly smaller in one of the organs with shorter cilia (spinal cord). Based on their observations they propose that IFT velocity determines cilia length and go one step further to propose that IFT velocity is regulated by the size of IFT trains.

      Strengths:

      The main highlight of this study is the direct visualization of IFT dynamics in multiple organs of a living complex multi-cellular organism, zebrafish. The quality of the imaging is really good. Further, the authors have developed phenomenal resources to study IFT in zebrafish which would allow us to explore several mechanisms involved in IFT regulation in future studies. They make some interesting findings in mutants with disrupted function of kinesin-2, BBSome, and tubulin modifying enzymes which are interesting to compare with cilia studies in other model organisms. Also, their observation of a possible link between cilia length and IFT speed is potentially fascinating.

      Weaknesses:

      The manuscript as it stands, has several issues.

      (1) The study does not provide a qualitative description of cilia organization in different cell types, the cilia length variation within the same organ, and IFT dynamics. The methodology is also described minimally and must be detailed with more care such that similar studies can be done in other laboratories.

      Thank you for your comments. We found that cilia length is generally consistent within the same cell types we examined, including those in the pronephric duct, spinal cord, and epidermal cells. However, we observed variability in cilia length within ear crista cilia. Upon comparing IFT velocities, we found no differences among these cilia, further confirming our conclusion that IFT velocity is directly related to cell type rather than cilia length. These new results are presented in Figure S4 of the revised version.

      We apologize for the lack of methodological details in the original manuscript. Following the reviewer's suggestion, we have added a detailed description of the methods used to generate the transgenic line and to perform IFT velocity analysis. These details are included in Figure S2 and are thoroughly described in the methods section of the revised manuscript.

      (2) They provide remarkable new observations for all the mutants. However, discussion regarding what the findings imply and how these observations align (or contradict) with what has been observed in cilia studies in other organisms is incomprehensive.

      Thank you for this suggestion. We initially submitted this paper as a report, which have word limits. We believe the main finding of our work is that IFT velocity is directly associated with cell type, with longer cilia requiring higher velocities to maintain their length. This association of IFT velocity with cell type has also been observed in mouse olfactory neurons(Williams et al., 2014). We have included a discussion of our findings, along with related data published in other organisms, in the revised version.

      (3) The analysis of IFT velocities, the main parameter they compare between experiments, is not described at all. The IFT velocities appear variable in several kymographs (and movies) and are visually difficult to see in shorter cilia. It is unclear how they make sure that the velocity readout is robust. Perhaps, a more automated approach is necessary to obtain more precise velocity estimates.

      Thank you for these comments. To measure the IFT velocities, we first used ImageJ software to generate a kymograph, where moving particles appear as oblique lines. The velocity of these particles can be calculated based on the slope of the lines (Zhou et al, 2001). In the initial version, most of the lines were drawn manually. To eliminate potential artifacts, we also used KymographDirect software to automatically trace the particle paths. The velocities obtained with this method were similar to those calculated manually. These new data are now shown in Figure S2 B-D. For shorter cilia, we only used particles with clear moving paths for our calculations. In the revised version, we have included a detailed description of the velocity analysis methods.

      (4) They claim that IFT speeds are determined by the size of IFT trains, based on their observations in samples with a reduced amount of IFT88. If this was indeed the case, the velocity of a brighter IFT train (larger train) would be higher than the velocity of a dimmer IFT train (smaller train) within the same cilia. This is not apparent from the movies and such a correlation should be verified to make their claim stronger.

      Thank you for these excellent suggestions. We measured the particle size and fluorescence intensity of 3 dpf crista cilia using high-resolution images acquired with Abberior STEDYCON. The results showed a positive correlation between the two. These data have been added to the revised version in Figure 5I, which includes both control and ift88 morphant data.

      (5) They make an even larger claim that the cilia length (and IFT velocity) in different organs is different due to differences in the sizes of IFT trains. This is based on a marginal difference they observe between the cilia of crista and the spinal cord in immunofluorescence experiments (Figure 5C). Inferring that this minor difference is key to the striking difference in cilia length and IFT velocity is incorrect in my opinion.

      Impact:

      Overall, I think this work develops an exciting new multicellular model organism to study IFT mechanisms. Zebrafish is a vertebrate where we can perform genetic modifications with relative ease. This could be an ideal model to study not just the role of IFT in connection with ciliary function but also ciliopathies. Further, from an evolutionary perspective, it is fascinating to compare IFT mechanisms in zebrafish with unicellular protists like Chlamydomonas, simple multicellular organisms like C elegans, and primary mammalian cell cultures. Having said that, the underlying storyline of this study is flawed in my opinion and I would recommend the authors to report the striking findings and methodology in more detail while significantly toning down their proposed hypothesis on ciliary length regulation. Given the technological advancements made in this study, I think it is fine if it is a descriptive manuscript and doesn't necessarily need a breakthrough hypothesis based on preliminary evidence.

      Thanks for with these comments. We agree with this reviewer that more evidences are required to explain why IFT is transported faster in longer cilia. In the revised version, we have modified and softened this section, focusing primarily on the novel findings of IFT velocity differences between cilia of varying lengths.

      Reviewer #3 (Public Review):

      Summary:

      A known feature of cilia in vertebrates and many, if not all, invertebrates is the striking heterogeneity of their lengths among different cell types. The underlying mechanisms, however, remain largely elusive. In the manuscript, the authors addressed this question from the angle of intraflagellar transport (IFT), a cilia-specific bidirectional transportation machinery essential to biogenesis, homeostasis, and functions of cilia, by using zebrafish as a model organism. They conducted a series of experiments and proposed an interesting mechanism. Furthermore, they achieved in situ live imaging of IFT in zebrafish larvae, which is a technical advance in the field.

      Strengths:

      The authors initially demonstrated that ectopically expressed Ift88-GFP through a certain heatshock induction protocol fully sustained the normal development of mutant zebrafish that would otherwise be dead by 7 dpf due to the lack of this critical component of IFT-B complex.

      Accordingly, cilia formations were also fully restored in the tissues examined. By imaging the IFT using Ift88-GFP in the mutant fish as a marker, they unexpectedly found that both anterograde and retrograde velocities of IFT trains varied among cilia of different cell types and appeared to be positively correlated with the length of the cilia.

      For insights into the possible cause(s) of the heterogeneity in IFT velocities, the authors assessed the effects of IFT kinesin Kif3b and Kif17, BBSome, and glycylation or glutamylation of axonemal tubulin on IFT and excluded their contributions. They also used a cilia-localized ATP reporter to exclude the possibility of different ciliary ATP concentrations. When they compared the size of Ift88-GFP puncta in crista cilia, which are long, and spinal cord cilia, which are relatively short, by imaging with a cutting-edge super-resolution microscope, they noticed a positive correlation between the puncta size, which presumably reflected the size of IFT trains, and the length of the cilia.

      Finally, they investigated whether it is the size of IFT trains that dictates the ciliary length. They injected a low dose (0.5 ng/embryo) of ift88 MO and showed that, although such a dosage did not induce the body curvature of the zebrafish larvae, crista cilia were shorter and contained less Ift88-GFP puncta. The particle size was also reduced. These data collectively suggested mildly downregulated expression levels of Ift88-GFP. Surprisingly, they observed significant reductions in both retrograde and anterograde IFT velocities. Therefore, they proposed that longer IFT trains would facilitate faster IFT and result in longer cilia.

      Weaknesses:

      The current manuscript, however, contains serious flaws that markedly limit the credibility of major results and findings. Firstly, important experimental information is frequently missing, including (but not limited to) developmental stages of zebrafish larvae assayed (Figures 1, 3, and 5), how the embryos or larvae were treated to express Ift88-GFP (Figures 3-5), and descriptions on sample sizes and the number of independent experiments or larvae examined in statistical results (Figures 3-5, S3, S6). For instance, although Figure 1B appears to be the standard experimental scheme, the authors provided results from 30-hpf larvae (Figure 3) that, according to Figure 1B, are supposed to neither express Ift88-GFP nor be genotyped because both the first round of heat shock treatment and the genotyping were arranged at 48 hpf. Similarly, the results that ovl larvae containing Tg(hsp70l:ift88 GFP) (again, because the genotype is not disclosed in the manuscript, one can only deduce) display normal body curvature at 2 dpf after the injection of 0.5 ng of ift88 MO (Fig 5D) is quite confusing because the larvae should also have been negative for Ift88-GFP and thus displayed body curvature. Secondly, some inferences are more or less logically flawed. The authors tend to use negative results on specific assays to exclude all possibilities. For instance, the negative results in Figures 4A-B are not sufficient to "suggest that the variability in IFT speeds among different cilia cannot be attributed to the use of different motor proteins" because the authors have not checked dynein-2 and other IFT kinesins. In fact, in their previous publication (Zhao et al., 2012), the authors actually demonstrated that different IFT kinesins have different effects on ciliogenesis and ciliary length in different tissues. Furthermore, instead of also examining cilia affected by Kif3b or Kif17 mutation, they only examined crista cilia, which are not sensitive to the mutations. Similarly, their results in Figures 4C-G only excluded the importance of tubulin glycylation or glutamylation in IFT. Thirdly, the conclusive model is based on certain assumptions, e.g., constant IFT velocities in a given cell type. The authors, however, do not discuss other possibilities.

      Thank you for pointing out the flaws in our experiments. We apologize for any confusion caused by the lack of detail in our descriptions. Regarding Figure 2B, we want to clarify that it depicts the procedure for heat shock experiments conducted for the ovl mutants' rescue assay, not the experimental procedure for IFT imaging. In the revised version, we have included detailed methods on how to induce the expression of Ift88-GFP via heat shock and the subsequent image processing. The procedure for heat induction is also shown in Figure S2A. We have also added the sample sizes for each experiment and descriptions of the statistical tests used in the appropriate sections of the revised version.

      Regarding the comments on the relationship between IFT speed variability and motor proteins, we completely agree with the reviewer. We have revised our description of this part accordingly.

      Lastly, the results shown in Figure 5D are from a wild-type background, not ovl mutants. We aimed to demonstrate that a lower dose of ift88 morpholino (0.5 ng) can partially knock down Ift88, allowing embryos to maintain a generally normal body axis, while the cilia in the ear crista became significantly shorter.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Minor

      (I recommend adding page numbers and probably line numbers. This makes commenting easier)

      We have added page numbers and line numbers in the revised manuscript.

      Intro: Furthermore, ultra-high-resolution microscopy showed a close association between cilia length in different organs and the size of IFT fluorescent particles, indicating the presence of larger IFT trains in longer cilia.

      This correlation is not that strong and data are only available for 2 types of cilia.

      Thanks. We have modified this part.

      P5) cilia (Fig. 1D) -> (Fig. S1)

      Thanks. We have corrected this.

      P5) "These movies provide a great opportunity to compare IFT across different cilia." Rewrite: "This approach allows one to determine the velocity and frequency based of IFT based on kymographs" or similar. 

      Thank you for your correction, we have changed it in the revised manuscript.

      This observation suggests that cargo and motor proteins are more effectively coordinated in transporting materials, resulting in increased IFT velocity-a novel regulatory mechanism governing IFT speed in vertebrate cilia.

      This is a somewhat cryptic phrase, rewrite?

      We have modified this sentence.

      P6 and elsewhere: "IFT in the absence of Kif17 or Bbs proteins" I wonder if it would be better to provide subheadings summarizing the main observation instead of descriptive titles. This includes the title of the manuscript.

      Thanks for this suggestion. We have changed the title of subheadings in the revised manuscript. We prefer to keep the current title of this manuscript, as we think this paper is mainly to describe IFT in different types of cilia. 

      Is it known whether IFT protein and motors are alternatively spliced in the various ciliated cells of zebrafish? In this context, is it known whether the cells express IFT proteins at different levels?

      We analyzed the transcript isoforms of several ciliary genes, including ift88, ift52, ift70, ift172, and kif3a. Most of these IFT genes possess only a single transcript isoform. The Kif3a motor proteins have two isoforms (long and short isoforms), however, the shorter isoform contains only the motor domain and is presumed to be nonfunctional for IFT. While we cannot completely rule out this possibility, we consider it unlikely that the variation in IFT speed is due to alternative splicing in ciliary tissues.

      P6) The relation between osm-3 and Kif17 needs to be introduced briefly.  

      Thank you for pointing this out. We have added it in the proper place of the revised manuscript.

      P6) "IFT was driven by kinesin or dynein motor proteins along the ciliary axoneme." "is driven"?

      Delete phrase and IFT to the next sentence?

      We have deleted this sentence.

      P7) "Moreover, the mutants were able to survive to adulthood and there is no difference in the fertility or sperm motility between mutants and control siblings, which is slightly different from those observed in mouse mutants(Gadadhar et al., 2021)." Could some of these data be shown? 

      Thanks for this suggestion. When crossed with wild-type females, all homozygous mutants showed no difference in fertility compared to controls. The percentage of fertilization rates in mutants was 90.5% (n = 7), which was similar to wild-type (87.2%, n = 7). We determined the trajectories of free-swimming sperm by high-speed video microscopy. The vast majority of sperm in ttll3 mutant, similar to wild-type sperm, swim almost entirely along a straight path, which is different from what was observed in the mouse mutant (where 86% of TTLL3-/-TTLL8-/- sperm rotate in situ). We assessed cilia motility in the pronephric ducts of 5dpf embryos using high-speed video microscopy. The ttll3 mutant exhibited a rhythmic sinusoidal wave pattern similar to the control, and there was no significant difference in ciliary beating frequency. These new data are now included in Figure S7C-H.

      P7) "which has been shown early to reduce" earlier

      We have changed it. Thanks.

      Maybe the authors could speculate how the cells ensure the assembly of larger/faster trains in certain cells. Are the relative expression levels known or worth exploring?

      Thank you for these suggestions. We believe that longer cilia may maintain larger IFT particle pools in the basal body region, facilitating the assembly of large IFT trains. The higher frequency of IFT injection in longer cilia further supports this hypothesis. It is likely that cells with longer cilia have higher expression levels of IFT proteins. However, due to the lack of proper antibodies for IFT proteins in zebrafish, it is currently unfeasible to compare this. This experiment is certainly worth investigating in the future. We have added this discussion in the revised manuscript.

      Reviewer #2 (Recommendations for The Authors):

      Here are detailed comments for the authors:

      (1) The authors need to describe their methodology of imaging and what they observe in much greater detail. How were the different cilia types organized? Approximately how many were observed in every organ? How were they oriented? Were there length variations between cilia in the same organ? While imaging, were individual cilium mostly lying in a single focal plane of imaging or the authors often performed z-scans over multiple planes. Velocity measurement is highly variable if individual cilia are spanning over a large volume, with only part of it in focus in single plane acquisition.

      Thank you for your comments. We apologize for the lack of details in the methodology. We have added a detailed description in the 'Materials and Methods' section and illustrated the experimental paradigm in Figure S2A of the revised manuscript. In most tissues we examined, the length of cilia was relatively uniform, except in the crista. The cilia in the crista were significantly longer, with lengths varying between 5 and 30 μm, compared to those in other tissues. We categorized the cilia lengths in the crista into three groups at intervals of 10 μm and measured the anterograde and retrograde velocities of IFT in each group. The results, shown in Figure S4, revealed no significant difference in IFT velocity among the different cilia lengths within the same tissue.  Regarding the imaging, all IFT movies were captured in a single focal plane. In most cases, we did not observe significant velocity variability within the same cilium.

      (2) It is very difficult to directly observe the large differences in IFT velocity from the kymographs, especially in the case of shorter cilia and retrograde motion in them. The quality of the example kymographs could be improved and more zoomed in several cases.

      Thank you for this suggestion. We have modified this.

      (3) The authors do not describe at all, how velocity analysis was done on the kymographs? Were lines drawn manually on the kymographs? From the movies and the kymographs it is visible that the IFT motion is often variable and sometimes gets stuck. How did the authors determine the velocities of such trains? A single slope through the entire train or part of the train? Were they consistent with this? Such variable motion is not so easy to discern in the case of really short cilia. The authors could use a more automatic way of extracting velocities from kymographs using tools such as kymodirect or kymobutler. Keeping in mind that IFT velocity is the main parameter studied in this work, it is important that the analysis is robust.

      We apologize for the previous lack of detailed description. We utilized ImageJ software to generate kymographs, where particles appear as lines. For a moving particle, this line appears oblique. We manually drew lines on the kymographs, and the velocity of particles was calculated based on the slope (Zhou et al., 2001). We only analyzed particles that tracked the full length of the cilia. Following the reviewer's suggestions, we also used the automatic software KymographDirect to calculate the velocity of IFT particles. The results were similar to those calculated using the previous method. These new data are now shown in Figure S2B-D. For shorter cilia, we only used particles with clear moving paths for our calculations. In the revised version, we have included a detailed description of the velocity analysis methods.

      (4) In line with the previous point, as visible from the kymographs the velocity is significantly slower near the transition zone. Did the authors make sure they are not including the region around the transition zone while measuring the IFT velocity, especially in the case of shorter cilia?

      Thank you for the comment. In the revised manuscript, we automatically extracted the path of particle using KymographDirect software. Quantification of each particle's velocity versus position in crista reveals that anterograde IFT proceeds from the base to the tip at a relatively constant speed, whereas retrograde IFT undergoes a slightly acceleration process when returning to the base (Fig. S2E). This finding differs from observations in C. elegans, which dynein-2 first accelerating and then decelerating back to 1.2 μm/s adjacent to the ciliary base (Yi et al, 2017). We believe it is very unlikely that the slow IFT velocity is due to the calculation of IFT only in the transition zone of shorter cilia.

      (5) There are several fascinating findings in this work that the authors do not discuss properly. Firstly, do the authors have a hypothesis as to why IFT speeds are so radically different in different cilia types, given that they are driven by the same motor proteins and have the same ATP levels? They make a big claim in this paper that IFT train sizes correlate with train velocities. IFT trains have a highly ordered structure with regular binding sites for motor proteins. So, a smaller train would have a proportional number of motors attached to them. Why (and how) are the motors moving trains so slowly in some cilia and not in others? If there is no clear answer, the authors must put forward the open question with greater clarity.

      Thank you for the comment. We hypothesize that if multiple motors drive the movement of cargoes synergistically, it could increase the speed of IFT transport. An example supporting this hypothesis is the principle of multiple-unit high-speed trains, which use multiple motors in each individual car to achieve high speeds. Of course, this is just one hypothesis, and we cannot exclude other possibilities, such as the use of different adaptors in different cell types. We have revised our conclusions accordingly in the updated manuscript.

      (6) They find that IFT speeds do not change in kif17 mutants. Are the cilia length also similar (does not appear to be the case in Figure 4 and Figure S3)? Cilia length needs to be quantified. Further, they mention that in C elegans, heterotrimeric kinesin-2 and homodimeric kinesin-2 coordinate IFT. However, from several previous studies, we know that in Chlamydomonas and in mammalian cilia IFT is driven primarily by heterotrimeric kinesin-2 with no evidence that homodimeric kinesin-2 is linked with driving IFT. It appears to be the same in zebrafish. This is an interesting finding and needs to be discussed far more comprehensively.

      Thank you for your comments. We have previously shown that the number and length of crista cilia were grossly normal in kif17 mutants (Zhao et al, 2012). The length of crista cilia displayed slight variability even in wild-type larvae. We quantified the length of cilia in both the crista and neuromast within different mutants, and our analysis revealed no significant difference (see Author response image 1). We agree with the reviewer that Kif17 may play a minor role in driving IFT in cilia. However, previous studies have shown that KIF17 exhibits robust, processive particle movement in both the anterograde and retrograde directions along the entire olfactory sensory neuron cilia in mice. This suggests that, although not essential, KIF17 may also be involved in IFT (Williams et al., 2014). We have added more discussion about Kif17 and heterotrimeric kinesin in the appropriate section of the revised manuscript.

      Author response image 1.

      Statistical significance is based on Kruskal-Wallis statistic, Dunn's multiple comparisons test. n.s., not significant, p>0.05.

      (7) Again, they find that IFT speeds do not change in BBS-4 mutants. I have the same comment about the cilia length as for kif17 mutants. Further, the discussion for this finding is lacking. The authors mention that IFT is disrupted in BBSome mutants of C elegans. Is this the case in other organisms as well? Structural studies on IFT trains reveal that BBSomes are not part of the core structure, while other studies reveal that BBSomes are not essential for IFT. So perhaps the results here are not too surprising.

      We agree with the reviewer that BBSome is possibly not essential for IFT in most cilia. However, in the cilia of olfactory sensory neurons, BBSome is involved in IFT in both mice and nematodes (Ou et al, 2005; Williams et al., 2014). We have added more discussion about BBSome in the appropriate section of the revised manuscript.

      (8) No change in IFT velocities in kif3b mutants is rather surprising. The authors suggest that Kif3C homodimerizes to carry out IFT in the absence of Kif3B. Even if that is the case, the individual homodimer constituents of heterotrimeric kinesin-2 have been shown in previous studies to have different motor properties when homodimerized artificially. Why is IFT not affected in these mutants? This should be discussed. Also, the cilia lengths should be quantified.

      We think the presence of the Kif3A/Kif3C/KAP3 trimeric kinesin may substitute for the Kif3A/Kif3B/KAP3 motors in kif3b mutants, which show normal length of cristae cilia. The Kif3A/Kif3C/KAP3 trimeric kinesin may have similar transport speeds as the Kif3A/Kif3B/KAP3 motors. We did not propose that the Kif3C homodimer can drive the cargoes alone. We apologize for this misunderstanding. Additionally, we have reevaluated the IFT velocities among different lengths of cristae cilia and found no difference between longer and shorter cilia within the same cell types.

      (9) The findings with tubulin modifications should also be discussed in comparison to what has been observed in other organisms.

      We have added further discussion about this result in the revised manuscript.

      (10) The authors find that IFT velocity is lower in ift88 morphants. They also find that the cilia length is shorter (in which cilia type?). Immunofluorescence experiments show that the IFT particle number and size are lower in the ift88 morphants. How many organisms did they look at for this data? What is the experimental variability in intensity measurements in immunofluorescence experiments? Wouldn't the authors expect much higher variability in ift88 morphants (between individual organisms) due to different amounts of IFT88 than for wildtype?

      Thank you for your comments. We apologize for the lack of information regarding the number of organisms observed in Figure 5. These numbers have been added to the figure legends in the revised manuscript. When a low dose of ift88 morpholino was injected, we observed significant shortening of cilia in the ear crista, along with reduced IFT speed. We measured the fluorescence intensity of different IFT particles and found a positive correlation between IFT particle size and fluorescence intensity (Fig 5I). Moreover, the variability of cilia length in cristae is slightly higher in ift88 morphants. These new data have been included in the revised version.

      (11) From their observations they make the claim that IFT velocity is directly proportional to IFT train size. Now within every cilium, IFT trains have large size variations, given the variable intensities for different IFT trains. The authors themselves show that they resolve far more trains when imaging with STED (possibly because they are able to visualize the smaller trains). Is the IFT velocity within the same cilium directly correlated with the intensity of the train, both for wildtype and ift88 morphants? That is the most direct way the authors can test that their hypothesis is true. Higher intensity (larger train size) results in faster velocity. From a qualitative look at their movies, I do not see any strong evidence for that.

      Thank you for your comments. We have measured the particle size and fluorescence intensity of 3dpf crista cilia using high-resolution images acquired with Abberior STEDYCON. The results, shown in Figure 5I, demonstrate a positive correlation between particle size and fluorescence intensity.

      (12) Are the sizes of both anterograde and retrograde trains lower in ift88 morphants? It's not clear from the data. It should be clearly stated that the authors speculate this and this is not directly evident from the data.

      Because the size of IFT fluorescence particles is based on immunostaining results, not live imaging, we cannot determine whether they are anterograde or retrograde IFT particles.

      Therefore, we can only speculate that possibly both anterograde and retrograde trains are reduced in ift88 morphants.

      (13) The biggest claim in this paper is that the cilia lengths in different organs are different due to differences in IFT train sizes. This is based on highly preliminary data shown in Figure 5C (how many organisms did they measure?). The difference is marginal and the dataset for spinal cord cilia is really small. The internal variability within the same cilia type is larger than the difference. How is this tiny difference resulting in such a large difference in IFT speeds? I believe their conclusions based on this data are incorrect.

      From our results, we believe that IFT velocity is related to cell types rather than the length of cilia (Fig. S4), which has also been mentioned in previous studies (Williams et al., 2014).  We agree with the reviewer that the evidence for faster IFT speed due to larger train size is not very solid. We have accordingly softened our conclusion and mentioned other possibilities in the revised version.

      Minor comments:

      (1) The authors only mention the number of IFT particles for their data. They should provide the number of cilia and the number of organisms as well.

      Thank you for your suggestion. We added the number of cilia and organisms next to the number of particles in Figure 3, Figure S2-S5 and Table S1 of the revised manuscript.

      (2) Cilia and flagella are similar structurally but not the same. The authors should change the following sentence: In contrast to the localization of most organelles within cells, cilia (also known as flagellar) are microtubule-based structures that extend from the cell surface, facilitating a more straightforward quantification of their size.  

      Thank you for the detailed review. We have changed it in our revised manuscript. 

      (3) The authors should provide references here. For example, Chlamydomonas has two flagella with lengths ranging from 10 to 14 μm, while sensory cilia in C. elegans vary from approximately 1.5 μm to 7.5 μm. In most mammalian cells, the primary cilium typically measures between 3 and 10 μm.  

      We have added it in our revised manuscript. 

      (4) They should mention ovl mutants are IFT88 mutants when they introduce it in the main text.

      We have added it in our revised manuscript. 

      (5) Correct the grammar here: The velocity of IFT within different cilia also seems unchanged (Figure 4F, Movie S9, Table S1).  

      We have changed it. 

      (6) Correct the grammar here: Similarly, the IFT speeds also exhibited only slight changes in ccp5 morphants, which decreased the deglutamylase activities of Ccp5 and resulted in a hyperglutamylated tubulin

      We have changed it. 

      Reviewer #3 (Recommendations For The Authors):

      Introduction:

      1st paragraph, "flagellar" should be "flagella"; 2nd paragraph, "result a wide range of" should be "result in a...".  

      We have changed it. 

      Results and discussion:

      "...certain specialized cell types, including olfactory epithelia and pronephric duct, ...": olfactory epithelia and pronephric duct are tissues, not cells.  

      "...the GFP fluorescence of the transgene was prominently enriched in the cilia (Fig 1D)" : Fig 2D?  

      "The velocity of IFT within different cilia was also seems unchanged (Fig. 4 F, Movie S9, Table S1)": "was" and "seems" cannot be used together.  

      "...driven by b-actin2 promotor":    -actin2? 

      "...each dynein motor protein might propel multiple IFT complexes": The "protein" should be deleted.  

      Thanks. We have corrected all of these mistakes.  

      Figures:

      Figure 1: Dyes and antibodies used other than the anti-acetylated tubulin antibody should mentioned. The developmental stages of zebrafish used for the imaging are mostly missing.  

      Thanks. In the revised version, we have updated the figure legends to include descriptions of the antibodies, developmental stages, as well as N numbers.

      Figure 2B: What "hphs" means should be explained somewhere.  

      Thanks. We have added full name for these abbreviations.  

      Figures 3A-E: For clarity, the cilia whose IFT kymographs are shown should be marked. "Representative particle traces are marked with white lines in panels D and E" (legend): they are actually black lines. The authors should also clearly disclose the developmental stages of zebrafish used for the imaging.  

      Thank you for your comments. In the revised manuscript, the cilia used to generate the kymograph are marked by yellow arrows. We have updated the legend to change "white" to "black." Additionally, we have included the developmental stages of zebrafish used for imaging in Figure 3A.

      Figures 3G-K: The authors used quantification results from 4-dpf larvae and 30-hpf embryos for comparisons. Nevertheless, according to their experimental scheme in Figure 2B, 30-hpf embryos were not subjected to heat-shock treatment and genotyping. How could they express Ift88-GFP for the imaging? How could the authors choose larvae of the right genotypes? In addition, even if the authors heat-shocked them in time but forgot to mention, there are issues that need to be clarified experimentally and/or through citations, at least through discussions. Firstly, at 30 hpf, those motile cilia are probably still elongating. If this is the case, their final lengths would be longer than those presented (H; the authors need to disclose whether the lengths were measured from ciliary Ift88-GFP or another marker). In other words, the correlation with IFT velocities (H and I) might no longer exist when mature cilia were measured. Similarly, cilia undergo gradual disassembly during the cell cycle. Epidermal cells at 30-hpf are likely proliferating actively, and the average length of their cilia (H) would be shorter than that measured from quiescent epidermal cells in later stages.

      Thank you for these comments. First, we want to clarify that Figure 2B depicts the procedure for heat shock experiments conducted for the ovl mutants' rescue assay, not the experimental procedure for IFT imaging. We visualized IFT in five types of cilia using Tg (hsp70l: ift88-GFP) embryos without the ovl mutant background. In the revised manuscript, we have provided a detailed description of embryo treatment in the 'Materials and Methods' section and illustrated the experimental paradigm in Figure S2A. 

      Regarding the ciliary length differences between different developmental stages, we quantified cilia length in epidermal cells at 30 hpf versus 4 dpf, and in pronephric duct cilia at 30 hpf versus 48 hpf. Our analysis found no significant difference in length between earlier and later stages. Additionally, IFT velocities were comparable between these stages. These findings suggest that slower IFT velocities may not be attributed to the selection of different embryonic stages. Furthermore, we demonstrated that longer and shorter cilia maintain similar IFT velocities in crista cilia, indicating that elongated cilia within the same cell type exhibit comparable IFT velocities. These new results are presented in Figures S4 and S5 in the revised version.

      Secondly, do IFT velocities differ between elongating and mature cilia or remain relatively constant for a given cell type? The authors apparently take the latter for granted without even discussing the possibility of the former. In addition, whether the quantification results were from cilia of one or multiple fish, an important parameter to reflect the reproducibility, and sample sizes for the length data are not disclosed. The lack of descriptions on sample sizes and the number of independent experiments or larvae examined are actually common for statistical results in this manuscript.

      Thank you for your comments. We apologize for omitting the basic description of sample sizes and the number of cilia analyzed. We have addressed these issues in the revised manuscript. The length of 4dpf Crista cilia is variable, with longer cilia reaching up to 30 µm and shorter cilia measuring only around 5 µm within the same crista. We categorized the cilia length of Crista into three groups at intervals of 10 µm and measured anterograde and retrograde velocities of IFT in each group. The results revealed no significant difference in IFT velocity among elongating and mature cilia within crista. These supplementary data are now included in Figure S4.

      Figures 4A-B: When mutating neither Kif17 nor Kif3b affected the IFT of crista cilia, the data unlikely "suggest that the variability in IFT speeds among different cilia cannot be attributed to the use of different motor proteins". In fact, in the cited publication (Zhao et al., 2012), the authors used the same and additional mutants (Kif3c and Kif3cl) to demonstrate that different IFT-related kinesin motors have different effects on ciliogenesis and ciliary length in different tissues, results actually implying tissue-specific contributions of different kinesin motors to IFT. Furthermore, although likely only cytoplasmic dynein-2 is involved in the retrograde IFT, the authors cannot exclude the possibility that different combinations or isoforms of its many subunits and regulators contribute to the velocity regulation. Therefore, the authors need to reconsider their wording. This reviewer would suggest that the authors examine the IFT status of cilia that were previously reported to be shortened in the Kif3b mutant to see whether the correlation between ciliary length and IFT velocities still stands. This would actually be a critical assay to assess whether the proposed correlation is only a coincidence or indeed has a certain causality.

      Thank you for your comments. The shortened cilia observed in Kif3b mutants may be attributed to the presence of maternal Kif3b proteins, making it challenging to exclude the involvement of Kif3b motor. Regarding the relationship between IFT speed variability and motor proteins, we agree with the reviewer that we cannot entirely dismiss the possibility of different motors or adaptors being involved. We have revised our description of this aspect accordingly.

      Figures 4C-G: Similarly, when the authors found that tubulin glycylation or glutamylation has little effect on IFT, they cannot use these observations to exclude possible influences of other types of tubulin modifications on IFT. They should only stick to their observations.

      Yes, we agree. We have changed the description in the revised manuscript.

      Figure 5:

      A-C: When the authors only compared immotile cilia of crista with motile cilia of the spinal cord, it is hard to say whether the difference in particle size is correlated with ciliary length or motility. Cilia from more tissues should be included to strengthen their point, especially when the authors want to make this point the central one.

      D: The authors showed that ovl larvae containing Tg(hsp70l:ift88 GFP) (as they do not indicate the genotype, this reviewer can only deduce) display normal body curvature at 2 dpf after the injection of 0.5 ng of ift88 MO. Such a result, however, is quite confusing. According to their experimental scheme in Figure 2B, these larvae were not subjected to heat shock induction for Ift88-GFP. Do ovl larvae containing Tg(hsp70l:ift88 GFP) naturally display normal body curvature at 2 dpf? 

      Thank you for your comments. Due to technical limitations, comparing IFT particle size across different cilia using STED is challenging. We agree with this reviewer that the evidence supporting this aspect is relatively weak. Accordingly, we have modified and softened our conclusion in the revised version.

      Regarding the injection of ift88 morpholino, we want to clarify that we are injecting it into wildtype embryos, not oval mutants. The lower dose of ift88 morpholino (0.5ng) partially knocked down Ift88, allowing embryos to maintain a grossly normal body axis while resulting in shorter cilia in the ear crista.

      E: The authors need to indicate the developmental stage of the larvae examined. One piece of missing data is global expression levels of both endogenous (maternal) Ift88 and exogenous

      Ift88-GFP in zebrafish larvae that are either uninjected, 8-ng-ift88 MO-injected, or 0.5-ng-ift88 MO-injected, preferably at multiple time points up to 3 dpf. The results will clarify (1) the total levels of Ift88 following time; (2) the extent of downregulation the MO injections achieved at different developmental stages; and importantly (3) whether the low MO dosage (0. 5 ng) indeed allowed a persistent downregulation to affect IFT trains at 3 dpf, a time the authors made the assays for Figures 5F-J to reach the model (K). It will be great to include wild-type larvae for comparison.

      Thank you for these valuable suggestions. The ift88 morpholino (MO) was designed to block the splicing of ift88 transcripts and has been used in multiple studies. This morpholino specifically blocks the expression of endogenous ift88, while the expression of the Ift88-GFP transgene remains unaffected. It would be beneficial to titrate the expression level of Ift88 in the morphants at different stages. Unfortunately, we do not have access to a zebrafish Ift88 antibody. We assessed the effects of a lower amount of MO based on our observation that the fish maintained a normal body axis while exhibiting shorter cilia. Ideally, the amount of Ift88 should be lower in the morphants, considering the presence of ciliogenesis defects. We have included additional comments regarding this limitation in the revised version.

      Movies:

      Movies 1-5: Elapsed time is not provided. Furthermore, cilia in the pronephric duct and spinal cord are known to beat rapidly. Their motilities, however, appear to be largely compromised in Movies 3 and 4. Although the quantification results in Fig 3G imply that the authors imaged 30hpf embryos for such cilia, there is no statement on real conditions.

      Thank you for your comments. We apologize for missing elapsed time in our movies. We have addressed this issue in the revised manuscript. Motile cilia are difficult to image due to their fast beating. To immobilize the moving cilia and enable the capture of IFT movement within the cilia, we gently press the embryo with a round cover glass to inhibit the beating of cilia. Data from each embryo were collected within 5 minutes to avoid the impact of embryo death on the results. We have added detail description in the 'Materials and Methods' section.

      Materials:

      The sequence of morpholino oligonucleotide against ift88 is missing.  

      We have added the sequence of ift88 morpholino in the revised manuscript.

      References:

      Important references are missing, including (1) the paper by Leventea et al., 2016 (PMID: 27263414), which shows cilia morphologies in various zebrafish tissues with more detailed descriptions of tissue anatomies and experimental techniques; (2) papers documenting that dynein motors "move faster than Kinesin motors" in IFT of C. reinhardtii and C. elegans cilia; and (3) the paper by Li et al., 2020 (PMID: 33112235), in which the authors constructed a hybrid IFT kinesin to markedly reduced anterograde IFT velocity (~ 2.8 fold) and IFT injection rate in C. reinhardtii cilia and found only a mild reduction (~15%) in ciliary length. This paper is important because it is a pioneer one that elegantly investigated the relationship between IFT velocity and ciliary length. The findings, however, do not necessarily contradict the current manuscript due to differences in, e.g., model organisms and methodology.

      Thank you for the detailed review, we have cited these literatures in the proper place of the revised manuscript.

      Reference

      Broekhuis JR, Verhey KJ, Jansen G (2014) Regulation of cilium length and intraflagellar transport by the RCK-kinases ICK and MOK in renal epithelial cells. PLoS One 9: e108470

      Kunova Bosakova M, Varecha M, Hampl M, Duran I, Nita A, Buchtova M, Dosedelova H, Machat R, Xie Y, Ni Z et al (2018) Regulation of ciliary function by fibroblast growth factor signaling identifies FGFR3-related disorders achondroplasia and thanatophoric dysplasia as ciliopathies. Hum Mol Genet 27: 1093-1105

      Luo W, Ruba A, Takao D, Zweifel LP, Lim RYH, Verhey KJ, Yang W (2017) Axonemal Lumen Dominates Cytosolic Protein Diffusion inside the Primary Cilium. Sci Rep 7: 15793 Ou G, Blacque OE, Snow JJ, Leroux MR, Scholey JM (2005) Functional coordination of intraflagellar transport motors. Nature 436: 583-587

      See SK, Hoogendoorn S, Chung AH, Ye F, Steinman JB, Sakata-Kato T, Miller RM, Cupido T, Zalyte R, Carter AP et al (2016) Cytoplasmic Dynein Antagonists with Improved Potency and Isoform Selectivity. ACS Chem Biol 11: 53-60

      Williams CL, McIntyre JC, Norris SR, Jenkins PM, Zhang L, Pei Q, Verhey K, Martens JR (2014) Direct evidence for BBSome-associated intraflagellar transport reveals distinct properties of native mammalian cilia. Nat Commun 5: 5813

      Yi P, Li WJ, Dong MQ, Ou G (2017) Dynein-Driven Retrograde Intraflagellar Transport Is Triphasic in C. elegans Sensory Cilia. Curr Biol 27: 1448-1461 e1447

      Zhao C, Omori Y, Brodowska K, Kovach P, Malicki J (2012) Kinesin-2 family in vertebrate ciliogenesis. Proceedings of the National Academy of Sciences 109: 2388 - 2393

      Zhou HM, Brust-Mascher I, Scholey JM (2001) Direct visualization of the movement of the monomeric axonal transport motor UNC-104 along neuronal processes in living Caenorhabditis elegans. J Neurosci 21: 3749-3755

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      In this study, Kim et al. investigated the mechanism by which uremic toxin indoxyl sulfate (IS) induces trained immunity, resulting in augmented pro-inflammatory cytokine production such as TNF and IL6. The authors claim that IS treatment induced epigenetic and metabolic reprogramming, and the aryl hydrocarbon receptor (AhR)-mediated arachidonic acid pathway is required for establishing trained immunity in human monocytes. They also demonstrated that uremic sera from end-stage renal disease (ESRD) patients can generate trained immunity in healthy control-derived monocytes.

      These are interesting results that introduce the important new concept of trained immunity and its importance in showing endogenous inflammatory stimuli-induced innate immune memory. Additional evidence proposing that IS plays a critical role in the initiation of inflammatory immune responses in patients with CKD is also interesting and a potential advance of the field. This study is in large part well done, but some components of the study are still incomplete and additional efforts are required to nail down the main conclusions.

      Thank you very much for your positive feedback.

      Specific comments:

      (1) Of greatest concern, there are concerns about the rigor of these experiments, whether the interpretation and conclusions are fully supported by the data. (1) Although many experiments have been sporadically conducted in many fields such as epigenetic, metabolic regulation, and AhR signaling, the causal relationship between each mechanism is not clear. (2) Throughout the manuscript, no distinction was made between the group treated with IS for 6 days and the group treated with the second LPS (addressed below). (3) Besides experiments using non-specific inhibitors, genetic experiments including siRNA or KO mice should be examined to strengthen and justify central suggestions.

      We are grateful for the invaluable constructive feedback provided. 

      (1) In response to the reviewer's feedback, we conducted additional experiments employing appropriate inhibitors to investigate the causal relationship among the AhR pathway, epigenetic modifications, and metabolic rewiring in IS-induced trained immunity. Notably, metabolic rewiring, particularly the upregulation of aerobic glycolysis via the mTORC1 signaling pathway, stands as a pivotal mechanism underlying the induction of trained immunity through the modulation of epigenetic modifications (Riksen NP et al. Figure 1). Initially, we assessed the enrichment of H3K4me3 at 6-day on promoters of TNFA and IL6 loci after treatment of zileuton, an inhibitor of ALOX5, and 2-DG, a glycolysis inhibitor. Additionally, we evaluated the alteration in the activity of S6K, a downstream molecule of mTORC1, following zileuton treatment. Our findings indicate that AhR-dependent arachidonic acid (AA) signaling induces epigenetic modifications, albeit without inducing metabolic rewiring, in IS-induced trained immunity (Author response image 1). However, IS stimulation promotes mTORC1-mediated glycolysis in an AhR-independent manner. Notably, inhibition of glycolysis with 2-DG impacts epigenetic modifications. We have updated Figure 7 of the revised manuscript to incorporate these additional experimental findings, elucidating the correlation between the diverse mechanisms implicated in IS-induced innate immune memory (Fig. 7 in the revised manuscript). These data have been integrated into the revised manuscript as Figure 3D and 5I, and supplementary Figure 5I.

      (2) We apologize for any confusion arising from the unclear description regarding the distinction between the group treated with IS for 6 days and the group subjected to secondary lipopolysaccharide (LPS) stimulation. It is imperative to clarify that induction of trained immunity necessitates 1 day of IS stimulation followed by 5 days of rest, rendering the 6th day sample representative of a trained state. Subsequent to this, a 24-hour LPS stimulation is applied, designating the 7th day sample as a secondary LPS-stimulated cell. This clarification is now explicitly indicated throughout the entirety of Figure 1A and Figure 3A in the revised manuscript.

      (3) In accordance with your feedback, we performed siRNA knockdown of AhR and ALOX5 in primary human monocytes. AhR knockdown markedly attenuated the mRNA expression of TNF-α and IL-6, which are augmented in IS-trained macrophages. Similarly, knockdown of ALOX5 using ALOX5 siRNA abrogated the increase in TNF-α and IL-6 levels upon LPS stimulation in IS-trained macrophages (Author response image 2). Our experiments utilizing AhR siRNA corroborate the involvement of AhR in the expression of AA pathway-related molecules, such as ALOX5, ALOX5AP, and LTB4R1, in IS-induced trained immunity. These data have been incorporated into the revised manuscript as Figure 4E and 5G, and supplementary Figure 5H.  

      Author response image 1.

      Epigenetic modification is regulated by arachidonic acid (AA) pathway and metabolic rewiring, but metabolic rewiring is not affected by the AA pathway. A-B. Monocytes were pre-treated with zileuton (ZLT), an inhibitor of ALOX5, or 2DG, a glycolysis inhibitor, followed by stimulation with IS for 24 hours. After a resting period of 5 days, the enrichment of H3K4me3 on the promoters of TNFA and IL6 loci was assessed. Normalization was performed using 2% input. C. Monocytes were pre-treated with zileuton (ZLT) and stimulated with IS for 24 hr. Cell lysates were immunoblotted for phosphorylated S6 Kinase, with β-actin serving as a normalization control. Band intensities in the immunoblots were quantified using densitometry. D, A schematic representation of the mechanistic framework underlying IS-trained immunity. Bar graphs show the mean ± SEM. * = p < 0.05, **= p < 0.01, and *** = p < 0.001 by two-tailed paired t-test.

      Author response image 2.

      Inhibition of IS-trained immunity by knockdown of AhR or ALOX5 in human monocytes. A-C. Human monocytes were transfected with siRNA targeting AhR (siAhR), ALOX5 (siALOX5), or negative control (siNC) for 1 day, followed by stimulation with IS for 24 hours. After a resting period of 5 days, cells were re-stimulated with LPS for 24 hours. mRNA expression levels of AhR and ALOX5 at 1 day after transfection, and TNF-α and IL-6 at 1 day after LPS treatment, were assessed using RT-qPCR. D. Human monocytes were transfected with AhR siRNA or negative control (NC) siRNA for 1 day, followed by stimulation with IS for 24 hours. After resting for 5 days, mRNA expression levels of ALOX5, ALOX5AP, and LTB4R1 were analyzed using RT-qPCR. Bar graphs show the mean ± SEM. * = p < 0.05, ** = p < 0.01, and *** = p < 0.001 by two-tailed paired t-test.  

      (2) The authors showed that IS-trained monocytes showed no change in TNF or IL-6, but increased the expression levels of TNF and IL-6 in response to the second LPS (Fig. 1B). This suggests that the different LPS responsiveness in IS-trained monocytes caused altered gene expression of TNF and IL6. However, the authors also showed that IS-trained monocytes without LPS stimulation showed increased levels of H3K4me3 at the TNF and IL-6 loci, as well as highly elevated ECAR and OCR, leading to no changes in TNF and IL-6. Therefore, it is unclear why or how the epigenetic and metabolic states of IS-trained monocytes induce different LPS responses. For example, increased H3K4me3 in HK2 and PFKP is important for metabolic rewiring, but why increased H3K4me3 in TNF and IL6 does not affect gene expression needs to be explained.

      We acknowledge the constructive critiques provided by the reviewer. While epigenetic modifications in the promoters of TNF-α, IL-6, HK2, and PFKP (Figure 3B and Supplementary Figure 3C in the revised manuscript), and metabolic rewiring (Figure 2A-D in the revised manuscript) were observed in IS-trained macrophages at 6 days prior to LPS stimulation, these macrophages do not exhibit an increase in TNF-α and IL-6 mRNA and protein levels before LPS stimulation. This lack of response is attributed to a 5-day resting period, allowing the macrophages to revert to a non-activated state, as depicted in Author response image 3 and 4. This phenomenon aligns with the concept of typical trained immunity.

      Trained immunity is characterized by the long-term functional reprogramming of innate immune cells, which is evoked by various primary insults and which leads to an altered response towards a second challenge after the return to a non-activated state. Metabolic and epigenetic reprogramming events during the primary immune response persist partially even after the initial stimulus is removed. Upon a secondary challenge, trained innate immune cells exhibit a more robust and more prompt response than the initial response (Netea MG et al. Defining trained immunity and its role in health and disease. Nat Rev Immunol. 2020 Jun;20(6):375-388).

      Numerous studies have demonstrated the observation of epigenetic modifications in the promoters of TNF-α and IL-6, and metabolic rewiring prior to LPS stimulation as a secondary challenge. However, cytokine production is contingent on LPS stimulation (Arts RJ et al. Glutaminolysis and Fumarate Accumulation Integrate Immunometabolic and Epigenetic Programs in Trained Immunity. Cell Metab. 2016 Dec 13;24(6):807-819; Arts RJW et al. Immunometabolic Pathways in BCG-Induced Trained Immunity. Cell Rep. 2016 Dec 6;17(10):2562-2571; Ochando J et al. Trained immunity - basic concepts and contributions to immunopathology. Nat Rev Nephrol. 2023 Jan;19(1):23-37). The prolonged presence of higher levels of H3K4me3 on immune gene promoters, even after returning to baseline, is associated with open chromatin and results in a more rapid and stronger response, such as cytokine production, upon a secondary insult (Netea MG et al. Defining trained immunity and its role in health and disease. Nat Rev Immunol. 2020 Jun;20(6):375-388).

      The results in Figure 1B may be interpreted as indicating different LPS responsiveness in IStrained monocytes caused altered gene expression of TNF and IL-6. However, it is plausible that trained immune cells respond more robustly even to low concentrations of LPS. In fact, the aim of this experiment was to determine the appropriate LPS concentration.

      Author response image 3.

      The changes in mRNA and protein level of TNF-α and IL-6 during induction of IS-trained immunity. Human monocytes were treated with or without IS (1 mM) for 24 hrs, succeeded by 5-day resting period to induce trained immunity. Cells were stimulated with LPS for 24 hrs. Protein and mRNA levels were assessed by ELISA and RT-qPCR, respectively. Bar graphs show the mean ± SEM. * = p < 0.05 and **= p < 0.01, by two-tailed paired t-test.

      Author response image 4.

      The changes in mRNA of HK2 and PFKP induced by IS during induction of IS-trained immunity. Human monocytes were treated with or without IS (1 mM) for 24 hrs, succeeded by 5-day resting period to induce trained immunity. mRNA levels were assessed by RT-qPCR. Bar graphs show the mean ± SEM. * = p < 0.05 by two-tailed paired ttest.

      (3) The authors used human monocytes cultured in human serum without growth factors such as MCSF for 5-6 days. When we consider the short lifespan of monocytes (1-3 days), the authors need to explain the validity of the experimental model.

      We appreciate the reviewer’s constructive critiques. As pointed out by the reviewer, human circulating CD14+ monocytes exhibit a relatively short lifespan (1-3 days) when cultured in the absence of growth factors (Patel AA et al. The fate and lifespan of human monocyte subsets in steady state and systemic inflammation. J Exp Med. 2017 Jul 3;214(7):1913-1923). In this study, purified CD14+ monocytes were subjected to adherent culture for a duration of 7 days in RPMI1640 media supplemented with 10% human AB serum, a standard in vitro culture protocol widely employed in studies focusing on trained immunity (Domínguez-Andrés J et al. In vitro induction of trained immunity in adherent human monocytes. STAR Protoc. 2021 Feb 24;2(1):100365). In response to the reviewer's suggestions, we assessed cell viability on days 0, 1, 4, and 6, utilizing the WST assay. Despite a marginal reduction in cell viability observed at day 1, attributed to detachment from the culture plate, the cultured monocytes exhibited a notable enhancement in cell viability on days 4 and 6 when compared to days 0 or 1 (Author response image 5).

      It has been demonstrated that the adhesion of human monocytes to a cell culture dish leads to their activation and induces the synthesis of substantial amounts of IL-1β mRNA as observed in monocytes adherent to extracellular matrix components such as fibronectin and collagen.

      Morphologically, human adherent monocytes cultured with 10% human serum appear to undergo partial differentiation into macrophages by day 6, potentially explaining the observed lack of decrease in monocyte viability. Notably, Safi et al. have reported that adherent monocytes cultured with 10% human serum exhibit no significant difference in cell viability over a 7-day period when compared to cultures supplemented with growth factors such as M-CSF and IL-3 (Safi W et al. Differentiation of human CD14+ monocytes: an experimental investigation of the optimal culture medium and evidence of a lack of differentiation along the endothelial line. Exp Mol Med. 2016 Apr 15;48(4):e227).

      Author response image 5.

      Viability of human monocytes during the induction of trained immunity. Purified human monocytes were seeded on plates with RPIM1640 media supplemented with 10% human AB serum. Cell viability was assessed on days 0, 1, 4, and 6 utilizing the WST assay (Left panel). Cell morphology was examined under a light-inverted microscope at the indicated times (Right panel).

      (4) The authors' ELISA results clearly showed increased levels of TNF and IL-6 proteins, but it is well established that LPS-induced gene expression of TNF and IL-6 in monocytes peaked within 1-4 hours and returned to baseline by 24 hours. Therefore, authors need to investigate gene expression at appropriate time points.

      We appreciate the valuable constructive feedback provided by the reviewer. As indicated by the reviewer, the LPS-induced gene expression of TNF-α and IL-6 in IS-trained monocytes exhibited a peak within the initial 1 to 4 hours, followed by a decrease by the 24-hour time point, as illustrated in Author response image 6. Nevertheless, the mRNA expression levels of TNFα and IL-6 were still elevated at the 24-hour mark. Furthermore, the protein levels of both TNFα and IL-6 apparently increased 24 hours after LPS stimulation. Due to technical constraints, sample collection had to be conducted at a single time point, and the 24-hour post-stimulation interval was deemed optimal for this purpose.

      Author response image 6.

      Kinetics of protein and mRNA expression of TNF-α and IL-6 after treatment of LPS as secondary insult in IS-trained monocytes. IS-trained cells were re-stimulated by LPS (10 ng/ml) for the indicated time. The supernatant and lysates were collected for ELISA assay and RT-qPCR analysis, respectively. Bar graphs show the mean ± SEM. * = p <0.05 and **= p < 0.01, by two-tailed paired t-test.

      (5) It is a highly interesting finding that IS induces trained immunity via the AhR pathway. The authors also showed that the pretreatment of FICZ, an AhR agonist, was good enough to induce trained immunity in terms of the expression of TNF and IL-6. However, from this point of view, the authors need to discuss why trained immunity was not affected by kynurenic acid (KA), which is a well-known AhR ligand accumulated in CKD and has been reported to be involved in innate immune memory mechanisms (Fig. S1A).

      We appreciate the constructive criticism provided by the reviewer, and we comprehend the raised points. In our initial experiments, we hypothesized that kynurenic acid (KA), an aryl hydrocarbon receptor (AhR) ligand, might instigate trained immunity in monocytes, despite KA not being our primary target uremic toxin. However, our findings, as depicted in Fig. S1A, demonstrated that KA did not induce trained immunity. Notably, KA-treated monocytes exhibited induction of CYP1B1, an AhR-responsive gene, and elevated levels of TNF-α and IL-6 mRNA at 24 hours post-treatment, comparable to FICZ-treated monocytes. This observation underscores KA's role as an AhR ligand in human monocytes, as emphasized by the reviewer. 

      Of particular interest, proteins associated with the arachidonic acid pathway, such as ALOX5 and ALOX5AP - integral to the mechanisms underlying IS-induced trained immunity - did not exhibit an increase at day 6 following KA treatment, in contrast to the significant elevation observed with IS and FICZ treatments (Author response image 7). The rationale behind this disparity remains unknown, necessitating further investigation to elucidate the underlying factors. These data have been incorporated into the revised manuscript as Supplementary Figure 5C.

      Author response image 7.

      Divergent impact of AhR agonists, especially IS, FICZ, and KA on the AhR-ALOX5 pathway. Purified ytes underwent treatment with IS (1 mM), FICZ (100 nM), or KA (0.5 mM) for 1 day, followed by 5-day resting period to trained immunity. Activation of AhR through ligand binding was assessed by examining the induction of CYP1B1, an AhR ene, and cytokines one day post-treatment. The expression of genes related to the arachidonic acid pathway, such as ALOX5, 5AP, and LTB4R1, was analyzed via RT-qPCR six days after inducing trained immunity. Bar graphs show the mean ± SEM. * .05, **= p < 0.01, and ***= p < 0.001 by two-tailed paired t-test.

      Indeed, it has been demonstrated that FICZ and TCDD, two high-affinity AhR ligands, exert opposite effects on T-cell differentiation, with TCDD inducing regulatory T cells and FICZ inducing Th17 cells. This dichotomy has been attributed to ligand-intrinsic differences in AhR activation (Ho PP et al. The aryl hydrocarbon receptor: a regulator of Th17 and Treg cell development in disease. Cell Res. 2008 Jun;18(6):605-8; Ehrlich AK et al. TCDD, FICZ, and Other High Affinity AhR Ligands Dose-Dependently Determine the Fate of CD4+ T Cell Differentiation. Toxicol Sci. 2018 Feb 1;161(2):310-320). These outcomes imply the involvement of an intricate interplay involving metabolic rewiring, epigenetic reprogramming, and the AhR-ALOX5 pathway in IS-induced trained immunity within monocytes.

      (6) The authors need to clarify the role of IL-10 in IS-trained monocytes. IL-10, an anti-inflammatory cytokine that can be modulated by AhR, whose expression (Fig. 1E, Fig. 4D) may explain the inflammatory cytokine expression of IS-trained monocytes.

      We appreciate the reviewer’s valuable comment, recognizing its significant importance. IL-10, characterized by potent anti-inflammatory attributes, assumes a pivotal role in constraining the host immune response against pathogens. This function serves to mitigate potential harm to the host and uphold normal tissue homeostasis. In the context of atherosclerosis (Mallat Z et al. Protective role of interleukin-10 in atherosclerosis. Circ Res. 1999 Oct 15;85(8):e17-24.) and kidney disease (Wei W et al. The role of IL-10 in kidney disease. Int Immunopharmacol. 2022 Jul;108:108917), IL-10 exerts potent deactivating effects on macrophages and T cells, influencing various cellular processes that could impact the development and stability of atherosclerotic plaques. Additionally, it is noteworthy that IL-10-deficient macrophages exhibit an augmentation in the proinflammatory cytokine TNF-α (Smallie T et al. IL-10 inhibits transcription elongation of the human TNF gene in primary macrophages. J Exp Med. 2010 Sep 27;207(10):2081-8; Couper KN et al. IL-10: the master regulator of immunity to infection. J Immunol. 2008 May 1;180(9):5771-7). As emphasized by the reviewer, the reduced gene expression of IL-10 by IS-trained monocytes may contribute to the heightened expression of proinflammatory cytokines. We have thoroughly addressed and discussed this specific point in response to the reviewer's comment (Line 394-399 of page 18 in the revised manuscript).

      (7) The authors need to show H3K4me3 levels in TNF and IL6 genes in all conditions in one figure. (Fig. 2B). Comparing Fig. 2B and Fig. S2B, H3K4me3 does not appear to be increased at all by LPS in the IL6 region. 

      We are grateful for the constructive criticism provided by the reviewer. In response to the reviewer's comment, we endeavored to conduct an experiment demonstrating H3K4me3 enrichment on the promoters of TNF-α and IL-6 across all experimental conditions. However, due to limitations in the availability of purified human monocytes, we conducted an additional three independent experiments for ChIP-qPCR across all conditions. Despite encountering a notable variability among individuals, even within the healthy donor cohort, our results demonstrated an increase in H3K4me3 enrichment on the TNF-α and IL-6 promoters in IS-trained groups, irrespective of subsequent LPS treatment (Author response image 8).

      Author response image 8.

      Analysis of H3K4me3 enrichment on the promoters of TNFA and IL6 Loci in IS-trained macrophages. ChIP-qPCR was employed to assess the enrichment of H3K4me3 on the promoters of TNFA and IL6 loci before (day 6) and after LPS stimulation (day 7) in IS-trained macrophages. The normalization control utilized 2% input. Bar graphs show the mean ± SEM. The data presented are derived from three independent experiments utilizing samples from different donors.

      (8) The authors need to address the changes of H3K4me3 in the presence of MTA.

      We appreciate the constructive criticism provided by the reviewer. In response to the reviewer's feedback, we conducted an analysis of the changes in H3K4me3 in the presence of MTA, a general methyltransferase inhibitor, using identical conditions as depicted in Figure 2C of the original manuscript. Our findings revealed that MTA exerted inhibitory effects on the levels of H3K4me3, as isolated through the acid histone extraction method, which were otherwise increased by IS-training, as illustrated in Author response image 9. 

      Author response image 9.

      The reduction of H3K4me3 by MTA treatment in IS-trained macrophages. IS-trained cells were restimulated by LPS (10 ng/ml) as a secondary challenge for 24 hrs, followed by isolation of histone and WB analysis for H3K4me3, Histone 3 (H3), and β-actin. The blot data from two independent experiments with different donors were shown.

      (9) Interpretation of ChIP-seq results is not entirely convincing due to doubts about the quality of sequencing results. First, authors need to provide information on the quality of ChIP-seq data in reliable criteria such as Encode Pipeline. It should also provide representative tracks of H3K4me3 in the TNF and IL-6 genes (Fig. 2F). And in Fig. 2F, the author showed the H3K4me3 track of replicates, but the results between replicates were very different, so there are concerns about reproducibility. Finally, the authors need to show the correlation between ChIP-seq (Fig. 2) and RNA-seq (Fig. 5).

      We appreciate the constructive criticism provided by the reviewer. 

      As indicated by the reviewer, for evaluation of sample read quality, analysis was performed using the histone ChIP-seq standard from the ENCODE project, focusing on metrics such as read depth, PCR bottleneck coefficient (PBC)1, PBC2, and non-redundant fraction (NRF). Five of the total samples were displayed moderate bottleneck levels (0.5 ≤ PBC1 < 0.8, 1 ≤ PBC2 < 3) with acceptable (0.5 ≤ NRF < 0.8) complexity. One sample showed mild bottlenecks (0.8 ≤ PBC1 < 0.9, 3 ≤ PBC2 < 10) with compliance (0.8 ≤ NRF < 0.9) complexity. This quality metrics indicated ChIP-seq data quality meets at least the standards required for downstream analysis according to ENCODE project criteria (Author response image 10A).

      To examine the differences in H3K4me3 enrichment patterns between two groups, we normalized the read counts around the TSS ±2 kb of human genes to CPM. Sequentially, we compared the average values of IS-treated macrophage compare to control and displayed in waterfall plots. In addition, we marked genes of interest in red including the phenotypes of IStrained macrophages (TNF and IL6), the activation of the innate immune responses (XRCC5, IFI16, PQBP1), and the regulation of ornithine decarboxylase (OAZ3, PSMA3, PSMA1) (Author response image 10B and C). Also, H3K4me3 peak tracks of TNF and IL6 loci and H3K4me3 enrichment pattern were added in supplementary Figure 3D and 3F in the revised manuscript.

      Next, to evaluate the consistency among replicates within a group, we analyzed enrichment values, expressed as Counts per Million (CPM) using edgeR R-package, by applying Spearman's correlation coefficients. we analyzed two sets included total 7,136 H3K4me3 peak sets, as described in Figure 3E in the revised manuscript and 2 kbp around transcription start sites (TSS) from hg19 human genomes. The resulting Spearman's correlation coefficients and associated P-values demonstrated a concordance between replicates, confirming reproducibility and consistent performance (Author response image 10D). 

      Finally, the correlation between gene expression and H3K4me3 enrichment around transcription start sites (TSS) has been reported in previous research (Reshetnikov VV et al. Data of correlation analysis between the density of H3K4me3 in promoters of genes and gene expression: Data from RNA-seq and ChIP-seq analyses of the murine prefrontal cortex. Data Brief. 2020 Oct 2;33:106365). To verify this association in our study, we applied Spearman's correlation for comparative analysis and conducted linear regression to determine if a consistent global trend in RNA expression existed. In our analysis, count values from regions extending 2 kbp around the TSSs in H3K4me3 ChIP-seq data were converted to Counts per Million (CPM) using edgeR R-package. These were then contrasted with the Transcripts Per Million (TPM) values of genes. Our results revealed a significant positive correlation, reinforcing the consistent relationship between H3K4me3 enrichment and gene expression (Author response image 10E and Supplementary Fig. 6D in revised manuscripts).

      Author response image 10.

      The information on quality of ChIP-seq data and correlation between ChIP-seq and RNA-seq. A, information on quality of ChIP-seq data. B, H3K4me3 peak of promoter region on TNFA and IL6. C, The differences in H3K4me3 enrichment patterns between control group and IS-training group. D, The consistency among replicates within a group. E, Correlation between ChIP-seq and RNA-seq in IS-induced trained immunity.

      (10) AhR changes in the cell nucleus should be provided (Fig. 4A).

      We appreciate the constructive feedback from the reviewer. In response to the reviewer's suggestions, we investigated the nuclear translocation of AhR on 6 days after the induction of ISmediated trained immunity, as illustrated in Author response image 11. For this purpose, the lysate from IS-trained monocytes was fractionated into the nucleus and cytosol, and AhR protein was subsequently immunoblotted. The results depicted in Figure X demonstrate that IS-trained monocytes exhibited a higher level of AhR protein in the nucleus compared to non-trained monocytes. Notably, the nuclear translocation of AhR was significantly attenuated in IS-trained monocytes treated with GNF351. These findings imply that the activation of AhR, facilitated by the binding of IS, persisted partially up to 6 days, indicating that IS-mediated degradation of AhR was not fully recovered even on day 6 after the induction of IS training. Consequently, we have replaced Figure 4A in the revised manuscript.

      Author response image 11.

      The activation of AhR, facilitated by IS binding, is persisted partially up to 6 days during induction of trained immunity. The lysate of IS-trained cells treated with or without GNF351, were separated into nuclear and cytosol fraction, followed by WB analysis for AhR protein (Left panel). Band intensity in immunoblots was quantified by densitometry (Right panel). β-actin was used as a normalization control. Bar graphs show the mean ± SEM. * = p < 0.05, by two-tailed paired t-test.

      (11) Do other protein-bound uremic toxins (PBUTs), such as PCS, HA, IAA, and KA, change the mRNA expression of ALOX5, ALOX5AP, and LTB4R1? In the absence of genetic studies, it is difficult to be certain of the ALOX5-related mechanism claimed by the authors.

      We are grateful for the constructive criticism provided by the reviewer. In response to the reviewer's comment, we investigated whether uremic toxins, specifically PBUTs such as PCS, HA, IAA, and KA, induce changes in the mRNA expression of ALOX5, ALOX5AP, and LTB4R1 in trained monocytes. Intriguingly, the examination revealed no discernible induction in the mRNA expression of these genes by PBUTs, with the exception of IS, as depicted in Author response image 12 of the letter. These findings once again underscore the implication of the AhR-ALOX5 pathway in the induction of trained immunity in monocytes by IS.

      Author response image 12.

      No obvious impact of PBUTs except IS on the expression of arachidonic acid pathway-related genes on 6 days after treatment with PBUTs. Purified monocytes were treated with several PBUTs including IS, PCS, HA, IAA, and KA for 24 hrs., following by 5-day resting period to induce trained immunity. The mRNA expression of ALOX5, ALOX5AP, and LTB4R1 were quantified using RT-qPCR. Bar graphs show the mean ± SEM. * = p < 0.05, by two-tailed paired t-test.

      (12) Fig.6 is based on the correlated expression of inflammatory genes or AA pathway genes. It does not clarify any mechanisms the authors claimed in the previous figures. 

      We express our sincere appreciation for the constructive criticism provided by the reviewer, and we have taken careful note of the points raised. In response to the reviewer's feedback, we adopted two distinct approaches utilizing samples obtained from ESRD patients and IS-trained mice. Initially, we investigated the correlation between ALOX5 protein expression in monocytes and IS concentration in the plasma of ESRD patients presented in Figure 6E of the original manuscript. Despite the limited number of samples, our analysis revealed a nonsignificant correlation between IS concentration and ALOX5 expression; however, it demonstrated a positive trend (Author response image 13A). Subsequently, we examined the potential inhibitory effects of zileuton, an ALOX5 inhibitor, on the production of TNF-α and IL-6 in LPSstimulated splenic myeloid cells derived from IS-trained mice. Our findings indicate that zileuton significantly inhibits the production of TNF-α and IL-6 induced by LPS in splenic myeloid cells from IS-trained mice (Author response image 13B). These data were added in Figure 6N of the revised manuscript (Line 350-354 of page 16 in the revised manuscript).

      Author response image 13.

      Assessment of the correlation between ALOX5 and the concentration of IS in ESRD patients, and investigation of ALOX5 effects in mouse splenic myeloid cells in IS-trained mice. A. Examination of the correlation between ALOX5 protein expression in monocytes and IS concentration in the plasma of ESRD patients. B. C57BL/6 mice were administered daily injections of 200 mg/kg IS for 5 days, followed by a resting period of another 5 days. Subsequently, IS-trained mice were sacrificed, and spleens were mechanically dissociated. Isolated splenic myeloid cells were subjected to ex vivo treatment with LPS (10 ng/ml), along with zileuton (100 µM). The levels of TNF-α and IL-6 in the supernatants were quantified using ELISA. The graphs show the mean ± SEM. * = p < 0.05, by two-tailed paired t-test between zileuton treatment group and no-treatment group.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Minor corrections to the figures

      (1) No indicators for the control group in Fig. 1B.

      We thank you for the reviewer’s comment. According to the reviewer’s comment, the control group was indicated with (-).

      (2) The same paper is listed twice in the references section. (No. 19 and 28)

      We thank you for the reviewer’s comment. We deleted the reference No. 28.

      Reviewer #2 (Public Review):

      Manuscript entitled "Uremic toxin indoxyl sulfate (IS) induces trained immunity via the AhR-dependent arachidonic acid pathway in ESRD" presented some interesting findings. The manuscript strengths included use of H3K4me3-CHIP-Seq, AhR antagonist, IS treated cell RNA-Seq, ALOX5 inhibitor, MTA inhibitor to determine the roles of IS-AhR in trained immunity related to ESRD inflammation and trained immunity.

      Thank you very much for your positive feedback.

      Reviewer #2 (Recommendations For The Authors):

      However, the manuscript needs to be improved by fixing the following concerns.

      There are concerns:

      (1) The experiments in Figs. 1G, 1H and 1I need to have AhR siRNA, and siRNA control to demonstrate that the results in uremic toxins-containing serum-treated experiments were related to IS;

      We extend our gratitude to the reviewer for their invaluable comment, acknowledging its significant relevance to our study. In accordance with the reviewer's suggestion, we endeavored to conduct additional experiments utilizing AhR siRNA to elucidate the direct impact of IS present in the serum of end-stage renal disease (ESRD) patients on the induction of IS-mediated trained immunity. 

      Regrettably, owing to limitations in the availability of monocytes post-siRNA transfection, we were unable to establish a direct relationship between the observed outcomes in experiments utilizing uremic toxins-containing serum and IS in AhR siRNA knockdown monocytes. However, treatment with GNF351, an AhR antagonist, resulted in the inhibition of TNF-α production in trained monocytes exposed to uremic toxins-containing serum (Author response image 14).

      In our previous studies, we have already reported that uremic serum-induced TNF-α production in human monocytes is dependent on the AhR pathway, using GNF351 (Kim HY et al. Indoxyl sulfate (IS)-mediated immune dysfunction provokes endothelial damage in patients with end-stage renal disease (ESRD). Sci Rep. 2017 Jun 8;7(1):3057). Additionally, we have provided evidence demonstrating an augmentation in the activity of the AhR pathway within monocytes derived from ESRD patients, indicative of a significant reduction in AhR protein levels (Kim HY et al. Indoxyl sulfate-induced TNF-α is regulated by crosstalk between the aryl hydrocarbon receptor, NF-κB, and SOCS2 in human macrophages. FASEB J. 2019 Oct;33(10):10844-10858). It is noteworthy that other major protein-bound uremic toxins (PBUTs), such as PCS, HA, IAA, and KA, failed to induce trained immunity in human monocytes (Supplementary Figure 1A in the revised manuscript). Nevertheless, knockdown of AhR via siRNA effectively impeded the induction of IS-mediated trained immunity in human monocytes (Figure 4E in the revised manuscript). 

      Taken collectively, our findings suggest a critical role for IS present in the serum of ESRD patients in the induction of trained immunity in human monocytes. 

      Author response image 14.

      Inhibition of uremic serum (US)-induced trained immunity by AhR antagonist, GNF351. Monocytes were pre-treated with or without GNF351 (AhR antagonist; 10 µM) for 1 hour, followed by treatment with pooled normal serum (NS) or uremic serum (US) at a concentration of 30% (v/v) for 24 hours. After a resting period of 5 days, cells were stimulated with LPS for 24 hours. The production of TNF-α and IL-6 in the supernatants was quantified using ELISA. The data presented are derived from three independent experiments utilizing samples from different donors.

      (2) Fig. 3 needs to be moved as Fig. 2

      We express appreciation for the constructive suggestion provided by the reviewer. In response to the reviewer's comment, the sequence of Figure 3 and Figure 2 was adjusted in the revised manuscript.

      (3, 4) The connection between bioenergetic metabolism pathways and H3K4me3 was missing; The connection between bioenergetic metabolism pathways and ALOX5 was missing;

      We appreciate the reviewer’s constructive criticism and fully understood the reviewer's points. In response to the reviewer's feedback, we conducted additional experiments employing appropriate inhibitors to elucidate the interrelation between bioenergetic metabolism and H3K4me3 and between bioenergetic metabolism and ALOX5. Initially, we assessed the enrichment of H3K4me3 at 6-day on promoters of TNFA and IL6 loci after treatment of 2-DG, a glycolysis inhibitor. Additionally, we evaluated the alteration in the activity of S6K, a downstream molecule of mTORC1, following treatment with zileuton, an inhibitor of ALOX5. Our findings indicate that AhR-dependent arachidonic acid (AA) signaling induces epigenetic modifications, albeit without inducing metabolic rewiring, in IS-induced trained immunity (Author response image 15). However, IS stimulation promotes mTORC1-mediated glycolysis in an AhR-independent manner. Notably, inhibition of glycolysis with 2-DG impacts epigenetic modifications. We have updated Figure 7 of the revised manuscript to incorporate these additional experimental findings, elucidating the correlation between the diverse mechanisms implicated in IS-induced innate immune memory (Fig. 7 in the revised manuscript).

      Author response image 15.

      Epigenetic modification is regulated by arachidonic acid (AA) pathway and metabolic rewiring, but metabolic rewiring is not affected by the AA pathway. A-B. Monocytes were pre-treated with zileuton (ZLT), an inhibitor of ALOX5, or 2DG, a glycolysis inhibitor, followed by stimulation with IS for 24 hours. After a resting period of 5 days, the enrichment of H3K4me3 on the promoters of TNFA and IL6 loci was assessed. Normalization was performed using 2% input. C. Monocytes were pre-treated with ziluton (ZLT) and stimulated with IS for 24 hr. Cell lysates were immunoblotted for phosphorylated S6 Kinase, with β-actin serving as a normalization control. Band intensities in the immunoblots were quantified using densitometry. D, A schematic representation of the mechanistic framework underlying IS-trained immunity. Bar graphs show the mean ± SEM. * = p < 0.05, **= p < 0.01, and *** = p < 0.001 by two-tailed paired t-test.

      (5) It was unclear whether histone acetylations such as H3K27acetylation and H3K14 acetylation are involved in IS-induced epigenetic reprogramming or IS-induced trained immunity is highly histone methylation-specific.

      We appreciate the constructive comment provided by the reviewer. As highlighted by the reviewer, alterations in epigenetic histone markers, specifically H3K4me3 or H3K27ac, have been recognized as the underlying molecular mechanism in trained immunity. Due to limitations in the availability of trained cells, this study primarily focused on histone methylation. In response to the reviewer's inquiry, we briefly investigated the impact of histone acetylation using C646, a histone acetyltransferase inhibitor, on IS-induced trained immunity (Author response image 16). Our experiments revealed that C646 treatment effectively hinders the production of TNF-α and IL-6 by IS-trained monocytes in response to LPS stimulation, comparable to the effects observed with MTA (5’methylthioadenosine), a non-selective methyltransferase inhibitor. This suggests that histone acetylation also contributes to the epigenetic modifications associated with IS-induced trained immunity. We sincerely appreciate the valuable input from the reviewer.

      Author response image 16.

      The role of histone acetylation in epigenetic modifications in IS-induced trained immunity. Monocytes were pretreated with MTA (methylthioadenosine, methyltransferase inhibitor) or C646 (histone acetyltransferase p300 inhibitor), followed treatment with IS 1 mM for 24 hrs. After resting for 5 days, trained cells were re-stimulated by LPS 10 ng/ml as secondary insult. TNF-α and IL-6 in supernatants were quantified by ELISA. Bar graphs show the mean ± SEM. * = p < 0.05 and **= p < 0.01 by two-tailed paired t-test.

      Reviewer #3 (Public Review):

      The manuscript entitled, "Uremic toxin indoxyl sulfate induces trained immunity via the AhRdependent arachidonic acid pathway in ESRD" demonstrates that indoxyl sulfate (IS) induces trained immunity in monocytes via epigenetic and metabolic reprogramming, resulting in augmented cytokine production. The authors conducted well-designed experiments to show that the aryl hydrocarbon receptor (AhR) contributes to IS-trained immunity by enhancing the expression of arachidonic acid (AA) metabolism-related genes such as arachidonate 5-lipoxygenase (ALOX5) and ALOX5 activating protein (ALOX5AP). Overall, this is a very interesting study that highlights that IS mediated trained immunity may have deleterious outcomes in augmented immune responses to the secondary insult in ESRD. Key findings would help to understand accelerated inflammation in CKD or RSRD.

      We greatly appreciate your positive feedback.

      Reviewer #3 (Recommendations for The Authors):

      This reviewer, however, has the following concerns.

      Major comments:

      (1) Figure 1B: IS is known to induce the expression of TNF-a and IL-6. This reviewer wonders why these molecules were not detected in the IS (+) LPS (-) condition.

      We appreciate the constructive comment provided by the reviewer. In our prior investigation, it was observed that the expression of TNF-α and IL-6 was induced 24 hours after IS treatment in human monocytes and macrophages (Couper KN et al. IL-10: the master regulator of immunity to infection. J Immunol. 2008 May 1;180(9):5771-7). In adherence to the trained immunity protocol, the medium was replaced at the 24 hours post-IS treatment to eliminate IS, with a subsequent change after a 5-day resting period. Probably, TNF-α and IL-6 are accumulated and detected in the IS (+) LPS (-) culture supernatant if the media was not changed at these specific time points. Our primary objective, however, was to ascertain the role of IS in the induction of trained immunity, prompting an investigation into whether IS contributes to an increase in the production of TNF-α and IL-6 in response to LPS stimulation as a secondary insult. 

      (2) 1' stimulus is IS followed by 2' stimulus LPS/Pam3. It would be interesting to know what the immune profile is when other uremic toxin is used for secondary insult, this would be more relevant in clinical context of ESRD.

      The reviewer's insightful comment is greatly appreciated. To address their feedback, IStrained macrophages were subjected to additional stimulation using protein-bound uremic toxins (PBUTs) as a secondary challenge. As illustrated in Letter figure 17, the examined uremic toxins, namely p-cresyl sulfate (PCS), Hippuric acid (HA), Indole 3-acetic acid (IAA), and kynurenic acid (KA), failed to elicit the production of proinflammatory cytokines, specifically TNF-α and IL-6, by IS-trained monocytes.

      Author response image 17.

      No obvious effect of protein-bound uremic toxin (PBUTs) as secondary insults on the production of proinflammatory cytokines in IS-trained monocytes. IS-trained monocytes were re-stimulated with several PBUTs, such as IS (1 mM), PCS (1 mM), HA (2 mM), IAA. (0.5 mM), and KA (0.5 mM) as a secondary challenge for 24 hrs. TNF-α and IL-6 in supernatants were quantified by ELISA. The data from two independent experiments with different donors were shown. ND indicates ‘not detected’.

      (3) The authors need to explain a rationale why RNA and protein data used different markers.

      We appreciate the constructive input provided by the reviewer. Given that TNF-α and IL6 represent prototypical cytokines synthesized by trained monocytes in humans, we conducted a comprehensive analysis of their mRNA and protein levels. In human macrophages, the release of active IL-1β necessitates a second priming event, such as the presence of ATP. Consequently, we posited that assessing the mRNA levels of IL-1β would suffice to demonstrate the induction of trained immunity in our experimental protocol. Nevertheless, in response to the reviewer's comment, we proceeded to assess the protein levels of IL-1β, IL-10, and MCP-1 as illustrated in Author response image 189. These data have been incorporated into the revised manuscript as supplementary Figure 1E. 

      Author response image 18.

      Modulation of cytokine levels in IS-trained macrophages in response to secondary stimulation with LPS. Human monocytes were stimulated with the IS for 24 hr, followed by resting period for 5 days. On day 6, the cells were re-stimulated with LPS for 24 hr. The levels of each cytokine in the supernatants were quantified using ELISA. Bar graphs show the mean ± SEM. ** = p < 0.01 and ***= p < 0.001 by two-tailed paired t-test.

      (4) Epigenetic modification primarily involves histone modification and DNA methylation. The authors presented convincing data on histone modification (Figure 2), but did not provide any insights in the promoter DNA methylation status.

      We express our gratitude to the reviewer for providing valuable comments, which highlight a crucial aspect of our study. Despite the well-established primary role of DNA methylation in epigenetic modifications, recent suggestions propose that histone modifications, particularly H3K4me3 or H3K27ac, play a predominant role in the induction of trained immunity. In this context, our primary inquiry was focused on determining whether IS, as an endogenous insult, induces trained immunity in monocytes, and if so, whether IS-trained immunity is mediated through metabolic and epigenetic modifications - recognized as the major mechanisms underlying the generation of trained immunity. It is imperative to note that our study's primary objective did not encompass the identification of various epigenetic changes. In response to the reviewer's inquiry, we conducted a brief examination of the impact of DNA methylation using ZdCyd (5-aza-2’-deoxycytidine), a DNA methylation inhibitor, on IS-induced trained immunity. Our experimental findings indicate that ZdCyd treatment exerts no discernible effect on the production of TNF-α and IL-6 by IS-trained monocytes upon stimulation with LPS, as illustrated in Author response image 19. However, a recent study has shed light on the role of DNA methylation in BCG vaccine-induced trained immunity in human monocytes (Bannister S et al. Neonatal BCG vaccination is associated with a long-term DNA methylation signature in circulating monocytes. Sci Adv. 2022 Aug 5;8(31):eabn4002). Consequently, further investigations utilizing DNA methylation sequencing are warranted to elucidate whether DNA methylation is implicated in the induction of IS-trained immunity.

      Author response image 19.

      The effect of DNA methylation on IS-induced trained immunity. Monocytes were pretreated with ZdCyd (5-aza-2’-deoxycytidine, DNA methylation inhibitor), followed by treatment with IS 1 mM for 24 hrs. After resting for 5 days, cells were re-stimulated by LPS 10 ng/ml as secondary insult. TNF-α and IL-6 in supernatants were quantified by ELISA. Bar graphs show the mean ± SEM. * = p < 0.05 and **= p < 0.01 by two-tailed paired t-test.

                     

      (5) Metabolic rewiring in trained immunity cells undergo metabolic changes which involved intertwined pathways of glucose and cholesterol metabolism. The authors presented nice data on glucose pathway (Figure 3) but failed to show any changes related to cholesterol metabolism.

      We express our gratitude to the reviewer for providing valuable comments, which underscore a noteworthy observation. In the current investigation, our primary emphasis has been on glycolytic reprogramming, recognized as a principal mechanism for inducing trained immunity in monocytes. This focus stems from preliminary experiments wherein Fluvastatin, a cholesterol synthesis inhibitor, demonstrated no discernible impact on TNF-α production by IS-trained monocytes, as illustrated in Author response image 20. Intriguingly, Fluvastatin treatment exhibited a partial inhibitory effect on the production of IL-6 by IS-trained monocytes. Subsequent investigations are imperative to elucidate the role of cholesterol metabolism in the induction of IS-trained immunity.

      Author response image 20.

      The effect of cholesterol metabolism on IS-induced trained immunity. Monocytes were pretreated with Fluvastatin (cholesterol synthesis inhibitor, HMG-CoA reductase inhibitor), followed treatment with IS 1 mM for 24 hrs. After resting for 5 days, cells were re-stimulated by LPS 10 ng/ml as secondary insult. TNF-α and IL-6 in supernatants were quantified by ELISA. Bar graphs show the mean ± SEM. * = p < 0.05 and **= p < 0.01 by two-tailed paired t-test.

      (6) Trained immunity involves neutrophils in addition to monocyte/macrophages. It is evident from the RNAseq data that neutrophil degranulation (Figure 5B) is the top enriched pathway. This reviewer wonders why the authors did not perform any assays on neutrophils.

      We appreciate the reviewer for valuable comment. IS represents a major uremic toxin that accumulates in the serum of patients with chronic kidney disease (CKD), correlating with CKD progression and the onset of CKD-related complications, including cardiovascular diseases (CVD). Our prior investigations have demonstrated that IS promotes the production of TNF-α and IL-1β by human monocytes and macrophages. Additionally, macrophages pre-treated with IS exhibit a significant augmentation in TNF-α production when exposed to a low dose of lipopolysaccharide (LPS). Considering the pivotal role of proinflammatory macrophages and TNF-α, a principal cardiotoxic cytokine, in CVD pathogenesis, our focus in this study has primarily focused on elucidating the trained immunity of monocytes/macrophages. Consequently, all experiments were meticulously conducted using highly purified monocytes and monocytederived macrophages derived from both healthy controls and end-stage renal disease (ESRD) patients. The reviewer's observation regarding the potential involvement of neutrophils in trained immunity has been duly noted. Subsequent investigations will be imperative to explore the conceivable role of IS-trained neutrophils in the pathogenesis of CVD. Once again, we appreciate the reviewer for their valuable comment.

      (7) Figure 5C (GSEA plots): This reviewer is not sure if one can present the plots assigned with groups (eg. IS(T) vs Control). More details are required in the Methods related to this.

      We apologize for any ambiguity resulting from the previously unclear description of methods concerning Gene Set Enrichment Analysis (GSEA) plots. To provide clarification, additional details pertaining to this aspect have been explained upon in the revised manuscript's Methods section. 

      (8) In vivo data (Figure 6 I-M): Instead of serum profile and whole set of spleen myeloid cells, it would be interesting to see changes of markers on peritoneal macrophages or bone marrow-derived macrophages since the in vitro findings are on monocyte-derived macrophages.

      We appreciate comment and the insightful suggestion provided by the reviewer. In response to the reviewer's feedback, we conducted additional in vivo experiments to examine the production of TNF-α and IL-6 in bone marrow-derived macrophages (BMDMs) derived from IStrained mice. Upon LPS stimulation, we observed an increase in the production of TNF-α and IL-6 in spleen myeloid cells from IS-trained mice. However, no such increase in these cytokines was noted in BMDMs derived from the same mice (Author response image 22, A and B). In fact, we already observed that that the expression of ALOX5 was not elevated in BM cells derived from IS-trained mice presented in Figure 6L and M of the original manuscript (Author response image 22C). 

      Recent studies have indicated that trained immunity can be induced in circulating immune cells, such as monocytes or resident macrophages (peripheral trained immunity), as well as in hematopoietic stem and progenitor cells (HSPCs) within the bone marrow (central trained immunity) (Kaufmann E et al. BCG Educates Hematopoietic Stem Cells to Generate Protective Innate Immunity against Tuberculosis. Cell. 2018 Jan 11;172(1-2):176-190.e19; Riksen NP et al. Trained immunity in atherosclerotic cardiovascular disease. Nat Rev Cardiol. 2023 Dec;20(12):799-811). It is plausible that central trained immunity in BM progenitor cells may not be elicited in our mouse model, which is relatively acute in nature. Further investigations are warranted to explore the role of IS in inducing central trained immunity, utilizing appropriate chronic disease models.

      We have included this additional data as supplementary figures in the revised manuscript (Suppl. Fig. 7, D and E, and line 355-362 of page 16 in the revised manuscript).

      Author response image 21.

      Absence of trained immunity in bone marrow derived macrophages (BMDMs) derived from IStrained mice. A-B, IS was intraperitoneally injected daily for 5 days, followed by training for another 5 days. Isolated BM progenitor cells and spleen myeloid cells were differentiated or treated with LPS for 24 hr. The supernatants were collected for ELISA. C, The level of ALOX5 protein in BM cells isolated from IS-trained or control mice was analyzed by western blot. The graph illustrates the band intensity quantified by densitometry. Bar graphs show the mean ± SEM. * = p < 0.05 and **= p < 0.01, by unpaired t-test.

      (9) Figure 7: There are no data on signaling pathway(s) that links IS and epigenetic changes, the authors therefore may want to add "?" to the proposed mechanism.

      We extend our sincere appreciation to the reviewer for providing valuable feedback. In light of the constructive comments provided by three reviewers, we have undertaken a series of additional experiments. These efforts have enabled us to propose a more elucidating schematic representation of the proposed mechanism, free of any ambiguous elements (Figure 7 in the revised manuscript). We are grateful for your insightful input.

      (10) Demographic data (Table S2): ESRD patients have co-morbidities including diabetes (33% of subjects), CAD (28%). How did the authors factor out the co-morbidities in the overall context of their findings?

      We express gratitude to the reviewer for providing valuable comments, particularly on a noteworthy and significant aspect. The investigation employed an End-Stage Renal Disease (ESRD) Cohort involving approximately 60 subjects undergoing maintenance hemodialysis at Severance Hospital in Seoul, Korea. The subset of participants subjected to analysis consisted of stable individuals who provided informed consent and had not undergone hospitalization for reasons related to infection or acute events within the preceding three months.

      (11) There are no data on the purity of IS.

      According to the reviewer's suggestion, we have included information regarding the purity (99%) of IS in the Methods section.

      (12) Figure 6L: Immunoblot on b-actin were merged. This reviewer wonders how the authors analyzed these blots. 

      We express gratitude for the constructive criticism provided by the reviewer, and we acknowledge and comprehend the concerns raised. In response to the reviewer's comments, a reanalysis of the ALOX5 expression level in Figure 6M was conducted, employing immunoblot analysis on β-actin, as depicted in Figure 6L, with a short exposure time (Author response image 22).

      Author response image 22.

      ALOX5 protein exhibited an elevation in splenic myeloid cells obtained from IS-trained mice.

      (13) qPCR data throughout the manuscript have control group with no error bar. The authors may not set all controls arbitrarily equal to 1 (Example Figure 1H and I). Data should be normalized in a test standard way. The average of a single datapoint may be scaled to 1, but variation must remain within the control groups.

      We express gratitude to the reviewer for their valuable feedback, acknowledging a comprehensive understanding of their perspectives. Our qPCR assays predominantly investigated the impact of various treatments on the expression of specific target genes (e.g., TNF-α, IL-6, Alox5) within monocytes/macrophages obtained from the same donors.

      Subsequently, normalization of gene expression levels occurred relative to ACTINB expression, followed by relative fold-increase determination using the comparative CT method (ΔΔCT).

      Statistical significance was assessed through a two-tailed paired analysis in these instances. Additionally, a substantial portion of the qPCR data was validated at the protein level through ELISA and immunoblotting techniques.

      Minor Comments:

      (1) Molecular weight markers are missing in immunoblots throughout the manuscript.

      According to the reviewer's comment, molecular weight markers are added into immunoblots

      (2)  ESRD should be spelled out in the title.

      According to the reviewer's comment, we spelled out ESRD in the title.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1:

      Summary:

      The experiment is interesting and well executed and describes in high detail fish behaviour in thermally stratified waters. The evidence is strong but the experimental design cannot distinguish between temperature and vertical position of the treatments.

      Strengths:

      High statistical power, solid quantification of behaviour.

      Weaknesses:

      A major issue with the experimental design is the vertical component of the experiment. Many thermal preference and avoidance experiments are run using horizontal division in shuttlebox systems or in annular choice flumes. These remove the vertical stratification component so that hot and cold can be compared equally, without the vertical layering as a confounding factor. The method chosen, with its vertical stratification, is inherently unable to control for this effect because warm water is always above, and cold water is always below. This complicates the interpretations and makes firm conclusions about thermal behaviour difficult.

      We highly appreciate this evaluation and have addressed the reviewer’s specific comments below.

      The sentence "Further, the metabolic performance (and thus functions including growth, reproduction, and locomotion) of ectotherms takes the form of a bell-shaped curve as a function of temperature6, peaking within a range of optimal temperatures (the 'preferendum') and going to zero at lower and upper temperature limits7." contains several over-simplifications and misconceptions:

      (1) Thermal performance curves are never bell-shaped.

      (2) The optimum for various traits often shows different TPCs.

      (3) The preferendum rarely lines up with the thermal optimum for various trait TPCs.

      (4) Performance for various traits rarely reaches zero at upper or lower limits, instead they can reach zero at less extreme temperatures (e.g. growth) or maintain high function all the way up to and sometimes beyond thermal limits (e.g. aerobic scope, heart rate).

      We highly appreciate this input. We have replaced that sentence with: L69-71: “Because temperature influences the rates of most physiological processes, rapid warming or cooling can affect fish performance traits, including metabolic rates, swimming ability, and thermal tolerance (Jutfelt et al. 2024).”

      The use of adaptation instead of acclimation is confusing. Adaptation should be reserved for evolutionary change. This is an issue in several parts of the manuscript.

      Thanks for this input, we have replaced the word adapt with acclimate in two instances: L79 and L398.

      It is not true that "very few quantitative studies of thermotaxis have been conducted in fish". There exists an extensive literature on thermal preference and avoidance in fish that the manuscript downplays.

      Thanks a lot for this input. We understand that thermal preference is ultimately driven by mechanistic responses to thermal gradients, and that thermotaxis and thermokinesis are the two mechanisms used by fish to navigate heterothermal environments. Our study and analysis are focused on understanding these mechanisms in vertically stratified conditions, not to understand thermal preferences per se. We have modified our text to clarify this aspect. Our literature review was focused on the behavioral mechanisms and our understanding is that the establishment of thermal preferences has a different goal compared to understanding how fish respond to rapid changes in water temperature. We have deleted that sentence and replaced it by (L107-110): “While the thermal preference of fish is a well-established field of research, very few quantitative studies of the behavioral mechanisms allowing fish to seek their preferendum (i.e. thermotaxis) have been conducted in fish.”

      (Methods) It is unclear why the blue dye was used in all experiments. The fish can see the differently coloured water layer and that may have affected their choices. Five control trials without dye were run but finding no difference there could also be due to low statistical power.

      We appreciate this comment. The blue dye was used to visualize the precise location of the thermal interface and was therefore necessary in all experiments (see Methods section ‘Visualization and evolution of the thermal interface’). We acknowledge that fish can perceive the colored water layer, but since the dye concentration and resulting color intensity were consistent across all treatments, we do not see how it could have acted as a confounding variable. While we recognize the possibility of some behavioral influence from the dye, the clear behavioral differences across treatments indicate that it was not a determining factor. To emphasize this we have added the following to the manuscript (L701-703): “Furthermore, because the dye concentration and resulting color intensity were consistent across all treatments, the dye did not act as a confounding variable in our statistical comparisons.”

      Regarding statistical power, our control experiment without dye (N = 16 fish, 4 replicates; see Fig. S34 and S35) provides sufficient statistical power to assess whether the dye influenced behavior. The reviewer indicated that the high statistical power was a strength of the paper, which aligns with our view that our study design enables robust statistical comparisons. It seems contradictory that statistical power is a concern for the control trials, given that our main experiments were conducted with a similar sample size. Indeed, the number of replicates used is consistent with similar studies and balances statistical rigor with the ethical goal of reducing the number of animals used in experimentation. To emphasize this, we have added the following to the manuscript (L865-868): “The number of replicates used in this study reflects a balance between statistical rigor and the ethical imperative to minimize the use of animals in experimentation. Regarding statistical power, our design (five replicates with groups of four fish each) is consistent with similar studies and represents an adequate sample size.”

      A major issue with the experimental design is the vertical component of the experiment. Many thermal preference and avoidance experiments are run using horizontal division in shuttlebox systems or in annular choice flumes. These remove the vertical stratification component so that hot and cold can be compared equally, without the vertical layering as a confounding factor. The method chosen, with its vertical stratification, is inherently unable to control for this effect because warm water is always above, and cold water is always below. This complicates the interpretations and makes firm conclusions about thermal behaviour difficult. This issue should be thoroughly discussed.

      Thank you very much for this comment. We revised the manuscript accordingly, to clearly indicate that our goal was to assess the response of fish to vertically thermally stratified water, a scenario that occurs frequently in nature. We have added the following paragraph the discussion (L523-530): “However, a generalization of our observations to horizontally oriented thermal gradients remains elusive. Our results are inherently tied to the vertical stratification created in our experiments. As warm water was always positioned above and cold water below, we could not control for the effect of vertical position (i.e., we could not do cold over warm layer experiments). This limits our ability to directly compare our findings to those obtained from horizontally oriented thermal gradients. On the other hand, the case we addressed is of direct environmental relevance, as natural waters often experience vertical thermal stratification.”

      It is unclear why the authors assume an "optimal temperature" (undefined for which trait) of 12°C for brown trout parr, and why they assume the preference temperature would match that "optimal" temperature. The thermal biology for any fish species is more complex than a single perfect temperature, with various traits showing differing optima and often a mismatch with the preferred temperature. The literature suggests brown trout growth optima between 13 and 16°C, and preference temperature has even been suggested to be as high as 21°C. In light of this, the authors' conclusion that brown trout avoid cold and don't avoid warm water is possibly misguided. It is possible that the brown trout had a preference temperature higher than 12°C, which should be acknowledged and discussed.

      This is indeed a very important aspect, which was partly (but indeed not fully) already addressed in the discussion. To reflect these considerations, we have expanded the existing paragraph in the discussion (additions are in yellow). (L422 - L439): “We conclude from the behavior of fish when warmer water was available that their acute thermal preferendum exceeded 12 °C, departing from the acclimation temperature we had chosen based on the thermal preferendum for trout reported in literature[33]. Indeed, the thermal biology for any fish species is more complex than a single, static thermal preferendum: Many internal and external factors, such as hypoxia, satiation, time of day, and life stage[5], can influence the temperature preference of fish. For example, the level of satiation can have an impact because when fish are well fed, their growth rate increases with body temperature as metabolic performance increases[40]. This modifies the preferred temperature, as observed in Bear Lake sculpin (Cottus extensus) that ascend into warmer water after feeding to stimulate digestion and thereby achieve a three-fold higher growth rate[41]. In contrast, field studies with adult fish have observed movement from warm to cold water in summer[42,43], allowing fish to lower their metabolic rate, likely in effort to conserve energy[2,44]. We propose that the behavior of trout parr upon exposure to warmer water in our experiments served to achieve a higher body temperature to ultimately increase growth rate, which is critical for this life stage[45,46]. Indeed, growth experiments on brown trout populations have shown that optimal growth temperatures can range between 15 and 19 °C, depending on the stream of origin[46].”

      The figures are unnecessarily complex and introduce a long list of abbreviations and Greek characters for no apparent reason. There are many simpler ways for showing the results so unclear why they are so opaque.

      We appreciate the reviewer’s feedback and agree on the importance of clarity, however (in the absence of specific suggestions) we did not make changes to the figures or the use of Greek characters (which align with convention), as we believe they effectively convey the results. We highlight that the data themselves are very rich (multiple fish, multiple phases, multiple treatments, etc.) and we wanted to convey this richness in a compact and transparent manner.

      Reviewer #2:

      This paper investigates an interesting question: how do fish react to and avoid thermal disturbances from the optimum that occur on fast timescales? Previous work has identified potential strategies for warm avoidance in fish on short timescales while strategies for cold avoidance are far more elusive. The work combines a clever experimental paradigm with careful analysis to show that trout parr avoid cold water by limiting excursions across a warm-cold thermal interface. While I found the paper interesting and convincing overall, there are a few omissions and choices in the presentation that limit interpretability and clarity.

      A main question concerns the thermal interface itself. The authors track this interface using a blue dye that is mixed in with either colder or warmer water before a gate is opened that leads to gravitational flow overlaying the two water temperatures. The dye likely allows to identify convective currents which could lead to rapid mixing of water temperatures. However, it is less clear whether it accurately reflects thermal diffusion. This is problematic as the authors identify upward turning behavior around the interface which appears to be the behavioral strategy for avoiding cold water but not warm water. Without knowing the extent of the gradient across the interface, it is hard to know what the fish are sensing. The authors appear to treat the interface as essentially static, leading them to the conclusion that turning away before the interface is reached is likely related to associative learning. However, thermal diffusion could very likely create a gradient across centimeters which is used as a cue by the fish to initiate the turn. In an ideal world, the authors would use a thermal camera to track the relationship between temperature and the dye interface. Absent that, the simulation that is mentioned in passing in the methods section should be discussed in detail in the main text, and results should be displayed in Figure 1. Error metrics on the parameters used in the simulation could then be used to identify turns in subsequent figures that likely are or aren't affected by a gradient formed across the interface.

      The authors assume that the thermal interface triggers the upward-turning behavior. However, an alternative explanation, which should be discussed, is that cold water increases the tendency for upward turns. This could be an adaptive strategy since for temperatures > 4C turning swimming upwards is likely a good strategy to reach warmer water.

      The paper currently also suffers from a lack of clarity which is largely created by figure organization. Four main and 38 supplemental figures are very unusual. I give some specific recommendations below but the authors should decide which data is truly supplemental, versus supporting important points made in the paper itself. There also appear to be supplemental figures that are never referenced in the text which makes traversing the supplements unnecessarily tedious.

      The N that was used as the basis for statistical tests and plots should be identified in the figures to improve interpretability. To improve rigor, the experimental procedures should be expanded.

      Specifically, the paper uses two thermal models which are not detailed at all in the methods section.

      We appreciate these crucial comments to our paper. We have addressed these points in detail below.

      As stated above, a characterization of the thermal interface is critical. Ideally via measurement or at least by expanding on the simulation.

      We appreciate the idea of using thermal cameras and, indeed, we had initially tried to use them. However, thermal cameras generally cannot see through plexiglass or glass-like material due to the way infrared radiation interacts with these materials. While thin plastics can transmit some infrared, thicker plastics and reflective materials like glass tend to block or reflect infrared light.

      We have attempted to better characterize the thermal interface thickness, namely the spatial extent of the thermal gradient over the time period of our experiments (20 min). Indeed, our simulations in the original SI were conducted precisely to estimate the thermal interface thickness, though based on thermal diffusion in still water, while turbulence generated by the moving gravity current can smear out the interface, particularly in the initial phase. To account for this in our in the reviewed manuscript, we adopted a phenomenological approach to estimate the initial increase in thickness of the thermal interface due to turbulence and present this refined simulation in our manuscript.

      Our analysis suggests that, rather than assuming an initial interface thickness of zero (as in the original version of the manuscript), the thermal diffusion simulations should begin with an initial thickness of 2.8 mm in TR1. To incorporate this adjustment, we set the initial interface thickness to 2.8 mm and ran the simulation forward for t = 20 min, assuming diffusion. This approach resulted in a final interface thickness ranging between 4 and 6 cm (see Fig. 29 in the Supplementary Information).

      To reflect this refinement, we have added a new paragraph (L717-758: "Characterization of the thermal gradient", to the Methods section. Additionally, we have updated Fig. S29 in the Supplementary Information and included an average (over time and across treatments) gradient thickness of 5 cm in Figs. 2 and 3 of the manuscript. The revised Figs. 2 and 3 now explicitly indicate the estimated vertical extent of the thermal gradient, with an extended caption detailing these changes.

      The simulation should be detailed in the methods so that its validity can be evaluated and ideally, it should involve curved interfaces as encountered in the experiment.

      To account for the effect of turbulence during the initial, inertia-dominated phase after the gate removal, we have provided a correction for the initial thickness of the interface (see the addition to the Methods section). Thank you for your suggestion regarding the incorporation of curved interfaces in the simulations. We believe that including curved interfaces in the simulations would not significantly affect the results. As shown in the manuscript, the interface is curved primarily during the initial phase of the process (first 2 min where the flow is inertia-dominated), which is currently not included in our data analysis (phase 1 begins 2 min after the gate removal).

      In that vein, distances from the interface rather than height above the interface should be reported for the fish.

      We acknowledge the reviewer’s suggestion to report distances from the interface rather than height above or below it. However, beyond the initial phase, we do not see a strong justification for using the orthogonal distance over the vertical distance, as the choice is inherently arbitrary (e.g., one could also measure the distance along the fish’s orientation vector). We have therefore kept our assessment based on the vertical distance.

      Absent measurements, the paragraph on associative learning should be struck from the discussion as it is purely speculative.

      We agree that the original paragraph on associative learning may have sounded overly speculative. However, after updating our manuscript with additional simulations of the thermal gradient's vertical extent, we found that fish perform upward turns not only above the thermal interface, but also before entering the thermal gradient itself. This observation makes us hesitant to attribute the response solely to thermotaxis. We believe it is essential to provide a plausible explanation—albeit speculative—for how fish initiate these turns before directly encountering the cold-water gradient. To support this, we have extended the discussion in this paragraph and added Supplementary Fig. 39. The new text now reads (additions in yellow): (L487 – 499): “Our findings show that fish were able to perform upward turns while still located above the thermal interface and that is, before actually sampling the cold water below the interface. In fact, our simulation of the vertical extent of the thermal gradient revealed that a substantial fraction of upward turns occurred before fish encountered the gradient itself — that is, prior to any sensory detection of the temperature change (Supplementary Fig. 39). This finding may be evidence of associative learning, whereby fish used information regarding the presence of colder water at depth obtained at prior times. While the current data do not provide conclusive evidence in this regard, they prompt the possibility that, rather than responding solely to immediate thermal cues, fish use spatial memory or associative learning to anticipate the location of colder water based on prior experience. Indeed, fish are able to perform associative learning based on non-visual cues[53], create mental maps of their surroundings54 and retain memory for hours[55], days[56] and months[57,58].”  

      The body-temperature simulations need to be detailed in the methods.

      Thanks for this comment. We have removed the supplementary text section and have included the paragraph “Body cooling during cold-water excursions” into the methods section of our manuscript (L804 - L829).

      Constant temperature experiments could be helpful in addressing the importance of a gradient/interface for triggering upward turning

      We agree, however, we were limited (for ethical reasons) to a maximum number of fish we could use in the experiments. Hence, we focused on getting approval to run experiments focused on the responses to thermal gradients. However, occupancy during the acclimation phase in 12 °C showed that fish were much more stationary and primarily occupied the lower half of the tank.

      A lot of ease of reading could be gained by labeling the conditions according to either the second temperature or perhaps even better the delta temperature (i.e. TR[-2C] instead of TR1).

      We agree that labeling conditions by the second temperature or delta temperature could in principle improve readability. However, since T_bottom and T_top are explicitly mentioned in each main figure at least once, they can be directly associated with the respective treatment. Therefore, we have opted to retain the current labeling for consistency.

      The figure legends are often short and do not accurately label all figure elements. This is especially true for supplemental figure legends which often appear rushed (e.g., the legend for Figure S2 stops mid-sentence, the legend of Figure S3 does not indicate what Ttop or Tbottom are).

      We appreciate the reviewer’s comment and have carefully revised all figure legends to ensure clarity and completeness. Specifically, we have corrected figure labels, expanded the descriptions for supplemental figures, and ensured that all elements are accurately defined. For instance, we have completed the legend for Figure S2 and clarified the definitions of T_top and T_bottom in Figure S3. Additionally, we have systematically reviewed all figure legends to prevent inconsistencies and omissions.

      For Figure S3, to improve clarity, plotting the standard deviation at different points in the tank across the phases could be more informative than the hard-to-distinguish multi-line plots in different shades of red.

      We appreciate the reviewer’s suggestion regarding Figure S3. However, the primary goal of this figure is to illustrate how the thermal interface moves over time. While plotting the standard deviation at different points in the tank could provide additional statistical insights, it would detract from the intended visualization of the interface dynamics. For this reason, we have opted to retain the current multi-line representation. Nevertheless, we have ensured that the figure is as clear as possible by refining the color contrast and improving the legend for better readability.

      There is an inconsistency in in-text citation styles (mixture of superscript and numbers in brackets).

      Thank you for pointing this out. We have carefully reviewed the manuscript and corrected any inconsistencies in the in-text citation style to ensure uniform formatting throughout.

      While the statement in the introduction, that increases in movement frequency could be purely metabolic in nature is correct, at least for larval zebrafish it has been shown that sensory neural activity is predictive of motor neuron activity and swim rates (Haesemeyer, 2018, cited by the authors).

      This is an interesting finding. It is however unclear to us why this information is crucial in our context of brown trout parr.

      Examples of summary results from Supplementary Figures 8-10 should be bundled in a main text figure since this appears to be important information supporting the conclusions.

      We agree that Supplementary Figures 8–10 contain important information (i.e. Boxplots) on vertical occupancy and the time individuals spent in different water temperatures. However, this information is already integrated into Figure 2C, D, F, and G, which display the vertical distributions of fish across treatments and over time. Given the current length of the manuscript, adding another main-text figure could dilute rather than enhance clarity. For this reason, we have opted to keep these details in the Supplementary Materials while ensuring they are appropriately referenced in the main text.

      The distributions of excursion length for all treatments should be graphed in a main figure to support the point made in the third paragraph of the "Trout parr... do not avoid warm water" section of the results.

      We appreciate the reviewer’s suggestion. However, we do not believe that plotting excursion length is necessary to support this statement, as the key finding is already well represented in the manuscript. Specifically, the transition to bimodal depth occupancy, with fish spending comparable time above and below the interface in warm-water treatments (TR6–TR9), is clearly conveyed in Figure 2F and Supplementary Figure 8B. Additionally, this information is explicitly stated in the results section (L235): "Fish did not avoid warmer water in any of the warm-water treatments (TR6–TR9). Instead, fish transitioned to a bimodal depth occupancy, with comparable time spent above and below the interface (Fig. 2F; Supplementary Fig. 8B)." Given this, we believe that adding an additional figure would not enhance clarity but may instead introduce redundancy.

      There should be a main figure panel that statistically compares the turn biases around the interface for the different conditions and the +/- 5cm interface line mentioned in the text should be visualized in the appropriate figures - incidentally, this length scale is on par with the diffusion seen in simulations further suggesting that fish in fact sense a gradient here rather than remembering an interface.

      To address the reviewer’s comment, we have made the following updates:

      • Extended and incorporated simulations of the thermal interface thickness (see Methods and Supplementary Fig. 29).

      • Plotted the vertical locations of up-turning events relative to the phase-averaged position of the thermal interface (see Supplementary Fig. 39), which includes the estimated 5 cm vertical extent of the thermal gradient.

      • Added the thermal interface thickness to the main figures (Fig. 3F,G and Fig. 2E,H) where applicable.

      While we do not claim that memory alone explains cold-water avoidance, our data still suggests that it may contribute to the observed behavior, particularly since a substantial number of upturns occurred before the fish entered the thermal gradient (see also Author response image 1 below). Our aim is not to statistically disentangle the relative contribution of thermotaxis versus associative learning, but to propose a plausible interpretation of this observed anticipatory behavior with due caution to clarify that this is only a possibility.

      Given that the thermal gradient is now visualized and characterized in detail, we respectfully suggest that an additional statistical comparison of turn biases would not add further clarity. We believe that is is evidence that vertical turning, away from the cold, occurred within and above the thermal gradient. However, we welcome the reviewer’s perspective and to demonstrate that turning points occur outside and above the thermal interface we have plotted them against gradient growth over time (see Author response image 1 below).

      Author response image 1.

      The colored area indicates the temporal growth of thermal interface thickness.

      Reviewer #3:

      In this study, the authors measured the behavioural responses of brown trout to the sudden availability of a choice between thermal environments. The data clearly show that these fish avoid colder temperatures than the acclimation condition, but generally have no preference between the acclimation condition or warmer water (though I think the speculation that the fish are slowly warming up is interesting). Further, the evidence is compelling that avoidance of cold water is a combination of thermotaxis and thermokinesis. This is a clever experimental approach and the results are novel, interesting, and have clear biological implications as the authors discuss. I also commend the team for an extremely robust, transparent, and clear explanation of the experimental design and analytical decisions. The supplemental material is very helpful for understanding many of the methodological nuances, though I admit that I found it overwhelming at times and wonder if it could be pruned slightly to increase readability. Overall, I think the conclusions are generally well-supported by the data, and I have no major concerns.

      Minor comments

      P2 intro paragraphs 1/3 - it is not clear that thermal preference generally reflects the thermal optimum, partly because it is not clear what trait is being optimized (fitness?). Some nuance here would be helpful, and would also link nicely to the discussion on p10.

      Thank you for this comment. We have now refined this section as follows (L67–71): "As most fish species are ectotherms, their body temperature fluctuates with the surrounding water temperature. Because temperature influences the rates of most physiological processes, rapid warming or cooling can affect fish performance traits, including metabolic rates, swimming ability, and thermal tolerance[6]."

      To further clarify how thermal preference relates to thermal optimum and what trait is being optimized, we have incorporated additional nuance in this section. Specifically, we now acknowledge that thermal preference may not always align with the thermal optimum for performance or fitness.

      P2 intro paragraph 2 - "adapt physiologically" implies evolution, but here you are referring to plasticity. Suggest saving the word "adapt/adaptation" for evolutionary changes (see also p9).

      Thank you for this comment. We have revised the wording to "acclimate physiologically" (L79) to more accurately reflect plastic responses rather than evolutionary adaptation.

      P7 - "This difference in probabilities (ρup - ρdown) was particularly large in the region immediately above and below the interface (-5 cm < D < 5 cm; Fig. 3F) and is a hallmark of a thermotactic behavior." I agree that the result provides compelling evidence for thermotaxis, but would it be possible to bolster this case by statistically testing for a difference in probabilities among the treatment groups here?

      In addition to Fig. 3F, we are presenting statistical evidence that for colder water temperatures, fish penetrate less deeply into the cold lower water. The decreasing trend was statistically significant (Mann–Kendall test: , p < 0.001; Supplementary Table 6) and is presented in Fig. 4C. The depth reached during each cold-water excursion is determined by the location of the vertical turning point, which redirects the fish upward toward the surface. We think this is sufficient evidence for thermotaxis.

      P9 paragraph 3 = "recent studies suggest that fish may instead respond to temporal changes of their internal body temperature." It seems like a citation is missing here. Would be useful to briefly summarize the evidence for internal temperature sensing that is the basis of this modelling exercise.

      Thanks, we have added that citation (L385).

      P10 "Our findings provide the first experimental evidence for this mode of behavioral thermoregulation in which fish navigate their heterothermal environment to achieve gradual body warming."

      I think this statement overreaches given the presented data. While there may be a trend towards fish in the warm treatment spending increasing amounts of time in the upper half of the tank, I do not see this pattern supported statistically. There is also no evidence of gradual body warming, and even if there was I disagree that this would constitute experimental evidence that this was happening "intentionally". By this reasoning, any shuttlebox experiment in which fish actively shuttle between relatively warm and cool sides to end up with a preference that is above the starting condition would also constitute evidence for gradual warming. Overall, this is an interesting pattern, but I do not think there is sufficient evidence to conclude that fish are strategically warming.

      We appreciate the reviewer’s comment and acknowledge that our original wording may have overstated the evidence. We have revised the sentence to better reflect the evdience presented (L411-415): “Our observations resemble this mode of behavioral thermoregulation, in which fish progressively favor warmer regions within a heterothermal environment. However, additional experimental evidence is required to determine the mechanisms underlying this behavior.”

      P11 "Despite the avoidance response of cold water, fish engaged in repeated cold-water excursions..."

      This is an interesting speculation, but I think it would be helpful to also point out that these fish are biased towards the bottom of the tank (based on control measurements) and this pattern may therefore simply reflect a desire to be lower in the water column.

      Thank you for this helpful comment. We have now added this point to the revised text, which reads (L475-477): “Despite the avoidance response to cold water, fish engaged in repeated cold-water excursions, potentially reflecting a behavioral strategy to map the thermal environment. This pattern may also reflect an inherent tendency to occupy the lower part of the tank, as observed during homogeneous temperature of 12 °C during the acclimation phase.”

      P13 - why was the dye always added to the right side of the tank, instead of being assigned to a side randomly? I think the control experiment is good evidence that the dye did not substantially affect behaviour, but it seems like it would have been nice to separate dye and novel temperature exposure.

      We agree that randomizing the side of dye application would have been ideal. The dye was consistently added to the right side to maintain procedural consistency, ensuring that the “incoming” or “novel” temperature was always dyed. That said, our control experiment provides strong evidence that the dye itself did not influence behavior (as discussed above and in the manuscript).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The major result in the manuscript is the observation of the higher order structures in a cryoET reconstruction that could be used for understanding the assembly of toroid structures. The crosslinking ability of ZapD dimers result in bending of FtsZ filaments to a constant curvature. Many such short filaments are stitched together to form a toroid like structure. The geometry of assembly of filaments - whether they form straight bundles or toroid like structures - depends on the relative concentrations of FtsZ and ZapD.

      Strengths:

      In addition to a clear picture of the FtsZ assembly into ring-like structures, the authors have carried out basic biochemistry and biophysical techniques to assay the GTPase activity, the kinetics of assembly, and the ZapD to FtsZ ratio.

      Weaknesses:

      The discussion does not provide an overall perspective that correlates the cryoET structural organisation of filaments with the biophysical data.

      The crosslinking nature of ZapD is already established in the field. The work carried out is important to understand the ring assembly of FtsZ. However, the availability of the cryoET observations can be further analysed in detail to derive many measurements that will help validate the model, and obtain new insights.

      We thank the reviewer for these insightful comments on our work. We have edited the manuscript to resolve and clarify most of the issues raised during the review process.

      Reviewer #2 (Public Review):

      Summary:

      In this paper, the authors set out to better understand the mechanism by which the FtsZ-associated protein ZapD crosslinks FtsZ filaments to assemble a large-scale cytoskeletal assembly. For this aim, they use purified proteins in solution and a combination of biochemical, biophysical experiments and cryo-EM. The most significant finding of this study is the observation of FtsZ toroids that form at equimolar concentrations of the two proteins.

      Strengths:

      Many experiments in this paper confirm previous knowledge about ZapD. For example, it shows that ZapD promotes the assembly of FtsZ polymers, that ZapD bundles FtsZ filaments, that ZapD forms dimers and that it reduces FtsZ's GTPase activity. The most novel discovery is the observation of different assemblies as a function of ZapD:FtsZ ratio. In addition, using CryoEM to describe the structure of toroids and bundles, the paper provides some information about the orientation of ZapD in relation to FtsZ filaments. For example, they found that the organization of ZapD in relation to FtsZ filaments is "intrinsic heterogeneous" and that FtsZ filaments were crosslinked by ZapD dimers pointing in all directions. The authors conclude that it is this plasticity that allows for the formation of toroids and its stabilization. Unfortunately, a high-resolution structure of the protein organization was not possible. These are interesting findings that in principle deserve publication.

      We thank the reviewer for this valuable assessment. We have made several changes to the manuscript to improve its readability and comprehensibility. In addition, we have addressed the reviewer’s main concerns in the point-by-point response below.

      Weaknesses:

      While the data is convincing, their interpretation has some substantial weaknesses that the authors should address for the final version of this paper.

      We have addressed most of the aspects highlighted by the reviewer to improve the quality and comprehensibility of our results.

      For example, as the authors are the first to describe FtsZ-ZapD toroids, a discussion why this has not been observed in previous studies would be very interesting, i.e. is it due to buffer conditions, sample preparation?

      Several factors may explain the absence of observed toroidal structures in other studies. FtsZ is a highly dynamic protein, and its behavior varies significantly with different environmental conditions, as detailed in the literature. These environmental factors include pH, salt concentration, protein type, GTP levels, and the purification strategy used. Previous research has employed negative stain electron microscopy (EM) to visualize ZapD-FtsZ structures. It is important to note that FtsZ is sensitive to surface effects when it is bound to or adsorbed onto membranes (Mateos-Gil et al. 2019 FEMS Microbiol Rev - DOI: 10.1093/femsre/fuy039). Therefore, the adsorption of FtsZ and ZapD onto the EM grid may influence the formation of higher order structures. In this study, we used cryo-electron microscopy (cryo-EM) and cryo-electron tomography (cryo-ET) to visualize the 3D organization of ZapD-mediated structures. This approach allows us to avoid staining artifacts and the distortion of structures caused by adsorption or drying of the grid. In addition, we can resolve single filaments. Our buffer conditions also differ slightly from those in previous studies, which may significantly impact the behavior of FtsZ, as illustrated in Supplementary Fig. 3.

      At parts of the manuscript, the authors try a bit too hard to argue for the physiological significance of these toroids. This, however, is at least very questionable, because: The typical diameter is in the range of 0.25-1.0 μm, which requires some flexibility of the filaments to be able to accommodate this. It's difficult to see how a FtsZ-ZapD toroid, which appears to be quite rigid with a narrow size distribution of 502 nm {plus minus} 55 nm could support cell division rather than stalling it at that cell diameter. which the authors say is similar to the E. coli cell.

      The toroidal structures formed by FtsZ and ZapD, with their characteristics similar to those of the bacterial division system, are significant in physiological contexts and warrant further study. The connections mediated by Zaps are expected to play a crucial role in filament organization, which is vital for the machinery enabling cellular constriction. Therefore, characterizing these structures in vitro can provide insight into divisome stabilization, assembly and constriction mechanisms. While we acknowledge the limitations of in vitro systems and do not expect to see the same toroidal structures in vivo, the way ZapD decorates and connects FtsZ filaments in vitro may resemble the processes that occur in the division ring formed inside the cell. This study represents an initial effort to characterize these toroidal structures, which could inspire further research and potentially reveal their physiological relevance.

      Regarding flexibility, it has been previously reported that an arrangement of loosely connected filaments forms the FtsZ ring. Our model is consistent with this observation despite the heterogeneity and density observed in the toroidal structures. We anticipate differences in vivo due to the high complexity of the cytoplasm, interactions with other cellular components, and attachment to the cell membrane, all of which would influence structural outcomes. However, our novel in vitro approach, which allows us to study FtsZ filament organization and connectivity – features that are challenging to explore in vivo and have not been thoroughly investigated before – has the potential to significantly advance our understanding of these structures. Consequently, these structures can aid our understanding of complex macrostructures in vivo, even if we have merely begun to scratch the surface of their characterization.

      Regarding the size of the toroids, we hypothesize that it reflects an optimal condition based on our experimental setup in solution. In vivo, these conditions are altered by interactions with various division partners, attachment to the plasma membrane, and system contraction. 

      We have better reformulated and edited the manuscript to discuss the potential physiological relevance of our toroidal structures.

      For cell division, FtsZ filaments are recruited to the membrane surface via an interaction of FtsA or ZipA the C-terminal peptide of FtsZ. As ZapD also binds to this peptide, the question arises who wins this competition or where is ZapD when FtsZ is recruited to the membrane surface? Can such a toroidal structure of FtsZ filaments form on the membrane surface? Additional experiments would be helpful, but a more detailed discussion on how the authors think ZapD could act on membrane-bound filaments would be essential.

      We appreciate this comment, which was indeed one of our main questions. The complexity of the division system raises many questions about the interaction of FtsZ with the plasma membrane. The competition between division components to interact with FtsZ and thus modulate its behavior is still largely unknown. FtsA and ZipA appear to have a greater affinity for the C-terminal domain (CTD) of FtsZ than ZapD. However, considering all FtsZ monomers forming a filament, we expect FtsZ filaments to interact with many different division partners. The ability of FtsZ to interact with many components is necessary to explain the current model of the system. According to this model, FtsZ filaments would be decorated by many different proteins, anchoring them to the membrane while crosslinking or promoting their disassembly in a spatiotemporally controlled manner. 

      We tried experiments combining FtsA, ZipA, and ZapD on supported lipid membranes and liposomes. However, they proved difficult to perform. We expect similar results to those observed for ZapA (Caldas et al. 2019 Nat Commun - DOI: 10.1038/s41467-019-13702-4). However, competition between proteins for interaction with the CTD of FtsZ adds an extra layer of complexity, making exploring this issue attractive in the future. However, as remarkably pointed out by Reviewer 3, our cryo-ET data of straight bundles provide new insights into how ZapD-FtsZ structures can bind to the plasma membrane. In these straight bundles, the CTDs of two parallel FtsZ filaments are oriented upwards. They can bind the plasma membrane directly or the ZapDs, which decorate the FtsZ filaments from above instead of from the side, as suggested previously (Schumacher et al. 2017 J Biol Chem - DOI: 10.1074/jbc.M116.773192), allowing ZapDs to interact with the membrane.

      The authors conclude that the FtsZ filaments are dynamic, which is essential for cell division. But the evidence for dynamic FtsZ filaments within these toroids seems rather weak, as it is solely the partial reassembly after addition of GTP. As ZapD significantly slows down GTP hydrolysis, I am not sure it's obvious to make this conclusion.

      FtsZ filaments are dynamic, as they can reassemble into macrostructures relatively quickly. Decreased GTPase activity is a good indicator of the formation of lateral interactions between filaments. For instance, under crowding conditions, FtsZ also reduces its GTPase activity, although the bundles disassemble very slowly over time (González et al. 2003 J. Biol. Chem - DOI: 10.1074/jbc.M305230200). We measured the GTPase activity during the first 5 minutes after GTP addition, conditions under which toroidal structures and bundles remain fully assembled. However, we expect GTPase activity to recover as the macrostructures disassemble, considering the reassembly of macrostructures after GTP resupply, which suggests that FtsZ filaments remain active and dynamic.

      On a similar note, on page 5 the authors claim that ZapD would transiently interact with FtsZ filaments. What is the evidence for this? They also say that this transient interaction could have a "mechanistic role in the functionality of FtsZ macrostructures." Could they elaborate?

      We have rephrased the whole paragraph in the revised version to clarify matters (page 10, lines 2434):

      “These results are consistent with the observation that ZapD interacts with FtsZ through its central hub, which provides additional spatial freedom to connect other filaments in different conformations. This flexibility allows different filament organizations and contributes to structural heterogeneity. In addition, these results suggest that these crosslinkers can act as modulators of the dynamics of the ring structure, spacing filaments apart and allowing them to slide in an organized manner. The ability of FtsZ to treadmill directionally, together with the parallel or antiparallel arrangement of short, transiently crosslinked filaments, is considered essential for the functionality of the Z ring and its ability to exert constrictive force34,36–38,50. Thus, Zap proteins can play a critical role in ensuring correct filament placement and stabilization, which is consistent with the toroidal structure formed by ZapD.”

      The author should also improve in putting their findings into the context of existing knowledge. For example:

      The authors observe a straightening of filament bundles with increasing ZapD concentration. This seems consistent with what was found for ZapA, but this is not explicitly discussed (Caldas et al 2019)

      We have discussed this similarity in the revised version of this manuscript (page 12, line 40 - page 13, line 8):

      “Understanding how the associative states of ZapA (as tetramers) and ZapD (as dimers), together with membrane tethering, influence the predominant structures formed in both systems is essential. The complexity of the division system raises important questions about the interaction dynamics between FtsZ and the plasma membrane. The competitive nature of the division components to engage with FtsZ and modulate its functionality remains to be thoroughly elucidated. It is important to note that FtsA and ZipA have a greater affinity for the C-terminal domain of FtsZ than ZapD. Our cryo-ET data on straight bundles provide new perspectives on how ZapD-FtsZ structures can effectively bind to the plasma membrane; in particular, the C-terminal domains of parallel FtsZ filaments are oriented upward, allowing direct membrane binding or interaction with ZapDs that reinforce these filaments from above, rather than from the side, as previously suggested.”

      A paragraph summarizing what is known about the properties of ZapD in vivo would be essential: i.e., what has been found regarding its intracellular copy number, location and dynamics?

      We thank the reviewer for this valuable suggestion. We describe the role of Zap proteins in vivo and the previous studies of ZapD in the introduction (page 2, lines 34 - page 3, line 17). Additionally, we added the estimated number of ZapD copies in the cell in the discussion (page 11, lines 2-7).

      In the introduction, the authors write that "GTP binding and hydrolysis induce a conformational change in each monomer that modifies its binding potential, enabling them to follow a treadmilling behavior". This seems inaccurate, as shown by Wagstaff et al. 2022, the conformational change of FtsZ is not associated with the nucleotide state. In addition, they write that FtsZ polymerization depends on the GTPase activity. It would be more accurate to write that polymerization depends on GTP, and disassembly on GTPase activity.”

      Following the reviewer's suggestions, we have adapted and corrected these text elements as follows (page 2, lines 7-9): 

      “FtsZ undergoes treadmilling due to polymerization-dependent GTP hydrolysis, allowing the ring to exhibit its dynamic behavior.”

      On page 2 they also write that "the mechanism underlying bundling of FtsZ filaments is unknown". I would disagree, the underlying mechanism is very well known (see for example Schumacher, MA JBC 2017), but how this relates to the large-scale organization of FtsZ filaments was not clear.

      We thank the reviewer for this comment. We have corrected and clarified the related text accordingly (page 3, lines 11-12):

      “…the link between FtsZ bundling, promoted by ZapD, and the large-scale organization of FtsZ filaments remains unresolved.”

      The authors describe the toroid as a dense 3D mesh, how would this be compatible with the Z-ring and its role for cell division? I don't think this corresponds to the current model of the Z-ring (McQuillen & Xiao, 2020). Apart from the fact it's a ring, I don't think the organization of FtsZ obviously similar to the current of the Z-ring in the bacterial cell, in particular because it's not obvious how FtsZ filaments can bind ZapD and membrane anchors simultaneously.

      We consider that the intrinsic characteristics of toroidal structures and the bacterial division ring have points in common. As indicated in the answer above, despite the differences and limitations that might result from an in vitro approach, the structures shown after ZapD crosslinking of FtsZ filaments can demonstrate intrinsic features occurring in vivo. The current model of the division ring consists of an arrangement of filaments loosely connected by crosslinkers in the center of the cell, forming a ring. This model is compatible with our findings, although many questions remain about the structural organization of the Z-ring in the cell.

      Reviewer 3 has brought a compelling new perspective to interpreting our cryo-ET data: ZapD decorates FtsZ from above, allowing ZapD or FtsZ to bind to the plasma membrane. We have discussed this point in more detail below. In the case of straight bundles, this favors the stacking of straight FtsZ filaments, whereas in the case of toroids, ZapD can also bind FtsZ filaments laterally and diagonally, and it is this less compact arrangement that could enable FtsZ bending and toroid size adjustment. 

      We have revised the text accordingly to incorporate the interpretation proposed by Reviewer 3 (page 12, lines 24-31):

      “The current model of the division ring consists of an array of filaments loosely connected by crosslinkers at the center of the cell, forming a ring. This model is consistent with our findings, although many questions remain regarding the structural organization of the Z ring within the cell. ZapD binds to FtsZ from above, allowing either ZapD or FtsZ to interact with the plasma membrane. In straight bundles, this facilitates the stacking of straight FtsZ filaments, while for toroids, ZapD can also bind FtsZ filaments diagonally. This less compact arrangement could allow bending of the FtsZ filaments and adjustment of toroid size.”

      The authors write that "most of these modulators" interact with FtsZ's CTP, but then later that ZapD is the only Zap protein that binds CTP. This seems to be inconsistent. Why not write that membrane anchors usually bind the CTP, most Zaps do not, but ZapD is the exception?

      We thank the reviewer for this pertinent suggestion, which we have followed in the revised version of the manuscript (page 2, lines 19-22):

      “Most of these modulators interact with FtsZ through its carboxy-terminal end, which modulates division assembly as a central hub.  ZapD is the only Zap protein known to crosslink FtsZ by binding its C-terminal domain, suggesting a critical Z ring structure stabilizing function.”

      I also have some comments regarding the experiments and their analysis:

      Regarding cryoET: the filaments appear like flat bands, even in the absence of ZapD, which further elongates these bands. Is this due to an anisotropic resolution? This distortion makes the conclusion that ZapD forms bi-spherical dimers unconvincing.

      The missing wedge caused by the limited angular range of the tomography data generates an elongation of the structures by a factor of 2 along the Z axis. This feature is visible in the undecorated FtsZ filament data (Supplementary Fig. 10). The more pronounced elongation along the Z-axis observed in the presence of ZapD indicates the presence of ZapD to connect two parallel FtsZ filaments along the Z-axis (see Supplementary Figs. 8, 9 and 10). We do not have sufficient resolution to precisely resolve ZapD proteins from the FtsZ filaments in the Z-axis, but we also observed bispherical ZapDs in the XY plane (Fig. 4b-d). Unfortunately, our data do not allow for a more detailed characterization.

      The authors say that the cryoET visualization provides crucial information on the length of the filaments within this toroid. How long are they? Could the authors measure it?

      Measuring the length of single filaments is not trivial, given the dense, heterogeneous mesh promoted by ZapD crosslinking. We tried to identify and track them, but the density of filaments and connections made precise measurement very difficult. Nevertheless, we could identify the formation of these toroids by an arrangement of short filaments (Supplementary Fig. 11) instead of continuous circular filaments.

      We have removed the following sentence text in the revised manuscript: “Visualization of ZapDmediated FtsZ toroidal structures by cryo-ET provided crucial information on the 3D organization, connectivity and length of filaments within the toroid.”

      Regarding the dimerization mutant of ZapD: there is actually no direct confirmation that mZapD is monomeric. Did the authors try SEC MALS or AUC? Accordingly, the statement that dimerization is "essential" seems exaggerated (although likely true).

      Unlike the wild-type ZapD protein, the mZapD mutant exists as a mixture of monomers (~15%) and dimers, as AUC assays performed at similar protein concentrations revealed. These results demonstrate that the mutant protein has a lower tendency to form dimers than the native ZapD protein. We have included the AUC data for mZapD in the supplementary material (Supp. Fig. 15a).

      What do the authors mean that toroid formation is compatible with robust persistence length? I.e. What does robust mean? It was recently shown that FtsZ filaments are actually surprisingly flexible, which matches well the fact that the diameter of the Z-ring must continuously decrease during cell division (Dunajova et al Nature Physics 2023).

      We have corrected this sentence in the revised version of the manuscript to improve clarity (page 11, lines 9-10): 

      “The persistence length and curvature of FtsZ filaments are optimized for forming bacterial-sized ring structures.”

      The authors claim that their observations suggest „that crosslinkers ... allows filament sliding in an organized fashion". As far as I know there is no evidence of filament sliding, as FtsZ monomers in living cells and in vitro are static.

      Filament sliding may be one of the factors contributing to the force generation mechanisms involved in cell division (Nguyen et al. 2021 J Bacteriol - DOI: 10.1128/JB.00576-20). Our results indicate that ZapD can separate filaments, creating space between them and facilitating their organization.

      Although the molecular dynamics of cell constriction are not yet fully understood, it is possible that filament sliding plays a role. If this is the case, the crosslinking of short FtsZ filaments in multiple directions by ZapD could provide the necessary flexibility to adjust the diameter of the constriction ring during bacterial division.

      What is the „proto-ring FtsA protein"?

      The proto-ring denotes the first molecular assembly of the Z-ring, which in E. coli consists of FtsZ, FtsA and ZipA (see, for example, Ortiz et al. 2016 FEMS Microbiol Rev - DOI: 10.1093/femsre/fuv040). To simplify matters, we have deleted the term “proto-ring” in the revised version of the MS.

      The authors refer to „increasing evidence" for „alternative network remodeling mechanisms that do not rely on chemical energy consumption as those in which entropic forces act through diffusible crosslinkers, similar to ZapD and FtsZ polymers." A reference should be given, I assume the authors refer to the study by Lansky et al 2015 of PRC on microtubules. However, I am not sure how the authors made the conclusion that this applies to FtsZ and ZapD, on which evidence is this assumption based?

      We refer to cytoskeletal network remodeling mechanisms independent of chemical energy consumption (Braun et al. 2016 Bioessays - DOI: 10.1002/bies.201500183) driven by entropic forces induced by macromolecular crowding agents or diffusible crosslinkers. The latter mechanism leads to an increase in filament overlap length and the contraction of filament networks. These mechanisms complement and act in synergy with energy-consuming processes (such as those involving nucleotide hydrolysis) to modulate actin- and microtubule-based cytoskeleton remodeling. Similarly, crosslinking proteins such as ZapD may contribute to remodeling the FtsZ division ring in the cell. 

      We have revised the corresponding text of the manuscript accordingly (page 13, lines 16-24):  “In addition, our findings could greatly enhance the understanding of how polymeric cytoskeletal networks are remodeled during essential cellular processes such as cell motility and morphogenesis. Although conventional wisdom points to molecular motors as the primary drivers of filament remodeling through energy consumption, there is increasing evidence that there are alternative mechanisms that do not rely on such energy, instead harnessing entropic forces via diffusible crosslinkers. This approach may also be applicable to ZapD and FtsZ polymers, suggesting a promising avenue for optimizing conditions in the reverse engineering of the division ring to enhance force generation in minimally reconstituted systems aimed at achieving autonomous cell division.”

      Some inconsistencies in supplementary figure 3: The normalized absorbances in panel a do not seem to agree with the absolute absorbance shown in panel e, i.e. compare maximum intensity for ZapD = 20 µM and 5 µM in both panels.

      We have corrected these inconsistencies in the revised version.

      It's not obvious to me why the structure formed by ZapD and FtsZ disassembles after some time even before GTP is exhausted, can the authors explain? As the structures disassemble, how is the "steadystate turbidity" defined? Do the structures also disassemble when they use a non-hydrolyzable analog of GTP?

      In the presence of ZapD, FtsZ rapidly forms higher order polymers after the addition of GTP, as shown by turbidity assays at 320 nm (the formation of single- or double-stranded FtsZ filaments in the absence of ZapD does not produce a significant increase in turbidity). Macrostructures formed by FtsZ in the presence of ZapD, while more stable than FtsZ filaments (which rapidly disassemble following GTP consumption), are also dynamic. These assembly reactions are GTP-dependent and considerably modify polymer dynamics. In agreement with our results, previous studies have shown that high concentrations of macromolecular crowders (such as Ficoll or dextran) promote the formation of dynamic FtsZ polymer networks (González et al. 2003 J. Biol. Chem - DOI: 10.1074/jbc.M305230200). In this case, FtsZ GTPase activity was significantly retarded compared with FtsZ filaments, resulting in a decrease in GTPase turnover. Similar mechanisms may apply to assembly reactions in the presence of ZapD.

      Parallel assembly studies replacing GTP with a slowly hydrolyzable GTP analog remain pending. We expect ZapD-containing FtsZ macrostructures to last assembled for longer but still disassemble upon GTP consumption, as occurs with the crowding-induced FtsZ polymer networks formed in the presence of nucleotide analogs.

      Accordingly, we have revised the corresponding text to clarify matters (page 4, line 37 – page 5 line 7). 

      Conclusion: Despite some weaknesses in the interpretation of their findings, I think this paper will likely motivate other structural studies on large scale assemblies of FtsZ filaments and its associated proteins. A systematic comparison of the effects of ZapA, ZapC and ZapD and how their different modes of filament crosslinking can result in different filament networks will be very useful to understand their individual roles and possible synergistic behavior.

      We appreciate the reviewer's remarks and comments, which provided us with valuable information and helped us considerably improve the revised manuscript.

      Reviewer #3 (Public Review):

      Summary:

      The authors provide the first image analysis by cryoET of toroids assembled by FtsZ crosslinked by ZapD. Previously toroids of FtsZ alone have been imaged only in projection by negative stain EM. The authors attempt to distinguish ZapD crosslinks from the underlying FtsZ filaments. I did not find this distinction convincing, especially because it seems inconsistent with the 1:1 stoichiometry demonstrated by pelleting. I was intrigued by one image showing straight filament pairs, which may suggest a new model for how ZapD crosslinks FtsZ filaments.

      We thank the reviewer for these valuable comments, to which we have responded in detail below. 

      Strengths:

      (1) The first image analysis of FtsZ toroids by cryoET.

      (2) The images are accompanied by pelleting assays that convincingly establish a 1:1 stoichiometry of FtsZ:ZapD subunits.

      (3) Fig. 5 shows an image of a pair of FtsZ filaments crosslinked by ZapD. This seems to have higher resolution than the toroids. Importantly, it suggests a new model for the structure of FtsZ-ZapD that resolves previously unrecognized conflicts. (This is discussed below under weaknesses, because it is so far only supported by a single image.)

      We thank the reviewer for this assessment and, in particular, for raising point 3, which provided a new perspective on the interpretation of our data. We have also included a new example of a straight bundle in Supplementary Fig. 13.

      Weaknesses:

      This paper reports a study by cryoEM of polymers and bundles assembled from FtsZ plus ZapD. Although previous studies by other labs have focused on straight bundles of filaments, the present study found toroids mixed with these straight bundles, and they focused most of their study on the toroids. In the toroids they attempt to delineate FtsZ filaments and ZapD crosslinks. A major problem here is with the stoichiometry. Their pelleting assays convincingly established a stoichiometry of 1:1, while the mass densities identified as ZapD are sparse and apparently well below the number of FtsZ (FtsZ subunits are not resolved in the reconstructions, but the continuous sheets or belts seem to have a lot more mass than the identified crosslinks.)  

      Apart from the stoichiometry I don't find the identification of crosslinks to be convincing. It is missing an important control - cryoET of toroids assembled from pure FtsZ, without ZapD.

      However, if I ignore these and jump to Fig. 5, I think there is an important discovery that resolves controversies in the present study as well as previous ones, controversies that were not even recognized. The controversy is illustrated by the Schumacher 2017 model (their Fig. 7), which is repeated in a simplified version in Fig. 1a of the present mss. That model has a two FtsZ filaments in a plane facing ZapD dimers which bridge them. In this planar model the C-terminal linker, and the ctd of FtsZ that binds ZapD facing each other and the ZapD in the middle, with. The contradiction arises because the C-terminus needs to face the membrane in order to attach and generate a bending force. The two FtsZ filaments in the planar model are facing 90{degree sign} away from the membrane. A related contradiction is that Houseman et al 2016 showed that curved FtsZ filaments have the C terminus on the outside of the curve. In a toroid the C termini should all be facing the outside. If the paired filaments had the C termini facing each other, they could not form a toroid because the two FtsZ filaments would be bending in opposite directions.

      Fig. 5 of the present ms seems to resolve this by showing that the two FtsZ filaments and ZapD are not planar, but stacked. The two FtsZ filaments have their C termini facing the same direction, let's say up, toward the membrane, and ZapD binds on top, bridging the two. The spacing of the ctd binding sites on the Zap D dimer is 6.5 nm, which would fit the ~8 nm width of the paired filament complex observed in the present cryoEM (Fig S13). In the Schumacher model the width would be about 20 nm. Importantly, the stack model has the ctd of each filament facing the same direction, so the paired filaments could attach to the membrane and bend together (using ctd's not bound by ZapD). Finally, the new arrangement would also provide an easy way for the complex to extend from a pair of filaments to a sheet of three or four or more. A problem with this new model from Fig. 5 is that it is supported by only a single example of the paired FtsZ-ZapD complex. If this is to be the basis of the interpretation, more examples should be shown. Maybe examples could be found with three or four FtsZ filaments in a sheet.

      We thank the reviewer for asking interesting questions and suggesting a compelling model for how ZapD could bind FtsZ filaments. Cryo-ET of straight bundles revealed that high ZapD density promotes vertical stacking of FtsZ filaments and decoration of FtsZ filaments by ZapD from above. In toroids, FtsZ filaments are vertically decorated by ZapD, which explains the high elongation of the filament structures observed, consisting of FtsZ-ZapD(-FtsZ) units. In addition, we observed a high abundance of diagonal connections between FtsZ filaments of different heights, revealing a certain flexibility/malleability of ZapD to link filaments that are not perfectly aligned vertically. This configuration could give rise to curved filaments and the overall toroid structure.

      The manuscript proposes that ZapD can bind FtsZ filaments in different directions. However, it seems to have a certain tendency to bind to the upper part of FtsZ filaments, stacking them vertically or vertically with a lateral shift (Supplementary Fig. 9). We also observe lateral connections, although the features of the toroidal structures limit their visualization. This enables both the binding to the membrane by ZapD or FtsZ and the formation of higher order FtsZ polymer structures. 

      In summary, ZapD is capable of linking FtsZ filaments in multiple directions, including from the upper part of the filaments as well as laterally or diagonally. At high concentrations of ZapD, the filaments become more compactly arranged, primarily stacking vertically, which results in the loss of curvature. In contrast, at lower concentrations of ZapD, the FtsZ filaments are less tightly packed, leading to curved filaments and an overall toroidal structure that may resemble the in vivo ring structures.

      We have edited our manuscript to accommodate this hypothesis, including the abstract and the cryoET section (page 7, lines 5-16): 

      “The isosurface confirmed the presence of extended structures along the Z-axis, well beyond the elongation expected from the missing wedge effect for single FtsZ filaments (for comparison, see Supplementary Fig. 10). The vertically extended structures appeared to correspond to filaments that were connected or decorated by additional densities along the Z-axis (Supplementary Fig. 9b). Importantly, these densities were only observed in the presence of ZapD (Supplementary Fig. 10b), suggesting that they represent ZapD connections (Fig. 3e and Supplementary Figs. 8e and 9b). We note that the resolution of the data is not sufficient to precisely resolve ZapD proteins from the FtsZ filaments in the Z-axis.

      These results suggest that the toroids are constructed and stabilized by interactions between ZapD and FtsZ, which are mainly formed along the Z-axis but also laterally and diagonally.”

      Page 7, lines 40-42: 

      “Cryo-ET imaging of ZapD-mediated FtsZ toroidal structures revealed a preferential vertical stacking and crosslinking of short ZapD filaments, which are also crosslinked laterally and diagonally, allowing for filament curvature.”

      And in the discussion (page 12, lines 27-31): 

      “ZapD binds to FtsZ from above, allowing either ZapD or FtsZ to interact with the plasma membrane. In straight bundles, this facilitates the stacking of straight FtsZ filaments, while for toroids, ZapD can also bind FtsZ filaments diagonally. This less compact arrangement could allow bending of the FtsZ filaments and adjustment of the toroid size.”

      What then should be done with the toroids? I am not convinced by the identification of ZapD as "connectors." I think it is likely that the ZapD is part of the belts that I discuss below, although the relative location of ZapD in the belts is not resolved. It is likely that the resolution in the toroid reconstructions of Fig. 4, S8,9 is less than that of the isolated pf pair in Fig. 5c.

      We agree with the reviewer's interpretation that ZapD can attach to FtsZ filaments from both above and laterally. The data from the straight bundles, which are more clearly resolved due to their thinner structure, demonstrate that ZapD can decorate FtsZ filaments vertically. Additionally, the toroidal data supports the notion that ZapD can act as a crosslinker between filaments that are not perfectly vertical, allowing for lateral offsets (see, for example, Fig. 4d) or lateral connections (Fig. 4b). 

      We recognize that the resolution and high density of structures in our cryo-ET data make it challenging to accurately annotate proteins or connectors. Despite this difficulty, we have made efforts to label and identify the ZapD proteins and connectors. We employed an arbitrary labeling method to assist with visual interpretation. However, we acknowledge that some errors may exist and that ZapD proteins were not labeled, particularly along the Z-axis, where the missing wedge limits our ability to distinguish between ZapD and FtsZ proteins (page 7, lines 8-13):

      “The vertically extended structures appeared to correspond to filaments that were connected or decorated by additional densities along the Z-axis (Supplementary Fig. 9b). Importantly, these densities were only observed in the presence of ZapD (Supplementary Fig. 10b), suggesting that they represent ZapD connections (Fig. 3e and Supplementary Figs. 8e and 9b). We note that the resolution of the data is not sufficient to precisely resolve ZapD proteins from the FtsZ filaments in the Z-axis. We note that the resolution of the data is not sufficient to precisely resolve ZapD proteins from the FtsZ filaments in the Z-axis.”

      We draw attention to the limitation of our manual segmentation in the text as follows (page 7, lines 20-24):

      “We manually labeled the connecting densities in the toroid isosurfaces to analyze their arrangement and connectivity with the FtsZ filaments. The high density of the toroids and the wide variety of conformations of these densities prevented the use of subtomogram averaging to resolve their structure and spatial arrangement within the toroids.”

      Importantly, If the authors want to pursue the location of ZapD in toroids, I suggest they need to compare their ZapD-containing toroids with toroids lacking ZapD. Popp et al 2009 have determined a variety of solution conditions that favor the assembly of toroids by FtsZ with no added protein crosslinker. It would be very interesting to investigate the structure of these toroids by the present cryoEM methods, and compare them to the FtsZ-ZapD toroids. I suspect that the belts seen in the ZapD toroids will not be found in the pure FtsZ toroids, confirming that their structure is generated by ZapD.

      The only reported toroidal structure of E. coli FtsZ can be found in the literature by Popp et al. (2009 Biopolymers – DOI: 10.1002/bip.21136). It is important to note that methylcellulose (MC) must be added to the working solution to induce the formation of these structures, as FtsZ toroids do not form in the absence of MC. The mechanisms by which MC promotes this assembly process go beyond mere excluded volume effects due to crowding, as the concentration of MC used is very low (less than 1 mg/ml), which is below the typical crowding regime. This suggests that there are additional interactions between MC and FtsZ. Such complexities and secondary interactions prevent the use of this system as a reliable control for the FtsZ toroidal structures reported here. Alternatively, we also considered the toroidal structures of FtsZ from Bacillus subtilis (Huecas et al. 2017 Biophys J - DOI: 10.1016/j.bpj.2017.08.046) and Cyanobacterium synechocystis (Wang et al. 2019 J Biol Chem – DOI: 10.1074/jbc.RA118.005200). However, these structures do not serve as appropriate controls due to the structural and molecular differences between these FtsZ proteins.

      Recommendations for the authors:  

      Reviewing Editor:

      While the three referees recognize and appreciate the importance of this work several technical and interpretational questions have been raised. There was a prolonged discussion amongst the three expert referees, and it was felt that the current version suffers from a number of problems that the authors need to consider. These are to do with 1. Stoichiometry of ZapD-FtsZ 2. the evidence for crosslinks 3. how the cryo-ET data correlates with the biophysical data 4. Physiological relevance of the elucidated structures. Please take note of the public reviews (strengths and weaknesses) as well as "Recommendations to the authors" sections below, if you choose to prepare a revision.

      In reading the reviews very carefully (as well as while following the ensuing robust discussion between the referees) I noticed that all points raised are extremely important to be addressed / reconciled (with experiments and / or discussion) for this study to become an outstanding contribution to bacterial cell biology field. I would therefore urge you to consider these carefully and revise the manuscript accordingly.

      We thank the editorial board and reviewers for their excellent work evaluating and reviewing our manuscript. Their constructive suggestions and comments have been taken into account in preparing the revised version. We have paid particular attention to the four points mentioned above by the reviewing editor. We hope that the new version and this point-by-point rebuttal letter will answer most of the questions and weaknesses raised by the reviewers.

      Reviewer #1 (Recommendations for the authors):

      Suggestions for improvement of the manuscript:

      (1) ZapD to FtsZ ratio:

      i) Page 3: Results section, paragraph 1:

      FtsZ to ZapD shows a 1:2 ratio. How does this explain cross linking by a dimeric species, as this will be equivalent to a 1:1 ratio of FtsZ and ZapD? The crystal structure in the reference cited has FtsZ peptide bound only to one side of the dimer, however a crosslinking effect can happen only if FtsZ binds to both protomers of ZapD dimer. If the decoration is not uniform as given in the toroid model based on cryoET, this should lead to a model with excess of FtsZ in the toroid?

      On page 3 of the original manuscript, we stated that the binding stoichiometry of ZapD to FtsZ was 2:1, based on estimates derived from sedimentation velocity experiments involving the unassembled GDP form of FtsZ. However, upon reanalyzing these experiments, we found that the previous characterization of the association mode was overly simplistic. We determined that there are two predominant molecular species of ZapD:FtsZ complexes in solution, which correspond to ZapD dimers bound to either one or two FtsZ monomers, resulting in stoichiometries of 2:1 and 1:1, respectively. The revised binding stoichiometry data for ZapD and GDP-FtsZ suggests the presence of 1:1 ZapD-FtsZ complexes which aligns with the idea that FtsZ polymers can be crosslinked by dimeric ZapD species. In mixtures where ZapD is present in excess over FtsZ, the crosslinking corresponds to 1:1 binding stoichiometries, leading to the formation of straight macrostructures. Conversely, when the concentration of ZapD is reduced in the reaction mixture, the resulting macrostructures take the form of toroids. In this scenario, there is an excess of FtsZ because only some of the FtsZ molecules within the polymers are crosslinked by ZapD dimers, resulting in a binding stoichiometry of approximately 0.4 ZapD molecules per FtsZ, as quantified by differential sedimentation experiments.

      We have rewritten the corresponding texts in the revised version to explain these matters (page 4 lines 14-18):

      “Sedimentation velocity analysis of mixtures of the two proteins revealed the presence of two predominant molecular species of ZapD:FtsZ complexes in solution. These complexes are compatible with ZapD dimers bound to one or two FtsZ monomers, corresponding to ZapD:FtsZ stoichiometries of 2:1 and 1:1, respectively (Supplementary Fig. 1a (III-IV)). This observation is consistent with the proposed interaction model.”

      ii) How does 40 - 80 uM of ZapD correspond to a molar ratio of approximately 6?

      It was a typo from previous versions. We have corrected it in the revised version. 

      iii) The ratios of ZapD to FtsZ are different when described later in page 4 in the context of the toroid. Are these ratios relevant compared to the contradicting ratios mentioned later in page 4?

      To clarify issues related to the binding of ZapD to FtsZ, we have rewritten the sections on ZapD binding stoichiometries to both FtsZ-GDP and FtsZ polymers in the presence of GTP (see page 4 lines 14-18 and page 5 lines 15-26).

      iv) Supplementary Figure 5:

      In the representative gel shown, the amount of ZapD in the pellet does not appear to be double compared to 10 and 30 uM concentrations. However, the estimated amount in the plot shown in panel (c) appears to indicate that that ZapD has approximately doubled at 30 uM compared to 10 uM. Please re-check the quantification.

      Without prior staining calibration of the gels, there is no simple quantitative relationship between gel band intensities after Coomassie staining and the amount of protein in a band (Darawshe et al. 1993 Anal Biochem - DOI: 10.1006/abio.1993.1581). The latter point precludes a quantitative comparison of pelleting / SDS-PAGE data and analytical sedimentation measurements.

      v) How can a consistent ratio being maintained be explained in an irregular structure of the toroid? The number of ZapD should be much less compared to FtsZ according to the model.

      See answers to points i) and iii)

      (2) GTPase activity and assembly/disassembly of toroids:

      i) Page 3, Results section: last paragraph:

      What is the explanation or hypothesis for decrease in GTPase activity upon ZapD binding? Given that FtsZ core is not involved in the interaction of the higher order assemblies, what is the probable reason on decrease in GTPase activity upon ZapA binding?

      Excluded volume effects caused by macromolecular crowding, such as high concentrations of Ficoll or dextran, promote the formation of dynamic FtsZ polymer networks (González et al. 2003 J. Biol. Chem - DOI: 10.1074/jbc.M305230200). In these conditions, FtsZ GTPase activity is significantly slowed down compared to the activity observed in FtsZ filaments formed without crowding, leading to a decreased GTPase turnover rate. Similar mechanisms may also apply to assembly reactions in the presence of ZapD (see, for example, Durand-Heredia et al. 2012 J Bacteriol - DOI: 10.1128/JB.0017612).

      ii) How is the decrease in GTPase activity compatible with dynamics of disassembly? Please substantiate on why disassembly is linked to transient interaction with ZapD. Shouldn't disassembly and transient interaction be linked to recovery of GTPase activity rates? 

      iii) Does the decrease in GTPase activity imply a reduced turnover of disassembly of FtsZ to monomers? Hence, how is the reduction in turbidity related to the decrease in GTPase activity? How does the GTPase activity change with time? iv) How can the decrease in GTPase activity with increasing ZapD be explained?

      We conducted GTPase activity assays within the first two minutes following GTP addition, a timeframe that promotes bundle formation. Previous studies, such as those by Durand-Heredia et al. (2012 J Bacteriol - DOI: 10.1128/JB.00176-12), have also indicated a reduction in GTPase activity during the initial moments of bundling. The reviewer’s suggestion that GTPase activity should recover after the disassembly of toroids is valid and warrants further investigation. To test this hypothesis, measuring GTPase activity over extended periods would be necessary. When comparing FtsZ filaments observed in vitro, we found that ZapD-containing FtsZ bundles exhibit decreased GTPase activity. Although we did not measure it directly, we anticipate a reduction in the rate of GTP exchange within the polymer, similar to the behavior of FtsZ bundles formed in the presence of crowders (González et al. 2003 J. Biol. Chem - DOI: 10.1074/jbc.M305230200), which also display a delay in GTPase activity. High levels of ZapD enhance bundling, which may explain the decrease in GTPase activity as ZapD levels increase.

      (3) Treadmilling and FtsZ filament organisation:

      If the FtsZ filaments are cross linked antiparallel, how can tread milling behaviour be explained? Doesn't tread milling imply a directionality of filament orientations in the FtsZ bundles?

      Our model can only suggest filament alignment. The latter is compatible with parallel and antiparallel filament organization.

      The correlation between observed effects on GTPase activity, treadmilling and ZapD interaction will provide an interesting insight to the model.

      Establishing a detailed correlation among these three factors could yield valuable insights into the mechanisms and potential physiological implications of the structural organization of FtsZ polymers influenced by crosslinking proteins and ZapD. To precisely characterize these interactions, further time-resolved assays in solution and reconstituted systems would be necessary, which is beyond the scope of this study.

      (4) Toroid dimensions and intrinsic curvature:

      i) Page 4: What is the correlation between the toroid dimensions and the intrinsic curvature of the FtsZ filaments? Given the thickness of ~ 127 nm, please provide an explanation of how the intrinsic curvature of FtsZ is compatible with both the inner and outer diameters of 500 nm and 380 nm.

      We added a paragraph for clarification (page 6, lines 20-24):

      “Previous studies have shown different FtsZ structures at different concentrations and buffer conditions. FtsZ filaments are flexible and can generate different curvatures ranging from mini rings of ~24 nm to intermediate circular filaments of ~300 nm or toroids of ~500 nm in diameter (reviewed in Erickson and Osawa 2017 Subcell Biochem - DOI: 10.1007/978-3-319-53047-5_5, and Wang et al. 2019 J Biol Chem - DOI: 10.1074/jbc.RA119.009621). It is reasonable to assume that FtsZ filaments can accommodate the toroid shape promoted by ZapD crosslinking.”

      ii) For the curvature of FtsZ filaments to be similar, the length of the filaments in the inner circles of the toroid have to be smaller than those in the outer circles? Is this true? Or are the FtsZ filaments of uniform length throughout?

      Due to the limitations in the resolution of the toroidal structure, we could not accurately measure the length or curvature of the filaments. Considering the FtsZ flexibility, these filaments may exhibit various curvatures and lengths, as previously mentioned.

      iii) Is the ZapD density uniform thought the inner and outer regions of the toroid?

      The heterogeneity found in the structures suggests a difference in ZapD binding densities; however, we lack quantitative data to confirm this. The outer regions are likely more exposed to the attachment of free ZapDs in the surrounding environment, which leads to the recruitment of more ZapDs and the formation of straight bundles. Supplementary Fig. 7b (right) features a zoomed-in image of a toroid adorned with globular densities in the outer areas, which may correspond to ZapD oligomers. Similar characteristics appear in the straight filaments illustrated in the panels of this figure. However, these features are absent or present in significantly lower quantities in toroids with a 1:1 ratio and toroids formed under a 1:6 ratio, suggesting that the external decoration is due to ZapD saturation. Unfortunately, we cannot provide further details on the characteristics of these protein associations.

      (5) Regular arrangement and toroid structure:

      i) Page 4: last section, first sentence: What is meant by 'regular' arrangement here? The word regular will imply a periodicity, which is not a feature of the bundles.

      We have rephrased the sentence in the revised manuscript as follows (page 5, lines 35-36): “Previous studies have visualized bundles with similar features using negative-stain transmission electron microscopy.”

      ii) Similarly, page 6 first sentence mentions about a conserved toroid structure. Which aspects of the toroid structure are conserved and what are the other toroids that are compared with?

      We noted several features that are conserved in the ZapD-mediated toroidal structures, including their diameter, thickness, height, and roundness, as shown in Fig. 2d-e and Supplementary Fig. 6b-c. However, the internal organization of the toroid does not exhibit a periodic or regular structure. We have rephrased this to say: “…resulting in a toroidal structure observed for the first time following the interaction between FtsZ and one of its natural partners in vitro.” (page 7, lines 42-43):

      iii) Discussion, para 1, last sentence: How is the toroid structural correlated with the bacterial cell FtsZ ring? What do the authors mean by 'structural compatibility' with the ring?

      The toroidal structures described in this work are consistent with the intermediate curved conformation of FtsZ polymers observed more generally across bacterial species and are likely to be part of the FtsZ structure responsible for constriction-force generation (Erickson and Osawa 2017 Subcell Biochem - DOI: 10.1007/978-3-319-53047-5_5). In the case of E. coli, if we assume an average of around 5000 FtsZ monomers in the polymeric form (two-thirds of the total found in dividing cells), this number of FtsZ molecules would be enough to encircle the cell around 6-8 times (considering the axial spacing between FtsZ monomers and the cell perimeter), which would be compatible with the structure adopting the form of a discontinuous toroidal assembly. 

      The term “structural compatibility” could be confusing, so we have removed it from the revised text. 

      iv) Discussion, para 2:

      Resemblance with the division ring in bacterial cells is mentioned in paragraph 2, however the features that are compared to claim resemblance comes later in the discussion. It will be helpful to rearrange the sections so that these are presented together.

      We have reorganized the sections following the reviewer’s suggestion.

      (6) CryoET of toroid and interpretation of the tomogram:

      i) Supplementary figure 10: It is not convincing that the indicated densities correspond to ZapD. Is the resolution and the quality of the tomogram sufficient to comment on the localisation of ZapD? It is challenging to see any interpretable difference between FtsZ filament dimers in 10a vs FtsZ+ZapD in panel (b).

      We acknowledge that localizing ZapDs in the structure is a challenge due to the limited resolution of the cryo-ET data (page 7, lines 11-13, 21-24). We have manually labeled putative ZapDs in the data and have done our best to identify the structures reasonably while recognizing the limitations of the segmentation. We use different colors to guide the eye without clearly stating what is or is not a ZapD. However, filaments found in 1:1 and 1:6 ratio toroids have a clear difference in thickness to those observed in the absence of ZapD. The filaments in 1:0 ratio toroids provide a reasonable control for elongation due to the missing wedge and allow us to attribute the extra filament thickness to ZapD densities confidently (page 7, lines 5-12).

      ii) How is it quantified that the elongation in Z is beyond the missing wedge effect? Please include the explanation for this in the methods or the relevant data as Supplementary figure panels.

      The missing wedge effect causes an elongation by a factor of 2 along the Z-axis. This elongation is evident in the filaments of the 1:0 ratio toroids. Consequently, the elongation in the filaments of the 1:1 and 1:6 ratio toroids exceed that observed due to the missing wedge effect. We have also added this information to the methods section (page 17, lines 31-33).

      iii) Segmentation analysis of the tomogram and many method details of analysis and interpretation of the tomography data has not been described. This is essential to understand the reliability of the interpretation of the tomography data.

      We provided thresholds for volume extraction as isosurfaces and clarified how the putative ZapDs are colored in the revised methods section (page 17, line 24-30). However, we could not perform quantitative analysis of the segmented structures.

      (7) Quantification of structural features of the toroid:

      i) Page 5 last sentence mentions that it provides crucial information on the connectivity and length of the filaments. Is it possible to show a quantification of these features in the toroid models?

      Based on our data, we hypothesize that ZapD crosslinks filaments by creating a network of short filaments rather than long ones. These short filaments assemble to form a complete ring. However, the current resolution of the data precludes precise quantification of this process.

      In the revised version, we have changed this last sentence to put the emphasis on the crosslinking geometry instead (page 7, lines 40-43):

      “Cryo-ET imaging of ZapD-mediated FtsZ toroidal structures revealed a preferential vertical stacking and crosslinking of short ZapD filaments, which are also crosslinked laterally and diagonally, allowing for filament curvature and resulting in a toroidal structure observed for the first time following the interaction between FtsZ and one of its natural partners in vitro.”

      ii) In toroids with increasing concentrations, will it be possible to quantify the number of blobs which have been interpreted as ZapD? Is this consistent with the data of FtsZ to ZapD ratios?

      These quantifications would assist in interpreting the data. However, due to the limited resolution of the data, we are reluctant to provide estimates.

      iii) What is the average length of the filaments in the toroid? Can this be quantified from the tomography data? Similarly, can there be an estimation of curvature of the filaments from the data?

      Unfortunately, the complexity of the toroidal structure and the limited resolution we achieved prevent us from providing accurate quantification. We attempted to track and measure the length of the filaments, but this proved challenging due to the high concentration of connections. Regarding curvature, the arrangement of the filaments into toroids makes it difficult to measure the curvature of each filament. Additionally, the filaments are not perfectly aligned, which suggests that there may be various curvatures present.

      iv) What is the average distance between the FtsZ filaments in the toroid? Does this correlate with the ZapD dimensions, when a model has been interpreted as ZapD?

      We measured the spacing (not the center-to-center distance) between filaments in the toroids and showed this in Supplementary Fig. 14b (sky blue). We observed that the distances are very similar to those found for straight bundles (light blue), with a slightly greater variability. We should point out here that the distances were measured in the XY plane to simplify the measurements.

      v) What is the estimate of average inter-filament distances within the toroid? (Similar data as in Figure 13 for bundles?) When the distance between filaments is less, is the angle between ZapD and FtsZ filament axis different from 90 degrees? This might help in validation of interpretation of some of the blobs as ZapD.

      The distances between the filaments presented in Supplementary Figure 14b include those for toroids (1:1 ratio, represented in sky blue) and straight bundles (1:6 ratio, shown in light blue). We focused solely on the distance between filaments in the XY plane and did not differentiate based on the connection angle. Although the distance may vary with changes in the angles between filaments, our data does not permit us to make any quantitative measurements regarding these variations.

      vi) How does the inter filament distance in the toroids compare with the dimensions of ZapD dimers, in the toroids and bundles? Is there a role played by the FtsZ linker in deciding the spacing?

      The dimension of a ZapD dimer is ~7 nm along the longest axis. Huecas et al. (2017 Biophys J - DOI: 10.1016/j.bpj.2017.08.046) estimated an interfilament distance of ~6.5-6.7 nm for toroids of FtsZ from Bacillus subtilis. These authors also observed a difference in this spacing as a function of the linker, assuming that linker length would modulate FtsZ-FtsZ interactions. We observe a similar spacing for double filaments (5.9 ± 0.8 nm) and a longer spacing in the presence of ZapD (7.88 ± 2.1 nm). Previous studies with ZapD did not measure the distance between filaments but hypothesized that distances of 6-12 nm are allowed based on the structure of the protein (Schumacher M. 2017 J Biol Chem - DOI: 10.1074/jbc.M116.773192). Longer linkers may also provide additional freedom to spread the filaments further apart and facilitate a higher degree of variability in the connections by ZapD. This discussion has been included in the revised text (page 6, line 10-18).

      (8) Crosslinking by ZapD and toroid reorganisation by transient interactions:

      i) Page 5, paragraph 2: Presence of putative ZapD decorating a single FtsZ': When ZapD is interacting with 2 FtsZ monomers within the same protofilament, it does not have any more valency to crosslink filaments. How do the authors propose that this can connect nearby filaments?

      We thank the reviewer for raising this interesting question. We see examples of ZapD dimers binding a filament through only one of the monomers, occupying one valency of the interaction and leaving one of the monomers available for another binding. We expect to see higher densities of ZapD in the outer regions of toroids simply because there are no longer (or not as frequent) FtsZ filaments available to be attached and join the overall toroid structure. Assuming that a ZapD dimer could bind the same FtsZ filament, this region would not be able to connect to other nearby filaments via these interactions.

      ii) Page 5: How are the authors coming up with the proposal of a reorganisation of toroid structures to a bundle? Given the extensive cross linking, a transition from a toroid to a bundle has to be a cooperative process and may not be driven by transient interactions. I would imagine that the higher concentration of ZapD will directly result in straight bundles because of the increased binding events of a dimer to one filament.

      Theoretically, this is correct. A certain degree of cooperativity linked to multivalent interactions would also favor the establishment of other ZapD connections. Furthermore, the formation of these structures occurs relatively quickly, within the first two minutes following the addition of GTP. We observed various intermediate structures, ranging from sparse filament bundles to toroids and straight filaments. However, the limited data prevents us from proposing a model that eventually explains the formation of higher-order structures over time.

      iii) Given such a highly cross-linked mesh, how can you justify transient interactions and loss of ZapD leading to disassembly? The possibility that ZapD can diffuse out of such a network seems impossible. Hence, what is the significance of a transient interaction? What is the basis of calling the interactions transient?

      We have noted that the term “transient” used to define the interaction between ZapD and FtsZ seems to generate confusion. Therefore, we have decided to replace this term to improve the readability of our manuscript, which has been edited accordingly.

      iv) Does the spacing between ZapD connections decide the curvature of the toroid?

      The FtsZ linker connected to ZapD molecules could modulate filament spacing and curvature, as previously suggested (Huecas et al. 2017 Biophys J - DOI: 10.1016/j.bpj.2017.08.046; Sundararajan and Goley 2017 J Biol Chem - DOI: 10.1074/jbc.M117.809939, and Sundararajan et al. 2018 Mol Microbiol - DOI: 10.1111/mmi.14081). In our structures, we observe a mixture of curvatures in the internal organization of the toroid. Despite the flexibility of FtsZ, filaments have a preferred curvature that FtsZ would initially determine. However, the amount of ZapD connections will eventually force the filament structure to adapt and align with neighboring filaments, facilitating connections with more ZapDs. Thus, the binding density of ZapD molecules significantly impacts FtsZ curvature rather than the ZapD connections themselves. However, the molecular mechanism describing the link between ZapD binding and polymer curvature remains unsolved.

      v) What is the difference in conditions between supplementary figure 6 and 12? Why is it that toroids are not observed in 12, for the same ratios?

      Both figures show images of samples under the same conditions. At high ZapD concentrations in the sample, we observe a mixture of structures ranging from single filaments, bundles, toroids, and straight bundles. In Supplementary Fig. 6, we have selected images of toroids, while in Supplementary Fig. 12, we have focused on single and double filaments. We aim to compare similar structures at different ZapD concentrations.

      (9) Correlation with in vivo observations:

      What is the approximate ratio of ZapD to FtsZ concentrations in the cell? In this context, within a cell which one - a toroid or bundle - will be preferred?

      Previous studies have estimated that E. coli cells contain approximately 5,000 to 15,000 FtsZ protein molecules, resulting in a concentration of around 3 to 10 µM (Rueda et al. 2003 J Bacteriol - DOI: 10.1128/JB.185.11.3344-3351.2003). Furthermore, only about two-thirds of these FtsZ molecules participate in forming the division ring (Stricker et al. 2002 PNAS - DOI: 10.1073/pnas.052595099). In contrast, ZapD is a low-abundance protein, with only around 500 molecules per cell (DurandHeredia et al. 2012 J Bacteriol - DOI: 10.1128/JB.00176-12), making it a relatively small fraction compared to the FtsZ molecules. Under these circumstances, toroidal structures are more likely to form than straight bundles, as the latter would require significantly higher concentrations of ZapD for proper assembly. We have added these considerations in the revised text (page 11, lines 1-7).

      (10) Interpretation of mZapD results:

      i) What is the experimental proof for weakened stability of the dimer? Rather than weakened stability, does this form a population of only monomeric ZapD or a proportion of non-functional or unfolded dimer? This requires to be shown by AUC or SEC to substantiate the claim of a weakened interface.

      We have provided new AUC results indicating that mZapD is partially monomeric, which suggests a weakened dimerization interface (page 9, line 15-16 and Supp. Fig. 15a). The assays revealed no signs of protein aggregation.

      ii) How does a weaker dimer result in thinner bundles and not toroids? A weaker dimer would imply that the number of ZapD linked to FtsZ will be less than the wild type, leading to less cross linking, which should lead to toroid formation rather than thinner bundles.

      This observation provides the most plausible explanation. However, we did not detect any toroidal structures, even at high concentrations of mZapD. This finding indicates that a more potent dimerization interface is essential for promoting the formation of toroidal structures rather than merely the number of ZapD-FtsZ connections. mZapD presumably has a reduced affinity for FtsZ, which, along with a weaker binding interface, may explain mZapD's inability to facilitate toroid formation.

      iii) This observation would imply that the geometry of the dimeric interaction plays a role in the bending of the FtsZ filaments into toroids? Please comment.

      Our data suggest that the binding density of ZapD to FtsZ polymers is a crucial factor governing the transition from toroidal structures to straight bundles. Toroids form when the polymers have excess free FtsZ (that ZapD does not crosslink). Additional factors, such as the orientation of the interactions, the length of the flexible linker, and the strength of the ZapD dimerization interface, are likely to contribute to these structural reorganizations. However, our current data do not allow for further analysis, and future experiments will be necessary to address these questions.

      (11) Curvature and plasticity of toroid:

      i) What are the factors that stabilise curved protofilaments/toroid structures in the absence of a cross linker, based on earlier studies from B. subtilis. A comparison will be insightful. ii) What is the effect of the linker length between FtsZ globular domain and CTP in the toroid spacing?

      Huecas et al. 2017 (Biophys J - DOI: 10.1016/j.bpj.2017.08.046) concluded that the disordered CTL of FtsZ serves as a spacer that modulates the self-organization of FtsZ polymers. They proposed that this intrinsically disordered CTL, which spans the gap between protofilament cores, provides approximately 70 Å of lateral spacing between the curved Bacillus subtilis FtsZ (BsFtsZ), forming toroidal structures. In contrast, the parallel filaments of tailless BsFtsZ mutants, which have a reduced spacing of 50 Å, will likely stick together, resulting in the straight bundles observed. In the full-length BsFtsZ filament, the flexibility allowed by the lateral association favors the coalescence of these curved protofilaments, leading to the formation of toroidal structures. 

      The role of the C-terminal tail of FtsZ in E. coli is critical for its functionality (Buske and Levin 2012 J Biol Chem - DOI: 10.1074/jbc.M111.330324). However, its structural involvement in complex formations remains unclear. Research indicates that any disordered peptide between 43 and 95 amino acids in length can function as a viable linker, while peptides that are significantly shorter or longer impede cell division (Gardner et al. 2013 Mol Microbiol - DOI: 10.1111/mmi.12279). Studies in E. coli and B. subtilis suggest that intrinsically disordered CTLs play a role in determining FtsZ assembly and function in vivo, and this role is dependent on the length, flexibility, and disorder of the tails. These aspects still require further exploration.

      iii) How is it concluded that the concentration of ZapD is modulating the behaviour of the toroid structure? ZapD as a molecule does not have much room for conformational flexibility beyond a few angstroms, in the absence of long flexible regions. Rather, shouldn't the linker length of FtsZ to the CTP decide the plasticity of the toroid?

      The length and flexibility of the linker can significantly influence structural interactions. As previously mentioned, a longer linker will likely enhance the range of interaction distances and orientations. However, specific interaction of ZapD and FtsZ is stronger than non-specific electrostatic FtsZ-FtsZ interactions, and this is not solely due to the flexibility of the linker. Instead, it can modulate the formation of either a toroidal structure or straight bundles.

      iv) "a minor free energy perturbation to bring about significant changes in the geometry of the fibers due to modifications in environmental conditions" - this sentence is not clear to me. How did the data described in the paper relate to minor free energy perturbations and how do environmental conditions affect this?

      This sentence aimed to convey the notion of polymorphism in FtsZ polymers. We acknowledge that the original version may have been unclear, so we have removed it in the new version of the manuscript (page 12, lines 1-2).

      (12) Missing controls:

      i) Supplementary Figure 2a: Interaction between ZapD and FtsZ: what was the negative control used in this experiment? Use of FtsZ with the CTP deletion or ZapD specific mutations will help in confirming that the Kd estimation is indeed driven by a specific interaction.

      Negative controls correspond to FtsZ and ZapD alone.

      ii) In a turbidity measurement, how will you distinguish between ZapD mediated bundling, ZapD independent bundling and FtsZ filaments alone? Here again, having a data with non-interacting mutational partners will make the data more reliable.

      The turbidity signal of individual proteins in the absence and presence of GTP is indistinguishable from that of the buffer. We have indicated this in the figure legend.

      iii) Control experiments to show that mZapD is folded (see point below) and to indeed prove that it is monomeric is missing.

      We have included the missing AUC data in the supplementary information (Supp Fig 15a).

      Minor points:

      -  Page 2, para 4: beta-sheet domain (instead of beta-strand)

      Done.

      -  Fig 2a and b: Why is a ratio mentioned in Figure 2a legend? I understood these images as individual proteins at 10 uM concentrations.

      That was a typing error; it corresponds to two individual proteins at 10 µM concentrations. 

      -  Fig 2. Y-axis - spelling of frequency (change in all figures where applicable)

      Corrected.

      -  Supplementary Figure 5: FtsZ 5 uM - change u to micro symbol. FtsZ - t is missing

      Corrected. 

      -  Molecular weight marker is xx. What does xx stand for?

      Corrected. 

      -  Fig 1: Units for GTPase activity on the y-axis is missing.

      Done.

      -  Suppl Fig 3: How was the normalisation carried out for the turbidity data?

      We have explained it the revised methods section. 

      -  Page 4, line 5: p missing in ZapD

      Done. 

      -  Page 5: paragraph 1, last sentence: stabilised or established?

      Done.

      -  Page 6: 3rd sentence from last: correct the sentence (one ZapD two FtsZ)

      Corrected. 

      -  Page 14: Fluorescence microscopy and FRAP experiments have not been described in the manuscript. Hence, these are not required in the methods.

      Corrected. 

      -  Please include representative gels of purified protein samples used in the assay for sample quality control.

      Controls for each protein are shown in Supplementary Fig. 5a as “control samples” corresponding to 5 µM of each protein before centrifugation.

      Reviewer #3 (Recommendations for the authors):

      Fig. S2a confirms and quantitates the interaction of ZapD with FtsZ-GDP monomers by F.A. It shows a surprisingly high Kd of ~10 µM. This seems important but it is ignored in the overall interpretation. Fig. S2b (FCS) suggests an even weaker interaction, but this may reflect higher order aggregates.

      As the reviewer points out, the interaction between ZapD and FtsZ in the GDP form is weak, consistent with the need for high concentrations of ZapD to form FtsZ macrostructures in the presence of GTP.

      We did not observe the formation of ZapD aggregates, even at higher protein (Author response image 1A) and salt (Author response image 1B) concentrations.

      Author response image 1.

      A) Sedimentation velocity (SV) profiles of ZapD over a concentration range of 2 to 30 µM in 50 mM KCl, 5 mM MgCl2, Tris-HCl pH 7. B) SV profiles of ZapD at 10 µM in different ionic strength concentrations in buffer 50-500 mM KCl, 5 mM MgCl2, 50 mM Tris-HCl pH 7. Abs280 measurements were collected at 48,000 rpm and 20 ºC. 

      Describing their assembly of toroids the authors state "Upon adding equimolar amounts of ZapD, corresponding to the subsaturating ZapD binding densities described in the previous section". My reading of Fig. 1b and S5 is that FtsZ is almost fully saturated at 1:1 concentration; In S5a at 5:5 µM about 25% of each is in the pellet, which is near 1:1 saturation. It is certainly >50% saturated. Shouldn't this be clarified to read "slightly substoichiometric. Of course, that undermines the identification of ZapD as such a substoichiometric number.

      We have rephrased the sentence following the reviewer’s suggestions to clarify matters (page 5, lines 39-40).

      The cryoET images in Fig. 3 are an average of five slices with a total thickness of 32 nm. The circular "short filaments..almost parallel" are therefore not single 5 nm diameter FtsZ filaments but must be alignment of filaments axially into sheets (or belts, the axial structure shown in Fig. S8e, discussed next). Importantly, the authors indicate "connections between filaments" by red arrows. This seems wrong for two reasons. (1) The "connections" are very sparse, and therefore not consistent with the near saturation of FtsZ by ZapD. (2) To show up in the 32 nm averaged slice, connections from multiple filaments would have to be aligned. Fig. 3e is a "view of the segmented toroidal structure." I think it shows sheets of filaments as noted above, and the suggested "crosslinks" are again very sparse and no more convincing.

      We thank the reviewer for pointing this out. This was an error on our part, which we have corrected in the figure legend of the revised version of the manuscript. The tomographic slice shown in Fig. 3a is an average of 5 slices, each with a pixel size of 0.86 nm, corresponding to a pixel size of 4.31 nm. It therefore corresponds to the thickness of a single FtsZ filament. The few red arrows indicate lateral connections between filaments, and as discussed earlier, ZapDs also crosslinks FtsZ filaments vertically, giving rise to the elongated structures observed in the Z-direction.

      All 3-D reconstructions and segmented renditions should have a scale bar. The axial cylindrical sheets seem to be confirmed and qualified in Fig. S8e. The cylindrical sheets are not continuous, but seem to consist of belt-like filaments that are ~8-10 nm wide in the axial direction. Adjacent belts are separated axially by ~5 nm gaps, and radially by 4-20 nm. The densest filaments in the projection image Fig. 3b are probably an axial superposition of 2-3 belts, while the lighter filaments may be individual belts.

      Fig. 4 shows a higher number of crosslinks but nowhere near a 1:1 stoichiometry. Most importantly to me, the identification of crosslinks vs filaments seems completely arbitrary. For example, if one colored grey all of the densities I 4a right panel, I would have no way to duplicate the distinctions shown in red and blue. Even if we accept the authors' distinction, it does not provide much structural insight. Continuous bands or sheets are identified as FtsZ, without any resolution of substructure, and any density outside these bands is ZapD. The spots identified as ZapD seem randomly dispersed and much too sparse to include all the ~1:1 ZapD.

      We appreciate the reviewer's comments. Scale bars are present in the tomographic slices but not in the 3D views, as these are perspective views, and it would be inappropriate to include scale bars. To provide context for the images, we added the dimensions of the toroids and toroid sections to the figure legends. 

      As previously mentioned, the resolution of our data limits our ability to accurately segment ZapD densities, especially in the Z direction. In Fig. 4, we have done our best to segment the ZapD densities at the top and sides of the FtsZ filaments, but many densities have been missed. We have clarified this point in the text and in the figure legend. We have clarified this point in both the text and the figure legends. This preliminary annotated view is meant to help illustrate the formation of the toroids. In Fig. 3, we have labeled only a few arrows to highlight the lateral connections between the FtsZ filaments; however, there are many more connections than those indicated.

      Fig. S12 explores the effect of increasing ZapD to 1:6, and the authors conclude "the high concentration of ZapD molecules increased the number of links between filaments and ultimately promoted the formation of straight bundles." However, the binding sites on FtsZ are already nearly saturated at 10:10.

      We cannot assume that all FtsZ binding sites are present at a 1:1 ratio. Our pelleting assay confirms the presence of both proteins in the pellet, but we should be cautious about quantification due to the limitations of this technique. Based on our cryo-EM experiments, the amount of ZapD associated with these structures is much lower. We hypothesize that ZapD proteins sediment with the large FtsZ structures, acting as an external decoration for the toroids. A single ZapD monomer may be bound to multiple outer filaments of the structures, which could effectively increase the total µM concentration observed in the pelleting assay. This situation may explain the enrichment of ZapD in the pellet at high concentrations, when theoretically only a 1:1 ratio should be possible. We have observed external decorations of ZapD at high concentrations (see Supplementary Fig. 6). We believe that the pelleting assay simplifies the system and should be used to complement the cryo-EM images.

      Minor points.

      In the Intro "..to follow a treadmilling behavior, similar to that of actin filaments.9-13." These refs have little to do with treadmilling. I suggest: Wagstaff..Lowe mBio 2017; Du..Lutkenhaus PNAS 2018; Corbin Erickson BJ 2020; Ruis..Fernandez-Tornero Plos Biol 2022.

      Following the reviewer’s suggestions, we have modified the references in the revised version. 

      The authors responded to a query during review stating that the concentration of ZapD always refers to the monomer subunit. That seems certainly the case for Fig. S1, but the caption to Fig. 1a confuses the stoichiometry issue: "expecting (sic) at around 2:1 FtsZ:ZapD." Perhaps it could be clarified by stating that the Fig. shows only half the FtsZ's occupied. But in Fig. 1b the absorbance reaches its maximum at equimolar FtsZ and ZapD. That means that all FtsZ's are bound to a ZapD monomer. Why not draw the model in 1A show that? Fig. S5 is also consistent with this 1:1 stoichiometry. And this might be the place to contrast the planar model with the stacked model suggested by Fig. 5 where the two FtsZ filaments are ~8 nm apart, and the ZapD bridging them is on top.

      We have revised the legend for Fig. 1a to improve its readability. In Fig. 1b, the absorbance data indicate that most FtsZ proteins form macrostructures; however, this does not imply that all FtsZ proteins are bound to ZapDs. Our findings demonstrate that this binding only occurs in the case of straight bundles.

      It may help to note that some previous studies have expressed the concentration of ZapD as the dimer. E.g., Roach..Khursigara 2016 found maximal pelleting at FtsZ:ZapD(dimer) of 2:1 (their Fig. 3), completely consistent with the 1:1 FtsZ:ZapD(monomer) in the present study.

      We recognize this discrepancy in the literature. Therefore, throughout the manuscript, the molar concentrations of both proteins are expressed in terms of the FtsZ and ZapD monomer species.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      Lodhiya et al. demonstrate that antibiotics with distinct mechanisms of action, norfloxacin, and streptomycin, cause similar metabolic dysfunction in the model organism Mycobacterium smegmatis. This includes enhanced flux through the TCA cycle and respiration as well as a build-up of reactive oxygen species (ROS) and ATP. Genetic and/or pharmacologic depression of ROS or ATP levels protect M. smegmatis from norfloxacin and streptomycin killing. Because ATP depression is protective, but in some cases does not depress ROS, the authors surmise that excessive ATP is the primary mechanism by which norfloxacin and streptomycin kill M. smegmatis. In general, the experiments are carefully executed; alternative hypotheses are discussed and considered; the data are contextualized within the existing literature. Clarification of the effect of 1) ROS depression on ATP levels and 2) ADP vs. ATP on divalent metal chelation would strengthen the paper, as would discussion of points of difference with the existing literature. The authors might also consider removing Figures 9 and 10A-B as they distract from the main point of the paper and appear to be the beginning of a new story rather than the end of the current one. Finally, statistics need some attention.

      Strengths:

      The authors tackle a problem that is both biologically interesting and medically impactful, namely, the mechanism of antibiotic-induced cell death.

      Experiments are carefully executed, for example, numerous dose- and time-dependency studies; multiple, orthogonal readouts for ROS; and several methods for pharmacological and genetic depletion of ATP.

      There has been a lot of excitement and controversy in the field, and the authors do a nice job of situating their work in this larger context.

      Inherent limitations to some of their approaches are acknowledged and discussed e.g., normalizing ATP levels to viable counts of bacteria.

      We sincerely appreciate the reviewer’s encouraging feedback.

      Weaknesses:

      The authors have shown that treatments that depress ATP do not necessarily repress ROS, and therefore conclude that ATP is the primary cause of norfloxacin and streptomycin lethality for M. smegmatis. Indeed, this is the most impactful claim of the paper. However, GSH and dipyridyl beautifully rescue viability. Do these and other ROS-repressing treatments impact ATP levels? If not, the authors should consider a more nuanced model and revise the title, abstract, and text accordingly.

      We thank the reviewer for asking this question. In the revised version of the manuscript, we have included data on the impact of the antioxidant GSH on antibiotic-induced ATP levels as the supplementary figure (S9C)

      Does ADP chelate divalent metal ions to the same extent as ATP? If so, it is difficult to understand how conversion of ADP to ATP by ATP synthase would alter metal sequestration without concomitant burst in ADP levels.

      We sincerely thank the reviewer for raising this insightful question. Indeed, ADP and AMP can also form complexes with divalent metal ions; however, these complexes tend to be less stable. According to the existing literature, ATP-metal ion complexes exhibit a higher formation constant compared to ADP or AMP complexes. This has been attributed to the polyphosphate chain of ATP, which acts as an active site, forming a highly stable tridentate structure (Khan et al., 1962; Distefano et al., 1953). An antibiotic-induced increase in ATP levels, irrespective of any changes in ADP levels or a total pool size of purine nucleotides, could still result in the formation of more stable complexes with metal ions, potentially leading to metal ion depletion. Although recent studies indicate that antibiotic treatment stimulates purine biosynthesis (Lobritz MA et al., 2022; Yang JH et al., 2019), thereby imposing energy demands and enhancing ATP production, and therefore, the possibility of a corresponding increase in total purine nucleotide levels (ADP+ATP) exist (is mentioned in discussion section). However, this hypothesis requires further investigation.

      Khan MMT, Martell AE. Metal Chelates of Adenosine Triphosphate. Journal of Physical Chemistry (US). 1962 Jan 1;Vol: 66(1):10–5

      Distefano v, Neuman wf. Calcium complexes of adenosinetriphosphate and adenosinediphosphate and their significance in calcification in vitro. Journal of Biological Chemistry. 1953 Feb 1;200(2):759–63

      Lobritz MA, Andrews IW, Braff D, Porter CBM, Gutierrez A, Furuta Y, et al. Increased energy demand from anabolic-catabolic processes drives β-lactam antibiotic lethality. Cell Chem Biol [Internet]. 2022 Feb 17.

      Yang JH, Wright SN, Hamblin M, McCloskey D, Alcantar MA, Schrübbers L, et al. A White-Box Machine Learning Approach for Revealing Antibiotic Mechanisms of Action. Cell [Internet]. 2019 May 30

      Reviewer #1 (Recommendations for the authors):

      (1) Some of the results in the paper diverge from what has been previously reported by some of the referenced literature. These discrepancies should be clarified.

      We apologize for any confusion, but we are uncertain about the specific discrepancies the reviewer is referring. In the discussion section, we have addressed and analysed our results within the broader context of the existing literature, regardless of whether our findings align with or differ from previous studies.

      (a) CCCP, nigericin, BDQ, and the atpD mutant all appear to affect M. smegmatis growth (Figures S6C, S7C, S7D-E, and Figure 1B from reference 41). Could depressed growth contribute to the rescue effects of these compounds?

      We concur with the reviewer that the reagents we used (CCCP, Nigericin, and BDQ) to suppress the ATP burst in the presence of antibiotics do affect bacterial growth. This growth sub-inhibitory effect is expected given their roles in either uncoupling the electron transport chain from oxidative phosphorylation or directly inhibiting ATP synthase, leading to reduced ATP production compared to the untreated control. However, we chose concentrations that reduces the antibiotic-induced surge in ATP levels without significantly depriving the bacteria of the ATP  essential for their survival, thereby avoiding cell death.

      Consequently, all three reagents (as shown in Figures S6C, S7C, and S7D-E) were employed at non-lethal concentrations. We would like to emphasize, however, that it was not feasible to select a reagent concentration that had no impact on growth yet still suppressed the antibiotic-induced ATP burst. We recognize the possibility that growth retardation may have contributed to the observed rescue effects. To address this concern, we used multiple orthogonal methods (CCCP, Nigericin, and BDQ), each with distinct mechanisms having a common effect of reducing the ATP surge, to minimize off-target effects and support our findings.

      Also, the authors report no growth phenotype for atpD mutant (Figure S8) but only carry out the growth curve to an OD of 2, which is approximately where the growth curve from ref 41 begins to diverge.

      Additionally, to further confirm that bacterial rescue was not due to growth retardation caused by these reagents, we utilized the atpD mutant. All experiments, including those involving the atpD mutant, were conducted when the OD600nm reached 0.8 (during the exponential phase). We specifically ensured that the growth of the atpD mutant was not compromised during this phase (Figure S8) and restricted our growth curve to the early stationary phase (OD600 between 1.5 and 2). While it is possible that the atpD mutant may exhibit slower growth compared to wild-type bacteria in stationary phase at an OD600nm of 4 (as shown in ref 41), however, this does not impact our observations.

      (b) Reference 41 also reports that the atpD mutant is more sensitive to some antibiotics  (Figure 6). This includes isoniazid, which references 34 and 35 have both reported caused an ATP burst.

      We acknowledge the reviewer’s query regarding the phenotype of the atpD mutant against isoniazid (Reference 41). However, the cited reference does not provide clarity on why the M. smegmatis atpD mutant exhibits increased sensitivity to isoniazid and other antibiotics, nor does it explain whether this sensitivity is due to reduced ATP levels or altered cell wall properties, such as enhanced drug uptake, as observed with Nile red and ethidium bromide.

      While references 34 and 35 reported an ATP burst following isoniazid treatment in slow-growing M. bovis BCG and M. tuberculosis, it remains to be tested whether isoniazid acts similarly in the fast-growing M. smegmatis, where it is bacteriostatic rather than being bactericidal as observed in M. bovis BCG and M. tuberculosis.  

      (2) The statistics require some attention. First, the wording for almost all of the figures is something like "data points represent the mean of at least three independent replicates," is that correct? CFUs are notoriously messy so it is surprising (impressive?) that the variability between replicates is so low. Second, t-tests are not appropriate for multiple comparisons.

      We thank the reviewer for raising this important query. It is correct that all our experiments included at least three independent replicates, and many of our results exhibit a high degree of variability, as indicated by the large error bars. We would like to clarify that we did not perform multiple comparisons on our results. For all analyses, an unpaired t-test was conducted between the control group and one experimental group at a time. Consequently, statistical data were generated for each pair of results, and the comparisons were displayed on the graph relative to the control data points, as mentioned in the Methods section under the heading “Statistical analysis”

      (3) Figures 9 and 10A-B seem tangential to the main point of the paper and, in the case of Figure 10A-B, preliminary.

      In this study, our aim was to comprehensively investigate the nature of antibiotic-induced stresses (i.e., mechanisms of action from T = 15 hrs) and leverage these insights to enhance our understanding of bacterial adaptation mechanisms, particularly antibiotic tolerance (from T = 25 hrs). While a significant portion of the manuscript focuses on the secondary consequences of antibiotic exposure, we also sought to assess the bacteria's ability to counteract these stresses, contributing to our understanding of how antibiotic tolerance phenotypes develop.

      The results presented in Figure 9 clearly demonstrate that bacteria attempt to reduce respiration by decreasing flux through the complete TCA cycle, thereby mitigating ROS and ATP production in response to antibiotics. These findings not only uncovers potential metabolic pathways to downregulate respiration but also validate our observations regarding the role of increased respiration, ROS generation, and subsequent ATP production in antibiotic action.

      Importantly, bacterial responses to antibiotics were not limited to metabolic adaptations. They also included the upregulation of the intrinsic drug resistance determinant Eis (Figure 10A) and an increase in mutation frequency (Figure 10B), both of which indicate a greater likelihood of these bacteria developing antibiotic tolerance and resistance. Therefore, the data presented in Figures 9 and 10A-B are not peripheral to the central theme of the paper. Rather, they complement and strengthen it by providing a comprehensive understanding of the consequences of antibiotic exposure, which aligns with the primary objectives of our study.

      Do the various perturbations used here (especially streptomycin) effect expression and/or turnover of the genetically-encoded sensors Mrx1-roGFP2 or Peredox-mCherry?

      We appreciate the reviewer for raising this query. Since streptomycin treatment leads to mistranslation and eventually inhibits protein synthesis, it is possible that such treatment could impact the expression and/or turnover of the genetically encoded biosensors, Mrx1-roGFP2 (1) or Peredox-mCherry (2). However, we do not anticipate any effects on the readout as both biosensors provide ratiometric measurements of redox potential and NADH levels, respectively, which eliminates errors due to variations in protein abundance. Nevertheless, in our experiments with both drugs, we employed multiple time- and dose-dependent responses, ensuring that all meaningful conclusions were drawn from the overall trends seen in the data rather than an individual data point.

      (1) Bhaskar A, Chawla M, Mehta M, Parikh P, Chandra P, Bhave D, et al. (2014) Reengineering Redox Sensitive GFP to Measure Mycothiol Redox Potential of Mycobacterium tuberculosis during Infection. PLoS Pathog 10(1): e1003902. https://doi.org/10.1371/journal.ppat.1003902

      (2) Shabir A. Bhat, Iram K. Iqbal, and Ashwani Kumar*. Imaging the NADH:NAD+ Homeostasis for Understanding the Metabolic Response of Mycobacterium to Physiologically Relevant Stresses. Front Cell Infect Microbiol. 2016; 6: 145. doi: 10.3389/fcimb.2016.00145

      (4) Do the antibiotics affect permeability? Especially relevant to CellROX experiments.

      Antibiotics can impact, or even increase, bacterial membrane permeability, a phenomenon noticed in case of self-promoted uptake of aminoglycosides. When aminoglycosides bind to ribosomes, they induce mistranslation, including of membrane proteins, leading to the formation of membrane pores, which in turn enhances antibiotic uptake and lethality (1-2). However, whether the antibiotics used in our study (norfloxacin and streptomycin) at the concentrations applied altered membrane permeability is not known.

      Experiments involving the CellROX dye are unlikely to be influenced by changes in membrane permeability, as the dye is freely permeable to the mycomembrane.

      References:

      (1) Davis BD Chen LL Tai PC (1986) Misread protein creates membrane channels: an essential step in the bactericidal action of aminoglycosides PNAS 83:6164–6168.

      (2) Ezraty B Vergnes A Banzhaf M Duverger Y Huguenot A Brochado AR Su SY Espinosa L Loiseau L Py B Typas A Barras F (2013) Fe-S cluster biosynthesis controls uptake of aminoglycosides in a ROS-less death pathway Science 340:1583–1587.

      (5) Figures 4E-H does GSH affect bacterial growth/viability on its own i.e. in the absence of a drug?

      We thank the reviewer for raising this query. Indeed, the 10 mM GSH used in our experiments to mitigate and rescue cells from antibiotic-induced ROS does impact bacterial growth on its own, though it does not affect viability, likely due to GSH inducing reductive stress on bacterial physiology. For clarification, we have included the viability measurement data in the presence of 10 mM GSH alone in the revised version of the manuscript, as supplementary figure (S4E).

      (6) p. 2 "...antibiotic resistance involves more complex mechanisms and manifests as genotypic resistance, antibiotic tolerance, and persistence." This reads as tolerance and persistence being a subset of resistance, which is not quite accurate. There is at least one other example of similar wording in the text.

      We thank the reviewer for highlighting this point. Our intention was to convey that resistance to antibiotics can manifest in two forms: permanent or genetic resistance, and transient resilience through antibiotic tolerance and persistence.

      (7) p. 3 "...and showing no visible differences in the growth rate...". It is hard to say this as all the values appear to be 0 - possible to zoom in on the CFU counts in this region? Same comment for p. 5 "...the unaffected growth rate in the early response phase...".

      We apologize for the lack of clarity regarding the resolution of the early time points in the growth curve. Unfortunately, it was not feasible for us to zoom in on the initial time points due to the significant difference in cell viability between T=0 and T=25 hours (i.e., spanning 8 generations). For clarification in the growth phenotype at early time points, please refer to Author response image 1, where CFU counts are plotted on a logarithmic scale. The y-axis spans 6-8 orders of magnitude across different conditions, making it difficult to visualize early time points on a linear scale.

      Author response image 1.

      (8) p. 5 "...data for each condition were subjected to rigorous quality control analysis (S2B)." I believe that this is the case, but how Figure S2B demonstrates this fact is not clear.

      Figures S2A and S2B present the quality assessment data for all six proteomics datasets. Figure S2A illustrates the consistency in the number of proteins identified across 10 samples (5 independent replicates for both control and drug treatment). The minimal variation in the number of identified proteins indicates reproducibility across the different runs. Similarly, Figure S2B displays the variability in Pearson correlation coefficient values of protein abundance (LFQ intensities) across the 10 samples. The closer and more consistent the Pearson correlation values, the greater the reproducibility of the quantitative data acquisition.

      (9) p. 7 "To look for a shared mechanism of antibiotic action..." The wording implies an assumption. Perhaps "to test whether" would be more appropriate? Same comment for p. 12 "To further confirm whether enhanced respiration ...".

      We appreciate the reviewer’s suggestions for both sentences and have made the necessary changes in the revised version. Thank you for bringing this to our attention.

      (10) Figure S1A-B figure legend. How was this assay performed?

      The experiment for Figures S1A-B was conducted using a standard REMA assay, as described in the methods section. Cells were harvested at the 25th-hour time point, and drug MICs were compared between cells grown with and without 1/4x MBC99 of the drugs. This was done to determine whether the growth recovery observed during the recovery phase was due to the presence of drug-resistant bacteria.

      (11) p. 14 "...(CCCP), a protonophore, at non-toxic levels..." Figure S6C implies an effect on growth.

      As clarified earlier in response to query 1(a), the CCCP reagent was used at concentrations that effectively minimize the antibiotic-induced surge in ATP levels. However, at these concentrations, CCCP reduces cellular ATP production (Figure S6A), leading to bacterial growth delay (Figure S6C). By "non-toxic levels," we intended to convey that these concentrations of CCCP are non-lethal to the bacteria, as evidenced in Figure S6C.

      (12) Figure 8A y axis is this CFU/mL or OD/mL?

      The y-axis for the figure 8A depicts CFU/ml as it measures the cell survival in response to increasing concentrations of bipyridyl.

      Reviewer #2 (Public review):

      Summary:

      The authors are trying to test the hypothesis that ATP bursts are the predominant driver of antibiotic lethality of Mycobacteria.

      Strengths:

      This reviewer has not identified any significant strengths of the paper in its current form.

      Weaknesses:

      A major weakness is that M. smegmatis has a doubling time of three hours and the authors are trying to conclude that their data would reflect the physiology of M. tuberculosis which has a doubling time of 24 hours. Moreover, the authors try to compare OD measurements with CFU counts and thus observe great variabilities.

      If the authors had evidence to support the conclusion that ATP burst is the predominant driver of antibiotic lethality in mycobacteria then this paper would be highly significant. However, with the way the paper is written, it is impossible to make this conclusion.

      We have identified a new mechanism of antibiotic action in Mycobacterium smegmatis. However, as discussed extensively in the manuscript's discussion section, whether and to what extent this mechanism applies to other organisms still needs to be tested.

      We have always drawn inferences from the CFU counts as the OD600nm is never a reliable method as reported in all of our experiments.

      Reviewer #2 (Recommendations for the authors):

      Figure 1 needs to have an x-axis that has intervals that have 10E5 CFU to 4 x 10E8. But even 4 x 10E8 CFU/ml is a late log and not exponentially growing cells.

      Figure 1 illustrates the growth curve. We hope the reviewer meant the Y axis which represents CFU/ml on a linear scale. As mentioned in response to reviewer #1’s query no. 7, it was not feasible to include the viability (CFU/ml) values at T=0 and a few subsequent time points. Naturally, the starting cell count was not zero; we began with approximately 600,000 CFU/ml, corresponding to an OD600nm of 0.0025/ml. For clarification, we have mentioned the initial OD as well CFU/ml at T= 0 hr in the figure legend.  

      Carefully look at Figure 1, what were you trying to show? Your x-axis goes from 0 to 10E8, of course you did not inoculate 0 cells, but if you had measured CFUs, you might not have gotten the great variability you reported in your graph.

      We assume that the reviewer is suggesting that "if we had measured OD600nm/ml instead of CFU/ml, we might not have observed the high variability we reported." While we agree with the reviewer's comment, our decision to use CFU/ml for growth measurement was to obtain more resolved and detectable data points, as an OD600nm of 0.0025/ml cannot be reliably measured with a spectrophotometer. Additionally, at around T=15 hours, where we observed an extended lag phase (referred to as the stress phase), the OD600nm was approximately 0.05, which is barely detectable. Therefore, the significant differences between the control group and the ¼ x MBC99 drug-treated group might not have been observed if we had relied on OD-based measurements. Despite the presence of high error bars and variability in the data points, we were still able to demonstrate clear differences in bacterial growth between treated and untreated samples at sub-lethal drug doses. This ultimately allowed us to capture the nature of antibiotic-induced stresses.

      There is no doubt that sublethal concentrations of antibiotics will have an effect on the bacterial cells. But it is not clear how you are concluding that ATP burst is the dominant driver of lethality. M. smegmatis can be very different from Mtb.

      Using a series of time- and dose-dependent experiments with plasmid and kit-based approaches, we demonstrated that both antibiotics generate and rely on ROS and ATP bursts to induce lethality in M. smegmatis. Careful monitoring of oxidative stress in cells, following specific quenching of the antibiotic-induced ATP burst (Figure 7, S9A-B), revealed that the ATP burst is the dominant driver of antibiotic lethality. In all tested experiments, surviving bacteria exhibited elevated levels of oxidative stress but were able to maintain their viability, suggesting that oxidative stress alone is not the dominant factor in antibiotic-induced lethality. Furthermore, quenching of ROS by glutathione also suppressed antibiotic-induced surge in ATP levels, thus supporting the notion that ROS alone, is not the dominant driver of antibiotic action as previously understood.

      All experiments reported were conducted using fast-growing M. smegmatis, and have acknowledged the need for similar experiments in other bacterial systems, including M. tuberculosis, to assess whether our findings are applicable to other systems.

      Another point, the use of a mutant in the ATP synthase is an interesting idea, but would it be better to use something where you knock out the ATP synthase activity with siRNA or a temperature-sensitive allele?

      We appreciate the reviewer’s encouraging comment. Knocking out ATP synthase would completely halt oxidative phosphorylation and shut down aerobic respiration, leading to severe metabolic and growth defects. Such stressful and non-growing conditions are not suitable for testing the efficacy of antibiotics, as it is widely accepted that antibiotics are more effective against metabolically active bacteria.

      Lastly, the conclusion is that norfloxacin and streptomycin have common mechanisms of action, but the authors do not explain how a DNA gyrase inhibitor shows the same mechanisms of action as a ribosome inhibitor.

      The connection between antibiotic target corruption (DNA gyrase or ribosome) and the activation of respiration is indeed unclear, intriguing, and represents one of the most exciting questions in the field of antibiotic mechanisms of action. In the discussion section, we have speculated on potential pathways for this connection, including the possibility that the inhibition of cell division by both drugs may create a perception of resource scarcity (energy and biosynthetic precursors), which could subsequently trigger increased metabolism, respiration, ROS production, and ATP synthesis. However, the precise mechanisms underlying this connection require further investigation and are beyond the scope of the present study.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This manuscript highlights single-stranded DNA exo- and endo-nuclease activities of ExoIII as a potential caveat and an underestimated source of decreased efficiency in its use in biosensor assays. The data present convincing evidence for the ssDNA nuclease activity of ExoIII and identifies residues that contribute to it. The findings are useful, but the study remains incomplete as the effect on biosensor assays was not established.

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, the authors show compelling data indicating that ExoIII has significant ssDNA nuclease activity that is posited to interfere with biosensor assays. This does not come as a surprise as other published works have indeed shown the same, but in this work, the authors provide a deeper analysis of this underestimated activity.

      Response: Thank you so much for reviewing and summarizing our work.

      Strengths:

      The authors used a variety of assays to examine the ssDNA nuclease activity of ExoIII and its origin. Fluorescence-based assays and native gel electrophoresis, combined with MS analysis clearly indicate that both commercial and laboratory purified ExoIII contain ssDNA nuclease activity. Mutational analysis identifies the residues responsible for this activity. Of note is the observation in this submitted work that the sites of ssDNA and dsDNA exonuclease activity overlap, suggesting that it may be difficult to identify mutations that affect one activity but not the other. In this regard, it is of interest the observation by the authors that the ssDNA nuclease activity depends on the sequence composition of the ssDNA, and this may be used as a strategy to suppress this activity when necessary. For example, the authors point out that a 3′ A4-protruding ssDNA could be employed in ExoIII-based assays due to its resistance to digestion. However, this remains an interesting suggestion that the authors do not test, but that would have strengthened their conclusion.

      Response: Thank you so much for the positive evaluation and insightful comments on our manuscript. In the revised version, we have modified the manuscript to address the reviewer’s concerns by providing point-to-point responses to all the comments.

      Weaknesses:

      The authors provide a wealth of experimental data showing that E. coli ExoIII has ssDNA nuclease activities, both exo- and endo-, however this work falls short in showing that indeed this activity practically interferes with ExoIII-driven biosensor assays, as suggested by the authors. Furthermore, it is not clear what new information is gained compared to the one already gathered in previously published works (e.g. references 20 and 21). Also, the authors show that ssDNA nuclease activity has sequence dependence, but in the context of the observation that this activity is driven by the same site as dsDNA Exo, how does this differ from similar sequence effects observed for the dsDNA Exo? (e.g. see Linxweiler, W. and Horz, W. (1982). Nucl. Acids Res. 10, 4845-4859).

      Response: We agree with the reviewer regarding the limitations in showing the practical influence of the ssDNAse activity in the commercial detection kit. Different from the biosensor in reference 20, our results showed a potential impact of ExoⅢ on another frequently used detection system, as the primer and probe required for the detection kit could be digested by ExoⅢ, leading to a lower detection efficiency. Since the activities of ExoⅢ on ssDNA and dsDNA share a same active center, we reason that the difference in sequence specificity of ExoⅢ on these two types of substrates might be caused in two aspects: on the nuclease, some unidentified residues of ExoⅢ that play an auxiliary role in digesting ssDNA but not in dsDNA, might exist, which contribute to the difference we observed; on the substrate structure, without the base-pairing of complementary sequence, the structure of ssDNA is more flexible (changeable with environmental factors such as ions and temperature) than that of dsDNA. The two aspects may collectively result in the difference in sequence specificity of ExoⅢ on ssDNA and dsDNA. We believe that cryo-electronic microscopy-based structure analysis of the ExoⅢ-ssDNA complex would provide more comprehensive and direct evidence.

      Because of the claim that the underestimated ssDNA nuclease activity can interfere with commercially available assays, it would have been appropriate to test this. The authors only show that ssDNA activity can be identified in commercial ExoIII-based kits, but they do not assess how this affects the efficiency of a full reaction of the kit. This could have been achieved by exploiting the observed ssDNA sequence dependence of the nuclease activity. In this regard, the work cited in Ref. 20 showed that indeed ExoIII has ssDNA nuclease activity at concentrations as low as 50-fold less than what test in this work. Ref 20 also tested the effect of the ssDNA nuclease activity in Targeted Recycle Assays, rather than just testing for its presence in a kit.

      Response: Thanks so much for your comments. Logically, to evaluate the practical influence, we need to compare the current and improved detection kits. Our result suggested that raising the temperature or using the mutant may minimize the ssDNase activity of ExoⅢ. But the RAA or RPA-ExoⅢ detection kit is multiple-component system consisting of recombinase T4 UvsX, loading factor T4 UvsY, ssDNA binding protein T4 gp32 polymerase Bsu and ExoⅢ (Analyst. 2018 Dec 17;144(1):31-67. doi: 10.1039/c8an01621f), which collectively decide the performance of the kit. By increasing the temperature, the activities or functions of other proteins contained in the detection kit would also be affected, and the resultant change in detection efficiency would not reflect the real practical influence of the ssDNase activity of ExoⅢ; By replacing the wild type with the mutant, the other four proteins need to be prepared and combined with an optimized ratio for rebuilding the detection system, which is challenging. The targeted recycle assays in Ref 20 is a simple system composed of ExoⅢ and corresponding nucleic acid adapters, which could be easily simulated by the researchers for evaluation. Being a much more complex system, the RAA or RPA-ExoⅢ detection kit is difficult to manipulate for displaying the practical influence. Thank you again for your insightful suggestions; and we may conduct a systematic investigation improve the detection kit in future studies.

      Because of the implication that the presence of ssDNA exonuclease activity may have in reactions that are supposed to only use ExoIII dsDNA exonuclease, it is surprising that in this submitted work no direct comparison of these two activities is done. Please provide an experimental determination of how different the specific activities for ssDNA and dsDNA are.

      Response: As for your suggestion, we have compared the digesting rate of two activities by using an equal amount of the commercial ExoⅢ (10 U/µL) on the two types of substrates (10 µM). The results below revealed that ExoⅢ required 10 minutes to digest the 30-nt single-stranded DNA (ssDNA) (A), whereas it could digest the same sequence on double-stranded DNA (dsDNA) within 1 minute (B) (in a newly produced Supplementary Figure S1). This indicated that ExoⅢ digested the dsDNA at a rate at least ten times faster than ssDNA. In conjunction with these results, a recent study has shown that the ssDNase activity of ExoⅢ surpasses that of the conventional ssDNA-specific nuclease ExoI (Biosensors (Basel), 2023, May 26; 13(6):581, doi: 10.3390/bios13060581), suggesting a potential biological significance of ExoⅢ in bacteria related to ssDNA, even though the digesting rate is not as rapid as the dsDNA. The corresponding text has been added to the result (Lines 200-207).

      Author response image 1.

      Reviewer #2 (Public Review):

      Summary:

      This paper describes some experiments addressing 3' exonuclease and 3' trimming activity of bacterial exonuclease III. The quantitative activity is in fact very low, despite claims to the contrary. The work is of low interest with regard to biology, but possibly of use for methods development. Thus the paper seems better suited to a methods forum.

      Response: We thank you for your time and effort in improving our work. In the following, we have revised the manuscript by providing point-to-point responses to your comments.

      Strengths:

      Technical approaches.

      Response: Thanks for your evaluation.

      Weaknesses:

      The purity of the recombinant proteins is critical, but no information on that is provided. The minimum would be silver-stained SDS-PAGE gels, with some samples overloaded in order to detect contaminants.

      Response: As suggested, we have performed the silver-stained SDS-PAGE on the purified proteins. The result below indicated that no significant contaminant was found, except for a minor contaminant in S217A (in a newly produced Supplementary Figure S4).

      Author response image 2.

      Lines 74-76: What is the evidence that BER in E. coli generates multinucleotide repair patches in vivo? In principle, there is no need for the nick to be widened to a gap, as DNA Pol I acts efficiently from a nick. And what would control the extent of the 3' excision?

      Response: Thank you for the insightful questions. The team of Gwangrog Lee lab has found that ExoⅢ is capable of creating a single-stranded DNA (ssDNA) gap on dsDNA during base excision repair, followed by the repair of DNA polymerase I. The gap size is decided by the rigidity of the generated ssDNA loop and the duplex stability of the dsDNA (Sci Adv. 2021 Jul 14;7(29):eabg0076. doi: 10.1126/sciadv.abg0076).

      Figure 1: The substrates all report only the first phosphodiester cleavage near the 3' end, which is quite a limitation. Do the reported values reflect only the single phosphodiester cleavage? Including the several other nucleotides likely inflates that activity value. And how much is a unit of activity in terms of actual protein concentration? Without that, it's hard to compare the observed activities to the many published studies. As best I know, Exo III was already known to remove a single-nucleotide 3'-overhang, albeit more slowly than the digestion of a duplex, but not zero! We need to be able to calculate an actual specific activity: pmol/min per µg of protein.

      Response: Yes, once the FQ reporter is digested off even one nucleotide or phosphodiester, fluorescence will be generated, and the value reflects how many phosphodiesters at least have been cleaved during the period, based on which the digesting rate or efficiency of the nuclease on ssDNA could be calculated. The following Figure 2 and 3 showed ExoⅢ could digest the ssDNA from the 3’ end, not just a single nucleotide. Since the “unit” has been widely used in numerous studies (Nature. 2015 Sep 10;525(7568):274-7; Cell. 2021 Aug 19;184(17):4392-4400.e4; Nat Nanotechnol. 2018 Jan;13(1):34-40.), its inclusion here aids in facilitating comparisons and evaluations of the activity in these studies. And the actual activity of ExoⅢ had been calculated in Figure 4D.

      Figures 2 & 3: These address the possible issue of 1-nt excision noted above. However, the question of efficiency is still not addressed in the absence of a more quantitative approach, not just "units" from the supplier's label. Moreover, it is quite common that commercial enzyme preparations contain a lot of inactive material.

      Response: Thanks for your comments. In fact, numerous studies have used the commercial ExoⅢ (Nature. 2015 Sep 10;525(7568):274-7; Cell. 2021 Aug 19;184(17):4392-4400.e4; Nat Nanotechnol. 2018 Jan;13(1):34-40.). Using this universal label of “units” helps researchers easily compare or evaluate the activity and its influence. The commercial ExoⅢ is developed by New England Biolabs Co., Ltd., and its quality has been widely examined in a wide range of scientific investigations.

      Figure 4D: This gets to the quantitative point. In this panel, we see that around 0.5 pmol/min of product is produced by 0.025 µmol = 25,000 pmol of the enzyme. That is certainly not very efficient, compared to the digestion of dsDNA or cleavage of an abasic site. It's hard to see that as significant.

      Response: Thanks for your comments; the possible confusion could have arisen due to the arrangement of the figure. Please note that based on Figure 4D, the digestion rate of 0.025 µM ExoⅢ on the substrate is approximately 5 pmol/min (as shown on the right vertical axis), rather than 0.5 pmol/min. Given that the reaction contained ExoⅢ with a concentration of 0.025 uM in a total volume of 10 µL, the quantity of ExoⅢ was determined to be 0.25 pmol (0.025 µmol/L × 10 µL, rather than 25,000 pmol), resulting in a digestion rate of 5 pmol/min. It suggested each molecule of ExoⅢ could digest one nucleotide in 3 seconds (5 pmol nucleotides /0.25 pmol ExoⅢ/60second=0.33 nucleotides/molecular/second). While it may not be as rapid as the digestion of ExoⅢ on dsDNA, a recent study has shown that the ssDNase activity of ExoⅢ surpasses that of the conventional ssDNA-specific nuclease ExoI (Biosensors (Basel), 2023, May 26; 13(6):581, doi: 10.3390/bios13060581), suggesting a potential biological significance of ExoⅢ in bacteria related to ssDNA.

      Line 459 and elsewhere: as noted above, the activity is not "highly efficient". I would say that it is not efficient at all.

      Response: We respectfully disagree with this point. Supported by the outcomes from fluorescence monitoring of FQ reporters, gel analysis of the ssDNA probe, and mass spectrometry findings, the conclusion is convincing, and more importantly, our findings align with a recent study (Biosensors 2023, 13(6), 581; https://doi.org/10.3390/bios13060581).

      Reviewer #3 (Public Review):

      Overall:

      ExoIII has been described and commercialized as a dsDNA-specific nuclease. Several lines of evidence, albeit incomplete, have indicated this may not be entirely true. Therefore, Wang et al comprehensively characterize the endonuclease and exonuclease enzymatic activities of ExoIII on ssDNA. A strength of the manuscript is the testing of popular kits that utilize ExoIII and coming up with and testing practical solutions (e.g. the addition of SSB proteins ExoIII variants such as K121A and varied assay conditions).

      Response: We really appreciate the reviewer for pointing out the significance and strength of our work. Additionally, we have responded point-by-point to the comments and suggestions.

      Comments:

      (1) The footprint of ExoIII on DNA is expected to be quite a bit larger than 5-nt, see structure in manuscript reference #5. Therefore, the substrate design in Figure 1A seems inappropriate for studying the enzymatic activity and it seems likely that ExoIII would be interacting with the FAM and/or BHQ1 ends as well as the DNA. Could this cause quenching? Would this represent real ssDNA activity? Is this figure/data necessary for the manuscript?

      Response: Thanks so much for your questions. The footprint of ExoⅢ on the dsDNA appears to exceed 5 nucleotides based on the structural analysis in reference #5. However, the footprint may vary when targeting ssDNA. Mass spectrometry analysis in our study demonstrated that ExoⅢ degraded a ~20-nucleotide single-stranded DNA substrate to mononucleotides (Figure 3), suggesting its capability to digest a 5-nt single-stranded DNA into mononucleotides as well. Otherwise, the reaction product left would only be 5-nt ssDNA fragment. Thus, the 5-nt FQ reporter is also a substrate for ExoⅢ. ExoⅢ possibly interacts with BHQ1 and affects its quenching efficiency on FAM to trigger the fluorescence release, as shown in Figure 1A, but this possibility has already been ruled out by the development of the RPA-ExoⅢ detection kit. As pointed out in the introduction part, the kit requires a probe labeled with fluorophore and quencher. If ExoⅢ could affect the fluorophore and quencher causing fluorescence release, the detection kit would yield a false-positive result regardless of the presence of the target, rendering the detection system ineffective. Thus, ExoⅢ does not interfere with the fluorophore and quencher. The digestion of ExoⅢ on the ssDNA within the FQ reporter was the sole cause of fluorescence release, and the emitted fluorescence represented the ssDNA activity. The result suggested that the FQ reporter might offer an effective approach to sensitively detect or quantitatively study the ssDNase activity of a protein that has not been characterized.

      (2) Based on the descriptions in the text, it seems there is activity with some of the other nucleases in 1C, 1F, and 1I other than ExoIII and Cas12a. Can this be plotted on a scale that allows the reader to see them relative to one other?

      Response: Thanks so much for your suggestions. We attempted to adjust the figure, but due to most of the values being less than or around 0.005, it was challenging to re-arrange for presentation.

      (3) The sequence alignment in Figure 2N and the corresponding text indicates a region of ExoIII lacking in APE1 that may be responsible for their differences in substrate specificity in regards to ssDNA. Does the mutational analysis support this hypothesis?

      Response: Our result indicated that the mutation of R170 located in the region (αM helix) resulted in lower digesting efficiency on ssDNA than the wild type, which showed that R170 was an important residue for the ssDNase activity, partially supported the hypothesis. Further investigation is needed to determine whether the structure of the αM helix accounts for the distinctions observed between ExoⅢ and APE1. Future research may require more residue mutations in this area for validation.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      • A significant fraction of amplitude is missing in the presented fluorescence time courses reporting on ssDNA nuclease activity (Figs 1 B, E, and H). Please indicate the dead time of mixing in these experiments, and if necessary include additional points in this time scale. It is unacceptable for the authors to simply connect the zero-time point and the first experimental point with a dashed line.

      Response: We thank the reviewer for pointing out the critical detail. We agree that simply connecting with a dashed line is an inappropriate way for indicating the real fluorescence generated in the initial stage. The fluorescence monitor machine needs about two minutes to initiate from the moment we place the reaction tube into the machine. But ExoⅢ can induce significant fluorescence immediately, reaching the peak within ~40 seconds, as shown in the video data. Therefore, it is difficult to record the initial real-time fluorescence generated. To avoid misleading, we have added a description in the legend as follows: “The dashed line used in the figure does not indicate the real-time fluorescence generated in the reaction but only represents a trend in the period for the monitor machine to initiate (~2 minutes).” The text was added in Lines 836-838.

      • The authors chose to utilize a 6% agarose electrophoresis to analyze digestion products. However, while this approach clearly shows that the substrates are being digested, it does not allow us to clearly estimate the extent. It would be appropriate to include control denaturing PAGE assays to test the extent of reaction, especially for dsDNA that contains a ssDNA extension, as in Figure 8, or for selected mutants to test whether exo activity may be limited to just a few nts, that may not be resolved with the lower resolution agarose gels.

      Response: We agree with the reviewer that denaturing PAGE assays usually is the choice for high-resolution analysis. And we performed this experiment on the short ssDNA, but observed that the bands of digestion products frequently shifted more or less in the gel. Of note, the other independent study also showed a similar phenomenon (Nucleic Acids Res. 2007;35(9):3118-27. doi: 10.1093/nar/gkm168). Even slight band shifting would significantly interfere with our analysis of the results, especially on the short ssDNA utilized in the study. After numerous attempts, we discovered that 6% agarose gel electrophoresis could detect the digested ssDNA bands with lower resolution than PAGE, but less shifting was observed. Considering all the factors, the 6% agarose gel was finally selected to analyze the digestion process.

      Reviewer #2 (Recommendations For The Authors):

      Line 158: tipycal should be typical

      Response: Thanks so much, and as the reviewer pointed, we have corrected the typo.

      Lines 299-300: "ssD-NA" should not be hyphenated, i.e., it should be ssDNA. .

      Response: Thank you for pointing this out. We have rectified the error and thoroughly reviewed the entire paper for any necessary corrections.

      Reviewer #3 (Recommendations For The Authors):

      Figure 2A should indicate the length of the substate. The legend says omitted nucleotides - I assume they were present in the substrate and just not in the figure? The authors should be very clear about this. Moreover, the text and figure do not well describe the design differences between the three probes. Are they the same except just 23, 21, and 20 nt in length? Are the sequences selected at random?

      Response: Thank you for your questions. The lengths of probes were described in the figure (23, 21, and 20 nt). The legend has been reworded in Line 843 as “The squiggle line represents the ~20 nucleotides of the ssDNA oligo.” And the sequences of three ssDNA substrates were randomly selected, and all the detailed information was provided in Supplementary Table S4.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer 1 (Public Review):

      Summary:

      The authors propose that the energy landscape of animals can be thought of in the same way as the fundamental versus realized niche concept in ecology. Namely, animals will use a subset of the fundamental energy landscape due to a variety of factors. The authors then show that the realized energy landscape of eagles increases with age as the animals are better able to use the energy landscape. Strengths:

      This is a very interesting idea and that adds significantly to the energy landscape framework. They provide convincing evidence that the available regions used by birds increase with size.

      Weaknesses:

      Some of the measures used in the manuscript are difficult to follow and there is no mention of the morphometrics of birds or how these change with age (other than that they don’t change which seems odd as surely they grow). Also, there may need to be more discussion of other ontogenetic changes such as foraging strategies, home range size etc.

      We thank reviewer 1 for their interest in our study and for their constructive recommendations. We have included further discussions of these points in the manuscript and outline these changes in our responses to the detailed recommendations below.

      Reviewer 2 (Public Review):

      Summary:

      With this work, the authors tried to expand and integrate the concept of realized niche in the context of movement ecology by using fine-scale GPS data of 55 juvenile Golden eagles in the Alps. Authors found that ontogenic changes influence the percentage of area flyable to the eagles as individuals exploit better geographic uplifts that allow them to reduce the cost of transport.

      Strengths:

      Authors made insightful work linking changes in ontogeny and energy landscapes in large soaring birds. It may not only advance the understanding of how changes in the life cycle affect the exploitability of aerial space but also offer valuable tools for the management and conservation of large soaring species in the changing world.

      Weaknesses:

      Future research may test the applicability of the present work by including more individuals and/or other species from other study areas.

      We are thankful to reviewer 2 for their encouragement and positive assessment of our work. We have addressed their specific recommendations below.

      Recommendations for the authors:

      Reviewer 1 (Recommendations For The Authors):

      I found this to be a very interesting paper which adds some great concepts and ideas to the energy landscape framework. The paper is also concise and well-written. While I am enthusiastic about the paper there are areas that need clarifying or need to be made clearer. Specific comments below:

      Line 64: I disagree that competition is the fundamental driver of the realized niche. In some cases, it may be but in others, predation may be far more important (as an example).

      We agree with this point and have now clarified that competition is an example of a driver of the realized niche. We have also included predation as another example:

      "However, just as animals do not occupy the entirety of their fundamental Hutchinsonian niche in reality [1], for example due to competition or predation risk, various factors can contribute to an animal not having access to the entirety of its fundamental movement niche."

      Intro: I think the authors should emphasize that morphological changes with ontogeny will change the energy landscape for many animals. It may not be the case specifically with eagles but that won’t be true for other animals. For example, in many sharks, buoyancy increases with age.

      We agree and have now clarified that the developmental processes that we are interested in happen in addition to morphological changes:

      "In addition to morphological changes, as young animals progress through their developmental stages, their movement proficiency [2] and cognitive capabilities [3] improve and memory manifests [4]."

      Line 91-93: The idea that birds fine-tune motor performance to take advantage of updrafts is a very important one to the manuscript and should be discussed in a bit more detail. How? At the moment there is a single sentence and it doesn’t even have a citation yet this is the main crux of the changes in realized energy landscape with age. This point should be emphasized because, by the end of the introduction, it is not clear to me why the landscape should be cheaper as the birds age?

      Thank you for pointing out this missing information. We have now added examples to clarify how soaring birds fine-tune their motor performance when soaring. These include for example adopting high bank angles in narrow and weak thermals [5] and reducing gliding airspeed when the next thermal has not been detected [6]:

      "Soaring flight is a learned and acquired behavior [7, 8], requiring advanced cognitive skills to locate uplifts as well as fine-tuned locomotor skills for optimal adjustment of the body and wings to extract the most energy from them, for example by adopting high bank angles in narrow and weak thermals [5] and reducing gliding airspeed when the next thermal has not been detected [6]."

      Results:

      Line 106: explain the basics of the life history of the birds in the introduction. I have no idea what emigration refers to or the life history of these animals.

      Thank you for pointing out the missing background information. We have now added this

      information to the introduction:

      "We analyzed 46,000 hours of flight data collected from bio-logging devices attached to 55 wild-ranging golden eagles in the Central European Alps. These data covered the transience phase of natal dispersal (hereafter post-emigration). In this population, juveniles typically achieve independence by emigrating from the parental territory within 4-10 months after fledging. However, due to the high density of eagles and consequently the scarcity of available territories, the transience phase between emigration and settling by eventually winning over a territory is exceptionally long at well over 4 years. Our hypothesis posited that the realized energy landscape during this transience phase gradually expands as the birds age."

      What I still am having a hard time understanding is the flyability index. Is this just a measure of the area animals actively select and then the assumption that it’s a good region to fly within?

      We have modified our description of the flyability index for more clarity. In short, we built a step-selection model and made predictions using this model. The predictions estimate the probability of use of an area based on the predictors of the model. For the purpose of our study and what our predictors were (proxies for uplift + movement capacity), we interpreted the predicted values as the "flyability index". We have now clarified this in the methods section:

      "We made the predictions on the scale of the link function and converted them to values between 0 and 1 using the inverse logit function [9]. These predicted values estimated the probability of use of an area for flying based on the model. We interpreted these predicted values as the flyability index, representing the potential energy available in the landscape to support flight, based on the uplift proxies (TRI and distance to ridge line) and the movement capacity (step length) of the birds included in the model."

      It might also be useful to simply show the changes in the area the animals use with age as well (i.e. a simple utilization distribution). This should increase in age for many animals but would also be a reflection of the resources animals need to acquire as they get older.

      We have now added the figure S2 to the supplementary material. This plot was created by calculating the cumulative area used by the birds in each week after emigration. This was done by extracting the commuting flights for each week, converting these to line objects, overlapping the lines with a raster of 100*100 m cell size, counting the number of overlapping cells and calculating the area that they covered. We did not calculate UDs or MCPs because the eagles seem to be responding to linear features of the landscape, e.g. preferring ridgelines and avoiding valleys. Using polygons to estimate used areas would have made it difficult to ensure that decision-making with regards to these linear features was captured.

      In a follow-up project, a PhD student in the golden eagle consortium is exploring the individuals’ space use after emigration considering different environmental and social factors. The outcome of that study will further complete our understanding of the post-emigration behavior of juvenile golden eagles in the Alps.

      How much do the birds change in size over the ontogeny measured? This is never discussed.

      Thank you for bringing up this question. The morphometrics of juvenile golden eagles are not significantly different from the adults, except in the size of culmen and claws [10]. Body mass changes after fledging, because of the development of the pectoral muscles as the birds start flying. Golden eagles typically achieve adult-like size and mass within their natal territory before emigration, at which time we started quantifying the changes in energy landscape. Given our focus on post-emigration flight behavior, we do not expect any significant changes in size and body mass during our study period. We now cover this in the discussion:

      "Juvenile golden eagles complete their morphological development before gaining independence from their parents, with their size and wing morphology remaining stable during the post-emigration phase [10, 11]. Consequently, variations in flyability of the landscape for these birds predominantly reflect their improved mastery of soaring flight, rather than changes in their morphology."

      Discussion:

      Line 154: Could the increase in step length also be due to changes in search strategies with age? e.g. from more Brownian motion when scavenging to Levy search patterns when actively hunting?

      This is a very good point and we tried to look for evidence of this transition in the tracking data. We explored the first passage time for two individuals with a radius of 50 km to see if there is a clear transition from a Brownian to a Levy motion. The patterns that emerge are inconclusive and seem to point to seasonality rather than a clear transition in foraging strategy (Author response image 1). We have modified our statement in the discussion about the change in preference of step lengths indicating improve flight ability, to clarify that it is speculative:

      Author response image 1.

      First passage times using a 50 km radius for two randomly selected individuals.

      "Our findings also reveal that as the eagles aged, they adopted longer step lengths, which could indicate an increasing ability to sustain longer uninterrupted flight bouts."

      Methods:

      Line 229: What is the cutoff for high altitude or high speed?

      We used the Expectation-maximization binary clustering (EMbC) method to identify commuting flights. The EmbC method does not use hard cutoffs to cluster the data. Each data point was assigned to the distribution to which it most likely belonged based on the final probabilities after multiple iterations of the algorithm. Author response image 2 shows the distribution of points that were either used or not used based on the EmbC classification.

      Author response image 2.

      Golden eagle tracking points were either retained (used) or discarded (not used) for further data analysis based on the EmbC algorithm. The point were clustered based on ground speed and height above ground.

      Figure 1: The figure captions should stand on their own but in this case there is no information as to what the tests are actually showing.

      We have now updated the caption to provide information about the model:

      "Coefficient estimates of the step selection function predicting probability of use as a function of uplift proxies, week since emigration, and step length. All variables were z-transformed prior to modeling.

      The error bars show 95% confidence intervals."

      Reviewer 2 (Recommendations For The Authors):

      First, I want to congratulate you on this fantastic work. I enjoyed reading it. The manuscript is clear and well-written, and the findings are sound and relevant to the field of movement ecology. Also, the figures are neatly presented and easy to follow.

      I particularly liked expanding the old concept of fundamental vs realized niche into a movement ecology context. I believe that adds a fresh view into these widely accepted ecological assumptions on species niche, which may help other researchers build upon them to better understand movement "realms" on highly mobile animals in a rapidly changing world.

      I made some minor comments to the manuscript since it was hard to find important weaknesses in it, given the quality of your work. However, there was a point in the discussion that I feel deserves your attention (or rather a reflection) on how major biological events such as moulting could also influence birds to master the flying and exploitation of the energy landscape. You may find my suggestion quite subjective, but I think it may help expand your idea for future works and, what is more, link concepts such as energy landscapes, ontogeny, and important life cycle events such as moulting in large soaring birds. I consider this relevant from a mechanistic perspective to understand better how individuals negotiate all three concepts to thrive and persist in changing environments and to maximise their

      fitness.

      Once again, congratulations on this excellent piece of research.

      We thank the reviewer for their enthusiasm about our work and for bringing up important points about the biology of the species. Our detailed response are below.

      MINOR COMMENTS:

      (Note: Line numbers refer to those in the PDF version provided by the journal).

      Line 110: Distinguished (?)

      corrected

      Line 131: Overall, I agree with the authors’ discussion and very much liked how they addressed crucial points. However, I have a point about some missing non-discussed aspects of bird ecology that had not been mentioned.

      The authors argue that morphological traits are less important in explaining birds’ mastery of flight (thus exploiting all available options in the landscape). However, I think the authors are missing some fundamental aspects of bird biology that are known to affect birds’ flying skills, such as moult.

      The moulting process affects species’ flying capacity. Although previous works have not assessed moults’ impact on movement capacity, I think it is worth including the influence of flyability on this ecologically relevant process.

      For instance, golden eagles change their juvenile plumage to intermediate, sub-adult plumage in two or three moult cycles. During this process, the moulting process is incomplete and affects the birds’ aerodynamics, flying capacity, and performance (see Tomotani et al. 2018; Hedenström 2023). Thus, one could expect this process to be somewhat indirectly linked to the extent to which birds can exploit available resources.

      Hedenström, A. (2023). Effects of wing damage and moult gaps on vertebrate flight performance.

      Journal of Experimental Biology, 226(9), jeb227355. Tomotani, B. M., Muijres, F. T., Koelman, J., Casagrande, S., & Visser, M. E. (2018). Simulated moult reduces flight performance, but overlap with breeding does not affect breeding success in a longdistance migrant. Functional Ecology, 32(2), 389-401.

      We thank the reviewer for bringing up this relevant topic. We explored the literature listed by the reviewer and also other sources. We came to the conclusion that moulting does not impact our findings. In our study, we included data for eagles that had emigrated from the natal territories, with their fully grown feathers in juvenile plumage. The moulting schedule in juvenile birds is similar to that of adults: the timing, intensity, and sequence of feathers being replaced is consistent every year (Author response image 3). For these reasons, we do not believe that moulting stage noticeably impacts flight performance at the scale of our study (hourly flights). Fine details of soaring flight performance (aerodynamics within and between thermals) could differs during moulting of different primary and secondary feathers, but this is something that would occur every time the eagle replaces these feather and we do not expect it to be any different for juveniles. Such fine scale investigations are outside the scope of this study.

      Author response image 3.

      Moulting schedule of golden eagles [12]

      Lines 181-182: I don’t think trophic transitions rely only on individual flying skill changes. Furthermore, despite its predominant role, scavenging does not mean it is the primary source of food acquisition in golden eagles. This also depends on prey availability, and scavenging is an auxiliary font of easy-to-catch food.

      Scavenging implies detecting carcasses. Should this carcass appearance occur in highly rugged areas, the likelihood of detection also reduces notably. This is not to say that there are not more specialized carrion consumers, such as vultures, that may outcompete eagles in searching for such resources more

      efficiently.

      In summary, I don‘t think such transition relies only on flying skills but on other non-discussed factors such as knowledge accumulation of the area or even the presence of conspecifics.

      Line 183: This is precisely what I meant with my earlier comment.

      Thank you for the discussion on the interaction between flight development and foraging strategy. We explored the transition from scavenging to hunting above as a response to Reviewer 1, but did not find a clear transition. This is in line with your comment that the birds probably use both scavenging and hunting methods opportunistically.

      Lines 193-195: I will locate this sentence somewhere in this paragraph. As it is now, it seems a bit out of context. It could be a better fit at the end of the first point in line 203.

      Thank you for pointing out the issue with the flow. We have now added a transitional sentence before this one to improve the paragraph. The beginning of the conclusion now reads as follows, with the new sentence shown in boldface.

      "Spatial maps serve as valuable tools in informing conservation and management strategies by showing the general distribution and movement patterns of animals. These tools are crucial for understanding how animals interact with their environment, including human-made structures. Within this context, energy landscapes play an important role in identifying potential areas of conflict between animals and anthropogenic infrastructures such as wind farms. The predictability of environmental factors that shape the energy landscape has facilitated the development of these conservation tools, which have been extrapolated to animals belonging to the same ecological guild traversing similar environments."

      References

      (1) Colwell, R. K. & Rangel, T. F. Hutchinson’s duality: The once and future niche. Proceedings of the National Academy of Sciences 106, 19651–19658. doi:10.1073/pnas.0901650106 (2009).

      (2) Corbeau, A., Prudor, A., Kato, A. & Weimerskirch, H. Development of flight and foraging behaviour in a juvenile seabird with extreme soaring capacities. Journal of Animal Ecology 89, 20–28. doi:10.1111/1365-2656.13121 (2020).

      (3) Fuster, J. M. Frontal lobe and cognitive development. Journal of neurocytology 31, 373–385.

      doi:10.1023/A:1024190429920 (2002).

      (4) Ramsaran, A. I., Schlichting, M. L. & Frankland, P. W. The ontogeny of memory persistence and specificity. Developmental Cognitive Neuroscience 36, 100591. doi:10.1016/j.dcn.2018.09.002 (2019).

      (5) Williams, H. J., Duriez, O., Holton, M. D., Dell’Omo, G., Wilson, R. P. & Shepard, E. L. C. Vultures respond to challenges of near-ground thermal soaring by varying bank angle. Journal of Experimental Biology 221, jeb174995. doi:10.1242/jeb.174995 (Dec. 2018).

      (6) Williams, H. J., King, A. J., Duriez, O., Börger, L. & Shepard, E. L. C. Social eavesdropping allows for a more risky gliding strategy by thermal-soaring birds. Journal of The Royal Society Interface 15, 20180578. doi:10.1098/rsif.2018.0578 (2018).

      (7) Harel, R., Horvitz, N. & Nathan, R. Adult vultures outperform juveniles in challenging thermal soaring conditions. Scientific reports 6, 27865. doi:10.1038/srep27865 (2016).

      (8) Ruaux, G., Lumineau, S. & de Margerie, E. The development of flight behaviours in birds. Proceedings of the Royal Society B: Biological Sciences 287, 20200668. doi:10.1098/rspb.2020.

      0668 (2020).

      (9) Bolker, B., Warnes, G. R. & Lumley, T. Package gtools. R Package "gtools" version 3.9.4 (2022).

      (10) Bortolotti, G. R. Age and sex size variation in Golden Eagles. Journal of Field Ornithology 55,

      54–66 (1984).

      (11) Katzner, T. E., Kochert, M. N., Steenhof, K., McIntyre, C. L., Craig, E. H. & Miller, T. A. Birds of the World (eds Rodewald, P. G. & Keeney, B. K.) chap. Golden Eagle (Aquila chrysaetos), version 2.0. doi:10.2173/bow.goleag.02 (Cornell Lab of Ornithology, Ithaca, NY, USA, 2020).

      (12) Bloom, P. H. & Clark, W. S. Molt and sequence of plumages of Golden Eagles and a technique for in-hand ageing. North American Bird Bander 26, 2 (2001).

    1. Author Response

      The following is the authors’ response to the original reviews.

      We thank the three reviewers and the reviewing editor for their positive evaluation of our manuscript. We particularly appreciate that they unanimously consider our work as “important contributions to the understanding of how the CAF-1 complex works”, “The large amounts of data provided in the paper support the authors' conclusion very well” and “The paper effectively addresses its primary objective and is strong”. We also thank them for a careful reading and useful comments to improve the manuscript. We have built on these comments to provide an improved version of the manuscript, and address them point by point below .

      Reviewer #1 (Public Review):

      Summary:

      This paper makes important contributions to the structural analysis of the DNA replication-linked nucleosome assembly machine termed Chromatin Assembly Factor-1 (CAF-1). The authors focus on the interplay of domains that bind DNA, histones, and replication clamp protein PCNA.

      Strengths:

      The authors analyze soluble complexes containing full-length versions of all three fission yeast CAF-1 subunits, an important accomplishment given that many previous structural and biophysical studies have focused on truncated complexes. New data here supports previous experiments indicating that the KER domain is a long alpha helix that binds DNA. Via NMR, the authors discover structural changes at the histone binding site, defined here with high resolution. Most strikingly, the experiments here show that for the S. pombe CAF-1 complex, the WHD domain at the C-terminus of the large subunit lacks DNA binding activity observed in the human and budding yeast homologs, indicating a surprising divergence in the evolution of this complex. Together, these are important contributions to the understanding of how the CAF-1 complex works.

      Weaknesses:

      1. There are some aspects of the experimentation that are incompletely described: <br /> In the SEC data (Fig. S1C) it appears that Pcf1 in the absence of other proteins forms three major peaks. Two are labeled as "1a" (eluting at ~8 mL) and "1b" (~10-11 mL). It appears that Pcf1 alone or in complex with either or both of the other two subunits forms two different high molecular weight complexes (e.g. 4a/4b, 5a/5b, 6a/6b). There is also a third peak in the analysis of Pcf1 alone, which isn't named here, eluting at ~14 mL, overlapping the peaks labeled 2a, 4c, and 5c. The text describing these different macromolecular complexes seems incomplete (p. 3, lines 32-33): "When isolated, both Pcf2 and Pcf3 are monomeric while Pcf1 forms large soluble oligomers". Which of the three Pcf1-alone peaks are oligomers, and how do we know? What is the third peak? The gel analysis across these chromatograms should be shown.

      We thank the reviewer for his/her careful reading of the manuscript. Indeed, we plotted two curves in Figure S1C in a color that does not match the legend, leading to confusion. Curve 1, Pcf1 alone, depicted in red, should appear in pink as indicated in the legend and in the SDS-PAGE analysis below. Curve 1 exhibits two peaks, labeled as 1a and 1b. With an elution volume of 8.5mL close to the dead volume of the column, peak 1a corresponds to soluble oligomers, while peak 1b (10.4mL) likely corresponds to monomeric Pcf1. Curve 5 (Pcf1 + Pcf2 mixture) was in pink instead of purple as indicated in the legend. This curve consists of three distinct peaks (5a, 5b, and 5c). The SDS-PAGE analysis revealed the presence of oligomers of Pcf1-Pcf2 (5a, 8.3mL), the Pcf1-Pcf2 complex (5b, 9.8mL), and Pcf2 alone (5c, 13.6 mL).

      The color has now been corrected in the revised manuscript.

      More importantly, was a particular SEC peak of the three-subunit CAF-1 complex (i.e. 4a or 4b) characterized in the further experimentation, or were the data obtained from the input material prior to the separation of the different peaks? If the latter, how might this have affected the results? Do the forms inter-convert spontaneously?

      We conducted all structural analyses and DNA/PCNA interactions Figures (1-4, S1-S4) with freshly SECpurified samples corresponding to the 4b peak (9.7mL). Aliquots were flash-frozen with 50% glycerol for in vitro histone assembly assays (Figure 5).

      1. Given the strong structural predication about the roles of residues L359 and F380 (Fig. 2f), these should be mutated to determine effects on histone binding.

      We are pleased that our structural predictions are considered as strong. We agree that investigating the role of the L359 and F380 residues will be critical to further refine the binding interface between histone H3-H4 and CAF-1. An in vitro and in vivo analysis of such mutated forms, alongside the current Pcf1-ED mutant characterized in this article and additional potential mutated forms, has the potential to provide a better understanding of the dynamic of histone deposition by CAF-1. However, these additional approaches would require to reach another step in breaking this enigmatic dynamic.

      1. Could it be that the apparent lack of histone deposition by the delta-WHD mutant complex occurs because this mutant complex is unstable when added to the Xenopus extract?

      We cannot formally exclude this possibility, and this could potentially applies to all mutated forms tested. However, in the absence of available antibodies against the fission yeast CAF-1 complex, we cannot test this hypothesis for technical reasons. Nevertheless, we feel reassured by the fact that the in vitro assays of nucleosome assembly are overall consistent with the in vivo assays. Indeed, all mutated forms tested that abolished or weakened nucleosome assembly also exhibited synthetic lethality/growth defect in the absence of a functional HIRA pathway, including the delta WHD mutated form. This genetic synergy, that reflects a defective histone deposition by CAF-1, is not specific to the fission yeast S. pombe and was previously reported in S. cerevisiae (Kaufman et al. MCB 1998; Krawitz et al. MCB 2002). This further supports the evolutionary conservation based on genetic assay as a read out for defective histone deposition by CAF-1.

      Reviewer #1 (Recommendations For The Authors):

      • p. 4: "An experimental molecular weight of 179 kDa was calculated using Small Angle X-ray Scattering (SAXS), consistent with a 1:1:1 stoichiometry (Figure S1e). These data are in agreement with a globular complex with a significant flexibility (Figure S1f)." There needs to be more description of the precision of the molecular weight measurement, and what aspects of these data indicate the flexibility.

      The molecular weight was estimated using the correlation volume (Vc) defined by (Rambo & Tainer, Nature 2013, 496, 477-481). The estimated error with this method is around 10%. We added this information together with supporting arguments for the existence of flexibility: “An experimental molecular weight of 179 kDa was calculated using Small Angle X-ray Scattering (SAXS). Assuming an accuracy of around 10% with this method (Rambo and Tainer 2013), this value is consistent with a 1:1:1 stoichiometry for the CAF-1 complex (calculated MW 167kDa) (Figure S1e). In addition, the position of the maximum for the dimensionless Kratky plot was slightly shifted to higher values in the y and x axis compared to the position of the expected maximum of the curve for a fully globular protein (Figure S1f).

      This shows that the complex was globular with a significant flexibility.”

      • p. 6, lines 21-22: "In contrast, a large part of signals (338-396) did not vanish anymore upon addition of a histone complex preformed with two other histone chaperones known to compete with CAF-1 for histone binding..." Given the contrast made later with the 338-351 region which is insensitive to Asf1/Mcm2, it would be clearer for the reader to describe the Asf1/Mcm2-competed regions as residues 325-338 plus 352-396. Note that the numerical scale of residues doesn't line up perfectly with the data points in Figure 2d, and this should be fixed as well.

      We thank this reviewer for spotting this typographical error; we intended to write "In contrast, a large part of signals (348-396) did not vanish anymore… “. We modified paragraph as suggested by the reviewer because we agree it is clearer for the reader : “In contrast, only a shorter fragment (338-347) vanished upon addition of Asf1-H3-H4-Mcm2(69-138), a histone complex preformed with two other histone chaperones, Asf1 and Mcm2, known to compete with CAF-1 for histone binding (Sauer et al. 2017) and whose histone binding modes are well established (Figure 2e) (Huang et al. 2015, Richet et al. 2015). This finding underscores a direct competition between residues (325-338) and (349-396) within the ED domain and Asf1/Mcm2 for histone binding.”

      The slight shift in the numerical scale Figure 2d was also corrected.

      • p. 8. Lines 22-24: "EMSAs with a double-stranded 40bp DNA fragment confirmed the homogeneity of the bound complex. When increasing the SpCAF-1 concentration, additional mobility shifts suggest, a cooperative DNA binding (Figure 3a)." I agree that the migration of the population is further retarded upon the addition of more protein. However, doesn't this negate the first sentence? That is, if multiple CAF-1 complexes can bind each dsDNA molecule, can these complexes be described as homogeneous?

      We fully agree with the reviewer's comment and have removed the notion of homogeneity from the first sentence. “EMSAs with a double-stranded 40bp DNA fragment showed the formation of a bound complex.”

      • Figure S2b Legend: "1H-15N HSQC spectra of Pcf1_ED (425-496)." The residue numbers should read 325-396.

      The typo has been corrected.

      • Is the title for Figure 5 correct?: "Figure 5: Rescue using Y340 and W348 in the ED domain, the intact KER DNA binding domain and the C-terminal WHD of Pcf1 in SpCAF-1 mediated nucleosome assembly." I don't see that any point mutation rescue experiments are done here.

      The title of figure 5 has been modified for “Efficient nucleosome assembly by SpCAF-1 in vitro requires interactions with H3-H4, DNA and PCNA, and the C-terminal WHD domain”.

      • Figure S6C. I assume the top strain lacks the Pcf2-GFP but this should be stated explicitly.

      The following sentence “The top strain corresponds to a strain expressing wild-type and untagged Pcf2 as a negative control of GFP fluorescence” is now added to the figure legend. The figure S6C has been modified accordingly to mention “Pcf2 (untagged)” and state more explicitly.

      • Regarding point #3 in the public review, a simple initial test of this idea would be to determine if similar amounts of wt and mutant complexes can be immunoprecipitated at the endpoint of the assembly reactions.

      In the absence of available antibodies against the fission yeast CAF-1 complex, we cannot test this hypothesis for technical reasons. However, the in vitro assays of nucleosome assembly are overall consistent with the in vivo assays. Indeed, all mutated forms tested that abolished or weakened nucleosome assembly also exhibited synthetic lethality/growth defect in the absence of a functional HIRA pathway, including the delta WHD mutated form. This genetic synergy, reflecting defective histone deposition by CAF-1, is not specific to the fission yeast S. pombe, as it was previously reported in S. cerevisiae (Kaufman et al. MCB 1998; Krawitz et al. MCB 2002), further supporting the evolution conservation in the genetic assay as a read out for defective histone deposition by CAF-1.

      • Foundational findings that should be cited: The role of PCNA in CAF-1 activity was first recognized by pioneering studies in the Stillman laboratory (PMID: 10052459, 11089978). The earliest recombinant studies of CAF-1 showed that the large subunit is the binding platform for the other two, showed that the KER and ED domains were required for histone deposition activity, and roughly mapped the p60-binding site on the large subunit (PMID: 7600578). Another early study roughly mapped the binding site for the third subunit and showed that biological effects of impairing the PCNA binding synergized with defects in the HIR pathway (PMID: 11756556), a genetic synergy first demonstrated in budding yeast (PMID: 9671489).

      We thank the reviewer for providing these important references that are now cited in the manuscript. PMID: 10052459 and 11089978 are cited page 2 line 18 and 19, PMID: 7600578 page 19 line 5 and PMID: 11756556 and 9671489 page 18 line 2.

      Reviewer #2 (Public Review):

      Summary:

      The authors describe the structure-functional relationship of domains in S. pombe CAF-1, which promotes DNA replication-coupled deposition of histone H3-H4 dimer. The authors nicely showed that the ED domain with an intrinsically disordered structure binds to histone H3-H4, that the KER domain binds to DNA, and that, in addition to a PIP box, the KER domain also contributes to the PCNA binding. The ED and KER domains as well as the WHD domain are essential for nucleosome assembly in vitro. The ED, KER domains, and the PIP box are important for the maintenance of heterochromatin.

      Strengths:

      The combination of structural analysis using NMR and Alphafold2 modeling with biophysical and biochemical analysis provided strong evidence on the role of the different domain structures of the large subunit of SpCAF-1, spPCF-1 in the binding to histone H3-H4, DNA as well as PCNA. The conclusion was further supported by genetic analysis of the various pcf1 mutants. The large amounts of data provided in the paper support the authors' conclusion very well.

      Reviewer #2 (Recommendations For The Authors):

      The paper by Ochesenbein describes the structural and functional analysis of S. pombe CAF-1 complex critical for DNA replication-coupled histone H3/H4 deposition. By using structural, biophysical, and biochemical analyses combined with genetic methods, the authors nicely showed that a large subunit of SpCAF1, SpPCF-1, consists of 5 structured domains with four connecting IDR domains. The ED domain with IDR nature binds to histone H3-H4 dimer with the conformational change of the other domain(s). SpCAF-1 binds to dsDNA by using the KER domain, but not the WHD domain. The experiments have been done with great care and a large amount of the data are highly reliable. Moreover, the results are clearly presented and convincingly written. The conclusion in the paper is very solid and will be useful for researchers who work in the field of chromosome biology.

      Major points:

      1. DNA binding of the KER mutant shown in Figures S3h and S3i, which was measured by the EMSA, looks similar to that of wild-type control in Figure S3f, which is different from the data in Figures 3b and 3e measured by the MST. The authors need a more precise description of the EMSA result of the KER mutant shown in Figures 3 and S3. The quantification of the EMSA result would resolve the point (should be provided).

      A proposed by this reviewer, we performed quantification of all EMSA presented in Figure 3 and Figure S3. We quantified the signal of the free DNA band to calculate a percentage of bound DNA in each condition. All EMSA experiments were conducted in duplicate, allowing us to calculate an average value and standard deviation for each interaction. Representative curves and fitted values are reported below in the figure provided for the reviewer (panel a data for Pcf1_KER domain with two fitting models, panel b for the entire CAF-1 complexes and mutants, panel c for the isolated Pcf1_KER domains), all fitted values in panel d. Importantly, as illustrated in panel a, the complete model for a single interaction (complete KD model, dashed line curve) does not adequately fit the data. In contrast, a function incorporating cooperativity (Hill model) better accounts for the measured data (solid line curve). Consistently, we also used the Hill model to fit the binding curves measured with the MST technique. As also specified now in the text, the Hill model allows to determine an EC50 value (concentration of protein resulting in the disappearance of half of the free DNA band intensity) and a Hill coefficient value (representing cooperativity during the interaction) for each curve.

      We measure a value of 3.4 ± 0.4 μM for the EC50 of SpCAF-1 WT, which is higher than the value measured by MST (0.7 ± 0.1 μM). Higher values were also calculated for all mutants and isolated Pcf1_KER domains compared to MST. These discrepancies could raise from the fact that the DNA concentration used in the two techniques were very different (20nM for MST experiments and 1μM for EMSA). Unlike the complete KD model, which includes in the calculation the DNA concentration (considered here as the "receptor"), the Hill model is fitted independently of this value. This model assumes that the “receptor” concentration is low compared to the KD. Here we calculate EC50 values on the same order of magnitude as the DNA concentration (low micromolar), The quantification obtained by EMSA is thus challenging to interpret. In contrast, values fitted by the MST measurements are more reliable since this limitation of low “receptor” concentration is correct.

      Therefore, although measurements of EC50 and Hill coefficient from EMSA are reproducible, they may be confusing for quantifying apparent affinity values through EC50. Nevertheless, this quantitative analysis of EMSA, requested by the reviewer, has highlighted an interesting characteristic of the KER mutant that is consistent across both methods: even though the EMSA pointed by the reviewer (Figures S3h and S3i compared to the wild-type control in Figure 3d and Figure S3f) show similar EC50 values, the binding cooperativity is different. Binding curves for the KER mutants is no longer cooperative (Hill coefficient ~1), and this is observed for all KER curves (isolated Pcf1_KER domain and the entire SpCAF-1 complex) with both methods, EMSA and MST. We thus decided to emphasize this characteristic of the KER mutant in the text (page 9 line 30-32). “Importantly, this mutant also shows a lower binding cooperativity for DNA binding, as estimated by the Hill coefficient value close to 1, compared to values around 3 for the WT and other mutants.”

      Since EMSA quantifications did not show a loss of “affinity” (as measured by the EC50 value) for the KER* mutants, compared to the WT contrary to MST measurements and because the DNA concentration was close to the measured EC50, we consider that EC50 values calculated by EMSA do not represent a KD value. If we add this quantification, we should discuss this point in detail. Thus, for sake of clarity, we prefer to put in the manuscript EMSA measurements as illustrations and qualitative validations of the interaction but not to include the quantification.

      Author response image 1.

      Quantitative analysis of interaction with DNA by EMSA. a: quantification of the amount of bound DNA for the Pcf1_KER domain (blue points with error bars). The fit with a KD model is shown as a dashed line, and the fit with a Hill model with a solid line. b: Examples of quantifications and fits (Hill model) for reconstituted SpCAF-1 WT and mutants. c: Examples of quantifications and fits (Hill model) for Pcf1_KER domains WT and mutant. d: EC50 values and Hill coefficients obtained for all EMSA experiments presented in Figure 3 and S3.

      1. As with the cooperative DNA binding of CAF-1, it is very important to show the stoichiometry of CAF-1 to the DNA or the site size. Given a long alpha-helix of the KER domain with biased charges, it is also interesting to show a model of how the dsDNA binds to the long helix with a cooperative binding property (this is not essential but would be helpful if the authors discuss it).

      We agree that having a molecular model for the binding of the KER helix to DNA would be especially interesting, but at this point, considering the accuracy of the tools currently at our disposal for predicting DNA-protein interactions, such a model would remain highly speculative.

      1. Figure 5 shows nucleosome assembly by SpCAF-1. SpCAF-1-PIP* mutant produced a product with faster mobility than the control at 2 h incubation. How much amounts of SpCAF-1 was added in the reaction seems to be critical. At least a few different concentrations of proteins should be tested.

      The slightly faster migration of the SpCAF-1-PIPis not systematically reproduced and we observed in several experiments that the band corresponding to supercoiled DNA migrated slightly above or below the one for the complementation by the SpCAF-1-WT (see Author response image 2 below). Thus this indicates that after 2 hours incubation the supercoiling assay with the SpCAF-1-PIP mutant compared to those achieved with the SpCAF-1-WT. To further document whether the WT or the PIP mutant are similar or not, we monitored difference of their nucleosome assembly efficiency by testing their ability to produce supercoiled DNA over shorter time, after 45 minute incubation. Under these conditions, we reproducibly detected supercoiled forms at earlier times with SpCAF-1-WT when compared to the SpCAF-1-PIP* (see figure 5 and Author response image 2). These observations indicate that mutation in the PIP motif of Pcf1 affects the rate of supercoiling in a distinct manner when compared to the other mutations that dramatically impair SpCAF-1 capacity to promote supercoiling.

      Author response image 2.

      Minor points:

      1. Page 8, line 26 or Table 1 legend: Please explain what "EC50" is.

      The definition of EC50, together with a reference paper for the Hill model have been added in the text page 8 lines 23-26, “The curves were fitted with a Hill model (Tso et al. 2018) with a EC50 value of 0.7± 0.1µM (effective concentration at which a 50% signal is observed) and a cooperativity (Hill coefficient, h) of 2.7 ± 0.2, in line with a cooperative DNA binging of SpCAF-1.”, in the Table 1 figure legend and in the method section (page 26).

      1. Page 13, lines 9, 11: "Xenopus" should be italicized.

      This is corrected

      1. Page 14, second half: In S. pombe, the pcf1 deletion mutant is not lethal. It is helpful to mention the phenotype of the deletion mutant a bit more when the authors described the genetic analysis of various pcf1 mutants.

      This point has been added on page 15, line 1.

      1. Figure 1d and Figure S2a: Captions and labels on the X and Y axes are overlapped or misplaced.

      This is corrected

      1. Figure 5: Please add a schematic figure of the assay to explain how one can check the nucleosome assembly by looking at the form I, supercoiled DNAs.

      A new panel has been added to Figure 5. This scheme depicts the supercoiling assay where supercoiled DNA (form I) is used as an indication of efficient nucleosome assembly. The figure legend has also been modified accordingly.

      Reviewer #3 (Public Review):

      Summary:

      The study conducted by Ouasti et al. is an elegant investigation of fission yeast CAF-1, employing a diverse array of technologies to dissect its functions and their interdependence. These functions play a critical role in specifying interactions vital for DNA replication, heterochromatin maintenance, and DNA damage repair, and their dynamics involve multiple interactions. The authors have extensively utilized various in vitro and in vivo tools to validate their model and emphasize the dynamic nature of this complex.

      Strengths:

      Their work is supported by robust experimental data from multiple techniques, including NMR and SAXS, which validate their molecular model. They conducted in vitro interactions using EMSA and isothermal microcalorimetry, in vitro histone deposition using Xenopus high-speed egg extract, and systematically generated and tested various genetic mutants for functionality in in vivo assays. They successfully delineated domain-specific functions using in vitro assays and could validate their roles to large extent using genetic mutants. One significant revelation from this study is the unfolded nature of the acidic domain, observed to fold when binding to histones. Additionally, the authors also elucidated the role of the long KER helix in mediating DNA binding and enhancing the association of CAF-1 with PCNA. The paper effectively addresses its primary objective and is strong.

      Weaknesses:

      A few relatively minor unresolved aspects persist, which, if clarified or experimentally addressed by the authors, could further bolster the study.

      1. The precise function of the WHD domain remains elusive. Its deletion does not result in DNA damage accumulation or defects in heterochromatin maintenance. This raises questions about the biological significance of this domain and whether it is dispensable. While in vitro assays revealed defects in chromatin assembly using this mutant (Figure 5), confirming these phenotypes through in vivo assays would provide additional assurance that the lack of function is not simply due to the in vitro system lacking PTMs or other regulatory factors.

      Our work demonstrates that the WHD domain is important CAF-1 function during DNA replication. Indeed, the deletion of this domain lead to a synthetic lethality when combined with mutation of the HIRA complex, as observed for a null pcf1 mutant, indicating a severe loss of function in the absence of the WHD domain. We propose that these genetic interactions, previously reported in S. cerevisiae (Kaufman et al. MCB 1998; Krawitz et al. MCB 2002) are indicative of a defective histone deposition by CAF-1. Moreover, our work establishes that this domain is dispensable to prevent DNA damage accumulation and to maintain silencing at centromeric heterochromatin, indicating that the WHD domain specifies CAF-1 functions. Moreover, our work further demonstrates that, in contrast to the S. cerevisiae and human WHD domain, the S. pombe counterpart exhibits no DNA binding activity. We thus agree that the WHD domain may contribute to nucleosome assembly in vivo via PTMs or interactions with regulatory factors that may potentially lack in in vitro systems. However, addressing these aspects deserves further investigations beyond the scope of this article.

      1. The observation of increased Pcf2-gfp foci in pcf1-ED cells, particularly in mono-nucleated (G2phase) and bi-nucleated cells with septum marks (S-phase), might suggest the presence of replication stress. This could imply incomplete replication in specific regions, leading to the persistence of Caf1-ED-PCNA factories throughout the cell cycle. To further confirm this, detecting accumulated single-stranded DNA (ssDNA) regions outside of S-phase using RPA as an ssDNA marker could be informative.

      We cannot formally exclude that cells expressing the Pcf1-ED mutated form exhibit incomplete replication in specific regions, an aspect that would require careful investigations. However, the microscopy analysis (Fig. 6c and S6c) of this mutant showed no alteration in the cell morphology, including the absence of elongated cells compared to wild type, a hallmark of checkpoint activation caused by ssDNA (Enoch et al. Gene & Dev 1992). Therefore, investigating the consequences of the interplay between the binding of CAF-1 to PCNA and histones on the dynamic of DNA replication, is of particular interest but out of the scope of the current manuscript.

      1. Moreover, considering the authors' strong assertion of histone binding defects in ED through in vitro assays (Figure 2d and S2a), these claims could be further substantiated, especially considering that some degree of histone deposition might still persist in vivo in the ED mutant (Figure 7d, viable though growth defective double ED*+hip1D mutants). For example, the approach, akin to the one employed in Fig. 6a (FLAG-IPs of various Pcf1-FLAG-tagged mutants), could also enable a comparison of the association of different mutants with histones and PCNA, providing a more thorough validation of their findings.

      We have provided in the current manuscript data establishing how Pcf1 mutated forms interacted with PCNA (Fig. 6a, 6b). Regarding the interactions with histone H3-H4, the approach based on immunoprecipitation using various Pcf1-FLAG tagged mutants has been unsuccessful in our hands. Indeed, we were unable to obtain robust and reproducible interactions between Pcf1 or its various mutated form with H3-H4. This is likely because Co-IP approaches do not probe for direct interactions. Indirect interactions between Pcf1 and H3-H4 are potentially bridged by additional factors, including the two other subunits of CAF-1, Pcf2 and Pcf3, or Asf1. Therefore, we are not in a position to address in vivo the direct interactions between Pcf1 and histone H3-H4.

      1. It would be valuable for the authors to speculate on the necessity of having disordered regions in CAF1. Specifically, exploring the overall distribution of these domains within disordered/unfolded structures could provide insightful perspectives. Additionally, it's intriguing to note that the significant disparities observed among mutants (ED, PIP, and KER*) in in vitro assays seem to become more generic in vivo, except for the indispensability of the WHD-domain. Could these disordered regions potentially play a crucial role in the phase separation of replication factories? Considering these questions could offer valuable insights into the underlying mechanisms at play.

      We agree that the potential mechanistic role of partial disorder in CAF-1 is particularly interesting. Disordered regions of human CAF-1 have been reported to form nuclear bodies with liquid-liquid phase separation properties to maintain HIV latency (Ma et al EMBO J. 2021). As suggested, this raises the question of how disordered domains of Pcf1 could promote phase separation for replication factories, if such phenomenon happens in vivo. Moreover, numerous factors of the replisome also harbor disordered regions (Bedina, A. et al, 2013. Intrinsically Disordered Proteins in Replication Process. InTech. doi: 10.5772/51673), adding complexity in disentangling experimentally such questions. We have added these elements at the end of the discussion in the revised manuscript (page 20, lines 23-29). “Such plasticity and cross-talks provided by structurally disordered domains might be key for the multivalent CAF-1 functions. Human CAF-1 has been reported to form nuclear bodies with liquid-liquid phase separation properties to maintain HIV latency (Ma et al. 2021). This raises the question of a potential role of the disordered domains of Pcf1, together with other replisome factor harbouring such disordered regions (Bedina 2013), in promoting phase separation of replication factories, if such phenomenon happens in vivo. Further studies will be needed to tackle these questions.”

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In the manuscript by Tie et.al., the authors couple the methodology which they have developed to measure LQ (localization quotient) of proteins within the Golgi apparatus along with RUSH based cargo release to quantify the speed of different cargos traveling through Golgi stacks in nocodazole induced Golgi ministacks to differentiate between cisternal progression vs stable compartment model of the Golgi apparatus. The debate between cisternal progression model and stable compartment model has been intense and going on for decades and important to understand the basic way of function/organization of the Golgi apparatus. As per the stable compartment model, cisterna are stable structures and cargo moves along the Golgi apparatus in vesicular carriers. While as per cisternal progression model, Golgi cisterna themselves mature acquiring new identity from the cis face to the trans face and act as transport carriers themselves. In this work, authors provide a missing part regarding intra-Golgi speed for transport of different cargoes as well as the speed of TGN exit and based on the differences in the transport velocities for different cargoes tested favor a stable compartment model. The argument which authors make is that if there is cisternal progression, all the cargoes should have a similar intra-Golgi transport speed which is essentially the rate at which the Golgi cisterna mature. Furthermore, using a combination of BFA and Nocodazole treatments authors show that the compartments remain stable in cells for at least 30-60 minutes after BFA treatment.

      Strengths:

      The method to accurately measure localization of a protein within the Golgi stack is rigorously tested in the previous publications from the same authors and in combination with pulse chase approaches has been used to quantify transport velocities of cargoes through the Golgi. This is a novel aspect in this paper and differences in intra-Golgi velocities for different cargoes tested makes a case for a stable compartment model.

      Weaknesses:

      Experiments are only tested in one cell line (HeLa cells) and predominantly derived from experimental paradigm using RUSH assays where a secretory cargo is released in a wave (not the most physiological condition) and therefore additional approaches would make a more compelling case for the model.

      We have added datasets from 293T cells in the revamped manuscript.

      Reviewer #2 (Public Review):

      Summary:

      This manuscript describes the use of quantitative imaging approaches, which have been a key element of the labs work over the past years, to address one of the major unresolved discussions in trafficking: intra-Golgi transport. The approach used has been clearly described in the labs previous papers, and is thus clearly described. The authors clearly address the weaknesses in this manuscript and do not overstate the conclusions drawn from the data. The only weakness not addressed is the concept of blocking COPI transport with BFA, which is a strong inhibitor and causes general disruption of the system. This is an interesting element of the paper, which I think could be improved upon by using more specific COPI inhibitors instead, although I understand that this is not necessarily straightforward.

      I commend the authors on their clear and precise presentation of this body of work, incorporating mathematical modelling with a fundamental question in cell biology. In all, I think that this is a very robust body of work, that provides a sound conclusion in support of the stable compartment model for the Golgi.

      General points:

      The manuscript contains a lot of background in its results sections, and the authors may wish to consider rebalancing the text: The section beginning at Line 175 is about 90% background and 10% data. Could some data currently in supplementary be included here to redress this balance, or this part combined with another?

      In the revamped manuscript, we have moved the background information on rapid partitioning and rim progression models to the Introduction.

      Reviewer #3 (Public Review):

      The manuscript by Tie et al. provides a quantitative assessment of intra-Golgi transport of diverse cargos. Quantitative approaches using fluorescence microscopy of RUSH synchronized cargos, namely GLIM and measurement of Golgi residence time, previously developed by the author's team (publications from 20216 to 2022), are being used here.

      Most of the results have been already published by the same team in 2016, 2017, 2020 and 2021. In this manuscript, very few new data have been added. The authors have put together measurements of intra-Golgi transport kinetics and Golgi residence time of many cargos. The quantitative results are supported by a large number of Golgi mini-stacks/cells analyzed. They are discussed with regard to the intra-Golgi transport models being debated in the field, namely the cisternal maturation/progression model and the stable compartments model. However, over the past decades, the cisternal progression model has been mostly accepted thanks to many experimental data.

      The authors show that different cargos have distinct intra-Golgi transport kinetics and that the Golgi residence time of glycosyltransferases is high. From this and the experiment using brefeldinA, the authors suggest that the rim progression model, adapted from the stable compartments model, fits with their experimental data.

      Strengths:

      The major strength of this manuscript is to put together many quantitative results that the authors previously obtained and to discuss them to give food for thought about the intraGolgi transport mechanism.

      The analysis by fluorescence microscopy of intra-Golgi transport is tough and is a tour de force of the authors even if their approach show limitations, which are clearly stated. Their work is remarkable in regards to the numbers of Golgi markers and secretory cargos which have been analyzed.

      Weaknesses:

      As previously mentioned, most of the data provided here were already published and thus accessible for the community. Is there is a need to publish them again?

      The authors' discussion about the intra-Golgi transport model is rather simplistic. In the introduction, there is no mention of the most recent models, namely the rapid partitioning and the rim progression models. To my opinion, the tubular connections between cisternae and the diffusion/biochemical properties of cargos are not enough taken into account to interpret the results. Indeed, tubular connections and biochemical properties of the cargos may affect their transit through the Golgi and the kinetics with which they reach the TGN for Golgi exit.

      Nocodazole is being used to form Golgi mini-stacks, which are necessary to allow intra-Golgi measurement. The use of nocodazole might affect cellular homeostasis but this is clearly stated by the authors and is acceptable as we need to perturb the system to conduct this analysis. However, the manual selection of the Golgi mini-stack being analyzed raises a major concern. As far as I understood, the authors select the mini-stacks where the cargo and the Golgi reference markers are clearly detectable and separated, which might introduce a bias in the analysis.

      The terms 'Golgi residence time ' is being used but it corresponds to the residence time in the trans-cisterna only as the cargo has been accumulated in the trans-Golgi thanks to a 20{degree sign}C block. The kinetics of disappearance of the protein of interest is then monitored after 20{degree sign}C to 37{degree sign}C switch.

      Another concern also lies in the differences that would be introduced by different expression levels of the cargo on the kinetics of their intra-Golgi transport and of their packaging into post-Golgi carriers.

      Please see below for our replies to intra-Golgi transport models, the Golgi residence time, and different expression levels of cargos.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      The data shown by the authors to measure differential intra Golgi velocities based on previously established methodology make a case for a stable compartment model, however more data is needed to make a complete story and the clarity of presentation can be improved.

      We sincerely appreciate the reviewer's insightful, detailed, and constructive feedback. Your thoughtful comments have helped us refine our analyses, clarify key points, and strengthen the overall quality of our manuscript. We are grateful for the time and effort you have dedicated to reviewing our work and providing valuable suggestions. Your input has been instrumental in improving both the scientific rigor and presentation of our findings. Thank you for your thorough and thoughtful review.

      Main points:

      (1) Along with the studies in yeast, which authors describe in this paper, the main evidence for cisternal maturation model in mammalian cells comes from Bonfanti et.al., (https://doi.org/10.1016/S0092-8674(00)81723-7), which used EM to visualize a wave of Collagen through Golgi stacks. It is therefore important this work needs to include collagen as one of the cargos tested. Can the authors use the RUSH-Col1AGFP (see: https://doi.org/10.1083/jcb.202005166) as a cargo to monitor intra-Golgi velocities?

      I understand that Hela cells are not professional collagen-secreting, but the authors can use U2OS cells to measure collagen export and two other extreme (slow and fast) cargos to validate the same trend in intra-Golgi transport velocities is seen in other cell lines. This will address three concerns: a. This is not a Hela-specific phenomenon; b. Transport of large cargoes like collagen agree with their proposal; c. To see if the same cargo has the same (similar) intra-Golgi speed and the trend between different cargoes is conserved across cell lines.

      Due to the difficulty of manipulating and imaging the procollagen-I RUSH reporter, we selected the collagenX-RUSH reporter (SBP-GFP-collagenX) instead. Our previous study (Tie et al., eLife, 2028) demonstrated that SBP-GFP-collagenX assembles as a large molecular weight particle, each having ~ 190 copies of SBP-GFP-collagenX. With an estimated mean size of ~ 40 nm, these aggregates are not as large as FM4 aggregates and procollagen-I (> 300 nm) and, therefore, are not excluded from conventional transport vesicles, which typically have a size of 50 – 100 nm. However, collagenX has distinct intra-Golgi transport behaviour from conventional secretory cargos -- while conventional secretory cargos localize to the cisternal interior, collagenX partitions to the cisternal rim (Tie et al., eLife, 2028).

      We studied the intra-Golgi transport of SBP-GFP-collagenX in HeLa cells via GLIM and side averaging. The new results are included in Figure 3 of the revamped manuscript. CollagenX has similar intra-Golgi transport kinetics as conventional secretory cargos, displaying the first-order exponential function in LQ vs. time and velocity vs. time plots.

      The side-averaging images are consistent with previous and current results. collagenX displays a double-punctum during the intra-Golgi transport, indicating a cisternal rim localization, as expected for large secretory cargos. Therefore, our new data demonstrated that cisternal rim partitioned large-size secretory cargos might follow intra-Golgi transport kinetics similar to those of cisternal interior partitioned conventional secretory cargos.

      We tried SBP-GFP-CD59 and SBP-GFP-Tac-TC, cargos with fast and slow intra-Golgi transport velocities, respectively, in 293T cells. Results are included in Figure 2, Supplementary Figure 2, and Table 1 of the revamped manuscript. We found that SBP-GFPTac-TC showed similar t<sub>intra</sub>s, 17 and 14 min, respectively, in HeLa and 293T cells. Considering our previous finding that glycosylation has an essential role in the Golgi exit (Sun et al., JBC, 2020), the distinct intra-Golgi transport kinetics of SBP-GFP-CD59 (t<sub>intra</sub>s, 13 and 5 min, respectively, in HeLa and 293T cells) might be due to its distinct luminal glycosylation between HeLa and 293T cells. Supporting this hypothesis, SBP-GFP-Tac-TC does not have any glycosylation sites due to the truncation of the Tac luminal domain.

      (2) RUSH assay has its own caveats which authors also refer to in the manuscript. Authors should test their model by using pulse chase approaches by SNAP tagged constructs which will allow them to do pulse chase assays without the requirement to release cargo as a wave (see: doi: 10.1242/jcs.231373). It is not necessary to test all the cargoes but the two on the ends of the spectrum (slow and fast). To avoid massive overexpression, authors could express the proteins using weaker promoters. Authors could also use this approach to simultaneously measure the two cargoes by tagging them with CLIP and SNAP tags and doing the pulse chase simultaneously (see: DOI: 10.1083/jcb.202206132). In this case it may be difficult to stain both GM130 and TGN, but authors could monitor the rate of segregation from the GM130 signal.

      During the RUSH assay, the sudden release of a large amount of secretory reporters does not occur under native secretory conditions and, consequently, might introduce artifacts. The reviewer suggests using pulse-chase labeling of SNAP (or CLIP)-tagged secretory cargos, which occurs in a steady state and hence more closely resembles native secretory transport. This is an excellent suggestion. However, we have not yet tested this method due to the following concerns.

      The standard protocol involves blocking existing reporters, pulse-labeling newly synthesized reporters, and chasing their movement along the secretory pathway. However, the typical 20minute pulse labeling period used in the two references would be too long, as a substantial portion of the reporters would already reach the trans-Golgi or exit the Golgi before the chase begins. Conversely, reducing the pulse labeling time would significantly weaken the GLIM signal.

      (3) While the intra-Golgi velocities are different for different cargoes tested, authors should show a control that the arrival of the cargoes from ER to the cis-Golgi follows similar kinetics or if there are differences there is no correlation with the intra-Golgi velocities. In other words, do cargoes which show slow intra-Golgi velocities also take more time to reach the cis-Golgi and vice versa.

      In nocodazole-induced Golgi ministacks, the ER exit site, ERGIC, and cis-Golgi are spatially closely associated. At the earliest measurable time point—5 minutes after biotin treatment— we observed that the secretory cargo had already reached the cis-Golgi (Figure 2 and Supplementary Figure 2). The rapid ER-to-cis-Golgi transport exceeds the temporal resolution of our current protocol, making it difficult to address the reviewer’s question (see our reply to Minor Points (2) of Reviewer #2 for more detailed discussion on this).

      (4) Were the different cargos traveling (at different speeds) through Golgi at the rims, or in the middle of ministack, or by vesicles?

      Please also refer to our reply to Question 1 of Reviewer #1. For the nocodazole-induced Golgi ministack, we previously investigated the lateral cisternal localization of RUSH secretory reporters using our en face average imaging (Tie et al., eLife, 2018). We found that small or conventional cargos (such as CD59 and E-cadherin) partition to the cisternal interior while large cargos (collagenX and FM4-CD8a) partition to the cisternal rim during their intra-Golgi transport. Using GLIM, we showed that the intra-Golgi transport kinetics of collagenX is similar to that of small cargos as both follow the first-order exponential function (Figure 3A-C). Therefore, cisternal rim partitioned large size secretory cargos might have intra-Golgi transport kinetics similar to those of cisternal interior partitioned conventional secretory cargos.

      (5) Figure 4, under both nocodazole and BFA treatment for 30mins, would the stacks have the same number (274 nm per LQ) as thickness? Or does it shrink a little? Considering extended BFA treatment reduced intact Golgi ministacks. This is important to understand the LQ numbers of those Golgi proteins. Besides, can they include one ERGIC marker in this assay, would it be approaching cis-Golgi? Images used for quantification in Figure 4 should be shown in the main figure.

      We define the axial size of the Golgi ministack as the axial distance from the GM130 to the GalT-mCherry, d<sub>(GM130-GalT-mCherry)</sub>, measured using the Gaussian centers of their line intensity profiles. As the reviewer suggested, we measured the axial size of the ministack during the nocodazole and BFA treatment. Indeed, we found a decrease in the ministack axial size from 300 ± 10 nm at 0 min to 190 ± 30 nm at 30 min of BFA treatment. This observation is further confirmed by our side average imaging. The new data is presented in Fig. 6G.

      Our study focuses on changes in the organization of the Golgi ministack. So, we didn’t include ERGIC53 in the current analysis. Instead, we quantified the axial distance between GalTmCherry and CD8a-furin, d<sub>(GalT-mCherry-CD8a-furin)</sub>, and found that it decreased from 200 ± 20 nm at 0 min to 100 ± 30 nm at 30 min of BFA treatment, suggesting the collapse of the TGN. The collapse of the TGN is further visualized by our side average imaging. The new data is presented in Fig. 6H.

      Therefore, our new data demonstrates that the Golgi ministack shrinks, and the TGN collapses under BFA treatment.

      Minor points:

      (1) The LQ data come from confocal/airy scan images, but no such images were shown in this paper. The authors can't assume every reader to have prior knowledge of their previous work. It will be beneficial to have one example image and how the LQ was measured.

      As advised by the reviewer, we have prepared Supplementary Figure 1 to provide a brief illustration of the principle behind GLIM and image processing steps involved.

      (2) The cargos used in this paper need to be introduced: what are they, how were they used in previous literature. Especially the furin constructs come out of the blue (also see point 7).

      As suggested by the reviewer, we have included a schematic diagram in Fig. 1 of the revised manuscript to illustrate all RUSH reporters and their corresponding ER hooks. In this diagram, we also highlight the key sequence differences in the cytosolic tails of different furin mutants.

      Additionally, we have added references for each RUSH reporter at the beginning of the Results and Discussion section.

      (3) There are two categories of exocytosis, constitutive and regulated. It important to state that the phenomenon observed is in cells predominantly showing only constitutive secretion.

      As the reviewer advised, we have added the following sentences in the section titled “Limitations of the study”.

      “Third, all RUSH reporters used in this study are constitutive secretory cargos. As a result, the intra-Golgi transport dynamics observed here might not reflect those of regulated secretion, which involves the synchronized release of a large quantity of cargo in response to a specific signal.”

      (4) All the cargoes show a progressive reduction in instantaneous velocities from cis to medial to trans. Authors should discuss how do they mechanistically explain this. Is the rate of vesicle production progressively decreasing from cis to trans and if so, why?

      As our imaging methods cannot differentiate vesicles from the cisternal rim, we could not tell if the vesicle production rate had changed during the intra-Golgi transport. We have provided an explanation of the progressive reduction of the intra-Golgi transport velocity in the Results and Discussion section. Please see the text below.

      “The progressive reduction in intra-Golgi transport of secretory cargo might result from the enzyme matrix's retention at the trans-Golgi. As the secretory cargos progress along the Golgi stack from the cis to the trans-side, more and more cargos become temporarily retained in the trans-Golgi region, gradually reducing their overall intra-Golgi transport velocity. If the release or Golgi exit of these cargos from the enzyme matrix follows a constant probability per unit time, i.e., a first-order kinetics process, the rate of cargo exiting from the Golgi should follow the first-order exponential function. Since the mechanism underlying intra-Golgi transport kinetics reflects fundamental molecular and cellular processes of the Golgi, further experimental data are essential to rigorously test this hypothesis.”

      (5) The supp file 1 nicely listed the raw data for plotting, and n for numbers of ministacks. Could the authors also show number of cells or experiment repeats?

      In the revamped version of the Supplementary File 1, we have added the cell number for each LQ measurement.

      (6) This recent work used novel multiplexing methods to show that nocodazole-treated cells had similar protein organization as in control may be cited. It also showed the effect of BFA. https://www.cell.com/cell/abstract/S0092-8674(24)00236-8.

      We have added this reference to the Introduction section to support that nocodazole-induced Golgi ministacks have a similar organization as the native Golgi. However, our BFA treatment was combined with the nocodazole treatment, while this paper’s BFA treatment does not contain nocodazole.

      (7) Figure 1G-J, authors should show a schematic to show the difference between different furin constructs. Also, LQ values in Fig 1I start from 1. Authors may need to include even earlier timepoints.

      As suggested by the reviewer, we have shown the domain organization of wild type and mutant furin RUSH reporters in Figure 1, highlighting key amino acids in the cytosolic tail. Please also see our reply to Minor Points (2) of Reviewer #1.

      In the revised manuscript, Fig. 1l (SBP-GFP-CD8a-furin-AC #1) has been updated to become Fig. 2J. In this dataset, the first time point was selected at a relatively late stage (20 min), resulting in an initial LQ value of 0.92. However, this should not pose an issue, as SBP-GFPCD8a-furin-AC reaches a plateau of ~ 1.6. The number of data points is sufficient to capture the rising phase and fit the first-order exponential function curve with an adjusted R<sup>2</sup> = 0.99. Furthermore, we have four independent datasets in total on the intra-Golgi transport of SBPGFP-CD8a-furin-AC (#1-4), demonstrating the consistency of our measurements.

      (8) Figure 2A need to show the data points, not just the lines.

      In the revamped manuscript, Fig. 2A has been updated to become Fig. 4A. The plot of Fig. 4A is calculated based on Equation 3.

      So, it does not have data points. However, t<sub>intra</sub> is calculated based on the experimental LQ vs. t kinetic data. 

      (9) Imaging and camera settings like exposure time, pixel size, etc should be reported in Methods.

      As suggested by the reviewer, we have supplied this information in the Materials and Methods section of the revised manuscript.

      (1) The exposure time and pixel size for the wide-field microscopy:

      “The image pixel size is 65 nm. The range of exposure time is 400 – 5000 ms for each channel.”

      (2) The exposure time and pixel size for the spinning disk confocal microscopy: “The image pixel size is 89 nm. The range of exposure time is 200 – 500 ms for each channel.”

      (3) The pixel dwelling time and pixel size for the Airyscan microscopy:

      “For side averaging, images were acquired under 63× objective (NA 1.40), zoomed in 3.5× to achieve 45 nm pixel size using the SR mode. The pixel dwelling time is 1.16 µs.”

      Reviewer #2 (Recommendations For The Authors):

      We sincerely appreciate the reviewer's insightful, detailed, and constructive feedback. Your thoughtful comments have helped us refine our analyses, clarify key points, and strengthen the overall quality of our manuscript. We are grateful for the time and effort you have dedicated to reviewing our work and providing valuable suggestions. Your input has been instrumental in improving both the scientific rigor and presentation of our findings. Thank you for your thorough and thoughtful review.

      Minor points:

      (1) Equation 2: A should be in front of the ln2. It's already resolved in equation 3, so likely only needs changing in the text

      As suggested by the reviewer, we have changed it accordingly.

      (2) Line 152: Why is there a lack of experimental data? High ER background and low golgi signal make it difficult to select ministacks: would be good to see examples of these images. Is 0 a relevant timepoint as cargo is still at the ER? Instead would a timepoint <5' be better demonstrate initial arrival in fast cargo, and 0' discarded?

      We observed that RUSH reporters typically do not exit the ER in < 5 min of biotin treatment, resulting in a high ER background and low Golgi signal. Example images of SBP-GFP-CD59 are shown below (scale bar: 10 µm). Possible reasons include: 1) the time required for biotin diffusion into the ER, 2) the time needed to displace the RUSH hook from the RUSH reporter, and 3) the time for recruitment of RUSH reporters to ER exit sites. As a result, we could not obtain LQs for time points earlier than 5 min during the biotin chase.

      Author response image 1.

      Despite the challenge in measuring LQs at early time points, 0 is still a relevant time point. At t = 0 min, RUSH reporters should be at the ER membrane near the ER exit site, a definitive pre-Golgi location along the Golgi axis, although we still don’t have a good method to determine its LQ.

      (3) Table 1 Line 474: 1-3 independent replicates: is there a better way of incorporating this into the table to make it more streamlined? It would be useful to see each cargo as a mean with error. Is there a more demonstrative way to present the table, for example (but does not have to be) fastest cargo first (Tintra) as in Table 2?

      As suggested by the reviewer, we revised Table 1. We calculated the mean and SD of t<sub>intra</sub> and arranged our RUSH reporters in ascending order based on their t<sub>intra</sub> values.

      (4) Line 264 / Fig 3B: It's unclear to me why the VHH-anti-GFP-mCherry internalisation approach was used, when the cells were expressing GFP, that could be used for imaging. Also, this introduces a question over trafficking of the VHH itself, to access the same compartments as the GFP-proteins are localised. It would be useful to describe the choice of this approach briefly in the text.

      Here, the surface-labeling approach is used to investigate if GFP-Tac-TC possesses a Golgi retrieval pathway after its exocytosis to the plasma membrane. When VHH-anti-GFP-mCherry is added to the tissue culture medium, it binds to the cell surface-exposed GFP-fused MGAT1, MGAT2, Tac, Tac-TC, CD8a, and CD8a-TC. Next, VHH-anti-GFP-mCherry traces the internalized GFP-fused transmembrane proteins. The surface-labeling approach has two advantages in this case. 1) It is much more sensitive in revealing the minor number of GFPtransmembrane proteins at the plasma membrane and endosomes, which are usually drowned in the strong Golgi and ER background fluorescence in the GFP channel. 2) While the GFP fluorescence distribution has reached a dynamic equilibrium, the surface labeling approach can reveal the endocytic trafficking route and dynamics.

      As the reviewer suggested, we added the following sentence to describe the choice of the cellsurface labeling – “By binding to the cell surface-exposed GFP, VHH-anti-GFP-mCherry serves as a sensitive probe to track the endocytic trafficking itinerary of the above GFP-fused transmembrane proteins”. 

      Regarding the trafficking of VHH-anti-GFP-mCherry itself, in HeLa cells that do not express GFP-fused transmembrane proteins, VHH-anti-GFP-mCherry can be internalized by fluidphase endocytosis. However, the fluid-phase endocytosis is negligible under our experimental condition, as we previously demonstrated (Sun et al., JCS, 2021; PMID: 34533190).

      (5) 446 Typo "internalization"

      It has been corrected.

      Reviewer #3 (Recommendations For The Authors):

      Below are my recommendations for the authors to improve their manuscript:

      We sincerely appreciate the reviewer's insightful, detailed, and constructive feedback. Your thoughtful comments have helped us refine our analyses, clarify key points, and strengthen the overall quality of our manuscript. We are grateful for the time and effort you have dedicated to reviewing our work and providing valuable suggestions. Your input has been instrumental in improving both the scientific rigor and presentation of our findings. Thank you for your thorough and thoughtful review.

      (1) Line 48: Tie at al. 2016 is cited. Please add references to original work showing that cargos transit from cis to trans Golgi cisternae.

      After reviewing the literature, we identified two references that provide some of the earliest morphological evidence of secretory cargo transit from the cis- to the trans-Golgi:

      (1) Castle et al, JCB, 1972; PMID: 5025103

      (2) Bergmann and Singer, JCB, 1983; PMID: 6315743

      The first study utilized pulse-chase autoradiographic EM imaging to track secretory protein movement, while the second employed immuno-EM imaging to observe the synchronized release of VSVGtsO45. Accordingly, we have removed Tie et al., 2016 and replaced it with these newly identified references.

      (2) I would suggest to cite earlier (in the Introduction) the rapid partitioning and rim progression models.

      As suggested, we have moved the rapid partitioning and rim progression models to the Introduction section.

      (3) Figure 1: LQ vs. time plot for SBP-GFP-CD8a-furinAC (panel I, 0.9 to 1.75 in 150 min) is different from Fig 7G of Tie et al. 2016 (LQ O-1.5 in 100 min). Please comment on why those 2 sets of data are different.

      We appreciate the reviewer for pointing out this error. In our previous publication (Tie et al., MBoC, 2016), we presented a total of four datasets on SBP-GFP-CD8a-furin-AC. However, in the earlier version of our manuscript, we mistakenly listed only three datasets, inadvertently omitting Fig. 7G from Tie et al., MBoC, 2016.

      In the revised version, we have now included Fig. S2T (SBP-GFP-CD8a-furin-AC #4), which corresponds to Fig. 7G from Tie et al., MBoC, 2016.

      (4) As mentioned in the public review, I think measurement of the expression level of the cargos is necessary to compare their transport kinetics.

      The reviewer raises a valid concern that is challenging to address. All our data were obtained by imaging overexpressed reporters, and we assume that their overexpression does not significantly impact the Golgi or the secretory pathway. Our previous studies have demonstrated that overexpression does not substantially affect LQs (Figure S2 of Tie et al., MBoC, 2016, and Figure S1 of Tie et al., JCB, 2022).

      We acknowledge this concern as one of the limitations in our study at the end of our manuscript:

      “First, our approach relied on the overexpression of fluorescence protein-tagged cargos. The synchronized release of a large amount of cargo could significantly saturate and skew the intra-Golgi transport.” 

      (5) To my opinion, cisternal continuities would also affect retrograde transport (accelerate) (by diffusion for instance) and not only retrograde transport. Please comment on how this would affect intra-Golgi transport kinetics.

      We believe the reviewer is suggesting “cisternal continuities would also affect retrograde transport (accelerate) (by diffusion for instance) and not only anterograde transport.”

      Transient cisternal continuities have been reported to facilitate the anterograde transport of large quantities of secretory cargos (Beznoussenko et al., 2014; PMID: 24867214) (Marsh et al., 2004; PMID: 15064406) (Trucco et al., 2004; PMID: 15502824). However, we are not aware of any reports demonstrating that such continuities facilitate the retrograde transport of secretory cargo, although Trucco et al. (2004) speculated that Golgi enzymes might use these connections to diffuse bidirectionally (anterograde and retrograde direction). For this reason, we did not discuss this scenario in our manuscript.

      (6) Lines 188-190: I don't understand why the rapid partitioning model is excluded. Please detail more the arguments used for this statement.

      Below is the section from the Introduction that addresses the reviewer's question.

      “This model (rapid partitioning model) suggests that cargos rapidly diffuse throughout the Golgi stack, segregating into multiple post-translational processing and export domains, where cargos are packed into carriers bound for the plasma membrane. Nonetheless, synchronized traffic waves have been observed through various techniques, including EM (Trucco et al., 2004) and advanced light microscopy methods we developed, such as GLIM and side-averaging(Tie et al., 2016; Tie et al., 2022). These findings suggest that the rapid partitioning model might not accurately represent the true nature of the intra-Golgi transport.”

      (7) I would suggest replacing the 'Golgi residence time' by another name as it reflects mainly the time of Golgi exit if I am not mistaken.

      We believe the term “Golgi residence time” more accurately reflects the underlying mechanism – retention. The same approach to measure the Golgi residence time can also be applied to Golgi enzymes such as ST6GAL1. Its slow Golgi exit kinetics (t<sub>1/2</sub> = 5.3 hours) (Sun et al., JCS, 2021) should be primarily due to a strong Golgi retention at its steady state Golgi localization.

      In contrast, the conventional secretory cargos’ Golgi exit times are usually much shorter (t<sub>1/2</sub> < 20 min) (Table 2) due to weaker Golgi retention. In a broader sense, the Golgi exit kinetics of a secretory cargo should be influenced by its Golgi retention. Furthermore, we have consistently used the term “Golgi residence time” in our previous publications. So, we propose maintaining this terminology in the current manuscript.

      (8) Lines 300-306: I would suggest that the authors remove this part as it is highly speculative and not supported by data.

      We have relocated this discussion to the section titled "Our data supports the rim progression model, a modified version of the stable compartment model."

      Our enzyme matrix hypothesis offers a potential explanation for key observations, including the differential cisternal localization of small and large cargos and the interior localization of Golgi enzymes. Cryo-FIB-ET has shown that the interior of Golgi cisternae is enriched with densely packed Golgi enzymes (Engel et al., PNAS, 2015; PMID: 26311849), supporting this hypothesis.

      Additionally, this hypothesis helps explain the gradual reduction in intra-Golgi transport velocities of secretory cargos, as requested by Reviewer #1 (Minor Points 4). For these reasons, we propose retaining this discussion in the manuscript.

      (9) In Figure 3B, percentage of MGAT2-GFP cells with anti-GFP signal at the Golgi is of 41% while Sun et al. 2021 reported 25%, please comment this difference. Reply:

      We included more cells for the quantification. The percentage of cells showing Golgi localization of VHH-anti-GFP-mCherry is now 32% (n = 266 cells). The observed difference, 32% vs. 25% (Sun et al., JCS, 2021), is likely due to uncontrollable variations in experimental conditions, which might have influenced the endocytic Golgi targeting efficiency.

      (10) The effects of brefeldinA are pleiotropic as it disassembles COPI and clathrin coats but also induces tubulation of endosomes. I would recommend using Golgicide A, which is more specific.

      We agree with the reviewer that Golgicide A might be more specific as an inhibitor of Arf1. We will certainly consider using this inhibitor next time.

    1. Author Response:

      The following is the authors' response to the original reviews.

      Reviewer #1 (Public Review):

      The authors investigated state-dependent changes in evoked brain activity, using electrical stimulation combined with multisite neural activity across wakefulness and anesthesia. The approach is novel, and the results are compelling. The study benefits from an in-depth sophisticated analysis of neural signals. The effects of behavioral state on brain responses to stimulation are generally convincing.

      It is possible that the authors' use of "an average reference montage that removed signals common to all EEG electrodes" could also remove useful components of the signal, which are common across EEG electrodes, especially during deep anesthesia. For example, it is possible (in fact from my experience I would be surprised if it is not the case) that under isoflurane anesthesia, electrical stimulation induces a generalized slow wave or a burst of activity across the brain. Subtracting the average signal will simply remove that from all channels. This does not only result in signals under anesthesia being affected more by the referencing procedure than during waking but also will have different effects on different channels, e.g. depending on how strong the response is in a specific channel.

      We thank the reviewer for the positive comments and for raising this point. We do not believe that the average reference montage is obscuring an evoked slow wave in the isoflurane-anesthetized mice. Electrical stimulation did elicit a brief activation in nearby neurons that was followed by roughly 200 ms of quiescence, but no significant changes in firing in the other regions we recorded from (Author response image 1).

      Author response image 1

      ERP and evoked population activity during isoflurane anesthesia do not show evidence of global responses. (Top). ERP (-0.2 to +0.8 s around stimulus onset) with all EEG electrode traces superimposed. Data represented is the same: red traces have been processed with the average reference montage, black traces have not. (Bottom) Population mean firing rates from the areas of interest from the same experiment as above.

      We are familiar with the work from Dasilva et al. (2021), a study similar to ours because they also performed cortical electrical stimulation in mice anesthetized with isoflurane. They show widespread evoked multi-unit activity (derived from LFP) in isoflurane-anesthetized mice in response to electrical stimulation, but critical experimental differences may underlie the conflicting results presented in our study. Both works use similar levels of isoflurane to maintain anesthesia (we use a level roughly equivalent to their “deep” level). However, our experiments use only isoflurane, whereas Dasilva et al. induced anesthesia with ketamine and medetomidine followed by isoflurane. It has been shown that isoflurane and ketamine have different effects on neural dynamics (Sorrenti et al., 2021). Typically, isoflurane causes reduced spontaneous firing rates and decreased evoked response amplitudes compared to wakefulness, whereas ketamine has been shown to increase firing rates and evoked response amplitudes (Aasebø et al., 2017; Michelson & Kozai, 2018). Perhaps a more relevant difference are the electrical stimulation parameters used to perturb the brain. Dasilva et al. used 1 ms pulses of 500 μA, which would have a much larger effect than the stimulation used in this work, 0.2 ms pulses of 10-100 μA.

      Additionally, we would like to clarify that the average reference montage is not impacting the main findings of this work. As the reviewer correctly pointed out, the average reference montage does change the appearance of the ERP in the butterfly plots (Top panel in Author response image 1). However, all the quantitative analyses of the EEG-ERPs are performed on the global field power, computed by taking the standard deviation across all EEG channels, which is not affected by the average reference montage.

      Reviewer #2 (Public Review):

      […] The conclusions regarding the thalamic contributions to the ERP components are strongly supported by the data.

      The spatiotemporal complexity is almost a side point compared to what seems to be the most important point of the paper: showing the contribution of thalamic activity to some components of the cortical ERP. Scalp ERPs have long been regarded as purely cortical phenomena, just like most EEGs, and this study shows convincing evidence to the contrary.

      The data presented seemingly contradicts the results presented by Histed et al. (2009), who assert that cortical microstimulation only affects passing fibers near the tip of the electrodes, and results in distant, sparse, and somewhat random neural activation. In this study, it is clear that the maximum effect happens near the electrodes, decays with distance, and is not sparse at all, suggesting that not only passing fibers are activated but that also neuronal elements might be activated by antidromic propagation from the axonal hillock. This appears to offer proof that microstimulation might be much more effective than it was thought after the publication of Histed 2009, as the uber-successful use of DBS to treat Parkinson's disease has also shown.

      We thank the reviewer for their positive comments and thoughtful suggestions. We appreciate and agree with the reviewer’s perspective that the thalamic contribution to the cortical ERP is one of the key points of this study. We also thank the reviewer for their comment on the apparently contradictory results reported by Histed et al. (2009). This gives us the opportunity to further highlight the important contribution of our study to the field.

      First, we would like to highlight some key experimental differences between the two studies. In our study we used single pulse stimulation with currents between 10 and 100 μA, whereas Histed et al. used trains of pulses (100 ms in duration at 250 Hz) with lower current intensities (between 2 and 50 μA). We varied the depth of stimulation, targeting superficial and deep cortical layers; Histed et al. exclusively stimulated superficial cortical layers. In addition, the two studies used recording methods that are orthogonal in nature. We used Neuropixels probes that record from neurons that span all cortical layers depth-wise while Histed et al. used two-photon calcium imaging to record from a horizontal plane of neurons (again, in the superficial cortical layers).

      Because of these important methodological differences, it is more appropriate to compare the Histed et al. results to our results from superficial stimulation at comparable current intensities. In this case, we believe the two studies show similar results: stimulation activated a small fraction of neurons even hundreds of microns away from the stimulating electrode (see Figure 4A from our manuscript). However, our study adds an important observation pointing to the critical role of the depth of the stimulating electrode. We observe significant excitation of local cortical neurons (Figure 4D) and trans-synaptic activation of the thalamus only when we delivered deep stimulation (Figure5A). This effect is likely mediated by activation of large, myelinated cortico-thalamic fibers, which are thought to be more excitable that non-myelinated horizontal fibers (Tehovnik & Slocum, 2013).

      To summarize, Histed et al. (2009) concluded that microstimulation causes a sparse activation of a distributed set of neurons with little evidence of synaptically driven activation. Instead, we showed that microstimulation can robustly activate local neurons and trans-synaptically activate distant neurons when stronger stimuli are directed to deep cortical layers. Based on this, we conclude that electrical stimulation is indeed highly effective, and is a valid tool that can be used to probe and characterize the cortico-thalamo-cortical network of any behavioral state.

      ----------

      Reviewer #1 (Recommendations for the authors):

      1. I am not clear how "putative pyramidal" or RS and "putative inhibitory" fast-spiking neurons were identified. Please provide some further details on that, including average spike wave shapes, and distribution of firing rates, and it would be interesting to know the proportion of "putative" RS and FS neurons in your recorded population. Obviously, caution is warranted here because, without further work, you cannot be sure that those are indeed pyramidal cells or interneurons! Is this subdivision necessary at all?

      We added details regarding the cell-type classification to the Results (lines 136-140) and the Methods section. This classification is common practice in cortical extracellular electrophysiology recordings given that cell-type specific analyses can reveal important differences between the two putative populations (Barthó et al., 2004; Bortone et al., 2014; Bruno & Simons, 2002; Jia et al., 2016; Niell & Stryker, 2008; Sirota et al., 2008). Based on our findings that the two populations respond to electrical stimulation in similar ways (excitation followed by a period of quiescence and rebound excitation), we agree the subdivision is not necessary to support our conclusions. However, we believe that some readers will appreciate seeing the two putative populations presented separately.

      2. I wonder how the authors know whether the animals were awake, specifically when they were not running. Did you observe animals falling asleep when head-fixed? Providing some analyses of spontaneous EEG/LFP signals in each state could add some reassurance that only wakefulness was included, as intended.

      While we cannot conclusively rule out that mice were asleep during the “quiet wakefulness” periods we analyzed, we believe they are likely to be awake for two main reasons: 1) all the experiments are performed during the dark phase of the light/dark cycle, when the mice are less likely to enter a sleep state (Franken et al., 1999); 2) the animals are not undergoing specific training to promote drowsiness or sleep. Indeed, many sleep-focused studies in head-fixed mice are performed during the light phase of the animal’s cycle to maximize the likelihood of capturing sleep states (Kobayashi et al., 2023; Turner et al., 2020; Yüzgeç et al., 2018; Zhang et al., 2022). We have added this note to the Discussion section (lines 402-406).

      Because we do not specifically record during sleep states and our recording does not include electromyography, which is commonly used in conjunction with EEG to classify sleep stages, we cannot accurately perform spectral comparison between “quiet wakefulness” and sleep states in our recordings.

      3. I was unsure about the meaning of some of the terminology, specifically "rebound", "rebound spiking", "rebound excitation" etc. Why do you call it "rebound"?

      “Rebound” is a term often used to describe a period of enhanced spiking following a period of prolonged silence or inhibition (Guido & Weyand, 1995; Roux et al., 2014). Grenier et al. list “postinhibitory rebound excitation” as an intrinsic property of cortical and thalamic neurons (1998). We added this description to the text (lines 79-80).

      Reviewer #2 (Recommendations For The Authors):

      Regarding analysis, I would make three main points:

      Regarding the CSD analysis, I think the authors have done a good job of circumventing several of the known issues of this technique, especially by using ERPs rather than ongoing activity. However, although I do not immediately have access to the literature to back up this claim, I've heard that many assumptions behind CSD require a laminar structure with electrodes positioned perpendicular to these layers. In Figure 1B it seems like the neuropixels probe is not really perpendicular to the cortical layers, and I wonder if this might be an issue. I am also wondering how to interpret the thalamic CSD, as this structure is not laminar, lacks the mass of neatly stacked neuronal dipoles present in the cortex, and does not have an orderly array of synaptic inputs and outputs. I understand that CSD analysis helps minimize the contributions of volume conduction, but in this case, I also wonder if the thalamic CSD is even necessary to back up the paper's claims.

      One-dimensional CSD is computed assuming that the electrode is inserted perpendicular to cortex. This is mainly important for the interpretation of sinks and sources, since CSD can be also computed on radial voltages (e.g., EEG [Tenke & Kayser, 2012]). In general, our Neuropixels probes do not significantly deviate from perpendicular (mean deviation from perpendicular 15.3 degrees, minimum 5.2 degrees, and maximum 36.6 degrees). The probe represented in Figure 1B deviates from perpendicular by 31.2 degrees, which is an outlier compared to the rest of the insertions. Any deviation from perpendicular would result in the “effective” cortical thickness being larger by a factor of 1/cos(angle deviation from perpendicular) and thus would not affect the relative location of sources and sinks. We have added a statement to clarify this in the text (lines 126 and 454-456).

      We agree with the statement regarding CSD analysis in the thalamus. We originally included the CSD for the thalamus in Figure 2F for completeness. As the reviewer pointed out, thalamic CSD was not used to perform any subsequent analysis and is, therefore, not necessary to back up any claims. As such, we have removed CSD plot from Figure 2F to avoid any confusion and made a comment to this effect in the legend (lines 1175-1177).

      On the merits of using the z-score normalization for spike rates vs. other strategies like standardizing to maximum firing, I am aware that both procedures have limitations, but the z-score changes the range of the firing rate from [0, +Inf] to [-Inf, +Inf]. This does not seem correct considering that negative spiking rates do not exist. The standardization to maximum rate keeps the range within [0, 1], not creating negative rates. Another point that it will be worth discussing is the reported values of the z-scored values. For example, what does it mean to be 54 standard deviations away from the mean? 6 standard deviations is already a big distance from the mean.

      For Figure 2, we chose to represent the neural firing rates as z-scores because we found it important to report the magnitude of both the increase and decrease of the evoked firing rates in the post-stimulus period relative to the pre-stimulus rate. The normalization we used helps to visualize the magnitude of the effects of electrical stimulation in neuronal activity for both directions, which is an important result of the study. Despite the differences between the two normalization methods, the normalization based on the maximum firing does not significantly change the qualitative interpretation of Figure 2 in the manuscript (Author response image 2).

      Author response image 2

      Evoked firing rates for neurons in the areas of interest in response to deep stimulation in MO during the awake state. (Left) Firing rates of all neurons normalized by the average, pre-stimulus firing rate. (Right) Firing rates of all neurons normalized by the maximum post-stimulus firing rate.

      Regarding Figure 3 and the associated text, we would like to clarify that the magnitude metric is not simply a z-score value (with units of s.d.) but rather it is the integrated area under the z-scored response over the response window (with units of s.d.∙seconds). This can help explain why we see values of ~50 s.d.∙s. We chose to z-score firing rates, LFP, and CSD to normalize across the different signals and magnitudes of the evoked responses. We often observed the largest responses in the LFP (see Figure 3A), which may be partly due to the signal naturally having a larger dynamic range than the measured neural firing rates. Then we integrated the z-score response time series to capture the dynamic of the signal over the response window, rather than a static value such as the mean or maximum z-score. After performing a thorough literature search, we found no other ways to capture and compare the magnitudes of the different signals. We have added language to clarify the magnitude metric (lines 155-156) and added the appropriate units.

      In reporting the p-values, I recommend increasing the number of significant digits to four because the p-value seems to be the same for different tests in several places (e.g.: lines 207 to 218), which seems odd. I also wonder whether this could be an artifact of the z-scoring procedure. In the figures, I would like to advise the use of 1 asterisk to denote "weak evidence to reject the null hypothesis (0.05 > p > 0.01)" and two asterisks to denote "strong evidence to reject the null hypothesis (0.01 > p)", and make a note of it accordingly in the manuscript and/or figure legends.

      According to the reviewer’s suggestion, we have changed the statistics language to “* weak evidence to reject null hypothesis (0.05 > p > 0.01), ** strong evidence to reject null hypothesis (0.01 > p > 0.001), *** very strong evidence to reject null hypothesis (0.001 > p)” throughout the manuscript.

      We have also increased the number of significant digits to four throughout the manuscript. It is true that some of the p-values reported for Figure 3 (lines 169-180) are the same for different tests. This is not an artifact of the z-scoring, but rather a consequence of performing the Wilcoxon signed-rank test (an ordinal statistical test) with small sample numbers. Because the p-value depends only on the relative ordering, not the continuous distribution of values, the small sample size (N=6-14) increases the likelihood of obtaining the exact same p-value if the relative ordering of samples is the same.

      Line 202: If the magnitude corresponds to z-score data, please add "s.d." after the number, as z-scored values are expressed in standard deviation units. Please update this throughout the paper.

      As stated above the magnitude metric is the integrated area under the z-scored response over the response window (with units of s.d.∙seconds). We have added the correct units in all places.

      Line 214: Please report how the multiple comparisons correction was performed

      We have added the test used for multiple comparisons in line 169 (formerly line 214) and in the Methods section (line 770).

      Line 462: please replace "Neuropixels activity" with "LFP and single-unit activity".

      We changed the wording to specify “LFP, and single neuron responses…” (now line 337).

      Line 475: a short explanation of the bi-stability phenomena will be helpful for the reader.

      We added the following description: “a state characterized by spontaneous alternation between bouts of activity and periods of silence” (lines 350-351).

      Line 601: It is asserted that "Electrical stimulation directly activates local cells and axons that run near the stimulation site via activation of the axon initial segment" and the paper by Histed et al. 2009 is cited. This does not seem like an appropriate citation, as Histed et al. explicitly state that electrical microstimulation does not activate local neuronal bodies near the electrode tip. See my comment above.

      Upon further reading, we believe we are seeing evidence of direct axonal activation and subsequent antidromic activation of local cell bodies, as you suggested in your above comment and has been proposed by many including Histed et al. (2009) and Nowak and Bullier (1998). We edited our sentence accordingly, kept the Histed et al. citation, and added other relevant citations (lines 487-490).

      References

      • Aasebø, I. E. J., Lepperød, M. E., Stavrinou, M., Nøkkevangen, S., Einevoll, G., Hafting, T., & Fyhn, M. (2017). Temporal Processing in the Visual Cortex of the Awake and Anesthetized Rat. ENeuro, 4(4), 59–76. https://doi.org/10.1523/ENEURO.0059-17.2017

      • Barthó, P., Hirase, H., Monconduit, L., Zugaro, M., Harris, K. D., & Buzsáki, G. (2004). Characterization of Neocortical Principal Cells and Interneurons by Network Interactions and Extracellular Features. Journal of Neurophysiology, 92(1), 600–608. https://doi.org/10.1152/jn.01170.2003

      • Bortone, D. S., Olsen, S. R., & Scanziani, M. (2014). Translaminar Inhibitory Cells Recruited by Layer 6 Corticothalamic Neurons Suppress Visual Cortex. Neuron, 82, 474–485. https://doi.org/10.1016/j.neuron.2014.02.021

      • Bruno, R. M., & Simons, D. J. (2002). Feedforward Mechanisms of Excitatory and Inhibitory Cortical Receptive Fields. The Journal of Neuroscience, 22(24), 10966–10975. https://doi.org/10.1523/JNEUROSCI.22-24-10966.2002

      • Dasilva, M., Camassa, A., Navarro-Guzman, A., Pazienti, A., Perez-Mendez, L., Zamora-López, G., Mattia, M., & Sanchez-Vives, M. V. (2021). Modulation of cortical slow oscillations and complexity across anesthesia levels. NeuroImage, 224, 117415. https://doi.org/10.1016/j.neuroimage.2020.117415

      • Franken, P., Malafosse, A., & Tafti, M. (1999). Genetics of sleep regulation in mice-Franken et al Genetic Determinants of Sleep Regulation in Inbred Mice. SLEEP, 22(2). https://academic.oup.com/sleep/article/22/2/155/2731698

      • Grenier, F., Timofeev, I., & Steriade, M. (1998). Leading role of thalamic over cortical neurons during postinhibitory rebound excitation. Proceedings of the National Academy of Sciences of the United States of America, 95(23), 13929–13934. https://doi.org/10.1073/pnas.95.23.13929

      • Guido, W., & Weyand, T. (1995). Burst responses in thalamic relay cells of the awake behaving cat. Journal of Neurophysiology, 74(4), 1782–1786. https://doi.org/10.1152/JN.1995.74.4.1782

      • Histed, M. H., Bonin, V., & Reid, R. C. (2009). Direct Activation of Sparse, Distributed Populations of Cortical Neurons by Electrical Microstimulation. Neuron, 63(4), 508–522. https://doi.org/10.1016/j.neuron.2009.07.016

      • Jia, X., Siegle, J., Bennett, C., Gale, S., Denman, D. R., Koch, C., & Olsen, S. (2016). High-density extracellular probes reveal dendritic backpropagation and facilitate neuron classification 1 2. Journal of Neurophysiology, 121(5), 1831–1847. https://doi.org/10.1101/376863

      • Kobayashi, G., Tanaka, K. F., & Takata, N. (2023). Pupil Dynamics-derived Sleep Stage Classification of a Head-fixed Mouse Using a Recurrent Neural Network. The Keio Journal of Medicine, 2022-0020-OA. https://doi.org/10.2302/KJM.2022-0020-OA

      • Michelson, N. J., & Kozai, T. D. Y. (2018). Isoflurane and ketamine differentially influence spontaneous and evoked laminar electrophysiology in mouse V1. Journal of Neurophysiology, 120(5), 2232. https://doi.org/10.1152/JN.00299.2018

      • Niell, C. M., & Stryker, M. P. (2008). Highly selective receptive fields in mouse visual cortex. Journal of Neuroscience, 28(30), 7520–7536. https://doi.org/10.1523/JNEUROSCI.0623-08.2008

      • Nowak, L. G., & Bullier, J. (1998). Axons, but not cell bodies, are activated by electrical stimulation in cortical gray matter. II. Evidence from selective inactivation of cell bodies and axon initial segments. Experimental Brain Research, 118(4), 489–500. https://doi.org/10.1007/S002210050305/METRICS

      • Roux, L., Stark, E., Sjulson, L., & Buzsáki, G. (2014). In vivo optogenetic identification and manipulation of GABAergic interneuron subtypes. Current Opinion in Neurobiology, 26, 88–95. https://doi.org/10.1016/j.conb.2013.12.013

      • Sirota, A., Montgomery, S., Fujisawa, S., Isomura, Y., Zugaro, M., & Buzsáki, G. (2008). Entrainment of Neocortical Neurons and Gamma Oscillations by the Hippocampal Theta Rhythm. Neuron, 60(4), 683–697. https://doi.org/10.1016/j.neuron.2008.09.014

      • Sorrenti, V., Cecchetto, C., Maschietto, M., Fortinguerra, S., Buriani, A., & Vassanelli, S. (2021). Understanding the Effects of Anesthesia on Cortical Electrophysiological Recordings: A Scoping Review. International Journal of Molecular Sciences, 22(3), 1286. https://doi.org/10.3390/IJMS22031286

      • Tehovnik, E. J., & Slocum, W. M. (2013). Two-photon imaging and the activation of cortical neurons. Neuroscience, 245(March), 12–25. https://doi.org/10.1016/j.neuroscience.2013.04.022

      • Tenke, C. E., & Kayser, J. (2012). Generator localization by current source density (CSD): Implications of volume conduction and field closure at intracranial and scalp resolutions. Clinical Neurophysiology, 123(12), 2328–2345. https://doi.org/10.1016/J.CLINPH.2012.06.005

      • Turner, K. L., Gheres, K. W., Proctor, E. A., & Drew, P. J. (2020). Neurovascular coupling and bilateral connectivity during nrem and rem sleep. ELife, 9, 1. https://doi.org/10.7554/ELIFE.62071

      • Yüzgeç, Ö., Prsa, M., Zimmermann, R., & Huber, D. (2018). Pupil Size Coupling to Cortical States Protects the Stability of Deep Sleep via Parasympathetic Modulation. Current Biology, 28(3), 392. https://doi.org/10.1016/J.CUB.2017.12.049

      • Zhang, X., Landsness, E. C., Chen, W., Miao, H., Tang, M., Brier, L. M., Culver, J. P., Lee, J. M., & Anastasio, M. A. (2022). Automated sleep state classification of wide-field calcium imaging data via multiplex visibility graphs and deep learning. Journal of Neuroscience Methods, 366, 109421. https://doi.org/10.1016/J.JNEUMETH.2021.109421

    1. Author response:

      The following is the authors’ response to the current reviews.

      Reviewer #1 (Public Review):

      Overall the authors provide a very limited data set and in fact only a proof of concept that their sensor can be applied in vivo. This is not really a research paper, but a technical note. With respect to their observation of clustered activity, they now provide an overview image, next to zoomed details. However, from these images one cannot conclude 'by eye' any clustering event. This aligns with the very low r values. All neurons in the field show variable activity and a clustering is not really evident from these examples. Even within a cluster, there is variability. The authors now confirm that expression levels are indeed variable but are independent from the ratio measurements. Further, they controlled for specificity by including DAPT treatments, but opposite to their own in vitro data (in primary neurons) the ratios increased. The authors argue that both distance and orientation can either decrease or increase ratios and that the use of this biosensor should be explored model-by-model. This doesn't really confer high confidence and may hinder other groups in using this sensor reliably.

      Secondly, there is still no physiological relevance for this observation. The experiments are performed in wild-type mice, but it would be more relevant to compare this with a fadPSEN1 KI or a PSEN1cKO model to investigate the contribution of a gain of toxic function or LOF to the claimed cell non-autonomous activations. The authors acknowledge this shortcoming but argue that this is for a follow-up study.

      For instance, they only monitor activity in cell bodies, and miss all info on g-sec activity in neurites and synapses: what is the relevance of the cell body associated g-sec and can it be used as a proxy for neuronal g-sec activity? If cells 'communicate' g-sec activities, I would expect to see hot spots of activity at synapses between neurons.

      Without some more validation and physiologically relevant studies, it remains a single observation and rather a technical note paper, instead of a true research paper.

      The effect size was small, as stated in the original and revised manuscripts and the point-by-point responses to the 1st round review. Such subtle effects will likely be challenging to detect by eye. However, our unbiased quantification allowed us to detect a statistically significant linear correlation between the 720/670 ratio in each neuron and the average ratio in neighboring neurons, which we have verified using many different approaches (Figure 3, Figure 3—figure supplement 2, and Figure 4), and the correlation was canceled by the administration of g-secretase inhibitor (Figure 5). Such objective analysis made us more confident to conclude that g-secretase affects g-secretase in neighboring neurons.

      We would also like to make clear the design of the C99 720-670 biosensor. Both C99, the sensing domain that is cleaved by g-secretase, and the anchoring domain fused to miRFP670 are integrated into the membrane (Figure 1A). Therefore, how these two domains with four transmembrane regions are embedded in the membrane should affect the orientation between the donor, miRFP670, and the acceptor, miRFP720. As noted in our point-by-point responses to the initial review, we have previously validated that pharmacological inhibition of g-secretase significantly increases the FRET ratio in various cell lines, including CHO, MEF, BV2 cells, and mouse cortical primary neurons (Maesako et al., 2020; Houser et al., 2020, and unpublished observations). On the other hand, FRET reduction by g-secretase inhibition was found in mouse primary neurons derived from the cerebellum (unpublished observations) as well as the somatosensory cortex neurons in vivo (this study). While we could not use the exact same imaging set-up between cortical primary neurons in vitro and those in vivo due to different expression levels of the biosensor, we could do it for in vitro cortical primary neurons vs. in vitro cerebellum neurons. We found by the direct comparison that 720/670 ratios are significantly higher in the cerebellum than the cortex neurons even in the presence of 1 mM DAPT (Author response image 1), a concentration that nearly completely inhibits g-secretase activity. This suggests a different integration and stabilization pattern of the sensing and anchoring domains in the C99 720-670 biosensor between the cortex and cerebellum primary neurons, and thus, orientation between the donor and acceptor varies in the two neuronal types. We expect a similar scenario between cortical primary neurons in vitro and those in vivo. Of note, we have recently demonstrated that the cortex and cerebellum primary neurons exhibit distinct membrane properties (Lundin and Wieckiewicz et al., 2024 in revision), suggesting the different baseline FRET could be related to the different membrane properties between the cortex and cerebellum primary neurons. On the other hand, this raises a concern that 720/670 ratios can be affected not only by g-secretase activity but also by other cofounders, such as altered membrane properties. However, a small but significant correlation between the 720/670 ratio in a neuron and those ratios in its neighboring neurons is canceled by g-secretase inhibitor (Figure 5), suggesting that the correlation between the 720/670 ratio in a neuron and those in its neighboring neurons is most likely dependent on g-secretase activity. Taken together, we currently think orientation plays a significant role in our biosensor and would like to emphasize the importance of ensuring on a model-by-model basis whether the cleavage of the C99 720-670 biosensor by g-secretase increases or decreases 720/670 FRET ratios.

      Author response image 1.

      Furthermore, we co-expressed the C99 720-670 biosensor and visible range fluorescence reporters to record other biological events, such as changes in ion concentration, in cortex primary neurons. Interestingly, several biological events uniquely detected in the neurons with higher 720/670 ratios, which are expected to exhibit lower endogenous g-secretase activity, are recapitulated by pharmacological inhibition of g-secretase (unpublished observations), ensuring that higher 720/670 ratios are indicative of lower g-secretase activity in mouse cortex primary neurons. Such multiplexed imaging will help to further elucidate how the C99 720-670 biosensor behaves in response to the modulation of g-secretase activity.

      Lastly, the scope of this study was to develop and validate a novel imaging assay employing a NIR FRET biosensor to measure g-secretase activity on a cell-by-cell basis in live wild-type mouse brains. However, we do appreciate the reviewer’s suggestion and think employing this new platform in FAD PSEN1 knock-in (KI) or PSEN1 conditional knockout (cKO) mice would provide valuable information. Furthermore, we are keen to expand our capability to monitor g-secretase with subcellular resolution in live mouse brains in vivo, which we will explore in follow-up studies. Thank you for your thoughtful suggestions.

      Reference

      - Maesako M, Sekula NM, Aristarkhova A, Feschenko P, Anderson LC, Berezovska O. Visualization of PS/γ-Secretase Activity in Living Cells. iScience. 2020 Jun 26;23(6):101139.

      - Houser MC, Hou SS, Perrin F, Turchyna Y, Bacskai BJ, Berezovska O, Maesako M. A Novel NIR-FRET Biosensor for Reporting PS/γ-Secretase Activity in Live Cells. Sensors (Basel). 2020 Oct 22;20(21):5980.

      - Lundin B, Wieckiewicz N, Dickson JR, Sobolewski RGR, Sadek M, Armagan G, Perrin F, Hyman BT, Berezovska O, and Maesako M. APP is a regulator of endo-lysosomal membrane permeability. 2024 in revision

      Reviewer #2 (Public Review):

      Regarding the variability and spatial correlation- the dynamic range of the sensor previously reported in vitro is in the range of 20-30% change (Houser et al 2020) whereas the range of FR detected in vivo is between cells is significantly larger in this MS. This raises considerable doubts for specific detection of cellular activity.

      One direct way to test the dynamic range of the sensor in vivo, is to increase or decrease endogenous gamma-secretase activity and to ensure this experimental design allows to accurately monitor gamma-secretase activity. In the previous characterization of the reporter (Hauser et al 2020), DAPT application and inhibition of gamma-secretase activity results in increased FR (Figures 2 and 3 of Houser et al). This is in agreement with the design of the biosensor, since FR should be inversely correlated with enzymatic activity. Here, the authors repeated the experiment, and surprisingly found an opposite effect, in which DAPT significantly reduced FR.

      The authors maintain that this result could be due to differences in cell-types, However, this experiment was previously performed in cultures cortical neurons and many different cell types, as noted by the authors in their rebuttal.

      Instead, I would argue that these results further highlight the concerns of using FR in vivo, since based on their own data, there is no way to interpret this quantification. If DAPT reduces FR, does this mean we should now interpret the results of higher FR corresponds to higher g-sec activity? Given a number of papers from the authors claiming otherwise, I do not understand how one can interpret the results as indicating a cell-specific effect.

      In conclusion, without any ground truth, it is impossible to assess and interpret what FR measurements of this sensor in vivo mean. Therefore, the use of this approach as a way to study g-sec activity in vivo seems premature.

      Please find our response to reviewer 1’s similar critique above. Here, we again would like to re-clarify the design of our C99 720-670 biosensor. The orientation between the donor, miRFP670, and acceptor, miRFP720, is dependent on how C99, the sensing domain that is cleaved by g-secretase, and the anchoring domain are integrated into the membrane (Figure 1A). Although it was surprising to us, it is possible that g-secretase inhibition decreases 720/670 ratios if 1) the donor-acceptor orientation plays a significant role in FRET and 2) the baseline structure of the C99 720-670 biosensor is different between cell types. This appears to be the case between the cortex and cerebellum primary neurons (i.e., DAPT increases 720/670 ratios in the cortex neurons while decreasing in the cerebellum neurons), and we expect it in cortical neurons in vitro vs. in vivo as well. Hence, we recommend that users first validate whether the cleavage of the C99 720-670 biosensor by g-secretase increases or decreases 720/670 FRET ratios in their models. If DAPT increases 720/670 ratios (like in cortex primary neurons, CHO, MEF, and BV2 cells that we have validated), the results of higher ratios should be interpreted as lower g-secretase activity. If DAPT reduces 720/670 ratios (like in cerebellum primary neurons and the somatosensory cortex neurons in vivo), we should interpret the results of higher ratios corresponding to higher g-secretase activity. From a biosensing perspective, although we need to know which is the case on a model-by-model basis, we think whether g-secretase activity increases or decreases the 720/670 ratio is not critical; rather, if it can significantly change FRET efficiency is more important. Thank you for your critical comments.

      Reviewer #3 (Public Review):

      This paper builds on the authors' original development of a near infrared (NIR) FRET sensor by reporting in vivo real-time measurements for gamma-secretase activity in the mouse cortex. The in vivo application of the sensor using state-of-the-art techniques is supported by a clear description and straightforward data, and the project represents significant progress because so few biosensors work in vivo. Notably, the NIR biosensor is detectable to ~ 100 µm depth in the cortex. A minor limitation is that this sensor has a relatively modest ΔF as reported in Houser et al, which is an additional challenge for its use in vivo. Thus, the data is fully dependent on post-capture processing and computational analyses. This can unintentionally introduce biases but is not an insurmountable issue with the proper controls that the authors have performed here.

      The following opportunity for improving the system didn't initially present itself until the authors performed an important test of the FRET sensor in vivo following DAPT treatment. The authors get credit for diligently reporting the unexpected decrease in 720/670 FRET ratio. In turn this has led to a suggestion that this sensor would benefit from a control that is insensitive to gamma-secretase activity. FRET influences that are independent of gamma-secretase activity could be distinguished by this control.

      From previous results in cultured neurons, the authors expected an increase in FRET following DAPT treatment in vivo. These expectations fit with the sensor's mode-of-action because a block of gamma-secretase activity should retain the fluorophores in proximity. When the authors observed decreased FRET, the conclusion was that the sensor performs differently in different cellular contexts. However, a major concern is that mechanistically it is unclear how this could occur with this type of sensor. The relative orientation of fluorophores indeed can contribute to FRET efficiency in tension-based sensors. However, the proteolysis expected with gamma-secretase activity would release tension and orientation constraints. Thus, the major contributing FRET factor is expected to be distance, not orientation. Alternative possibilities that could inadvertently affect readouts include an additional DAPT target in vivo sequestering the inhibitor, secondary pH effects on FRET, photo-bleaching, or an unidentified fluorophore quencher in vivo stimulated by DAPT. Ultimately this new FRET sensor would benefit from a control that is insensitive to gamma-secretase activity. FRET influences that are independent of gamma-secretase activity could be distinguished by this control.

      Given that the anchoring domain is composed of three transmembrane regions and the linker connecting the donor, miRFP670, and the acceptor, miRFP720, is highly flexibility, we are still not sure if the orientation constraint of the C99 720-670 biosensor is canceled by g-secretase cleavage. This means that the orientation between the donor and acceptor in the cleaved form of the sensor can be different between model and model. As explained in response to the similar critique of reviewer 1, we found that the 720/670 ratio is significantly higher in the cerebellum than in the cortex neurons even in the presence of DAPT (Figure 1 for the review only). Therefore, we currently think the donor-acceptor orientation, both in the cleaved and non-cleaved forms of the sensor, plays a role in determining whether g-secretase activity increases or decreases the 720/670 ratio (but this view may change depends on the future discoveries).

      As the reviewer pointed out, the NIR g-secretase biosensor with no biological activity is important; however, a point mutation in the transmembrane region of the C99 sensing domain could also result in altered orientation between the donor, miRFP670, and the acceptor, miRFP720, since C99 is connected to the acceptor, which may bring additional complexity. Also, as noted in our point-by-point responses to the initial review, the mutation(s) that can fully block C99 processing by g-secretase has not been established. Therefore, we asked if a subtle but significant correlation we found between the 720/670 ratio in a neuron and those ratios in its neighboring neurons is canceled by g-secretase inhibitor administration. Since the correlation was abolished (Figure 5), it suggests that the correlation between the 720/670 ratio in a neuron and those ratios in the neighboring neurons depends on g-secretase activity.

      It is not fully established how g-secretase activity is spatiotemporally regulated; therefore, the development of more appropriate control biosensors and further validation of our findings with complementary approaches would be crucial in our follow-up studies. Thank you for your valuable comments.


      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      (1) Overall the authors provide a very limited data set and in fact only a proof of concept that their sensor can be applied in vivo. This is not really a research paper, but a technical note. With respect to their observation of clustered activity, the images do not convince me as they show only limited areas of interest: from these examples (for instance fig 5) one sees that merely all neurons in the field show variable activity and a clustering is not really evident from these examples. Even within a cluster, there is variability. With r values between 0.23 to .36, the correlation is not that striking. The authors herein do not control for expression levels of the sensor: for instance, can they show that in all neurons in the field, the sensor is equally expressed, but FRET activity is correlated in sets of neurons? Or are the FRET activities that are measured only in positively transduced neurons, while neighboring neurons are not expressing the sensor? Without such validation, it is difficult to make this conclusion.

      We appreciate the reviewer’s comment. We agree with the reviewer that this study is not testing a new hypothesis but rather developing and validating a novel tool. However, we do believe such a “technical note” is as important as a “research paper” since advancing technique(s) is the only way to break the barrier in our understanding of complex biological events. Therefore, this study aimed to develop and validate a novel imaging assay employing a recently engineered NIR FRET biosensor to measure γ-secretase activity (Houser et al., 2020) on a cell-by-cell basis in live mouse brains, enabling us for the first time to examine how γ-secretase activity is regulated in individual neurons in vivo, and uncover that γ-secretase activity may influence γ-secretase in neighboring neurons. Like the reviewer, we found that the cell-to-cell correlation is not that striking, as we clearly stated in the original manuscript: “Although the effect size is modest, we also found a statistically significant correlation between…” 

      We were also aware that there is variability in a cluster of neurons exhibiting similar γ-secretase activities. Per the reviewer’s request, the images have been expanded to the entire imaging field of view (new Figure 3A). Although the effect size is small, our unbiased quantification showed a statistically significant linear correlation between the 720/670 ratio in each neuron and the average ratio in five neighboring neurons (Figure 3, Figure 3—figure supplement 2, and Figure 4), and the correlation was canceled by the administration of γ-secretase inhibitor (Figure 5). These findings made it impossible to conclude that γ-secretase does not affect γ-secretase in neighboring neurons.

      Regarding the expression levels and pattern of the sensor, an AAV-based gene delivery approach employed in this study results in the expression of the sensor not in all but in selected neurons. We have newly performed immunohistochemistry, showing that approximately 40% of NeuN-positive neurons express the C99 720-670 biosensor (new Figure 1—figure supplement 2A and 2B).

      Reference

      - Houser MC, Hou SS, Perrin F, Turchyna Y, Bacskai BJ, Berezovska O, Maesako M. A Novel NIRFRET Biosensor for Reporting PS/γ-Secretase Activity in Live Cells. Sensors (Basel). 2020 Oct 22;20(21):5980. 

      (2) Secondly, I am lacking some more physiological relevance for this observation. The experiments are performed in wild-type mice, but it would be more relevant to compare this with a fadPSEN1 KI or a PSEN1cKO model to investigate the contribution of a gain of toxic function or LOF to the claimed cell non-autonomous activations. Or what would be the outcome if the sensor was targeted to glial cells?

      The AAV vector in this study encodes the human synapsin promoter and our new immunohistochemistry demonstrates that nearly 100% of the cells expressing the C99 720-670 sensor are NeuN positive, and we hardly detected the sensor expression in Iba-1 or GFAP-positive cells (new Figure 1— figure supplement 2A and 2C). 

      The mechanism underlying the cell non-autonomous regulation of γ-secretase remains unclear. As discussed in our manuscript, one of the potential hypotheses could be that secreted abeta42 plays a role (Zoltowska et al., 2023 eLife). Whereas this report focuses on the development and validation of a novel assay using wildtype mice, future follow-up studies employing FAD PSEN1 knock-in (KI) and PSEN1 conditional knockout (cKO) mice would allow us test the hypothesis above since abeta42 is known to increase in some FAD PSEN1 KI mice (Siman et al., 2000 J Neurosci, Vidal et al., 2012 FASEB J) while decreases in PSEN1 cKO mice (Yu et al., 2001 Neuron).  

      Reference

      - Siman R, Reaume AG, Savage MJ, Trusko S, Lin YG, Scott RW, Flood DG. Presenilin-1 P264L knockin mutation: differential effects on abeta production, amyloid deposition, and neuronal vulnerability. J Neurosci. 2000 Dec 1;20(23):8717-26. 

      - Vidal R, Sammeta N, Garringer HJ, Sambamurti K, Miravalle L, Lamb BT, Ghetti B. The Psen1-L166Pknock-in mutation leads to amyloid deposition in human wild-type amyloid precursor protein YAC transgenic mice. FASEB J. 2012 Jul;26(7):2899-910. 

      - Yu H, Saura CA, Choi SY, Sun LD, Yang X, Handler M, Kawarabayashi T, Younkin L, Fedeles B, Wilson MA, Younkin S, Kandel ER, Kirkwood A, Shen J. APP processing and synaptic plasticity in presenilin-1 conditional knockout mice. Neuron. 2001 Sep 13;31(5):713-26. 

      - Zoltowska KM, Das U, Lismont S, Enzlein T, Maesako M, Houser MC, Franco ML, Moreira DG, Karachentsev D, Becker A, Hopf C, Vilar M, Berezovska O, Mobley W, Chávez-Gutiérrez L. Alzheimer's disease linked Aβ42 exerts product feedback inhibition on γ-secretase impairing downstream cell signaling. eLife. 2023. 12:RP90690

      (3) For this reviewer it is not clear what resolution they are measuring activity, at cellular or subcellular level? In other words are the intensity spots neuronal cell bodies? Given g-sec activity are in all endosomal compartments and at the cell surface, including in the synapse, does NIR imaging have the resolution to distinguish subcellular or surface localized activities? If cells 'communicate' g-sec activities, I would expect to see hot spots of activity at synapses between neurons: is this possible to assess with the current setup? 

      Since this study aimed to determine how γ-secretase activity is regulated on a cell-by-cell basis in live mouse brains, the FRET signal was detected in neuronal cell bodies. While our current set-up for in vivo can only record γ-secretase activity with a cellular resolution, we previously detected predominant γ-secretase activity in the endo-lysosomal compartments (Maesako et al., 2022 J Neurosci) as well as in certain spots of neuronal processes (Maesako et al., 2020 iScience) in cultured primary neurons using the same microscope set-up. Therefore, future studies will expand our capability to monitor γ-secretase with subcellular resolution in live mouse brains in vivo.

      Reference

      - Maesako M, Sekula NM, Aristarkhova A, Feschenko P, Anderson LC, Berezovska O. Visualization of PS/γ-Secretase Activity in Living Cells. iScience. 2020 Jun 26;23(6):101139. 

      - Maesako M, Houser MCQ, Turchyna Y, Wolfe MS, Berezovska O. Presenilin/γ-Secretase Activity Is Located in Acidic Compartments of Live Neurons. J Neurosci. 2022 Jan 5;42(1):145-154. 

      (4) Without some more validation and physiological relevant studies, it remains a single observation and rather a technical note paper, instead of a true research paper.

      Please find our response above to the critique (1).  

      Reviewer #2 (Public Review):

      (1) Regarding the variability and spatial correlation- the dynamic range of the sensor previously reported in vitro is in the range of 20-30% change (Houser et al 2020) whereas the range of FR detected in vivo is between cells is significantly larger (Fig. 3). This raises considerable doubts for specific detection of cellular activity (see point 3).

      Please find our response below to the critique (2).

      (2) One direct way to test the dynamic range of the sensor in vivo, is to increase or decrease endogenous gamma-secretase activity and to ensure this experimental design allows to accurately monitor gamma-secretase activity. In the previous characterization of the reporter (Hauser et al 2020), DAPT application and inhibition of gammasecretase activity results in increased FR (Figures 2 and 3 of Houser et al). This is in agreement with the design of the biosensor, since FR should be inversely correlated with enzymatic activity. Here, while the authors repeat the same manipulation and apply DAPT to block gamma-secretase activity, it seems to induce the opposite effect and reduces FR (comparing figures 8 with figures 5,6,7). First, there is no quantification comparing FR with and without DAPT. Moreover, it is possible to conduct this experiment in the same animals, meaning comparing FR before and after DAPT in the same mouse and cell populations. This point is absolutely critical- if indeed FR is reduced following DAPT application, this needs to be explained since this contradicts the basic design and interpretation of the biosensor.

      We appreciate the reviewer’s comment. In our hand, overexpression of γ-secretase four components (PSEN, Nct, Aph1, and Pen2) is the only reliable and reproducible approach to increase the cellular activity of γ-secretase, which we successfully employed in vitro but not in vivo yet. Therefore, a γ-secretase inhibitor was used to determine the dynamic range of our FRET biosensor in vivo. FRET efficiency depends on the proximity and orientation of donor and acceptor fluorescent proteins. In our initial study, we engineered the original C99 EGFP-RFP biosensor (C99 R-G), and the replacement of EGFP and RFP with mTurquoise-GL and YPet, respectively, expanded the dynamic range of the sensor approximately 2 times. Moreover, extending the linker length from 20 a.a. to 80 a.a. increased the dynamic range 2.2 times (Maesako et al., 2020 iScience). Of note, the C99 720-670 NIR analog, which has the same 80 a.a. linker but miRFP670 and miRFP720 as the donor and acceptor, exhibited a slightly better dynamic range than the C99 Y-T sensor (Houser et al., 2020 Sensor). Our interpretation, at that time, was that the cleavage of the C99 720-670 biosensor by γ-secretase results in a longer distance between the donor and acceptor, and thus, the FRET ratio always increases by γ-secretase inhibition (i.e., proximity plays a more significant role than orientation in our biosensors). As expected, a significantly increased FRET ratio was detected in various cell lines by γ-secretase inhibitors, including CHO, MEF, BV2 cells, and mouse cortical primary neurons. Moreover, to further ensure the C99 720-670 biosensor records changes in γ-secretase activity, the multiplexing capability of the biosensor was utilized. In other words, we co-expressed the C99 720-670 biosensor and visible range fluorescence reporters to record other biological events, such as changes in ion concentration, etc., in cortex primary neurons. Strikingly, several biological events uniquely detected in the neurons with diminished endogenous γ-secretase activity, i.e., neurons with higher FRET ratios, are recapitulated by pharmacological inhibition of γ-secretase (unpublished observation). This approach has allowed us to ensure that increased FRET ratios are indicative of decreased endogenous γ-secretase activity in mouse cortical primary neurons. 

      However, as recommended by the reviewer, we have performed a new experiment to compare the FRET ratio before and after DAPT, a potent γ-secretase inhibitor, administration in the same mouse and cell populations. Surprisingly, we found that of DAPT significantly decreases 720/670 ratios, which is included in our revised manuscript (Figure 2—figure supplement 2C). This unexpected FRET reduction by γ-secretase inhibition was also found in mouse primary neurons derived from the cerebellum (unpublished observation). These findings suggest that orientation plays a significant role in our γ-secretase FRET biosensor and whether the FRET ratio is increased or decreased by the γ-secretase-mediated cleavage depends on cell types. Of note, the difference in FRET ratios with and without DAPT was comparable between primary cortex neurons (24.3%) and the somatosensory cortex neurons in vivo (22.1%). Our new findings suggest that how our biosensors report γ-secretase activity (i.e., increased vs. decreased FRET ratio) must be examined on a model-by-model basis, which is clearly noted in the revised manuscript: 

      Reference

      - Houser MC, Hou SS, Perrin F, Turchyna Y, Bacskai BJ, Berezovska O, Maesako M. A Novel NIRFRET Biosensor for Reporting PS/γ-Secretase Activity in Live Cells. Sensors (Basel). 2020 Oct 22;20(21):5980. 

      - Maesako M, Sekula NM, Aristarkhova A, Feschenko P, Anderson LC, Berezovska O. Visualization of PS/γ-Secretase Activity in Living Cells. iScience. 2020 Jun 26;23(6):101139. 

      (3) For further validation, I would suggest including in vivo measurements with a sensor version with no biological activity as a negative control, for example, a mutation that prevents enzymatic cleavage and FRET changes. This should be used to showcase instrumental variability and would help to validate the variability of FR is indeed biological in origin. This would significantly strengthen the claims regarding spatial correlation within population of cells.

      We fully agree with the reviewer that having a sensor version containing a mutation, which prevents enzymatic cleavage and thus FRET changes, as a negative control is preferable. In our previous study, we developed and validated the APP-based C99 Y-T and Notch1-based N100 Y-T biosensors (Maesako et al., 2020 iScience). It is well established that Notch1 cleavage is entirely blocked by Notch1 V1744G mutation (Schroeter et al., 1998 Nature; Huppert et al., 2000 Nature), and therefore, we introduced the mutation into N100 Y-T biosensor and used it as a negative control. On the other hand, such a striking mutation has never been identified in APP processing. To successfully monitor γ-secretase activity in deep tissue in vivo, we replaced Turquoise-GL and YPet in the C99 Y-T and N100 Y-T biosensors with miRFP670 and miRFP720, respectively. While the APP-based C99 720-670 biosensor allows recording γ-secretase activity (Houser et al., 2020 Sensors), we found the N100 720-670 sensor exhibits a very small dynamic range, not enabling to reliably measure γ-secretase activity. Taken together, there is not currently available NIR γ-secretase biosensor with no biological activity.

      Reference

      - Houser MC, Hou SS, Perrin F, Turchyna Y, Bacskai BJ, Berezovska O, Maesako M. A Novel NIRFRET Biosensor for Reporting PS/γ-Secretase Activity in Live Cells. Sensors (Basel). 2020 Oct 22;20(21):5980. 

      - Huppert SS, Le A, Schroeter EH, Mumm JS, Saxena MT, Milner LA, Kopan R. Embryonic lethality in mice homozygous for a processing-deficient allele of Notch1. Nature. 2000 Jun 22;405(6789):966-70. 

      - Maesako M, Sekula NM, Aristarkhova A, Feschenko P, Anderson LC, Berezovska O. Visualization of PS/γ-Secretase Activity in Living Cells. iScience. 2020 Jun 26;23(6):101139. 

      - Schroeter EH, Kisslinger JA, Kopan R. Notch-1 signalling requires ligand-induced proteolytic release of intracellular domain. Nature. 1998 May 28;393(6683):382-6. 

      (4) In general, confocal microcopy is not ideal for in vivo imaging. Although the authors demonstrate data collected using IR imaging increases penetration depth, out of focus fluorescence is still evident (Figure 4). Many previous papers have primarily used FLIM based analysis in combination with 2p microscopy for in vivo FRET imaging (Some examples: Ma et al, Neuron, 2018; Massengil et al, Nature methods, 2022; DIaz-Garcia et al, Cell Metabolism, 2017; Laviv et al, Neuron, 2020). This technique does not rely on absolute photon number and therefore has several advantage sin terms of quantification of FRET signals in vivo.

      It is therefore likely that use of previously developed sensors of gamma-secretase with conventional FRET pairs, might be better suited for in vivo imaging. This point should be at least discussed as an alternative.

      The reviewer notes that 2p-FLIM may provide certain advantages over our confocal spectral imaging approach for detecting in vivo FRET. In our response below, we will address both the FRET detection method (FLIM vs. spectral) and microscope modality (2p vs. confocal). 

      As noted by the reviewer, we do acknowledge that 2p-FLIM has been utilized to detect FRET in vivo. On the other hand, the ratiometric spectral FRET approach has also been utilized in many in vivo FRET studies (Kuchibhotla et al., 2008 Neuron; Kuchibhotla et al., 2014 PNAS; Hiratsuka et al., 2015 eLife; Maesako et al., 2017 eLife; Konagaya et al., 2017 Cell Rep; Calvo-Rodriguez et al., 2020 Nat Communi; Hino et al., 2022 Dev Cell). We think both approaches have advantages and disadvantages, as discussed in a previous review (Bajar et al., 2016 Sensors), but they complement each other. Indeed, we regularly employ FLIM in cell culture studies (Maesako et al., 2017 eLife; McKendell et al., 2022 Biosensors; Devkota 2024 Cell Rep), and our recent study also utilized 2p-FLIM for in vivo NIR imaging (although not for detecting FRET) (Hou et al., 2023, Nat Biomed Eng); therefore, we are confident that 2p-FLIM can be adapted in our follow-up studies for γ-secretase recording.

      Regarding microscope modality, we agree with the reviewer’s point that generally two-photon microscopy can achieve larger penetration depths than confocal microscopy and is therefore more ideal for in vivo FRET imaging. However, in this study, since our aim was to quantify γ-secretase activity in the superficial layers of the cortex (<200 microns in depth), both NIR confocal and multiphoton microscopies could be used to achieve this imaging objective. Additionally, we chose to use confocal microscopy with our NIR C99 720-670 probe due to the probe’s slightly but higher sensitivity compared to our C99 Y-T probe (Houser et al., 2020 Sensors). Imaging γ-secretase activity with our NIR C99-720-670 probe has the additional advantage that it will allow us in future studies to multiplex with visible FRET pairs using multiphoton microscopy in the same brain region. Furthermore, our demonstration of in vivo FRET imaging using NIR confocal microscopy avoids some of the issues associated with multiphoton microscopy, including potential phototoxicity due to high average and peak laser powers and the high complexity and costs of the instrumentation. For future studies aimed at interrogating γ-secretase activity in deeper cortical regions, multiphoton microscopy could be applied for FLIM or ratiometric spectral imaging of either our NIR or visible FRET probes. Per the reviewer’s request, we have added multiphoton FRET imaging as an alternative in the discussion section. 

      Reference

      - Bajar BT, Wang ES, Zhang S, Lin MZ, Chu J. A Guide to Fluorescent Protein FRET Pairs. Sensors (Basel). 2016 Sep 14;16(9):1488.  

      - Calvo-Rodriguez M, Hou SS, Snyder AC, Kharitonova EK, Russ AN, Das S, Fan Z, Muzikansky A,

      Garcia-Alloza M, Serrano-Pozo A, Hudry E, Bacskai BJ. Increased mitochondrial calcium levels

      associated with neuronal death in a mouse model of Alzheimer's disease. Nat Commun. 2020 May

      1;11(1):2146  

      - Devkota S, Zhou R, Nagarajan V, Maesako M, Do H, Noorani A, Overmeyer C, Bhattarai S, Douglas JT, Saraf A, Miao Y, Ackley BD, Shi Y, Wolfe MS. Familial Alzheimer mutations stabilize synaptotoxic γ-secretase-substrate complexes. Cell Rep. 2024 Feb 27;43(2):113761. 

      - Hino N, Matsuda K, Jikko Y, Maryu G, Sakai K, Imamura R, Tsukiji S, Aoki K, Terai K, Hirashima T, Trepat X, Matsuda M. A feedback loop between lamellipodial extension and HGF-ERK signaling specifies leader cells during collective cell migration. Dev Cell. 2022 Oct 10;57(19):2290-2304.e7.

      - Hiratsuka T, Fujita Y, Naoki H, Aoki K, Kamioka Y, Matsuda M. Intercellular propagation of extracellular signal-regulated kinase activation revealed by in vivo imaging of mouse skin. eLife. 2015 Feb 10;4:e05178.  

      - Hou SS, Yang J, Lee JH, Kwon Y, Calvo-Rodriguez M, Bao K, Ahn S, Kashiwagi S, Kumar ATN, Bacskai BJ, Choi HS. Near-infrared fluorescence lifetime imaging of amyloid-β aggregates and tau fibrils through the intact skull of mice. Nat Biomed Eng. 2023 Mar;7(3):270-280.  

      - Houser MC, Hou SS, Perrin F, Turchyna Y, Bacskai BJ, Berezovska O, Maesako M. A Novel NIRFRET Biosensor for Reporting PS/γ-Secretase Activity in Live Cells. Sensors (Basel). 2020 Oct 22;20(21):5980. 

      - Konagaya Y, Terai K, Hirao Y, Takakura K, Imajo M, Kamioka Y, Sasaoka N, Kakizuka A, Sumiyama K, Asano T, Matsuda M. A Highly Sensitive FRET Biosensor for AMPK Exhibits Heterogeneous AMPK Responses among Cells and Organs. Cell Rep. 2017 Nov 28;21(9):2628-2638.  

      - Kuchibhotla KV, Goldman ST, Lattarulo CR, Wu HY, Hyman BT, Bacskai BJ. Abeta plaques lead to aberrant regulation of calcium homeostasis in vivo resulting in structural and functional disruption of neuronal networks. Neuron. 2008 Jul 31;59(2):214-25  

      - Kuchibhotla KV, Wegmann S, Kopeikina KJ, Hawkes J, Rudinskiy N, Andermann ML, Spires-Jones TL, Bacskai BJ, Hyman BT. Neurofibrillary tangle-bearing neurons are functionally integrated in cortical circuits in vivo. Proc Natl Acad Sci U S A. 2014 Jan 7;111(1):510-4  

      - Maesako M, Horlacher J, Zoltowska KM, Kastanenka KV, Kara E, Svirsky S, Keller LJ, Li X, Hyman BT, Bacskai BJ, Berezovska O. Pathogenic PS1 phosphorylation at Ser367. Elife. 2017 Jan 30;6:e19720.  

      - McKendell AK, Houser MCQ, Mitchell SPC, Wolfe MS, Berezovska O, Maesako M. In-Depth

      Characterization of Endo-Lysosomal Aβ in Intact Neurons. Biosensors (Basel). 2022 Aug 20;12(8):663. 

      (Recommendations For The Authors):

      (5) Minor issues- Figure 4 describes the analysis procedure, which seems to be standard practice in the field. This can be described in the methods section rather than in the main figure.

      Per the reviewer’s suggestion, this figure has been moved to Figure 2—figure supplement 1. 

      Reviewer #3 (Public Review):

      (1) This paper builds on the authors' original development of a near infrared (NIR) FRET sensor by reporting in vivo real-time measurements for gamma-secretase activity in the mouse cortex. The in vivo application of the sensor using state of the art techniques is supported by a clear description and straightforward data, and the project represents significant progress because so few biosensors work in vivo. Notably, the NIR biosensor is detectable to ~ 100 µm depth in the cortex. A minor limitation is that this sensor has a relatively modest ΔF as reported in Houser et al, which is an additional challenge for its use in vivo. Thus, the data is fully dependent on post-capture processing and computational analyses. This can unintentionally introduce biases but is not an insurmountable issue with the proper controls that the authors have performed here.

      We appreciate the reviewer’s overall positive evaluation. As described in our response to the Reviewer 2’s critique (2), ΔF in vivo has been characterized (Figure 2—figure supplement 2C).

      (2) The observation of gamma-secretase signaling that spreads across cells is potentially quite interesting, but it can be better supported. An alternative interpretation is that there exist pre-formed and clustered hubs of high gamma-secretase activity, and that DAPT has stochastic or differential accessibility to cells within the cluster. This could be resolved by an experiment of induction, for example, if gamma-secretase activity is induced or activated at a specific locale and there was observed coordinated spreading to neighboring neurons with their sensor.

      We agree with the reviewer that the stochastic or differential accessibility of DAPT to cell clusters with different γ-secretase can be an alternative interpretation of our data, which is now included in the Discussion of the revised manuscript. Undoubtedly, the activation of γ-secretase would provide valuable information. However, as described in the response above to Reviewer 2’s critique #2, overexpressing the four components of γ-secretase (PSEN, Nct, Aph1, and Pen2) is the only reliable and reproducible approach to increasing the cellular activity of γ-secretase, which was achieved in our in vitro study but not yet in vivo. Our future study will develop and characterize the approach to induce γ-secretase activity to further perform detailed mechanistic studies.

      (3) Furthermore, to rule out the possibility that uneven viral transduction was not simply responsible for the observed clustering, it would be helpful to see an analysis of 670nm fluorescence alone.

      Our new analysis comparing 670 nm fluorescence intensity and that in five neighbor neurons shows a positive correlation (Figure 3—figure supplement 1A), suggesting that AAV was unevenly transduced. On the other hand, the 720/670 ratio (i.e., γ-secretase activity) is not correlated with 670 nm fluorescence intensity (i.e., C99 720-670 biosensor expression) (Figure 3—figure supplement 1B). This strongly suggests that, while C99 720-670 biosensor expression was not evenly distributed in the brain, the uneven probe expression did not impact the capability of γ-secretase recording.  

      Reviewer #3 (Recommendations For The Authors):

      (4) One minor suggestion might be to consider Figures 6-7 as orthogonal supporting analyses rather than "validation". It might then be helpful to present them together with Figure 5.

      We have moved the initial Figure 6 and 7 to Figure 3—figure supplement 2 and Figure 4, respectively.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This study presents a valuable conceptual advance of how Vitamin A and its derivatives contribute to atherosclerosis. There is solid evidence invoking the contributions of specialized populations of T cells in atherosclerosis resolution, including use of multiple in vivo models to validate the functional effect. The significance of the study would be strengthened with more detailed interrogation of lesions composition and consolidation with previous work on the topic from human studies.

      Answer: We thank the reviewers and editorial office for their comments and constructive criticism. Below we provide point by point responses to the comments and concerns, which include the issues of lesion composition and consolidation with human studies. We also proofread the manuscript and included information about the immunostaining procedures that were previously missing (Lines 199 – 206).

      Public Reviews

      REVIEWER #1:

      This is an interesting study by Pinos and colleagues that examines the effect of beta carotene on atherosclerosis regression. The authors have previously shown that beta carotene reduces atherosclerosis progress and hepatic lipid metabolism, and now they seek to extend these findings by feeding mice a diet with excess beta carotene in a model of atherosclerosis regression (LDLR antisense oligo plus Western diet followed by LDLR sense oligo and chow diet). They show some metrics of lesion regression are increased upon beta carotene feeding (collagen content) while others remain equal to normal chow diet (macrophage content and lesion size). These effects are lost when beta carotene oxidase (BCO) is deleted. The study adds to the existing literature that beta carotene protects from atherosclerosis in general, and adds new information regarding regulatory T-cells. However, the study does not present significant evidence about how beta-carotene is affecting T-cells in atherosclerosis. For the most part, the conclusions are supported by the data presented, and the work is completed in multiple models, supporting its robustness. However there are a few areas that require additional information or evidence to support their conclusions and/or to align with the previously published work.

      Specific additional areas of focus for the authors:

      1. The premise of the story is that b-carotene is converted into retinoic acid, which acts as a ligand of the RAR transcription factor in T-regs. The authors measure hepatic markers of retinoic acid signaling (retinyl esters, Cyp26a1 expression) but none of these are measured in the lesion, which calls into question the conclusion that Tregs in the lesion are responsible for the regression observed with b-carotene supplementation.

      Answer: We agree with the Reviewer’s comment, which prompted us to quantify the expression of the retinoic acid-sensitive maker Cyp26b1 in the atherosclerotic lesions. Cyp26b1, together with Cyp26a1 and c1, contain retinoic acid response elements (RAREs) in their promoter, and therefore, are highly sensitive to retinoic acid. Indeed, the mRNA/protein expression of Cyp26s are widely considered surrogate markers for retinoic acid levels in cells or tissues.

      We typically use Cyp26a1 as a surrogate marker for retinoic acid signaling in the adipose tissue and the liver, as we did in this study. However, our RNA seq data in murine bone-marrow derived macrophages (mBMDMs) exposed to retinoic acid revealed that Cyp26b1 is the only Cyp26 family member responsive to retinoic acid (PMID: 36754230). Actually, Cyp26a1 or c1 were not expressed in our mBMDMs (data not shown). Unlike the M2 marker arginase 1, Cyp26b1 did not respond to IL-4 (Figure iA). Hence, Cyp26b1 is an adequate marker to evaluate retinoic acid signaling in the lesion of mice, rich in macrophages.

      Before staining the lesions, we validated the Cyp26b1 antibody by staining mBMDMs exposed to retinoic acid (Figure iB).

      Author response image 1.

      (A) mBMDMs were divided in M0 or M2 (exposed to IL-4 for 24 h), and then treated with either DMSO or retinoic acid for 6 h before harvesting for RNA seq analysis. Exploring the RNA seq dataset, we identified Cyp26b1 as a RA-sensitive gene in mBMDMs (PMID: 36754230). (B) Validation of Cyp26b1 antibody in mBMDMs exposed to retinoic acid confirms the suitability of this antibody for measuring retinoic acid signaling in our experimental settings.

      In the current version of the manuscript, we include the results of Cyp26b1 quantifications (Figure 5H, I), (Lines: 362 - 366). To put these findings in perspective to human studies, we discuss these results with the role human CYP26B1 plays in the atherosclerotic lesion (Lines: 450 - 464).

      1. There does not appear to be a strong effect of Tregs on the b-carotene induced pro-regression phenotype presented in Figure 5. The only major CD25+ cell dependent b-carotene effect is on collagen content, which matches with the findings in Figure 1 +2. This mechanistically might be very interesting and novel, yet the authors do not investigate this further or add any additional detail regarding this observation. This would greatly strengthen the study and the novelty of the findings overall as it relates to b-carotene and atherosclerosis.

      Answer: As the Reviewer points out, the effects of β-carotene on collagen content are more pronounced than those on CD68 content in the lesion. Indeed, we have observed the majority of the experiments in this manuscript.

      Collagen accumulation in the lesion is a complex process, where smooth muscle cells secrete collagen and plaque macrophages (typically) degrade it. Matrix metalloproteases produced by macrophages contribute to the degradation of collagen, and studies show that retinoic acid regulates the expression of metalloproteinases in various cell types (PMID: 2324527, 24008270). We explored the expression of metalloproteases in macrophages exposed to retinoic acid in our mBMDM RNA seq, but we did not observe any significant result (data not shown).

      Interestingly, M2 macrophages can secrete collagen by upregulating arginase 1 expression. In the current version of the manuscript, we acknowledge this in the results (Lines: 358-359) and in the discussion section (Lines: 443-449).

      1. The title indicates that beta-carotene induces Treg 'expansion' in the lesion, but this is not measured in the study.

      Answer: Following the suggestion by the Reviewer, we have re-worded the title to “β-carotene accelerates the resolution of atherosclerosis in mice”

      REVIEWER #2:

      Pinos et al present five atherosclerosis studies in mice to investigate the impact of dietary supplementation with b-carotene on plaque remodeling during resolution. The authors use either LDLR-ko mice or WT mice injected with ASO-LDLR to establish diet-induced hyperlipidemia and promote atherogenesis during 16 weeks, and then they promote resolution by switching the mice for 3 weeks to a regular chow, either deficient or supplemented with b-carotene. Supplementation was successful, as measured by hepatic accumulation of retinyl esters. As expected, chow diet led to reduced hyperlipidemia, and plaque remodeling (both reduced CD68+ macs and increased collagen contents) without actual changes in plaque size. But, b-carotene supplementation resulted in further increased collagen contents and, importantly, a large increase in plaque regulatory T-cells (TREG). This accumulation of TREG is specific to the plaque, as it was not observed in blood or spleen. The authors propose that the anti-inflammatory properties of these TREG explain the atheroprotective effect of b-carotene, and found that treatment with anti-CD25 antibodies (to induce systemic depletion of TREG) prevents b-carotene-stimulated increase in plaque collagen and TREG.

      1. An obvious strength is the use of two different mouse models of atherogenesis, as well as genetic and interventional approaches. The analyses of aortic root plaque size and contents are rigorous and included both male and female mice (although the data was not segregated by sex). Unfortunately, the authors did not provide data on lesions in en face preparations of the whole aorta.

      Answer: We appreciate the positive comments on rigor. We considered displaying our data segregated by sex, although for some experiments, we did not have matching numbers of male and female mice, which could be distracting for the reader. The goal of our study was to analyze changes in plaque composition. Therefore, our experimental approach was designed to study atherosclerosis resolution (plaque composition changes, but not plaque size) instead of atherosclerosis regression (both plaque composition and size change). As expected, we did not observe differences in plaque size at the level of the atherosclerotic root for any of our experiments, which deterred us from quantifying plaque content by en-face in the aorta.

      2.Overall, the conclusion that dietary supplementation with b-carotene may be atheroprotective via induction of TREG is reasonably supported by the evidence presented. Other conclusions put forth by the authors (e.g., that vitamin A production favors TREG production or that BCO1 deficiency reduces plasma cholesterol), however, will need further experimental evidence to be substantiated.

      Answer: We apologize for the lack of clarity in the presentation of our results and overstating our conclusions. We have rephrased some of these conclusions in the results and discussion sections.

      3.The authors claim that b-carotene reduces blood cholesterol, but data shown herein show no differences in plasma lipids between mice fed b-carotene-deficient and -supplemented diets (Figs. 1B, 2A, and S3A).

      Answer: As Reviewer 2 points out, we did not observe changes in plasma cholesterol between mice undergoing Resolution in response to β-carotene. For clarity, we rephrased our plasma lipids results for each of our experimental designs (Lines: 230 – 236, 270 – 272, and 288-290). We also include a clarification in the discussion section about the differential effects of β-carotene on plasma lipids when mice undergo atherosclerosis progression and resolution. (Lines: 419 - 430).

      1. Also, the authors present no experimental data to support the idea that BCO1 activity favors plaque TREG expansion (e.g., no TREG data in Fig 3 using Bco1-ko mice).

      Answer: We appreciate the suggestion by the Reviewer 2. In the current version of the manuscript, we stained the aortic roots from Bco1-/- mice for FoxP3. We did not observe differences between Control and β-carotene resolution groups, in agreement with the results in plaque composition (CD68 and collagen contents). These new data strengthen our manuscript and now we included these results as a Supplementary Figure 3D, E. (Lines: 465 - 471).

      5.As the authors show, the treatment with anti-CD25 resulted in only partial suppression of TREG levels. Because CD25 is also expressed in some subpopulation of effector T-cells, this could potentially cloud the interpretation of the results. Data in Fig 4H showing loss of b-carotene-stimulated increase in numbers of FoxP3+GFP+ cells in the plaque should be taken cautiously, as they come from a small number of mice. Perhaps an orthogonal approach using FoxP3-DTR mice could have produced a more robust loss of TREG and further confirmation that the loss of plaque remodeling is indeed due to loss of TREG.

      Answer: We agree with the reviewer, and we rephrased the results and discussion to avoid overstating our findings. We now acknowledge a second experimental approach would help us confirm our findings employing a blocking antibody targeting CD25. We favored the use of anti-CD25 infusions over other depletion methods based on the experimental protocol carried out by our collaborators in which the examined the effect of Tregs on atherosclerosis regression (PMID: 32336197). The utilization of FoxP3-DTR mice would nicely complement our findings. In the current version of the manuscript, we discuss this alternative approach (Line : 491 - 501).

      Recommendations for the Authors

      All reviewers agreed that despite the claims of the title, there is no direct interrogation of Tregs or vitamin A signaling in lesions.

      The work does not consolidate well with the role of B-carotene in human heart disease. Additional discussion and synthesis are required to elaborate on the significance of the findings. For example, the idea of beta carotene supplementation for cardiovascular prevention has attracted attention for years but recent meta-analysis showed no benefit, and, if anything, an increase in cardiovascular events. The U.S. Preventive Services Task Force (USPSTF) went as far to recommend AGAINST the use of beta-carotene for the prevention of cardiovascular disease.

      In light of the above point and elife editorial policies, please revise the title to include species.

      Answer: Thanks for your feedback. Carotenoid metabolism in mammals is complex, and establishing direct parallelisms between humans and rodents must be done with caution. For example, β-carotene supplementation in humans inevitably results in the accumulation of this compound in plasma, while in rodents, β-carotene is quickly metabolized to vitamin A. Our findings over the years reveal that the effects of β-carotene in mice derive exclusively from its role as vitamin A precursor.

      In the current study, we confirm our previous work utilizing Bco1-/- mice, which are unable to produce vitamin A when fed β-carotene. Then, we observe that vitamin A promotes atherosclerosis resolution in mice independently of alterations in plasma cholesterol in two independent mouse models. Lastly, we utilized anti-CD25 blocking antibodies to deplete Tregs to establish a direct connection between dietary β-carotene/vitamin A and Tregs in the lesion. While this experimental approach failed to completely deplete Tregs, our morphometric assays indicates that these infusions were sufficient to partially mitigate the effect of β-carotene on atherosclerosis resolution.

      Regardless, in the discussion section of our manuscript, we attempt to consolidate our preclinical studies with clinical data (Lines: 374 – 376, and 461 – 464).

      We have also revised the title, as suggested by Reviewer 1. We also included “mice” in the title to align with the editorial policies of eLife.

      Reviewer #1:

      1.1. The authors need to measure retinoic acid signaling directly in the lesion and in Tregs to be able to draw the conclusion that b-carotene is directly activating Tregs to promote regression.

      Answer: Please see comments above.

      1.2. The authors to investigate the role of beta carotene on collagen production by T-regs.

      Answer: Please see comments above.

      Reviewer #2 (Recommendations For The Authors):

      Major:

      2.1. If the authors still have frozen sections of the aortas from their Bco1-ko experiment, it should be trivial to look at plaque TREG contents to confirm that vitamin A production is indeed needed for the effect of b-carotene on plaque remodeling.

      Answer: Please see comments above.

      Minor:

      2.2. This reviewer wonders if the axis for lesion size in all figures is off by an order of magnitude. Most studies show aortic root lesions in the 10^5 um2 range, not in the 10^6 um2.

      Answer: We apologize for this error. We have corrected the units in all our quantifications.

      2.3. FPLC lipoprotein profiles would enhance the manuscript.

      Answer: We have run FPLCs for the plasmas and included them in the results (Lines: 233 – 236). Data are presented in Figure 1C, D.

      2.4.This reviewer could not cope with the thought that mice that are fed 16+ weeks a diet that is vitamin A-deficient did not become vit A-deficient (e.g., Fig. 1E). Perhaps the authors could elaborate a little on this in their discussion.

      Answer: Mice are extremely resistant to vitamin A deficiency. A common protocol to achieve deficiency in mice requires feeding a vitamin A deficient diet to dams during their pregnancy and lactation to deplete new-born pups of vitamin A stores. Even in that situation, pups display enough vitamin A stores to sustain circulating vitamin A levels to those observed in wild-type mice. In the current version of the manuscript, we have included a paragraph in the discussion to cover this “interesting” aspect. (Lines: 476 – 483).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews (consolidated):

      In the microglia research community, it is accepted that microglia change their shape both gradually and acutely along a continuum that is influenced by external factors both in their microenvironments and in circulation. Ideally, a given morphological state reflects a functional state that provides insight into a microglia's role in physiological and pathological conditions. The current manuscript introduces MorphoCellSorter, an open-source tool designed for automated morphometric analysis of microglia. This method adds to the many programs and platforms available to assess the characteristics of microglial morphology; however, MorphoCellSorter is unique in that it uses Andrew's plotting to rank populations of cells together (in control and experimental groups) and presents "big picture" views of how entire populations of microglia alter under different conditions. Notably, MorphoCellSorter is versatile, as it can be used across a wide array of imaging techniques and equipment. For example, the authors use MorphoCellSorter on images of fixed and live tissues representing different biological contexts such as embryonic stages, Alzheimer's disease models, stroke, and primary cell cultures.

      This manuscript outlines a strategy for efficiently ranking microglia beyond the classical homeostatic vs. active morphological states. The outcome offers only a minor improvement over the already available strategies that have the same challenge: how to interpret the ranking functionally.

      We would like to thank the reviewers for their careful reading and constructive comments and questions. While MorphoCellSorter currently does not rank cells functionally based on their morphology, its broad range of application, ease of use and capacity to handle large datasets provide a solid foundation. Combined with advances in single-cell transcriptomics, MorphoCellSorter could potentially enable the future prediction of cell functions based on morphology.

      Strengths and Weaknesses:

      (1) The authors offer an alternative perspective on microglia morphology, exploring the option to rank microglia instead of categorizing them with means of clusterings like k-means, which should better reflect the concept of a microglia morphology continuum. They demonstrate that these ranked representations of morphology can be illustrated using histograms across the entire population, allowing the identification of potential shifts between experimental groups. Although the idea of using Andrews curves is innovative, the distance between ranked morphologies is challenging to measure, raising the question of whether the authors oversimplify the problem.

      We have access to the distance between cells through the Andrew’s score of each cell. However, the challenge is that these distances are relative values and specific to each dataset. While we believe that these distances could provide valuable information, we have not yet determined the most effective way to represent and utilize this data in a meaningful manner.

      Also, the discussion about the pipeline's uniqueness does not go into the details of alternative models.The introduction remains weak in outlining the limitations of current methods (L90). Acknowledging this limitation will be necessary.

      Thank you for these insightful comments. The discussion about alternative methods was already present in the discussion L586-598 but to answer the request of the reviewers, we have revised the introduction and discussion sections to more clearly address the limitations of current methods, as well as discussed the uniqueness of the pipeline. Additionally, we have reorganized Figure 1 to more effectively highlight the main caveats associated with clustering, the primary method currently in use.

      (2) The manuscript suffers from several overstatements and simplifications, which need to be resolved. For example:

      a)  L40: The authors talk about "accurately ranked cells". Based on their results, the term "accuracy" is still unclear in this context.

      Thank you for this comment. Our use of the term "accurately" was intended to convey that the ranking was correct based on comparison with human experts, though we agree that it may have been overstated. We have removed "accurately" and propose to replace it with "properly" to better reflect the intended meaning.

      b) L50: Microglial processes are not necessarily evenly distributed in the healthy brain. Depending on their embedded environment, they can have longer process extensions (e.g., frontal cortex versus cerebellum).

      Thank you for raising this point to our attention. We removed evenly to be more inclusive on the various morphologies of microglia cells in this introductory sentence

      c)  L69: The term "metabolic challenge" is very broad, ranging from glycolysis/FAO switches to ATP-mediated morphological adaptations, and it needs further clarification about the author's intended meaning.

      Thank you for this comment, indeed we clarified to specify that we were talking about the metabolic challenge triggered by ischemia and added a reference as well.

      d) L75: Is morphology truly "easy" to obtain?

      Yes, it is in comparison to other parameters such as transcripts or metabolism, but we understand the point made by the reviewer and we found another way of writing it. As an alternative we propose: “morphology is an indicator accessible through…”

      e) L80: The sentence structure implies that clustering or artificial intelligence (AI) are parameters, which is incorrect. Furthermore, the authors should clarify the term "AI" in their intended context of morphological analysis.

      We apologize for this confusing writing, we reformulated the sentence as follows: “Artificial intelligence (AI) approaches such as machine learning have also been used to categorize morphologies (Leyh et al., 2021)”.

      f) L390f: An assumption is made that the contralateral hemisphere is a non-pathological condition. How confident are the authors about this statement? The brain is still exposed to a pathological condition, which does not stop at one brain hemisphere.

      We did not say that the contralateral is non-pathological but that the microglial cells have a non-pathological morphology which is slightly different. The contralateral side in ischemic experiments is classically used as a control (Rutkai et al 2022). Although It has been reported that differences in transcript levels can be found between sham operated animals and contralateral hemisphere in tMCAO mice (Filippenkov et al 2022) https://doi.org/10.3390/ijms23137308 showing that indeed the contralateral side is in a different state that sham controls, no report have been made on differences in term of morphology.

      We have removed “non-pathological” to avoid misinterpretations

      g)  Methodological questions:

      a) L299: An inversion operation was applied to specific parameters. The description needs to clarify the necessity of this since the PCA does not require it.

      Indeed, we are sorry for this lack of explanation. Some morphological indexes rank cells from the least to the most ramified, while others rank them in the opposite order. By inverting certain parameters, we can standardize the ranking direction across all parameters, simplifying data interpretation. This clarification has been added to the revised manuscript as follows:

      “Lacunarity, roundness factor, convex hull radii ratio, processes cell areas ratio and skeleton processes ratio were subjected to an inversion operation in order to homogenize the parameters before conducting the PCA: indeed, some parameters rank cells from the least to the most ramified, while others rank them in the opposite order. By inverting certain parameters, we can standardize the ranking direction across all parameters, thus simplifying data interpretation.”

      b) Different biological samples have been collected across different species (rat, mouse) and disease conditions (stroke, Alzheimer's disease). Sex is a relevant component in microglia morphology. At first glance, information on sex is missing for several of the samples. The authors should always refer to Table 1 in their manuscript to avoid this confusion. Furthermore, how many biological animals have been analyzed? It would be beneficial for the study to compare different sexes and see how accurate Andrew's ranking would be in ranking differences between males and females. If they have a rationale for choosing one sex, this should be explained.

      As reported in the literature, we acknowledge the presence of sex differences in microglial cell morphology. Due to ethical considerations and our commitment to reducing animal use, we did not conduct dedicated experiments specifically for developing MorphoCellSorter. Instead, we relied on existing brain sections provided by collaborators, which were already prepared and included tissue from only one sex—either female or male—except in the case of newborn pups, whose sex is not easily determined. Consequently, we were unable to evaluate whether MorphoCellSorter is sensitive enough to detect morphological differences in microglia attributable to sex. Although assessing this aspect is feasible, we are uncertain if it would yield additional insights relevant to MorphoCellSorter’s design and intended applications.

      To address this, we have included additional references in Table 1 of the revised manuscript and clearly indicated the sex of the animals from which each dataset was obtained.

      c) In the methodology, the slice thickness has been given in a range. Is there a particular reason for this variability?

      We could not spot any range in the text, we usually used 30µm thick sections in order to have entire or close to entire microglia cells.

      Although the thickness of the sections was identical for all the sections of a given dataset, only the plans containing the cells of interest were selected during the imaging for both of the ischemic stroke model. This explains why depending on how the cell is distributed in Z the range of the plans acquired vary.

      Also, the slice thickness is inadequate to cover the entire microglia morphology. How do the authors include this limitation of their strategy? Did the authors define a cut-off for incomplete microglia?

      We found that 30 µm sections provide an effective balance, capturing entire or nearly entire microglial cells (consistent with what we observe in vivo) while allowing sufficient antibody penetration to ensure strong signal quality, even at the section's center. In our segmentation process, we excluded microglia located near the section edges (i.e., cells with processes visible on the first or last plane of image acquisition, as well as those close to the field of view’s boundary). Although our analysis pipeline should also function with thicker sections (>30 µm), we confirmed that thinner sections (15 µm or less) are inadequate for detecting morphological differences, as tested initially on the AD model. Segmented, incomplete microglia lack the necessary structural information to accurately reflect morphological differences thus impairing the detection of existing morphological differences.

      c) The manuscript outlines that the authors have used different preprocessing pipelines, which is great for being transparent about this process. Yet, it would be relevant to provide a rationale for the different imaging processing and segmentation pipelines and platform usages (Supplementary Figure 7). For example, it is not clear why the Z maximum projection is performed at the end for the Alzheimer's Disease model, while it's done at the beginning of the others.

      The same holds through for cropping, filter values, etc. Would it be possible to analyze the images with the same pipelines and compare whether a specific pipeline should be preferable to others?

      The pre-processing steps depend on the quality of the images in each dataset. For example, in the AD dataset, images acquired with a wide-field microscope were considerably noisier compared to those obtained via confocal microscopy. In this case, reducing noise plane-by-plane was more effective than applying noise reduction on a Z-projection, as we would typically do for confocal images. Given that accurate segmentation is essential for reliable analysis in MorphoCellSorter, we chose to tailor the segmentation approach for each dataset individually. We recommend future users of MorphoCellSorter take a similar approach. This clarification has been added to the discussion.

      On a note, Matlab is not open-access,

      This is correct. We are currently translating this Matlab script in Python, this will be available soon on Github. https://github.com/Pascuallab/MorphCellSorter.

      This also includes combining the different animals to see which insights could be gained using the proposed pipelines.

      Because of what we have been explaining earlier, having a common segmentation process for very diverse types of acquisitions (magnification, resolution and type of images) is not optimal in terms of segmentation and accuracy in the analysis. Although we could feed MorphoCellSorter with all this data from a unique segmentation pipeline, the results might be very difficult to interprete.

      d) L227: Performing manual thresholding isn't ideal because it implies the preprocessing could be improved. Additionally, it is important to consider that morphology may vary depending on the thresholding parameters. Comparing different acquisitions that have been binarized using different criteria could introduce biases.

      As noted earlier, segmentation is not the main focus of this paper, and we leave it to users to select the segmentation method best suited to their datasets. Although we acknowledge that automated thresholding would be in theory ideal, we were confronted toimage acquisitions that were not uniform, even within the same sample. For instance, in ischemic brain samples, lipofuscin from cell death introduces background noise that can artificially impact threshold levels. We tested global and local algorithms to automatically binarize the cells but these approaches resulted often on imperfect and not optimized segmentation for every cell. In our experience, manually adjusting the threshold provides a more accurate, reliable, and comparable selection of cellular elements, even though it introduces some subjectivity. To ensure consistency in segmentation, we recommend that the same person performs the analysis across all conditions. This clarification has been added to the discussion.

      e) Parameter choices: L375: When using k-means clustering, it is good practice to determine the number of clusters (k) using silhouette or elbow scores. Simply selecting a value of k based on its previous usage in the literature is not rigorous, as the optimal number of clusters depends on the specific data structure. If they are seeking a more objective clustering approach, they could also consider employing other unsupervised techniques, (e.g. HDBSCAN) (L403f).

      We do agree with the referee’s comment but, the purpose of the k-mean we used was just to illustrate the fact that the clusters generated are artificial and do not correspond to the reality of the continuum of microglia morphology. In the course of the study we used the elbow score to determine the k means but this did not work well because no clear elbow was visible in some datasets (probably because of the continuum of microglia morphologies). Anyway, using whatever k value will not change the problem that those clusters are quite artificial and that the boundaries of those clusters are quite arbitrary whatever the way k is determined manually or mathematically.

      L373: A rationale for the choice of the 20 non-dimensional parameters as well as a detailed explanation of their computation such as the skeleton process ratio is missing. Also, how strongly correlated are those parameters, and how might this correlation bias the data outcomes?

      Thank you for raising this point. There is no specific rationale beyond our goal of being as exhaustive as possible, incorporating most of the parameters found in the literature, as well as some additional ones that we believed could provide a more thorough description of microglial morphology.

      Indeed, some of these parameters are correlated. Initially, we considered this might be problematic, but we quickly found that these correlations essentially act as factors that help assign more weight to certain parameters, reflecting their likely greater importance in a given dataset. Rather than being a limitation, the correlated parameters actually enhance the ranking. We tested removing some of these parameters in earlier versions of MorphoCellSorter, and found that doing so reduced the accuracy of the tool.

      Differences between circularity and roundness factors are not coming across and require further clarification.

      These are two distinct ways of characterizing morphological complexity, and we borrowed these parameters and kept the name from the existing literature, not necessarily in the context of microglia. In our case, these parameters are used to describe the overall shape of the cell. The advantage of using different metrics to calculate similar parameters is that, depending on the dataset, one method may be better suited to capture specific morphological features of a given dataset. MorphoCellSorter selects the parameter that best explains the greatest dispersion in the data, allowing for a more accurate characterization of the morphology. In Author response image 1 you will see how circularity and roundness describe differently cells

      Author response image 1.

      Correlation between Circularity and Roundness Factor in the Alzheimer disease dataset. A second order polynomial correlation exists between the two parameters in our dataset. Indeed (1) a single maximum is shared between both parameters. However, Circularity and Roundness Factor are not entirely redundant, as examplified by (2) the possible variety of Roundness Factors for a given Circularity as well as (3) the very different morphology minima of these two parameters.

      One is applied to the soma and the other to the cell, but why is neither circularity nor loudness factor applied to both?

      None of the parameters concern the cell body by itself. The cell body is always relative to another metric(s). Because these parameters and what they represent does not seem to be very clear we have added a graphic representation of the type of measurements and measure they provide in the revised version of the manuscript (Supplemental figure 8).

      f) PCA analysis:

      The authors spend a lot of text to describe the basic principles of PCA. PCA is mathematically well-described and does not require such depth in the description and would be sufficient with references.

      Thank you for this comment indeed the description of PCA may be too exhaustive, we will simplify the text.

      Furthermore, there are the following points that require attention:

      L321: PC1 is the most important part of the data could be an incorrect statement because the highest dispersion could be noise, which would not be the most relevant part of the data. Therefore, the term "important" has to be clarified.

      We are not sure in the case of segmented images the noise would represent most of the data, as by doing segmentation we also remove most of the noise, but maybe the reviewer is concerned about another type of noise? Nonetheless, we thank the reviewer for his comment and we propose the following change, that should solve this potential issue.

      PC<sub>1<.sub> is the direction in which data is most dispersed.”

      L323: As before, it's not given that the first two components hold all the information.

      Thank you for this comment we modified this statement as follows: “The two first components represent most of the information (about 70%), hence we can consider the plan PC<sub>1</sub>, PC<sub>2</sub> as the principal plan reducing the dataset to a two dimensional space”

      L327 and L331 contain mistakes in the nomenclature: Mix up of "wi" should be "wn" because "i" does not refer to anything. The same for "phi i = arctan(yn/wn)" should be "phi n".

      Thanks a lot for these comments. We have made the changes in the text as proposed by the reviewer.

      L348: Spearman's correlation measures monotonic correlation, not linear correlation. Either the authors used Pearson Correlation for linearity or Spearman correlation for monotonic. This needs to be clarified to avoid misunderstandings.

      Sorry for the misunderstanding, we did use Spearman correlation which is monotonic, we thus changed linear by monotonic in the text. Thanks a lot for the careful reading.

      g) If the authors find no morphological alteration, how can they ensure that the algorithm is sensitive enough to detect them? When morphologies are similar, it's harder to spot differences. In cases where morphological differences are more apparent, like stroke, classification is more straightforward.

      We are not entirely sure we fully understand the reviewer's comment. When data are similar or nearly identical, MorphoCellSorter performs comparably to human experts (see Table 1). However, the advantage of using MorphoCellSorter is that it ranks cells do.much faster while achieving accuracy similar to that of human experts AND gives them a value on an axis (andrews score), which a human expert certainly can't. For example, in the case of mouse embryos, MorphoCellSorter’s ranking was as accurate as that made by human experts. Based on this ranking, the distributions were similar, suggesting that the morphologies are generally consistent across samples.

      The algorithm itself does not detect anything—it simply ranks cells according to the provided parameters. Therefore, it is unlikely that sensitivity is an issue; the algorithm ranks the cells based on existing data. The most critical factor in the analysis is the segmentation step, which is not the focus of our paper. However, the more accurate the segmentation, the more distinct the parameters will be if actual differences exist. Thus, sensitivity concerns are more related to the quality of image acquisition or the segmentation process rather than the ranking itself. Once MorphoCellSorter receives the parameters, it ranks the cells accordingly. When cells are very similar, the ranking process becomes more complex, as reflected in the correlation values comparing expert rankings to those from MorphoCellSorter (Table 1).

      Moreover, MorphoCellSorter does not only provide a ranking: the morphological indexes automatically computed offer useful information to compare the cells’ morphology between groups.

      h) Minor aspects:

      % notation requires to include (weight/volume) annotation.

      This has been done in the revised version of the manuscript

      Citation/source of the different mouse lines should be included in the method sections (e.g. L117).

      The reference of the mouse line has been added (RRID:IMSR_JAX:005582) to the revised version of the manuscript.

      L125: The length of the single housing should be specified to ensure no variability in this context.

      The mice were kept 24h00 individually, this is now stated in the text

      L673: Typo to the reference to the figure.

      This has been corrected, thank you for your thoughtful reading.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Methods

      (1) Alzheimer's disease model: was a perfusion performed and then an hour later brains extracted? Please clarify.

      This is indeed what has been done.

      (2) For in vitro microglial studies: was a percoll gradient used for the separation of immune cells? What percentage percoll was used? Was there separation of myelin and associated debris with the percoll centrifugation? Please clarify the protocol as it is not completely clear how these cells were separated from the initial brain lysate suspension. What cell density was plated?

      The protocol has been completed, as followed: “Myelin and debris were then eliminated thanks to a Percoll® PLUS solution (E0414, Sigma-Aldrich) diluted with DPBS10X (14200075, Gibco) and enriched in MgCl<sub>2</sub> and CaCl<sub>2</sub> (for 50 mL of myelin separation buffer: 90 mL of Percoll PLUS, 10 mL of DPBS10X, 90 μL of 1 M CaCl<sub>2</sub> solution, and 50 μL of 1 M MgCl<sub>2</sub> solution).”. Thank you for your feedback.

      (3) How are the microglia "automatically cropped" in FIJI (for the Phox2b mutant)? Is there a function/macro in the program you used? This is very important for the workflow and needs to be clarified. The methods section of this manuscript is a guide for future users of this workflow and should be as descriptive as possible. It would be useful to give detailed information on the manual classification process, perhaps as a supplement. The authors do a nice job pointing out that these older methods are not effective in categorizing microglia that don't necessarily fit into a predefined phenotype.

      The protocol has been completed, as follows “. Briefly, the centroid of each detected object (i.e. microglia), except the ones on the borders, were detected, and a crop of 300x300 pixels around the objects were generated. Then, the pixels belonging to neighboring cells were manually removed on each generated crop.

      (4) Please address the concern that manual tuning and thresholding are required for this method's accuracy. Is this easily reproducible?

      Yes, it is easily reproducible for a given experimenter and is better suited than automatic thresholding. Although segmentation is not the primary focus of this paper, we leave it to users to choose the segmentation method that best fits their datasets.

      To address your question, we acknowledge that automated thresholding would theoretically be ideal. However, we encountered challenges due to non-uniform image acquisitions, even within the same sample. For instance, in ischemic brain samples, lipofuscin resulting from cell death introduced background noise that could artificially influence threshold levels. We tested both global and local algorithms for automatic binarization of cells, but these approaches often produced suboptimal segmentation results for individual cells.

      Based on our experience, manually adjusting the threshold provided more accurate, reliable, and consistent selection of cellular elements, even though it introduces a degree of subjectivity. To maintain consistency, we recommend that the same individual perform the analysis across all conditions.

      This clarification has been incorporated into the discussion as follows: “Although, automated thresholding would be ideal. In our case, image acquisitions were not entirely uniform, even within the same sample. For instance, in ischemic brain samples, lipofuscin from cell death introduces background noise that can artificially impact threshold levels. This effect is observed even when comparing contralateral and ipsilateral sides of the same brain. In our experience, manually adjusting the threshold provides a more accurate, reliable, and comparable selection of cellular elements, even though it introduces some subjectivity. To ensure consistency in segmentation, we recommend that the same person performs the analysis across all conditions. “

      (5) How are the authors performing the PCA---what program (e.g .R)? Again, please be explicit about how these mathematical operations were computed. (lines 302-345).

      The PCA was made in Matlab, the code can be found on Github (https://github.com/Pascuallab/MorphCellSorter), as stated in the discussion.

      Other:

      (1) Can the authors comment on the challenges of the in vitro microglial analyses? The correlation of the experts v. MorphoCellSorter is much less than the fixed tissue. This is not addressed in the manuscript.

      In vitro, microglial cells exhibit a narrower range of morphological diversity compared to ex vivo or in vivo conditions. A higher proportion of cells share similar morphologies or morphologies with comparable complexities, which makes establishing a precise ranking more challenging. Consequently, the rank of many cells could be adjusted without significantly affecting the overall quality of the ranking.

      This explains why the rankings tend to show slightly greater divergence between experts. Interestingly, the ranking generated by MorphoCellSorter, which is objective and not subject to human bias, lies roughly midway between the rankings of the two experts.

      (2) You point out that the MorphoCellSorter may not be suited for embryonic/prenatal microglial analysis.

      This must be a misunderstanding because it is not what we concluded; we found that the ranking was correct but that we could not spot any differences due to transgenic alteration.

      The lack of differences observed in the embryonic microglia (Figure 5) is not necessarily surprising, as embryonic microglia have diverse morphological characteristics--- immature microglia do not possess highly ramified processes until postnatal development [see Hirosawa et al. (2005) https://doi.org/10.1002/jnr.20480 -they use an Iba1-GFP transgenic mouse to visualize prenatal microglia]. Also, see Bennett et al. (2016) [https://doi.org/10.1073/pnas.1525528113] which shows mature microglia not appearing until 14 days postnatal.

      We agree with the reviewer on that point nonetheless MorphoCellSorter provides an information on the fact that the population is homogeneous and that the mutation has no effect on the morphology.

      (3) Although a semantic issue, Figure 1's categorization of microglia shows predefined groups of microglia do not necessarily usefully bin many cells. Is still possible to categorize the microglia without using hotly debated categorization methods? The literature review in the current manuscript correctly points out the spectrum phenomenon of microglial activation states, though some of the suggestions from Paolicelli et al. (2022) are not put into action. The use of "activated" only further perpetuates the oversimplified classification of microglia. Perhaps the authors could consider using the term "reactive", as it is recognized by the Microglial nomenclature paper cited above. Are "amoeboid microglia" not "activated microglia"? "Reactive" is a less loaded term and is a recommended descriptor. Amoeboid microglia are commonly understood to be indicative of a highly proinflammatory environment, though you could potentially use "hyper-reactive" to differentiate them from the slightly ramified "reactive" cells.

      We changed activated microglia to reactive microglia as requested by the reviewer in the text. Thanks a lot for your comment

      (4) The graphs in Figures 3 B-D are visually difficult to interpret. The better color contrast between the MorphoCellSorter/Expert and Expert1/Expert2 would be useful--- perhaps a color for Expert 1 and a different color for Expert 2. Is this the ranking from the same data in Figure 1 (lines 420-421)? It is unclear what the x-axis represents in 3B-D. E-G is much more intuitive.

      We believe the confusion stems more from Figure 1 than Figure 3, as both figures use similar representations for entirely different analyses (clustering vs. ranking). To address this, we have provided an updated version of Figure 1 to help clarify this distinction and avoid any potential misinterpretation.

      Regarding Figure 3B-D, we do not fully see the need for changing the colors. These panels are histograms that display the distribution of rank differences either between experts and MorphoCellSorter or between the two experts. Assigning specific colors to the experts or MorphoCellSorter would be challenging, as the histograms represent comparative distributions involving both an expert and MorphoCellSorter or the ranking differences between the two experts.

      The same reasoning applies to Figures 3E-G. In these scatter plots, each point is defined by an ordinate (ranking value for one expert) and an abscissa (ranking value for either the other expert or MorphoCellSorter). Therefore, it would not be straightforward or meaningful to assign distinct colors to these elements within this context.

      (5) Line 217: use the term "imaged" rather than "generated" ... or "images were generated of clusters of microglia located .... using MICROSOPE and Zen software." You aren't generating microglia, rather, you are generating images.

      Thanks a lot for raising this problem, we changed the sentence as followed: “For the AD model, crops of individual microglial cells located in the secondary visual cortex were extracted from images using the Zen software (v3.5, Zeiss) and exported to the Tif image format.

      (6) Elaborate on how an "inversion operation" was applied to Lacunarity, roundness factor, convex hull radii ratio, processes cell areas ratio, and skeleton processes. (Lines 299-300) Furthermore, a paragraph separation would be useful if the "inversion operation" is not what is described in the text immediately after this description.

      Indeed, we are sorry for this lack of explanation. Some morphological indexes rank cells from the least to the most ramified, while others rank them in the opposite order. By inverting certain parameters, we can standardize the ranking direction across all parameters, simplifying data interpretation. This clarification has been added to the revised manuscript as follows:

      “Lacunarity, roundness factor, convex hull radii ratio, processes cell areas ratio and skeleton processes ratio were subjected to an inversion operation in order to homogenize the parameters before conducting the PCA: indeed, some parameters rank cells from the least to the most ramified, while others rank them in the opposite order. By inverting certain parameters, we can standardize the ranking direction across all parameters, thus simplifying data interpretation.”

      (7) Line 560: "measureclarke" seems to be an error associated with the reference. Please correct.

      Thanks a lot, this has been corrected

      (8) Discussion: compare MorphoCellSorter to the MIC-MAC program used by Salamanca et al. (2019). They use a similar approach, albeit not Andrew's plot.

      We have added the Salamanca reference

      Reviewer #2 (Recommendations for the authors):

      While it's not expected that the authors address the significance of the morphology in relation to function here, they could help highlight the issue and produce data that would enhance the paper's significance. Therefore, I recommend a small-scale and straightforward study where the authors couple their analysis with a marker (e.g. Lysotracker or Mitotracker) to produce data that link their morphometric analysis to more functional readouts. Furthermore, I encourage the authors to elaborate on the practical applications of these morphometric tools and the implications of their measurements, as this would provide context for their work, which, as it stands, feels like just another tool.

      We would like to thank the reviewer for their thoughtful comment and suggestion. Indeed, MorphoCellSorter is simply another tool, but one that offers a more convenient and efficient approach, producing a variety of results tailored to specific research needs. We strongly believe that MorphoCellSorter should be used in conjunction with other tools, depending on the specific research question.

      In our view, MorphoCellSorter is particularly well-suited for researchers who need a quick and efficient way to determine whether their treatment, gene invalidation, or other experimental conditions affect microglial morphology. In this context, MorphoCellSorter is fast, user-friendly, and highly effective. However, for those who aim to uncover detailed differences in cell morphology, other tools requiring more time-intensive, full reconstructions of the cells would be more appropriate.

      Providing additional data on the relationship between cellular function and morphology could certainly pave the way for new questions and more robust evidence. For instance, combining single-cell transcriptomics with morphological analysis would be an excellent approach to exploring the relationship between function and morphology. However, this would involve significant time, expense, and effort, and it represents a different line of inquiry altogether.

      While it would be ideal to clearly demonstrate the link between morphology and function, we are concerned that pursuing such a goal would considerably delay the implementation and adoption of our tool, potentially raising additional questions beyond the scope of this study.!

      Minor comments:

      (1) Can MorphCellSorter be adapted for use with other cell types (e.g., astrocytes)?

      Yes it could, we have made some pretty conclusive analysis on astrocytes but some parameters have to be adapted before being released.

      (2) What modifications would be necessary? If it is not applicable, would a name that includes "Microglia" be more descriptive?

      Modification would be quite minor, it is mainly the parameters being considered that would change, this is the reason why we will keep the MorphoCellSorter name. Thank you for the suggestion!

      (3) A common challenge with such tools is the technical expertise required to use them. Could a user-friendly interface be developed to better fulfill its intended purpose and benefit the community?

      This is a good point thank you, and the answer is yes, we will translate our Matlab code to Python to open it to a wider audience and we will certainly work on a friendly user interface!

      (4) Given that this tool relies on imaging, can users trace a cell (or group of cells) back to the original image?

      Yes, it is possible if each crop is annotated with the spatial coordinates during the segmentation step. It is not yet implemented in the actual version of the software but mainly depend on the way segmentation is performed, which is not the topic of the paper.

      (5)  Line 36: The "biologically relevant" statement is central and needs to be expanded.

      This is not easy as it is the abstract with a word limit. What we mean by this sentence is that when classifying cells we force them by mathematical tools to enter in a group of cells based on metrics that have not necessarily a biological meaning. We suggest the following modification “However, this classification may lack biological relevance, as microglial morphologies represent a continuum rather than distinct, separate groups, and do not correspond to mathematically defined, clusters irrelevant of microglial cells function.”

      (6) Line 49-50: Provide reference and elaborate. For example, does this apply during early life?

      We have slightly changed the sentence and added a reference.

      (7) Line 69: Provide reference.

      The reference, Hubert et al 2021 has been added

      (8) Lines 78-88: A table summarizing other efforts in morphometric characterization of microglia would be helpful in distinguishing your work from others.

      This has already been done in some review articles; we thus added the references to address readers to these reviews. Here is the revised version of the sentence: “ To date, the literature contains a wide variety of criteria to quantitatively describe microglial morphology, ranging from descriptive measures such as cell body surface area, perimeter, and process length to indices calculating different parameters such as circularity, roundness, branching index, and clustering (Adaikkan et al., 2019; Heindl et al., 2018; Kongsui, Beynon, Johnson, & Walker, 2014; Morrison et al., 2017; Young & Morrison, 2018)”

      (9) Lines 130, 145: Please provide complete genotype information and the sources of the animals used.

      It has been done

      (10) Materials and Methods:

      (1) Standardize the presentation of products (e.g., using # consistently).

      It has been done

      (2) Provide versions of software used.

      We have modified accordingly

      (3) Lines 372-373: A table listing the 20 parameters with brief explanations (as partially done in Materials and Methods) would greatly improve readability.

      This is done in supp figure 8

      (4) Since nomenclature is a critical issue in the literature, you used specific definitions (lines 376-383). However, please indicate (with a reference) why you use the term "activated," as it implies that the others are non-activated. Alternatively, define "activated" cluster differently.

      We change activated microglia to reactive microglia as requested by the reviewer #1.

      (4) Figure 1: In my opinion placing this figure as the first main figure is problematic as it confuses the message of the paper. Since the authors are introducing a new approach for morphological characterization in Figure 2, I recommend the latter for the sake of readability and clarity should be the first main image, while Figure 1 can move the supplements.

      We do agree with the reviewer, we thus changed figure one as explained earlier to reviewer 1. Nonetheless because it is an important step of our reflection process we believe it can stay as a figure. We hope the change made in figure one clarifies the message of the paper.

      (5) Figure 1: Please indicate on the figure the marker for the analysis.

      Figure 2 has been changed

      (6) No funding agencies are communicated.

      This has been corrected

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1:

      (1) Line numbers are missing.

      Added

      (2) VR classroom. Was this a completely custom design based on Unity, or was this developed on top of some pre-existing code? Many aspects of the VR classroom scenario are only introduced (e.g., how was the lip-speech synchronisation done exactly?). Additional detail is required. Also, is or will the experiment code be shared publicly with appropriate documentation? It would also be useful to share brief example video-clips.

      We have added details about the VR classroom programming to the methods section (p. 6-7), and we have now included a video-example as supplementary material.

      “Development and programming of the VR classroom were done primarily in-house, using assets (avatars and environment) were sourced from pre-existing databases. The classroom environment was adapted from assets provided by Tirgames on TurboSquid (https://www.turbosquid.com/Search/Artists/Tirgames) and modified to meet the experimental needs. The avatars and their basic animations were sourced from the Mixamo library, which at the time of development supported legacy avatars with facial blendshapes (this functionality is no longer available in current versions of Mixamo). A brief video example of the VR classroom is available at: https://osf.io/rf6t8.

      “To achieve realistic lip-speech synchronization, the teacher’s lip movements were controlled by the temporal envelope of the speech, adjusting both timing and mouth size dynamically. His body motions were animated using natural talking gestures.”

      While we do intent to make the dataset publicly available for other researchers, at this point we are not making the code for the VR classroom public. However, we are happy to share it on an individual-basis with other researchers who might find it useful for their own research in the future.

      (3) "normalized to the same loudness level using the software Audacity". Please specify the Audacity function and parameters.

      We have added these details (p.7)

      “All sound-events were normalized to the same loudness level using the Normalize function in the audio-editing software Audacity (theaudacityteam.org, ver 3.4), with the peak amplitude parameter set to -5 dB, and trimmed to a duration of 300 milliseconds.“

      (4) Did the authors check if the participants were already familiar with some of the content in the mini-lectures?

      This is a good point. Since the mini-lectures spanned many different topics, we did not pre-screen participants for familiarity with the topics, and it is possible that some of the participants had some pre-existing knowledge.

      In hindsight, it would have been good to have added some reflective questions regarding participants prior knowledge as well as other questions such as level of interest in the topic and/or how well they understood the content. These are elements that we hope to include in future versions of the VR classroom.

      (5) "Independent Component Analysis (ICA) was then used to further remove components associated with horizontal or vertical eye movements and heartbeats". Please specify how this selection was carried out.

      Selection of ICA components was done manually based on visual inspection of their time-course patterns and topographical distributions, to identify components characteristic of blinks, horizontal eye-movements and heartbeats). Examples of these distinct components are provided in Author response image 1 below. These is now specified in the methods section.

      Author response image 1.

      (6) "EEG data was further bandpass filtered between 0.8 and 20 Hz". If I understand correctly, the data was filtered a second time. If that's the case, please do not do that, as that will introduce additional and unnecessary filtering artifacts. Instead, the authors should replace the original filter with this one (so, filtering the data only once). Please see de Cheveigne and Nelkn, Neuron, 2019 for an explanation. Also, please provide an explanation of the rationale for further restricting the cut-off bands in the methods section. Finally, further details on the filters should be included (filter type and order, for example).

      Yes, the data was indeed filtered twice. The first filter is done as part of the preprocessing procedure, in order to remove extremely high- and low- frequency noise but retain most activity within the range of “neural” activity. This broad range is mostly important for the ICA procedure, so as to adequately separate between ocular and neural contribution to the recorded signal.

      However, since both the speech tracking responses and ERPs are typically less broadband and are comprised mostly of lower frequencies (e.g., those that make up the speech-envelope), a second narrower filter was applied to improve TRF model-fit and make ERPs more interpretable.

      In both cases we used a fourth order zero-phase Butterworth IIR filter with 1-seconds of padding, as implemented in the Fieldtrip toolbox. We have added these details to the manuscript.

      (7) "(~ 5 minutes of data in total), which is insufficient for deriving reliable TRFs". That is a bit pessimistic and vague. What does "reliable" mean? I would tend to agree when talking about individual subject TRFs, which 5 min per participant can be enough at the group level. Also, this depends on the specific speech material. If the features are univariate or multivariate. Etc. Please narrow down and clarify this statement.

      We determined that the data in the Quiet condition (~5 min) was insufficient for performing reliable TRF analysis, by assessing whether its predictive-power was significantly better than chance. As shown in Author response image 2 below, the predictive power achieved using this data was not higher than values obtained in permuted data (p = 0.43). Therefore, we did not feel that it was appropriate to include TRF analysis of the Quiet condition in this manuscript. We have now clarified this in the manuscript (p. 10)

      Author response image 2.

      (8) "Based on previous research in by our group (Kaufman & Zion Golumbic 2023), we chose to use a constant regularization ridge parameter (λ= 100) for all participants and conditions". This is an insufficient explanation. I understand that there is a previous paper involved. However, such an unconventional choice that goes against the original definition and typical use of these methods should be clearly reported in this manuscript.

      We apologize for not clarifying this point sufficiently, and have added an explanation of this methodological choice (p.11):

      “The mTRF toolbox uses a ridge-regression approach for L2 regularization of the model to ensure better generalization to new data. We tested a range of ridge parameter values (λ's) and used a leave-one-out cross-validation procedure to assess the model’s predictive power, whereby in each iteration, all but one trials are used to train the model, and it is then applied to the left-out trial. The predictive power of the model (for each λ) is estimated as the Pearson’s correlation between the predicted neural responses and the actual neural responses, separately for each electrode, averages across all iterations. We report results of the model with the λ the yielded the highest predictive power at the group-level (rather than selecting a different λ for each participant which can lead to incomparable TRF models across participants; see discussion in Kaufman & Zion Golumbic 2023).”

      Assuming that the explanation will be sufficiently convincing, which is not a trivial case to make, the next issue that I will bring up is that the lambda value depends on the magnitude of input and output vectors. While the input features are normalised, I don't see that described for the EEG signals. So I assume they are not normalized. In that case, the lambda would have at least to be adapted between subjects to account for their different magnitude.

      We apologize for omitting this detail – yes, the EEG signals were normalized prior to conducting the TRF analysis. We have updated the methods section to explicitly state this pre-processing step (p.10).

      Another clarification, is that value (i.e., 100) would not be comparable either across subjects or across studies. But maybe the authors have a simple explanation for that choice? (note that this point is very important as this could lead others to use TRF methods in an inappropriate way - but I understand that the authors might have specific reasons to do so here). Note that, if the issue is finding a reliable lambda per subject, a more reasonable choice would be to use a fixed lambda selected on a generic (i.e., group-level) model. However selecting an arbitrary lambda could be problematic (e.g., would the results replicate with another lambda; and similarly, what if a different EEG system was used, with different overall magnitude, hence the different impact of the regularisation).

      We fully agree that selecting an arbitrary lambda is problematic (esp across studies). As clarified above, the group-level lambda chosen here for the encoding more was data-driven, optimized based on group-level predictive power.

      (9) "L2 regularization of the model, to reduce its complexity". Could the authors explain what "reduce its complexity" refers to?

      Our intension here was to state that the L2 regularization constrains the model’s weights so that it can better generalize between to left-out data. However, for clarity we have now removed this statement.

      (10) The same lambda value was used for the decoding model. From personal experience, that is very unlikely to be the optimal selection. Decoding models typically require a different (usually larger) lambda than forward models, which can be due to different reasons (different SNR of "input" of the model and, crucially, very different dimensionality).

      We agree with the reviewer that treatment of regularization parameters might not be identical for encoding and decoding models. Our initial search of lambda parameters was limited to λ= 0.01 - 100, with λ= 100 showing the best reconstruction correlations. However, following the reviewer’s suggestion we have now broadened the range and found that, in fact reconstruction correlations are further improved and the best lambda is λ= 1000 (see Author response image 3 below, left panel). Importantly, the difference in decoding reconstruction correlations between the groups is maintained regardless of the choice of lambda (although the effect-size varies; see Author response image 3, right panel). We have now updated the text to reflect results of the model with λ= 1000.

      Author response image 3.

      (11) Skin conductance analysis. Additional details are required. For example, how was the linear interpolation done exactly? The raw data was downsampled, sure. But was an anti-aliasing filter applied? What filter exactly? What implementation for the CDA was run exactly?

      We have added the following details to the methods section (p. 14):

      “The Skin Conductance (SC) signal was analyzed using the Ledalab MATLAB toolbox (version 3.4.9; Benedek and Kaernbach, 2010; http://www.ledalab.de/) and custom-written scripts. The raw data was downsampled to 16Hz using FieldTrip's ft_resampledata function, which applies a built-in anti-aliasing low-pass filter to prevent aliasing artifacts. Data were inspected manually for any noticeable artifacts (large ‘jumps’), and if present were corrected using linear interpolation in Ledalab. A continuous decomposition analysis (CDA) was employed to separate the tonic and phasic SC responses for each participant. The CDA was conducted using the 'sdeco' mode (signal decomposition), which iteratively optimizes the separation of tonic and phasic components using the default regularization settings.”

      (12) "N1- and P2 peaks of the speech tracking response". Have the authors considered using the N1-P2 complex rather than the two peaks separately? Just a thought.

      This is an interesting suggestion, and we know that this has been used sometimes in more traditional ERP literature. In this case, since neither peak was modulated across groups, we did not think this would yield different results. However, it is a good point to keep in mind for future work.

      (13) Figure 4B. The ticks are missing. From what I can see (but it's hard without the ticks), the N1 seems later than in other speech-EEG tracking experiments (where is closer to ~80ms). Could the authors comment on that? Or maybe this looks similar to some of the authors' previous work?

      We apologize for this and have added ticks to the figure.

      In terms of time-course, a N1 peak at around 100ms is compatible with many of our previous studies, as well as those from other groups.

      (14) Figure 4C. Strange thin vertical grey bar to remove.

      Fixed.

      (15) Figure 4B: What about the topographies for the TRF weights? Could the authors show that for the main components?

      Yes. The topographies of the main TRF components are similar to those of the predictive power and are compatible with auditory responses. We have added them to Figure 4B.

      (16) Figure 4B: I just noticed that this is a grand average TRF. That is ok (but not ideal) only because the referencing is to the mastoids. The more appropriate way of doing this is to look at the GFP, instead, which estimates the presence of dipoles. And then look at topographies of the components. Averaging across channels makes the plotted TRF weaker and noisier. I suggest adding the GFP to the plot. Also, the colour scale in Figure 4A is deceiving, as blue is usually used for +/- in plots of the weights. While that is a heatmap, where using a single colour or even yellow to red would be less deceiving at first look. Only cosmetics, indeed. The result is interesting nonetheless!

      We apologize for this, and agree with the reviewer that it is better not to average across EEG channels. In the revised Figure, we now show the TRFs based on the average of electrodes FC1, FC2, and FCz, which exhibited the strongest activity for the two main components.

      Following the previous comment, we have also included the topographical representation of the TRF main components, to give readers a whole-head perspective of the TRF.

      We have also fixed the color-scales.

      We are glad that the reviewer finds this result interesting!

      (17) Figure 4C. This looks like a missed opportunity. That metric shows a significant difference overall. But is that underpinned but a generally lower envelope reconstruction correlation, or by a larger deviation in those correlations (so, that metric is as for the control in some moments, but it drops more frequently due to distractibility)?

      We understand the reviewer’s point here, and ideally would like to be able to address this in a more fine-grained analysis, for example on a trial-by-trial basis. However, the design of the current experiment was not optimized for this, in terms of (for example) number of trials, the distribution of sound-events and behavioral outcomes. We hope to be able to address this issue in our future research.

      (18) I am not a fan of the term "accuracy" for indicating envelope reconstruction correlations. Accuracy is a term typically associated with classification. Regression models are typically measured through errors, loss, and sometimes correlations. 'Accuracy' is inaccurate (no joke intended).

      We accept this comment and now used the term “reconstruction correlation”.

      (19) Discussion. "The most robust finding in". I suggest using more precise terminology. For example, "largest effect-size".

      We agree and have changed the terminology (p. 31).

      (20) "individuals who exhibited higher alpha-power [...]". I probably missed this. But could the authors clarify this result? From what I can see, alpha did not show an effect on the group. Is this referring to Table 2? Could the authors elaborate on that? How does that reconcile with the non-significant effect of the group? In that same sentence, do you mean "and were more likely"? If that's the case, and they were more likely to report attentional difficulties, how is it that there is no group-effect when studying alpha?

      Yes, this sentence refers to the linear regression models described in Figure 10 and in Table 2. As the reviewer correctly points out, this is one place where there is a discrepancy between the results of the between-group analysis (ADHD diagnosis yes/no) and the regression analysis, which treats ADHD symptoms as a continuum, across both groups. The same is true for the gaze-shift data, which also did not show a significance between-group effect but was identified in the regression analysis as contributing to explaining the variance in ADHD symptoms.

      We discuss this point on pages 30-31, noting that “although the two groups are clearly separable from each other, they are far from uniform in the severity of symptoms experienced”, which motivated the inclusion of both analyses in this paper.

      At the bottom of p. 31 we specifically address the similarities and differences between the between-group and regression-based results. In our opinion, this pattern emphasizes that while neither approach is ‘conclusive’, looking at the data through both lenses contributes to an overall better understanding of the contributing factors, as well as highlighting that “no single neurophysiological measure alone is sufficient for explaining differences between the individuals – whether through the lens of clinical diagnosis or through report of symptoms”.

      (21) "why in the latter case the neural speech-decoding accuracy did not contribute to explaining ASRS scores [...]". My previous point 1 on separating overall envelope decoding from its deviation could help there. The envelope decoding correlation might go up and down due to SNR, while you might be more interested in the dynamics over time (i.e., looking at the reconstructions over time).

      Again, we appreciate this comment, but believe that this additional analysis is outside the scope of what would be reliably-feasible with the current dataset. However, since the data will be made publicly available, perhaps other researchers will have better ideas as to how to do this.

      (22) Data and code sharing should be discussed. Also, specific links/names and version numbers should be included for the various libraries used.

      We are currently working on organizing the data to make it publicly available on the Open Science Project.

      We have updated links and version numbers for the various toolboxes/software used, throughout the manuscript.

      Reviewer #2:

      (1) While it is highly appreciated to study selective attention in a naturalistic context, the readers would expect to see whether there are any potential similarities or differences in the cognitive and neural mechanisms between contexts. Whether the classic findings about selective attention would be challenged, rebutted, or confirmed? Whether we should expect any novel findings in such a novel context? Moreover, there are some studies on selective attention in the naturalistic context though not in the classroom, it would be better to formulate specific hypotheses based on previous findings both in the strictly controlled and naturalistic contexts.

      Yes, we fully agree that comparing results across different contexts would be extremely beneficial and important.

      The current paper serves as an important proof-first-concept demonstrating the plausibility and scientific potential of using combined EEG-VR-eyetracking to study neurophysiological aspects of attention and distractibility, but is also the basis for formulating specific hypothesis that will be tested in follow-up studies.

      If fact, a follow up study is already ongoing in our lab, where we are looking into this point, by testing users in different VR scenarios (e.g., classroom, café, office etc.), and assessing whether similar neurophysiological patterns are observed across contexts and to what degree they are replicable within and across individuals. We hope to share these data with the community in the near future.

      (2) Previous studies suggest handedness and hemispheric dominance might impact the processing of information in each hemisphere. Whether these issues have been taken into consideration and appropriately addressed?

      This is an interesting point. In this study we did not specifically control for handedness/hemispheric dominance, since most of the neurophysiological measured used here are sensory/auditory in their nature, and therefore potentially invariant to handedness. Moreover, the EEG signal is typically not very sensitive to hemispheric dominance, at least for the measures used here. However, this might be something to consider more explicitly in future studies. Nonetheless, we have added handedness information to the Methods section (p. 5): “46 right-handed, 3 left-handed”

      (3) It would be interesting to know how students felt about the Virtual Classroom context, whether it is indeed close to the real classroom or to some extent different.

      Yes, we agree. Obviously, the VR classroom differs in many ways from a real classroom, in terms of the perceptual experience, social aspects and interactive possibilities. We did ask participants about their VR experience after the experiment, and most reported feeling highly immersed in the VR environment and engaged in the task, with a strong sense of presence in the virtual-classroom.

      We note that, in parallel to the VR studies in our lab, we are also conducting experiments in real classrooms, and we hope that the cross-study comparison will be able to shed more light on these similarities/differences.

      (4) One intriguing issue is whether neural tracking of the teacher's speech can index students' attention, as the tracking of speech may be relevant to various factors such as sound processing without semantic access.

      Another excellent point. While separating the ‘acoustic’ and ‘semantic’ contributions to the speech tracking response is non-trivial, we are currently working on methodological approaches to do this (again, in future studies) following, for example, the hierarchical TRF approach used by Brodbeck et al. and others.

      (5) There are many results associated with various metrics, and many results did not show a significant difference between the ADHD group and the control group. It is difficult to find the crucial information that supports the conclusion. I suggest the authors reorganize the results section and report the significant results first, and to which comparison(s) the readers should pay attention.

      We apologize if the organization of the results section was difficult to follow. This is indeed a challenge when collecting so many different neurophysiological metrics.

      To facilitate this, we have now added a paragraph at the beginning of the result section, clarifying its structure (p.16):

      The current dataset is extremely rich, consisting of many different behavioral, neural and physiological responses. In reporting these results, we have separated between metrics that are associated with paying attention to the teacher (behavioral performance, neural tracking of the teacher’s speech, and looking at the teacher), those capturing responses to the irrelevant sound-events (ERPs and event-related changes in SC and gaze); as well as more global neurophysiological measures that may be associated with the listeners’ overall ‘state’ of attention or arousal (alpha- and beta-power and tonic SC).

      Moreover, within each section we have ordered the analysis such that the ones with significant effects are first. We hope that this contributes to the clarity of the results section.

      (6) The difference between artificial and non-verbal humans should be introduced earlier in the introduction and let the readers know what should be expected and why.

      We have added this to the Introduction (p. 4)

      (7) It would be better to discuss the results against a theoretical background rather than majorly focusing on technical aspects.

      We appreciate this comment. In our opinion, the discussion does contain a substantial theoretical component, both regarding theories of attention and attention-deficits, and also regarding their potential neural correlates. However, we agree that there is always room for more in depth discussion.

      Reviewer #3:

      Major:

      (1) While the study introduced a well-designed experiment with comprehensive physiological measures and thorough analyses, the key insights derived from the experiment are unclear. For example, does the high ecological validity provide a more sensitive biomarker or a new physiological measure of attention deficit compared to previous studies? Or does the study shed light on new mechanisms of attention deficit, such as the simultaneous presence of inattention and distraction (as mentioned in the Conclusion)? The authors should clearly articulate their contributions.

      Thanks for this comment.

      We would not say that this paper is able to provide a ‘more sensitive biomarker’ or a ‘new physiological measure of attention’ – in order to make those type of grand statements we would need to have much more converging evidence from multiple studies and using both replication and generalization approaches.

      Rather, from our perspective, the key contribution of this work is in broadening the scope of research regarding the neurophysiological mechanisms involved in attention and distraction.

      Specifically, this work:

      (1) Offers a significant methodological advancement of the field – demonstrating the plausibility and scientific potential of using combined EEG-VR-eyetracking to study neurophysiological aspects of attention and distractibility in contexts that ‘mimic’ real-life situations (rather than highly controlled computerized tasks).

      (2) Provides a solid basis formulating specific mechanistic hypothesis regarding the neurophysiological metrics associated with attention and distraction, the interplay between them, and their potential relation to ADHD-symptoms. Rather than being an end-point, we see these results as a start-point for future studies that emphasize ecological validity and generalizability across contexts, that will hopefully lead to improved mechanisms understanding and potential biomarkers of real-life attentional capabilities (see also response to Rev #2 comment #1 above).

      (3) Highlights differences and similarities between the current results and those obtained in traditional ‘highly controlled’ studies of attention (e.g., in the way ERPs to sound-events differ between ADHD and controls; variability in gaze and alpha-power; and more broadly about whether ADHD symptoms do or don’t map onto specific neurophysiological metrics). Again, we do not claim to give a definitive ’answer’ to these issues, but rather to provide a new type of data that can expands the conversation and address the ecological validity gap in attention research.

      (2) Based on the multivariate analyses, ASRS scores correlate better with the physiological measures rather than the binary deficit category. It may be worthwhile to report the correlation between physiological measures and ASRS scores for the univariate analyses. Additionally, the correlation between physiological measures and behavioral accuracy might also be interesting.

      Thanks for this. The beta-values reported for the regression analysis reflect the correlations between the different physiological measures and the ASRS scores (p. 30). From a statistical perspective, it is better to report these values rather than the univariate correlation-coefficients, since these represent the ‘unique’ relationship with each factor, after controlling for all the others.

      The univariate correlations between the physiological measures themselves, as well as with behavioral accuracy, are reported in Figure 10

      (3) For the TRF and decoding analysis, the authors used a constant regularization parameter per a previous study. However, the optimal regularization parameter is data-dependent and may differ between encoding and decoding analyses. Furthermore, the authors did not conduct TRF analysis for the quiet condition due to the limited ~5 minutes of data. However, such a data duration is generally sufficient to derive a stable TRF with significant predictive power (Mesik and Wojtczak, 2023).

      The reviewer raises two important points, also raised by Rev #1 (see above).

      Regarding the choice of regularization parameters, we have now clarified that although we used a common lambda value for all participants, it was selected in a data-driven manner, so as to achieve an optimal predictive power at the group-level.

      See revised methods section:

      “The mTRF toolbox uses a ridge-regression approach for L2 regularization of the model to ensure better generalization to new data. We tested a range of ridge parameter values (λ's) and used a leave-one-out cross-validation procedure to assess the model’s predictive power, whereby in each iteration, all but one trials are used to train the model, and it is then applied to the left-out trial. The predictive power of the model (for each λ) is estimated as the Pearson’s correlation between the predicted neural responses and the actual neural responses, separately for each electrode, averages across all iterations. We report results of the model with the λ the yielded the highest predictive power at the group-level (rather than selecting a different λ for each participant which can lead to incomparable TRF models across participants; see discussion in Kaufman & Zion Golumbic 2023).”

      Regarding whether data was sufficient in the Quiet condition for performing TRF analysis – we are aware of the important work by Mesik & Wojtczak, and had initially used this estimate when designing our study. However, when assessing the predictive-power of the TRF model trained on data from the Quiet condition, we found that it was not significantly better than chance (see Author response image 2, ‘real’ predictive power vs. permuted data). Therefore, we ultimately did not feel that it was appropriate to include TRF analysis of the Quiet condition in this manuscript. We have now clarified this in the manuscript (p. 10)

      (4) As shown in Figure 4, for ADHD participants, decoding accuracy appears to be lower than the predictive power of TRF. This result is surprising because more data (i.e., data from all electrodes) is used in the decoding analysis.

      This is an interesting point – however, in our experience it is not necessarily the case that decoding accuracy (i.e., reconstruction correlation with the stimulus) is higher than encoding predictive-power. While both metrics use Pearson’s’ correlations, they quantify the similarity between two different types of signals (the EEG and the speech-envelope). Although the decoding procedure does use data from all electrodes, many of them don’t actually contain meaningful information regarding the stimulus, and thus could just as well hinder the overall performance of the decoding.

      (5) Beyond the current analyses, the authors may consider analyzing inter-subject correlation, especially for the gaze signal analysis. Given that the area of interest during the lesson changes dynamically, the teacher might not always be the focal point. Therefore, the correlation of gaze locations between subjects might be better than the percentage of gaze duration on the teacher.

      Thanks for this suggestion. We have tried to look into this, however working with eye-gaze in a 3-D space is extremely complex and we are not able to calculate reliable correlations between participants.

      (6) Some preprocessing steps relied on visual and subjective inspection. For instance, " Visual inspection was performed to identify and remove gross artifacts (excluding eye movements) " (P9); " The raw data was downsampled to 16Hz and inspected for any noticeable artifacts " (P13). Please consider using objective processes or provide standards for subjective inspections.

      We are aware of the possible differences between objective methods of artifact rejection vs. use of manual visual inspection, however we still prefer the manual (subjective) approach. As noted, in this case only very large artifacts were removed, exceeding ~ 4 SD of the amplitude variability, so as to preserve as many full-length trials as possible.

      (7) Numerous significance testing methods were employed in the manuscript. While I appreciate the detailed information provided, describing these methods in a separate section within the Methods would be more general and clearer. Additionally, the authors may consider using a linear mixed-effects model, which is more widely adopted in current neuroscience studies and can account for random subject effects.

      Indeed, there are many statistical tests in the paper, given the diverse types of neurophysiological data collected here. We actually thought that describing the statistics per method rather than in a separate “general” section would be easier to follow, but we understand that readers might diverge in their preferences.

      Regarding the use of mixed-effect models – this is indeed a great approach. However, it requires deriving reliable metrics on a per-trial basis, and while this might be plausible for some of our metrics, the EEG and GSR metrics are less reliable at this level. This is why we ultimately chose to aggregate across trials and use a regular regression model rather than mixed-effects.

      (8) Some participant information is missing, such as their academic majors. Given that only two lesson topics were used, the participants' majors may be a relevant factor.

      To clarify – the mini-lectures presented here actually covered a large variety of topics, broadly falling within the domains of history, science and social-science and technology. Regarding participants’ academic majors, these were relatively diverse, as can be seen in Author response table 1 and Author response image 4.

      Author response table 1.

      Author response image 4.

      (9) Did the multiple regression model include cross-validation? Please provide details regarding this.

      Yes, we used a leave-one-out cross validation procedure. We have now clarified this in the methods section which now reads:

      “The mTRF toolbox uses a ridge-regression approach for L2 regularization of the model to ensure better generalization to new data. We tested a range of ridge parameter values (λ's) and used a leave-one-out cross-validation procedure to assess the model’s predictive power, whereby in each iteration, all but one trials are used to train the model, and it is then applied to the left-out trial. The predictive power of the model (for each λ) is estimated as the Pearson’s correlation between the predicted neural responses and the actual neural responses, separately for each electrode, averages across all iterations. We report results of the model with the λ the yielded the highest predictive power at the group-level (rather than selecting a different λ for each participant which can lead to incomparable TRF models across participants; see discussion in Kaufman & Zion Golumbic 2023).”

      Minor:

      (10) Typographical errors: P5, "forty-nine 49 participants"; P21, "$ref"; P26, "Table X"; P4, please provide the full name for "SC" when first mentioned.

      Thanks! corrected

    1. Author Response

      The following is the authors’ response to the current reviews.

      Reviewer #2 (Recommendations For The Authors):

      We sincerely appreciate the time and efforts of the Reviewer.

      In light of your data showing that the IgG response is similar with and without CIN, it would be good to drop "and induce abroad, vaccination-like anti-tumor IgG response". This suggests a direct connection between CIN and the IgG response.In my opinion, the shorter title is equally strong and more correct.

      We edited this phrase in the originally submitted title for accuracy:

      Chromosomal instability induced in cancer can enhance macrophage-initiated immune responses that include anti-tumor IgG

      I agree that inducing CIN through other means can be left for a different study but in that case the abstract should moredirectly mention MSP1 inhibition since that is how CIN is always induced. Perhaps line 18: CIN is induced by MSP-1inhibition in poorly immunogenic....

      Done as requested:

      “…Here, CIN is induced in poorly immunogenic B16F10 mouse melanoma cells using spindle assembly checkpoint MPS1 inhibitors…”


      The following is the authors’ response to the original reviews.

      eLife assessment

      This study highlights a valuable finding that chromosomal instability can change immunes responses, in particular macrophages behaviours. The convincing results showing that the use of CD47 targeting and anti-Tyrp1 IgG can overcome changes in immune landscape in tumors and prolong survival of tumor-bearing mice. These findings reveal a new exciting dimension on how chromosomal instability can influence immune responses against tumor.

      We thank the Editors for their enthusiasm and appreciation for this work. We also want to highlight our thanks for their careful reading, support, and patience while handling this manuscript. While this work provides useful insight into potential therapeutic implications of chromosomal instability in the macrophage immunotherapy field, we also hope it elucidates some novel basic science to further explore how chromosomal instability has such interesting effects on the immune system.

      Public Reviews:

      Reviewer #1 (Public Review):

      The manuscript by Hayes et al. explored the potential of combining chromosomal instability with macrophage phagocytosis to enhance tumor clearance of B16-F10 melanoma. However, the manuscript suffers from substandard experimental design, some contradictory conclusions, and a lack of viable therapeutic effects.

      The authors suggest that early-stage chromosomal instability (CIN) is a vulnerability for tumorigenesis, CD47-SIRPa interactions prevent effective phagocytosis, and opsonization combined with inhibition of the CD47-SIRPa axis can amplify tumor clearance. While these interactions are important, the experimental methodology used to address them is lacking.

      Reviewer #1 (Recommendations For The Authors):

      First, early stages of the tumor are essentially being defined as before implantation. In all cases, the tumor cells were pre-treated with MPS1i or had a genetic knockout of CD47. This makes it difficult to see how this would translate clinically.

      We greatly appreciate the Reviewer’s interest in the topic and its potential, but our manuscript makes no claims of immediate clinical translation. Chromosomal instability (CIN) studies have to date not yet discovered or described whether and how CIN can affect macrophage function. To our knowledge, this is the first study to begin such characterizations with various MPS1i drugs to induce CIN. Many variations of the approach can be envisioned for future studies.

      Our Results include some key studies of cancer cells with wildtype levels of CD47- including in vivo tumor elimination (Fig.3E). Nonetheless, we do conduct some of our studies in a CD47 knockout context to remove this “brake” that generally impedes phagocytosis, with our goal being to better understand how CIN affects phagocytosis. As cited to some extent in our Introduction, there are many efforts in clinical trials to disrupt this macrophage checkpoint and others focused on macrophage immunotherapy. Whether CIN can be induced by clinically translatable drugs and specifically in cancer cells is beyond the scope of our studies.

      I would like to see the amount of CIN that occurs in WT B16F10 over the course of tumorigenesis (ie longer than 5 days). This is because I would assume that CIN would eventually occur in the WT B16F10 regardless of whether MPS1i is being given. And if that's the case, then the initiation of CIN at day 10 after implantation (for example) would still be considered "early stage" CIN. If the therapy is then initiated at this point, does the effect remain? Or put differently, how would the authors propose to induce the appropriate level of CIN in an established tumor? Why is pretreatment necessary?

      Untreated B16F10 cells fail to produce micronuclei over 12 days compared to MPS1i treated cells – as shown in a newly added panel in Fig. S1:

      Author response image 1.

      This helps support our decision to pre-treat cells with MPS1i to stimulate genomic instability and is described in the first section of Results:

      “…we saw >10-fold increases of micronuclei over the cell line’s low basal level (~1% of cells), and two other MPS1i inhibitors AZ3146 and BAY12-17389 confirm such effects (Fig. S1A). Micronuclei-positive cells can persist up to 12 days after treatment (Fig. S1B), while control cells maintain the low basal levels. The results suggest pre-treatment with MPS1i can simulate CIN in an experimental context even for 1-2 weeks, which may not typically occur at the same frequency during early tumor growth.

      It is known that PD-1 expression inhibits tumor-associated macrophage phagocytosis (Nature, 2017). Does MSP1i (sic) treatment affect the population of PD-1+ tumor macrophages in vivo?

      We thank the Reviewer for bringing up an interesting point.

      Using the same tumor RNA-seq data that was used for Fig.1E, a heatmap of expression of PD-1 (gene Pdcd1) shows no consistent trend with MPS1i:

      Author response image 2.

      We also examined whether the secretome from CIN-afflicted cancer cells affect PD-1 expression in cultured macrophages, but we did not register any reads from our single-cell RNA-sequencing experiment for Pdcd1 in any of the macrophage clusters from Fig. 1H.

      Author response image 3.

      The Discussion section now includes a statement on this topic:

      “…B16F10 tumors are poorly immunogenic, do not respond to either anti-CD47 or anti-PD-1/PDL1 monotherapies, and show modest and variable cure rates (~20-40%; Dooling et al., 2023; Hayes et al., 2023) even when macrophages have been made maximally phagocytic according to notions above. We should note here that our whole-tumor RNA-seq data (Fig.1E) shows expression of PD-1 (gene Pdcd1) follows no consistent trend upon MPS1i treatment, and that Pdcd1 was not detected in our scRNA-seq data for macrophage cultures (Fig.1G) – motivating further study.”

      The authors must explain how the proposed therapy works since MPS1i increases tumor (cell) size, making it difficult for macrophages to phagocytose the tumor cells. It also reduces or suppresses Tyrp1 expression on the cancer cells, making it harder to opsonize. Since these were two main points for the rationale of this study, the authors need to reconcile them.

      We appreciate this comment and have re-organized this Results section to try to minimize confusion:

      CIN-afflicted, CD47-knockout tumoroids are eliminated by Macrophages

      To assess functional effects of macrophage polarization, we focused on a 3D “immuno-tumoroid” model in which macrophage activity can work (or not) over many days against a solid proliferating mass of cancer cells in non-adherent roundbottom wells (Fig. 2A) (Dooling et al., 2023). We used CD47 knockout (KO) B16F10 cells, which removes the inhibitory effect of CD47 on phagocytosis, noting that KO does not perturb surface levels of Tyrp1, which is targetable for opsonization with anti-Tyrp1 (Fig. S2A). BMDMs were added to pre-assembled tumoroids at a 3:1 ratio, and we first assessed surface protein expression of macrophage polarization markers. Consistent with our whole-tumor bulk RNA-sequencing and also single-cell RNA-sequencing of BMDM monocultures (Fig. 1E, 1I-J), BMDMs from immunotumoroids of MPS1i-treated B16F10 showed increased surface expression of M1-like markers MHCII and CD86 while showing decreased expression of M2-like markers CD163 and CD206 (Fig. 2B-C). Although these macrophages seemed poised for anticancer activity, the cancer cells showed decreased binding of anti-Tyrp1 (Fig. S2B) and ~20% larger size in flow cytometry (Fig. S2C). The latter likely reflects cytokinesis defects and poly-ploidy as acute effects of CIN induction (Chunduri & Storchová, 2019; Mallin et al., 2022). Such cancer cell changes might explain why standard 2D phagocytosis assays show BMDMs attached to rigid plastic engulf relatively few anti-Tyrp1 opsonized cancer cells pretreated with MPS1i versus DMSO (Fig. S2D). In such cultures, BMDMs use their cytoskeleton to attach and spread, competing with engulfment of large and poorly opsonized targets. Noting that tumors in vivo are not as rigid as plastic, our 3D immunotumoroids eliminate attachment to plastic, and large numbers of macrophages can cluster and cooperate in engulfing cancer cells in a cohesive mass (Dooling et al., 2023). We indeed find CIN-afflicted tumoroids are eliminated by BMDMs regardless of anti-Tyrp1 opsonization (Fig. 2D-E), whereas anti-Tyrp1 is required for clearance of DMSO control tumoroids (Fig. 2D, S3B). Imaging also suggests that cancer CIN stimulates macrophages to cluster (compare Day-4 in Fig. 2D), which favors cooperative phagocytosis of tumoroids (Dooling et al., 2023), and occurs despite the lack of cancer cell opsonization and their larger cell size. The 3D immunotumoroid results with induced CIN are thus consistent with a more pro-phagocytic M1-type polarization (Fig.1J and 2B,C).

      The authors used varying numbers of tumor cells for the in vivo portions of the study; the first half of the manuscript uses 500,000 cells, while the latter half uses 200,000 cells. Why?

      The reasons for the difference in numbers is now clarified in the Methods:

      For assessing immune infiltrates in early stages of tumor engraftment, when tumors are still small, we used a relatively high number of tumor cells (500,000 cells in Fig. 1D and Fig. 2F-G) to achieve sufficient cell numbers after dissociating the tumors, particularly for the slow-growing MPS1i-treated tumors. More specifically, with dissection, collagenase treatment, passage through a filter to remove clumps, we would lose many cells, and yet needed 100,000 viable cells or more for bulk RNA-seq suspensions and for flow cytometry measurements. For all other studies, 200,000 cancer cells were injected,

      The authors need to report the tumor volumes and the total number of cells isolated from the day five tumors to avoid grossly inflating the effect (i.e. Fig 2G and 4G).

      We have added relevant numbers in the Methods:

      For day 5 post-challenge measurements, 100,000 to 200,000 live cells were collected. For in vivo tumor infiltrate studies in re-challenged mice, 10 million live cells were collected.

      Also, regarding tumor sizes and cell numbers, we have previously published relevant measurements in assessments of tumor growth. Please see:

      Brandon H Hayes, Hui Zhu, Jason C Andrechak, Lawrence J Dooling, Dennis E Discher, Titrating CD47 by mismatch CRISPR-interference reveals incomplete repression can eliminate IgG-opsonized tumors but limits induction of antitumor IgG, PNAS Nexus, Volume 2, Issue 8, August 2023, pgad243, https://doi.org/10.1093/pnasnexus/pgad243

      Dooling, L.J., Andrechak, J.C., Hayes, B.H. et al. Cooperative phagocytosis of solid tumours by macrophages triggers durable anti-tumour responses. Nat. Biomed. Eng 7, 1081–1096 (2023). https://doi.org/10.1038/s41551-023-01031-3

      In the present study, similar tumor growth curves are provided for transparency, but the Kaplan-Meier curves as the key pieces of data in Fig. 3-4. Lastly, regarding reporting total cell number harvested, we based our experiments on previously accepted measurements that also reported numbers out of total harvested cells. See:

      Cerezo-Wallis, D., Contreras-Alcalde, M., … Soengas, M.S., 2020. Midkine rewires the melanoma microenvironment toward a tolerogenic and immune-resistant state. Nat Med 26, 1865–1877. https://doi.org/10.1038/s41591-020-1073-3

      The figure titles need to be revised. For example, the title of Figure 1 claims that "MPS1i-induced chromosomal instability causes proliferation deficits in B16F10 tumors." However, the evidence provided is weak. The authors only present GSEA analysis of proliferation and no functional evidence of impairment. The authors need to characterize this proliferation deficit using in vitro studies and functional studies of macrophage polarization. I would suggest proliferation assays (crystal violet, MTT, Incucyte, etc) to measure the B16 growth over time with MPS1i treatment.

      We thank the Reviewer for pointing this out. In Fig.1 we have minimized information regarding proliferation because it is later quantified in Figs.2D,E, S3, and 3D-i:

      Fig.1F legend: Top downregulated hallmark gene sets in tumors comprised of MPS1i-treated B16F10 cells, showing downregulated DNA repair, cell cycle, and growth-related pathways, consistent with observations of slowed growth in culture and in vivo – as subsequently quantified.

      Then the authors could collect the tumor supernatant to culture with macrophages and determine polarization in vitro. I would also like to see functional studies of macrophage polarization (suppression assays, cytokine production, etc). Currently, the authors provide no functional studies.

      Fig.2B,C provides functional surface marker measurements of in vitro polarization toward anti-cancer M1 macrophages by MPS1i-pretreated tumor cells, consistent with gene expression in Fig.1G-J. Function is further shown as ant-cancer activity in Fig.2D,E, as now stated explicitly in the text:

      “…In our 3D tumoroid in vitro assays, we found that macrophages can suppress the growth of chromosomally unstable tumoroids and clear them, surprisingly both with and without anti-Tyrp1 (Fig. 2D-E), regardless of MPS1i concentration used for treatment. Such a result is consistent with M1-type polarization (Fig.1J and 2B,C), which tends to be more pro-phagocytic. Such a result is consistent with M1-type polarization (Fig.1J and 2B,C), which tends to be more prophagocytic.”

      The authors claim that macrophages are the key effector cells, but they need to provide evidence for this claim.

      Other immune cells clearly contribute to the presented results because the IgG must eventually come from B cells. The text has been edited to indicate 'macrophages are key initiating-effector cells', and some evidence for this is the maximal survival of (WT B16 + Rev tumors) in Fig.3E upon treatment with Marrow Macrophages plus Macrophage-relevant SIRPa blockade and Macrophage-relevant IgG (via FcR). T cells do not have SIRPa or FcR.

      They can deplete macrophages and T and B cells to determine whether the effect remains or is ablated. This is the only definitive way to make this claim.

      To determine whether T and B cells might also be key initiating-effector cells, new experiments were done with mice depleted of T and B cells (per Fig.S9, below). We compared the growth of MPS1i vs DMSO treatments in these mice to results in mice with T and B cells (which should replicate our previous results in Fig.3D-i). We found that slower growth with Rev relative to DMSO was similar in mice without T and B cells compared to mice with T and B cells. We have added to the text our conclusion that: T and B cells are not key initiating-effector cells. Whereas B cells are effector cells at least in terms of eventually making anti-tumor IgG, our results show that macrophages are key initiating-effector cells because macrophages certainly affect the growth of (WT B16 + Rev tumors) when more are added (Fig.3E).

      Author response image 4.

      Growth of CIN-afflicted wild-type (WT) tumors in T- and B-cell deficient mice and T- and B-cell replete mice. Similar growth delays for MPS1i-pretreated B16F10 cells in T- and B-cell deficient NSG mice and immunocompetent C57BL/6 mice. Both types of mice have functional macrophages. Parallel studies in vivo were done with WT B16F10 ctrl cells cultured 24 h in 2.5 μM MPS1i (reversine or DMSO, then washed 3x in growth media for 5 min each and allowed to recover in growth media for 48 h. 200,000 cells in 100 uL PBS were injected subcutaneously into right flanks, and the standard size limit was used to determine survival curves. The C57BL/6 experiments were done independently here (by co-author L.J.D.) from the similar results (by B.H.H.) shown in Fig.3D-i, which provides evidence of reproducibility.

      The Results section final paragraph describes all of this:

      Macrophages seem to be the key initiating-effector cells, based in part on the following findings. First, macrophages with both SIRPα blockade and FcR-engaging, tumor-targeting IgG maximize survival of mice with WT B16 + Rev tumors (Fig. 3E) – noting that macrophages but not T cells express SIRPα and FcR’s. Despite the clear benefits of adding macrophages, to further assess whether T and B cells are key initiating-effector cells, new experiments were done with mice depleted of T and B cells. We compared the growth delay of MPS1i versus DMSO treatments in these mice to the delay in fully immunocompetent mice with T and B cells – with all studies done at the same time. We found that slower growth with Rev relative to DMSO was similar in mice without T and B cells when compared to immunocompetent C57 mice (Fig.S9). We conclude therefore that T and B cells are not key initiating-effector cells. At later times, B cells are likely effector cells at least in terms of making anti-tumor IgG, and T cells in tumor re-challenges are also increased in number (Fig. 4G-ii). We further note that in our earlier collaborative study (Harding et al., 2017) WT B16 cells were pre-treated by genome-damaging irradiation before engraftment in C57 mice, and these cells grew minimally – similar to MPS1i treatment – while untreated WT B16 cells grew normally at a contralateral site in the same mouse. Such results indicate that T and B cells in C57BL/6 mice are not sufficiently stimulated by genome-damaged B16 cells to generically impact the growth of undamaged B16 cells.

      Reviewer #2 (Public Review):

      Harnessing macrophages to attack cancer is an immunotherapy strategy that has been steadily gaining interest. Whether macrophages alone can be powerful enough to permanently eliminate a tumor is a high-priority question. In addition, the factors making different tumors more vulnerable to macrophage attack have not been completely defined. In this paper, the authors find that chromosomal instability (CIN) in cancer cells improves the effect of macrophage targeted immunotherapies. They demonstrate that CIN tumors secrete factors that polarize macrophages to a more tumoricidal fate through several methods. The most compelling experiment is transferring conditioned media from MSP1 inhibited and control cancer cells, then using RNAseq to demonstrate that the MSP1-inhibited conditioned media causes a shift towards a more tumoricidal macrophage phenotype. In mice with MSP1 inhibited (CIN) B16 melanoma tumors, a combination of CD47 knockdown and anti-Tyrp1 IgG is sufficient for long term survival in nearly all mice. This combination is a striking improvement from conditions without CIN.

      Like any interesting paper, this study leaves several unanswered questions. First, how do CIN tumors repolarize macrophages? The authors demonstrate that conditioned media is sufficient for this repolarization, implicating secreted factors, but the specific mechanism is unclear. In addition, the connection between the broad, vaccination-like IgG response and CIN is not completely delineated. The authors demonstrate that mice who successfully clear CIN tumors have a broad anti-tumor IgG response. This broad IgG response has previously been demonstrated for tumors that do not have CIN. It is not clear if CIN specifically enhances the anti-tumor IgG response or if the broad IgG response is similar to other tumors. Finally, CIN is always induced with MSP1 inhibition. To specifically attribute this phenotype to CIN it would be most compelling to demonstrate that tumors with CIN unrelated to MSP1 inhibition are also able to repolarize macrophages.

      Overall, this is a thought-provoking study that will be of broad interest to many different fields including cancer biology, immunology and cell biology.

      We thank the Reviewer for their enthusiastic and positive comments toward the manuscript.

      Our main purpose with this study has been discovery science oriented and mechanistic, with implications for improving macrophage immunotherapies. More experimentation needs to be done to further understand how this positive immune response emerges. However, we could address whether CIN enhances or not the anti-tumor IgG response by quantitative comparisons to our two other recent studies, and we conclude that it does not per new edits in the Abstract and the Results. See attached PPT for full details and comparison.

      Abstract:

      “CIN does not greatly affect the level of the induced response but does significantly increase survival.”

      “…these results demonstrate induction of a generally potent anti-cancer antibody response to CIN-afflicted B16F10 in a CD47 KO context. Importantly, comparing these sera results for CINafflicted tumors to our recent studies of the same tumor model without CIN (Dooling et al., 2022; Hayes et al., 2022), we find similar levels of IgG induction (e.g. ~100-fold above naive on average for IgG2a/c), similar increases in phagocytosis by sera opsonization (e.g. equivalent to antiTyrp1), and similar levels of suppressed tumoroid growth – including the variability.

      However, median survival increased (21 days) compared to their naïve counterparts (14 days), supporting the initial hypothesis of prolonged survival and consistent not only with past results indicating major benefits of a prime-&-boost approach with anti-Tyrp1 (Dooling et al., 2022) but also with the noted similarities in induced IgG levels.”

      Future studies could certainly focus on trying to identify what secreted factors might be inducing the M1-like polarization (using ELISA assays for cytokine detection, for example). This could be important because a main finding here is that we achieve nearly a 100% success rate in clearing tumors when we combine CD47 ablation and IgG opsonization with cancer cell CIN. Previous studies were only able to achieve about 40% cures in mice when working with CD47 disription and IgG opsonization alone, suggesting CIN in this experimental context does improve macrophage response.

      Lastly, we agree with the Reviewer that future studies should also address how CIN in general (not MPS1i-induced) affects tumor growth. The final paragraph of our Discussion at least cites support for consistent effects of M1-like polarization:

      “The effects of CIN and aneuploidy in macrophages certainly requires further investigation. We did publish recently that M1-like polarization of BMDMs with IFNg priming is sufficient to suppress growth of B16 tumoroids with anti-Tyrp1 opsonization more rapidly than unpolarized/unprimed macrophages and much more rapidly than M2-like polarization of BMDMs with IL4 (Extended Data Fig.5a in Dooling et al., 2023); hence, anti-cancer polarization contributes in this assay.

      While the secretome from MPS1i-treated cancer cells has been found to trigger…”

      Nonetheless, we can only speculate that there is a threshold of CIN reached by a certain timepoint in tumor engraftment and growth. Natural CIN might not be enough, so we pursued a pharmacological approach consistent with ongoing pre-clinical studies (https://doi.org/10.1158/1535-7163.MCT-15-0500). Future studies should consider trying knockdown models to gradually accrue CIN in tumors or using more relevant pharmacological drugs that are known to induce CIN not associated with the spindle. We believe, however, that these are larger questions on their own and are beyond the scope of the foundational discoveries in this manuscript.

      Reviewer #2 (Recommendations For The Authors):

      None

      We again thank the Reviewer for their support and enthusiasm for the manuscript. We made some additional changes and more data to address questions posed by the other Reviewer that we hope you find to help the manuscript further.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Hippocampal place cells display a sequence of firing activities when the animal travels through a spatial trajectory at a behavioral time scale of seconds to tens of seconds. Interestingly, parts of the firing sequence also occur at a much shorter time scale: ~120 ms within individual cycles of theta oscillation. These so-called theta sequences are originally thought to naturally result from the phenomenon of theta phase precession. However, there is evidence that theta sequences do not always occur even when theta phase precession is present, for example, during the early experience of a novel maze. The question is then how they emerge with experience (theta sequence development). This study presents evidence that a special group of place cells, those tuned to fast-gamma oscillations, may play a key role in theta sequence development.

      The authors analyzed place cells, LFPs, and theta sequences as rats traveled a circular maze in repeated laps. They found that a group of place cells were significantly tuned to a particular phase of fast-gamma (FG-cells), in contrast to others that did not show such tunning (NFG-cells). The authors then omitted FG-cells or the same number of NFG-cells, in their algorithm of theta sequence detection and found that the quality of theta sequences, quantified by a weighted correlation, was worse with the FG-cell omission, compared to that with the NFG-cell omission, during later laps, but not during early laps. What made the FG-cells special for theta sequences? The authors found that FG-cells, but not NFG-cells, displayed phase recession to slow-gamma (25 - 45 Hz) oscillations (within theta cycles) during early laps (both FG- and NFG-cells showed slow-gamma phase precession during later laps). Overall, the authors conclude that FG-cells contribute to theta sequence development through slow-gamma phase precession during early laps.

      How theta sequences are formed and developed during experience is an important question, because these sequences have been implicated in several cognitive functions of place cells, including memory-guided spatial navigation. The identification of FG-cells in this study is straightforward. Evidence is also presented for the role of these cells in theta sequence development. However, given several concerns elaborated below, whether the evidence is sufficiently strong for the conclusion needs further clarification, perhaps, in future studies.

      We thank the reviewer for these positive comments.

      (1) The results in Figure 3 and Figure 8 seems contradictory. In Figure 8, all theta sequences displayed a seemingly significant weighted correlation (above 0) even in early laps, which was mostly due to FG-cell sequences but not NFG-cell sequences (correlation for NFG-sequences appeared below 0). However, in Figure 3H, omitting FG-cells and omitting NFG-cells did not produce significant differences in the correlation. Conversely, FG-cell and NFG-cell sequences were similar in later laps in Figure 8 (NFG-cell sequences appeared even better than FG-cell sequences), yet omitting NFG-cells produced a better correlation than omitting FG-cells. This confusion may be related to how "FG-cell-dominant sequences" were defined, which is unclear in the manuscript. Nevertheless, the different results are not easy to understand.

      We thank the reviewer for pointing out this important problem.  The potential contradictory can be interpreted by different sequence dataset included in Fig3 and Fig8, described as follows.

      (1) In Fig 3, all sequences decoded without either FG or NFG cells were included, defined as exFG-sequences and exNFG sequences, so that we couldn’t observe sequence development at early phase and thus the weighted correlation was low.  (2) In Fig8, however, the sequences with either FG or NFG cells firing across at least 3 slow gamma cycles were included, defined as FG-cell sequences and NFG-cell sequences.  This criterion ensures to investigate the relationship between sequence development and slow gamma phase precession, so that these sequences were contributed by cells likely to show slow gamma phase precession.  These definitions have been updated to the “Theta sequences detection” section of the Methods (Line 606-619).

      At early phase, there’s still no difference of weighted correlation between FG-cell sequences and NFG-cell sequences (Author response image 1A, Student’s t test, t(65)=0.2, p=0.8, Cohen's D=0.1), but the FG-cell sequences contained high proportion of slow gamma phase precession (Fig8F).  At late phase, both FG-cell sequences and NFG-cell sequences exhibited slow gamma phase precession, so that their weighted correlation were high with no difference (Author response image 1B, Student’s t test, t(62)=-1.1, p=0.3, Cohen's D=0.3).  This result further indicates that the theta sequence development requires slow gamma phase precession, especially for FG cells during early phase.

      Author response image 1.

      (2) The different contributions between FG-cells and NFG-cells to theta sequences are supposed not to be caused by their different firing properties (Figure 5). However, Figure 5D and E showed a large effect size (Cohen's D = 07, 0.8), although not significant (P = 0.09, 0.06). But the seemingly non-significant P values could be simply due to smaller N's (~20). In other parts of the manuscript, the effect sizes were comparable or even smaller (e.g. D = 0.5 in Figure 7B), but interpreted as positive results: P values were significant with large N's (~480 in Fig. 7B). Drawing a conclusion purely based on a P value while N is large often renders the conclusion only statistical, with unclear physical meaning. Although this is common in neuroscience publications, it makes more sense to at least make multiple inferences using similar sample sizes in the same study.

      We thank the reviewer for this kind suggestion.  We made multiple inferences using similar sample sizes as much as possible.  In Fig7B, we did the statistical analysis with sessions as samples, and we found the significant conclusion was maintained.  These results have been updated to the revised manuscript (Lines 269-270).and the Fig7B has been replaced correspondingly.

      (3) In supplementary Figure 2 - S2, FG-cells displayed stronger theta phase precession than NFG-cells, which could be a major reason why FG-cells impacted theta sequences more than NFG cells. Although factors other than theta phase precession may contribute to or interfere with theta sequences, stronger theta phase precession itself (without the interference of other factors), by definition, can lead to stronger theta sequences.

      This is a very good point.  The finding that FG-cells displayed stronger theta phase precession than NFG-cells was consistent with the finding of Guardamagna et al., 2023 Cell Rep, that the theta phase precession pattern emerged with strong fast gamma.  Since slow gamma phase precession occurred within theta cycles, it is hard to consider the contribution of these factors to theta sequences development, without taking theta phase precession into account.  But one should be noted that the theta sequences could not be developed even if theta phase precession existed from the very beginning of the exploration (Feng et al., 2025 J Neurosci).  These findings suggest that theta phase precession, together with other factors, impact theta sequence development.  However, the weight of each factor and their interaction still need to be further investigated.  We have discussed this possibility in the Discussion section (Lines 361- 373).

      (4) The slow-gamma phase precession of FG-cells during early laps is supposed to mediate or contribute to the emergence of theta sequences during late laps (Figure 1). The logic of this model is unclear. The slow-gamma phase precession was present in both early and late laps for FG-cells, but only present in late laps for NFG-cells. It seems more straightforward to hypothesize that the difference in theta sequences between early and later laps is due to the difference in slow-gamma phase precession of NFG cells between early and late laps. Although this is not necessarily the case, the argument presented in the manuscript is not easy to follow.

      We thank the reviewer for pointing this out.  The slow gamma phase precession was first found in my previous publication (Zheng et al., 2016 Neuron), which indicates a temporally compressed manner for coding spatial information related to memory retrieval.  In this case, we would expect that slow gamma phase precession occurred in all cells during late laps, because spatial information was retrieved when rats have been familiar with the environment.  However, during early laps when novel information was just encoded, there would be balance between fast gamma and slow gamma modulation of cells for upcoming encoding-retrieval transition.  A possibility is that FG-cells support this balance by receiving modulation of both fast gamma and slow gamma, but with distinct phase-coding modes (fast gamma phase locking and slow gamma phase precession) in a temporally coordinated manner.  We have discussed this possibility in the Discussion section (Lines 415- 428).

      (5) There are several questions on the description of methods, which could be addressed to clarify or strengthen the conclusions.

      (i) Were the identified fast- and slow-gamma episodes mutually exclusive?

      Yes, the fast- and slow-gamma episodes are mutually exclusive. We have added descriptions in the “Detection of gamma episodes” section in the Methods part (Lines 538-550).

      (ii) Was the task novel when the data were acquired? How many days (from the 1st day of the task) were included in the analysis? When the development of the theta sequence was mentioned, did it mean the development in a novel environment, in a novel task, or purely in a sense of early laps (Lap 1, 2) on each day?

      We thank the reviewer for pointing this out.  The task was not novel to rats in this dataset, because only days with good enough recording quality for sequence decoding were included in this paper, which were about day2-day10 for each rat.  However, we still observed the process of sequence formation because of the rat’s exploration interest during early laps.  Thus, when the development of the theta sequence was mentioned, it meant a sense of early laps on each day.

      (iii) How were the animals' behavioral parameters equalized between early and later laps? For example, speed or head direction could potentially produce the differences in theta sequences.

      This is a very good point.  In terms of the effect of running speed on theta sequences, we quantified the running speeds during theta sequences across trials 1-5.  We found that the rats were running at stable running speed, which has been reported in Fig.3F.  In terms of the effect of head direction on theta sequences, we measured the angle difference between head direction and running direction.  We found that the angle difference for each lap was distributed around 0, with no significant difference across laps (Fig.S3, Watson-Williams multi-sample test, F(4,55)=0.2, p=0.9, partial η<sup>2</sup>= 0.01).  These results indicate that the differences in theta sequences across trials cannot be interpreted by the variability of behavioral parameters.  We have updated these results and corresponding methods in the revised manuscript (Lines 172-175, Lines 507-511, with a new Fig.S3).

      Reviewer #2 (Public Review):

      This manuscript addresses an important question that has not yet been solved in the field, what is the contribution of different gamma oscillatory inputs to the development of "theta sequences" in the hippocampal CA1 region? Theta sequences have received much attention due to their proposed roles in encoding short-term behavioral predictions, mediating synaptic plasticity, and guiding flexible decision-making. Gamma oscillations in CA1 offer a readout of different inputs to this region and have been proposed to synchronize neuronal assemblies and modulate spike timing and temporal coding. However, the interactions between these two important phenomena have not been sufficiently investigated. The authors conducted place cell and local field potential (LFP) recordings in the CA1 region of rats running on a circular track. They then analyzed the phase locking of place cell spikes to slow and fast gamma rhythms, the evolution of theta sequences during behavior, and the interaction between these two phenomena. They found that place cells with the strongest modulation by fast gamma oscillations were the most important contributors to the early development of theta sequences and that they also displayed a faster form of phase precession within slow gamma cycles nested with theta. The results reported are interesting and support the main conclusions of the authors. However, the manuscript needs significant improvement in several aspects regarding data analysis, description of both experimental and analytical methods, and alternative interpretations, as I detail below.

      • The experimental paradigm and recordings should be explained at the beginning of the Results section. Right now, there is no description whatsoever which makes it harder to understand the design of the study.

      We thank the reviewer for this kind suggestion.  The description of experimental paradigm and recordings has been added to the beginning of the results section (Lines 114-119).

      • An important issue that needs to be addressed is the very small fraction of CA1 cells phased-locked to slow gamma rhythms (3.7%). This fraction is much lower than in many previous studies, that typically report it in the range of 20-50%. However, this discrepancy is not discussed by the authors. This needs to be explained and additional analysis considered. One analysis that I would suggest, although there are also other valid approaches, is to, instead of just analyzing the phase locking in two discrete frequency bands, compute the phase locking will all LFP frequencies from 25-100 Hz. This will offer a more comprehensive and unbiased view of the gamma modulation of place cell firing. Alternative metrics to mean vector length that is less sensitive to firing rates, such as pairwise phase consistency index (Vinck et a., Neuroimage, 2010), could be implemented. This may reveal whether the low fraction of phase-locked cells could be due to a low number of spikes entering the analysis.

      We thank the reviewer for this constructive suggestion.  A previous work also on Long-Evans rats showed that the proportion of slow gamma phase-locked cells during novelty exploration was ~20%, however it dropped to ~10% during familiar exploration (Fig.4E, Kitanishi et al., 2015 Neuron).  This suggests that the proportion of slow gamma phase-locked cells may decreased with familiarity of the environment, which supports our data.  In addition, we also calculated the pairwise phase consistency index in terms of the effect of spike counts on MVL.  We could observe that the tendency of PPC (Author response image 2A) and MVL (Author response image 2B) along frequency bands were consistent across different subsets of cells, suggesting that the determination of cell subsets by MVL metric was not biased by the low number of spikes.  These results further shed light to the contribution of slow gamma phase precession of place cells to theta sequence development.

      Author response image 2.

      • From the methods, it is not clear to me whether the reference LFP channel was consistently selected to be a different one that where the spikes analyzed were taken. This is the better practice to reduce the contribution of spike leakage that could substantially inflate the coupling with faster gamma frequencies. These analyses need to be described in more detail.

      We thank the reviewer for pointing this out.  In the main manuscript, we used local LFPs as the cells were recorded from the same tetrode.  In addition, we selected an individual tetrode which located at stratum pyramidale and at the center of the drive bundle for each rat.  We detected a similar proportion of FG-cells by using LFPs on this tetrode, compared with that using local LFPs (Author response image 3A-B, Chi-squared test, χ<sup>2</sup>= 0.9, p=0.4, Cramer V=0.03).  We further found that the PPC measurement of FG- and NFG-cells were different at fast gamma band by using central LFPs (Author response image 3D), consistent with that by using local LFPs (Author response image 3C).  Therefore, these results suggest that the findings related to fast gamma was not due to the contribution of spike leakage in the local LFPs.  We have updated the description in the manuscript (Lines 553-557, 566-568).

      Author response image 3.

      • The initial framework of the authors of classifying cells into fast gamma and not fast gamma modulated implies a bimodality that may be artificial. The authors should discuss the nuances and limitations of this framework. For example, several previous work has shown that the same place cell can couple to different gamma oscillations (e.g., Lastoczni et al., Neuron, 2016; Fernandez-Ruiz et al., Neuron, 2017; Sharif et al., Neuron,2021).

      We thank the reviewer for this kind suggestion.  We have cited these references and discussed the possibility of bimodal phase-locking in the manuscript (Lines 430-433).

      • It would be useful to provide a more thorough characterization of the physiological properties of FG and NFG cells, as this distinction is the basis of the paper. Only very little characterization of some place cell properties is provided in Figure 5. Important characteristics that should be very feasible to compare include average firing rate, burstiness, estimated location within the layer (i.e., deep vs superficial sublayers) and along the transverse axis (i.e., proximal vs distal), theta oscillation frequency, phase precession metrics (given their fundamental relationship with theta sequences), etc.

      We thank the reviewer for this constructive suggestion.  In addition to the characterizations shown in Fig5, we also analyzed firing rate, anatomical location and theta modulation to compare the physiological properties of FG- and NFG-cells.

      In terms of the firing properties of both types of cells, we found that the mean firing rate of FG-cell was higher than NFG-cell (Fig. 5A, Student's t-test, t(22) = 2.1, p = 0.04, Cohen's D = 0.9), which was consistent with the previous study that the firing rate was higher during fast gamma than during slow gamma (Zheng et al., 2015 Hippocampus).  However, the spike counts of excluded FG- and NFG-cells for decoding were similar (Fig. 5B, Student's t-test, t(22) = 1.2, p = 0.3, Cohen's D = 0.5), suggesting that the differences found in theta sequences cannot be accounted for by different decoding quality related to spike counts.  In addition, we measured the burstiness based on the distribution of inter-spike-intervals, and we found that the bursting probability of spikes was not significantly different between FG and NFG cells (Author response image 4A, Student's t-test, t(22) = 0.6, p=0.5, Cohen's d=0.3).

      In terms of theta modulation of cells, we first compared the theta frequency related to the firing of FG and NFG cells.  We detected the instantaneous theta frequency at each spike timing of FG and NFG cells, and found that it was not significantly different between cell types (Author response image 4B, Student's t-test, t(22) = -0.5, p=0.6, Cohen's d=0.2).  In addition, we found the proportion of cells with significant theta phase precession was greater in FG-cells than in NFG-cells (Fig. S2E).  However, the slope and starting phase of theta phase precession was not significantly different between FG and NFG cells (Author response image 4C, Student's t-test, t(21) = 0.3, p=0.8, Cohen's d=0.1; Author response image 4D, Watson-Williams test, F(1,21)=0.5, p=0.5, partial η<sup>2</sup>=0.02).

      In terms of the anatomical location of FG and NFG cells, we identified tetrode traces in slices for each cell.  We found that both FG and NFG cells were recorded from the deep layer of dorsal CA1, with no difference of proportions between cell types (Author response image 4E, Chi-squared test, χ<sup>2</sup>=0.5, p=0.5, Cramer V=0.05).  The distribution of FG-cells he NFG-cells along the transverse axis was also similar between cell types (Author response image 4F, χ<sup>2</sup>=0.08, p=0.8, Cramer V=0.02).

      Author response image 4.

      • It is not clear to me how the analysis in Figure 6 was performed. In Figure 6B I would think that the grey line should connect with the bottom white dot in the third panel, which would be the interpretation of the results.

      We thank the reviewer for raising this good point.  The grey line was just for intuitional observation, not a quantitative analysis.  We have removed the grey lines from all heat maps in Fig.6.

      Reviewer #3 (Public Review):

      [Editors' note: This review contains many criticisms that apply to the whole sub-field of slow/fast gamma oscillations in the hippocampus, as opposed to this particular paper. In the editors' view, these comments are beyond the scope of any single paper. However, they represent a view that, if true, should contextualise the interpretation of this paper and all papers in the sub-field. In doing so, they highlight an ongoing debate within the broader field.]

      Summary:

      The authors aimed to elucidate the role of dynamic gamma modulation in the development of hippocampal theta sequences, utilizing the traditional framework of "two gammas," a slow and a fast rhythm. This framework is currently being challenged, necessitating further analyses to establish and secure the assumed premises before substantiating the claims made in the present article.

      The results are too preliminary and need to integrate contemporary literature. New analyses are required to address these concerns. However, by addressing these issues, it may be possible to produce an impactful manuscript.

      We thank the reviewer for raising these important questions in the hippocampal gamma field.  We have done a lot of new analyses according to the comments to strengthen our manuscript.

      I. Introduction

      Within the introduction, multiple broad assertions are conveyed that serve as the premise for the research. However, equally important citations that are not mentioned potentially contradict the ideas that serve as the foundation. Instances of these are described below:

      (1) Are there multiple gammas? The authors launched the study on the premise that two different gamma bands are communicated from CA3 and the entorhinal cortex. However, recent literature suggests otherwise, offering that the slow gamma component may be related to theta harmonics:

      From a review by Etter, Carmichael and Williams (2023)

      "Gamma-based coherence has been a prominent model for communication across the hippocampal-entorhinal circuit and has classically focused on slow and fast gamma oscillations originating in CA3 and medial entorhinal cortex, respectively. These two distinct gammas are then hypothesized to be integrated into hippocampal CA1 with theta oscillations on a cycle-to-cycle basis (Colgin et al., 2009; Schomburg et al., 2014). This would suggest that theta oscillations in CA1 could serve to partition temporal windows that enable the integration of inputs from these upstream regions using alternating gamma waves (Vinck et al., 2023). However, these models have largely been based on correlations between shifting CA3 and medial entorhinal cortex to CA1 coherence in theta and gamma bands. In vivo, excitatory inputs from the entorhinal cortex to the dentate gyrus are most coherent in the theta band, while gamma oscillations would be generated locally from presumed local inhibitory inputs (Pernía-Andrade and Jonas, 2014). This predominance of theta over gamma coherence has also been reported between hippocampal CA1 and the medial entorhinal cortex (Zhou et al., 2022). Another potential pitfall in the communication-through-coherence hypothesis is that theta oscillations harmonics could overlap with higher frequency bands (Czurkó et al., 1999; Terrazas et al., 2005), including slow gamma (Petersen and Buzsáki, 2020). The asymmetry of theta oscillations (Belluscio et al., 2012) can lead to harmonics that extend into the slow gamma range (Scheffer-Teixeira and Tort, 2016), which may lead to a misattribution as to the origin of slow-gamma coherence and the degree of spike modulation in the gamma range during movement (Zhou et al., 2019)."

      And from Benjamin Griffiths and Ole Jensen (2023)

      "That said, in both rodent and human studies, measurements of 'slow' gamma oscillations may be susceptible to distortion by theta harmonics [53], meaning open questions remain about what can be attributed to 'slow' gamma oscillations and what is attributable to theta."

      This second statement should be heavily considered as it is from one of the original authors who reported the existence of slow gamma.

      Yet another instance from Schomburg, Fernández-Ruiz, Mizuseki, Berényi, Anastassiou, Christof Koch, and Buzsáki (2014):

      "Note that modulation from 20-30 Hz may not be related to gamma activity but, instead, reflect timing relationships with non-sinusoidal features of theta waves (Belluscio et al., 2012) and/or the 3rd theta harmonic."

      One of this manuscript's authors is Fernández-Ruiz, a contemporary proponent of the multiple gamma theory. Thus, the modulation to slow gamma offered in the present manuscript may actually be related to theta harmonics.

      With the above emphasis from proponents of the slow/fast gamma theory on disambiguating harmonics from slow gamma, our first suggestion to the authors is that they A) address these statements (citing the work of these authors in their manuscript) and B) demonstrably quantify theta harmonics in relation to slow gamma prior to making assertions of phase relationships (methodological suggestions below). As the frequency of theta harmonics can extend as high as 56 Hz (PMID: 32297752), overlapping with the slow gamma range defined here (25-45 Hz), it will be important to establish an approach that decouples the two phenomena using an approach other than an arbitrary frequency boundary.

      We agree with the reviewer that the theta oscillations harmonics could overlap with higher frequency bands including slow gamma, as the above reviews discussed.  In order to rule out the possibility of theta harmonics effects in this study, we added new analyses in this letter (see below).

      (2) Can gammas be segregated into different lamina of the hippocampus? This idea appears to be foundational in the premise of the research but is also undergoing revision.

      As discussed by Etter et al. above, the initial theory of gamma routing was launched on coherence values. However, the values reported by Colgin et al. (2009) lean more towards incoherence (a value of 0) rather than coherence (1), suggesting a weak to negligible interaction. Nevertheless, this theory is coupled with the idea that the different gamma frequencies are exclusive to the specific lamina of the hippocampus.

      Recently, Deschamps et al. (2024) suggested a broader, more nuanced understanding of gamma oscillations than previously thought, emphasizing their wide range and variability across hippocampal layers. This perspective challenges the traditional dichotomy of gamma sub-bands (e.g., slow vs. medium gamma) and their associated cognitive functions based on a more rigid classification according to frequency and phase relative to the theta rhythm. Moreover, they observed all frequencies across all layers.

      Similarly, the current source density plots from Belluscio et al. (2012) suggest that SG and FG can be observed in both the radiatum and lacunosum-moleculare.

      Therefore, if the initial coherence values are weak to negligible and both slow and fast gamma are observed in all layers of the hippocampus, can the different gammas be exclusively related to either anatomical inputs or psychological functions (as done in the present manuscript)? Do these observations challenge the authors' premise of their research? At the least, please discuss.

      We thank the reviewer for raising this point, which I believe still remains controversial in this field.  We also thank the reviewer for providing detailed proofs of existence forms of gamma rhythms.  The reviewer was considering 2 aspects of gamma: 1) the reasonability of dividing slow and fast gamma by specific frequency bands; 2) the existence of gamma across all hippocampal layers, which challenged the functional significance of different types of gamma rhythms.  Although the results in Douchamps et al., 2024 challenged the idea of rigid gamma sub-bands, we still could see separate slow and fast gamma components exclusively occurred along time course, with central frequency of slow gamma lower than ~60Hz and central frequency of fast gamma higher than ~60Hz (Fig.1b of Douchamps et al., 2024).  This was also seen in the rat dataset of this reference (Fig. S3).  Since their behavioral test required both memory encoding and retrieval processes, it was hard to distinguish the role of different gamma components as they may dynamically coordinate during complex memory process.  Thus, although the behavioral performance can be decoded from broad range of gamma, we still cannot deny the existence of difference gamma rhythms and their functional significance during difference memory phases.

      (3) Do place cells, phase precession, and theta sequences require input from afferent regions? It is offered in the introduction that "Fast gamma (~65-100Hz), associated with the input from the medial entorhinal cortex, is thought to rapidly encode ongoing novel information in the context (Fernandez-Ruiz et al., 2021; Kemere, Carr, Karlsson, & Frank, 2013; Zheng et al., 2016)".

      CA1 place fields remain fairly intact following MEC inactivation include Ipshita Zutshi, Manuel Valero, Antonio Fernández-Ruiz , and György Buzsáki (2022)- "CA1 place cells and assemblies persist despite combined mEC and CA3 silencing" and from Hadas E Sloin, Lidor Spivak, Amir Levi, Roni Gattegno, Shirly Someck, Eran Stark (2024) - "These findings are incompatible with precession models based on inheritance, dual-input, spreading activation, inhibition-excitation summation, or somato-dendritic competition. Thus, a precession generator resides locally within CA1."

      These publications, at the least, challenge the inheritance model by which the afferent input controls CA1 place field spike timing. The research premise offered by the authors is couched in the logic of inheritance, when the effect that the authors are observing could be governed by local intrinsic activity (e.g., phase precession and gamma are locally generated, and the attribution to routed input is perhaps erroneous). Certainly, it is worth discussing these manuscripts in the context of the present manuscript.

      We thank the review for this discussion.  The main purpose of our current study is to investigate the mechanism of theta sequence development along with learning, which may or may not dependent on theta phase precession of single place cells as it remains controversial in this field.  Also, there is a limitation in this study that all gamma components were recorded from stratum pyramidale, thus we cannot make any conclusion on the originate of gamma in modulating sequence development.

      II. Results

      (1) Figure 2-

      a. There is a bit of a puzzle here that should be discussed. If slow and fast frequencies modulate 25% of neurons, how can these rhythms serve as mechanisms of communication/support psychological functions? For instance, if fast gamma is engaged in rapid encoding (line 72) and slow gamma is related to the integration processing of learned information (line 84), and these are functions of the hippocampus, then why do these rhythms modulate so few cells? Is this to say 75% of CA1 neurons do not listen to CA3 or MEC input?

      The proportion ~25% was the part of place cells phase-locked to either slow or fast gamma.  However, one of the main findings in this study was that most cells were modulated by slow gamma as they fired at precessed slow gamma phase within a theta cycle (Figs 6-8), which would promote information compression for theta sequence development.  Therefore, we didn’t mean that only a small proportion of cells were modulated by gamma rhythms and contributed to this process.

      b. Figure 2. It is hard to know if the mean vector lengths presented are large or small. Moreover, one can expect to find significance due to chance. For instance, it is challenging to find a frequency in which modulation strength is zero (please see Figure 4 of PMID: 30428340 or Figure 7 of PMID: 31324673).

      i. Please construct the histograms of Mean Vector Length as in the above papers, using 1 Hz filter steps from 1-120Hz and include it as part of Figure 2 (i.e., calculate the mean vector length for the filtered LFP in steps of 1-2 Hz, 2-3 Hz, 3-4 Hz,... etc). This should help the authors portray the amount of modulation these neurons have relative to the theta rhythm and other frequencies. If the theta mean vector length is higher, should it be considered the primary modulatory influence of these neurons (with slow and fast gammas as a minor influence)?

      We thank the review for this suggestion.  We measured the mean vector length at 5Hz step (equivalent to 1Hz step), and we found that the FG-cells were phase-locked to fast gamma rhythms even stronger than that to theta (Author response image 2B, mean MVL of theta=0.126±0.007, mean MVL of theta=0.175±0.006, paired t-test, t(112)=-5.9, p=0.01, Cohen's d=0.7).  In addition, in some previous studies with significant fast gamma phase locking, the MVL values were around 0.15 by using broad gamma band (Kitanishi et al., 2015 Neuron, Lasztóczi et al., 2016 Neuron, Tomar et al., 2021 Front Behav Neurosci, and Asiminas et al., 2022 Molecular Autism), which was consistent with the value in this study.  Therefore, we don’t believe that fast gamma was only a minor influence of these neurons.

      ii. It is possible to infer a neuron's degree of oscillatory modulation without using the LFP. For instance, one can create an ISI histogram as done in Figure 1 here (https://www.biorxiv.org/content/10.1101/2021.09.20.461152v3.full.pdf+html; "Distinct ground state and activated state modes of firing in forebrain neurons"). The reciprocal of the ISI values would be "instantaneous spike frequency". In favor of the Douchamps et al. (2024) results, the figure of the BioRXiV paper implies that there is a single gamma frequency modulate as there is only a single bump in the ISIs in the 10^-1.5 to 10^-2 range. Therefore, to vet the slow gamma results and the premise of two gammas offered in the introduction, it would be worth including this analysis as part of Figure 2.

      By using suggested method, we calculated the ISI distribution on log scale for FG-cells and NFG-cells during behavior (Author response image 5).  We could observe that the ISI distribution of FG-cells had a bump in the 10<sup>-1.5</sup>= to 10<sup>-2</sup>= range (black bar), in particular in the fast gamma range (10<sup>-2</sup>= to 10<sup>-1.8</sup>=).

      Author response image 5.

      c. There are some things generally concerning about Figure 2.

      i. First, the raw trace does not seem to have clear theta epochs (it is challenging to ascertain the start and end of a theta cycle). Certainly, it would be worth highlighting the relationship between theta and the gammas and picking a nice theta epoch.

      We thank the review for this suggestion.  We've updated this figure with a nice theta epoch in the revised manuscript.

      ii. Also, in panel A, there looks to be a declining amplitude relationship between the raw, fast, and slow gamma traces, assuming that the scale bars represent 100uV in all three traces. The raw trace is significantly larger than the fast gamma. However, this relationship does not seem to be the case in panel B (in which both the raw and unfiltered examples of slow and fast gamma appear to be equal; the right panels of B suggest that fast gamma is larger than slow, appearing to contradict the A= 1/f organization of the power spectral density). Please explain as to why this occurs. Including the power spectral density (see below) should resolve some of this.

      We thank the review for pointing this out.  The scales of y-axis of LFPs tracs in Fig.2B was not consistent, which mislead the comparison of amplitude between slow and fast gamma.  We have unified y axis scales across different gamma types in the revised manuscript.  Moreover, we also have replaced these examples with more typical ones (also see the response below).

      iii. Within the example of spiking to phase in the left side of Panel B (fast gamma example)- the neuron appears to fire near the trough twice, near the peak twice, and somewhere in between once. A similar relationship is observed for the slow gamma epoch. One would conclude from these plots that the interaction of the neuron with the two rhythms is the same. However, the mean vector lengths and histograms below these plots suggest a different story in which the neuron is modulated by FG but not SG. Please reconcile this.

      We thank the review for pointing this out.  We found that the fast gamma phase locking was robust across FG-cells with fast gamma peak as the preferred phase.  Therefore, we have replaced these examples with more typical ones, so that the examples were consistent with the group effect.

      iv. For calculating the MVL, it seems that the number of spikes that the neuron fires would play a significant role. Working towards our next point, there may be a bias of finding a relationship if there are too few spikes (spurious clustering due to sparse data) and/or higher coupling values for higher firing rate cells (cells with higher firing rates will clearly show a relationship), forming a sort of inverse Yerkes-Dodson curve. Also, without understanding the magnitude of the MVL relative to other frequencies, it may be that these values are indeed larger than zero, but not biologically significant.

      - Please provide a scatter plot of Neuron MVL versus the Neuron's Firing Rate for 1) theta (7-9 Hz), 2) slow gamma, and 3) fast gamma, along with their line of best fit.

      - Please run a shuffle control where the LFP trace is shifted by random values between 125-1000ms and recalculate the MVL for theta, slow, and fast gamma. Often, these shuffle controls are done between 100-1000 times (see cross-correlation analyses of Fujisawa, Buzsaki et al.).

      - To establish that firing rate does not play a role in uncovering modulation, it would be worth conducting a spike number control, reducing the number of spikes per cell so that they are all equal before calculating the phase plots/MVL.

      We thank the review for raising this point.  Beside of the MVL value, we also calculated the pairwise phase consistency (PPC) as suggested by Reviewer2, which is not sensitive to the spike counts.  We found that the phase locking strength to either rhythm (theta or gamma) was comparable between MVL and PPC measurements (Author response image 2).  Moreover, we quantified the relationship between MVL and mean firing rate, as suggested.  We found that the MVL value for theta, slow gamma and fast gamma was negatively correlated with mean firing rate (Author response image 6, Pearson correlation, theta: R<sup>2</sup>= 0.06, Pearson’s r=-0.3, p=1.3×10<sup>-8</sup>=; slow gamma: R<sup>2</sup>= 0.1, Pearson’s r=-0.4, p=2.4×10<sup>-17</sup>=; fast gamma: R<sup>2</sup>= 0.03, Pearson’s r=-0.2, p=4.3×10<sup>-5</sup>=).  These results help us rule out the concerns of the effect of spikes counts on the phase modulation measurement.

      Author response image 6.

      (2) Something that I anticipated to see addressed in the manuscript was the study from Grosmark and Buzsaki (2016): "Cell assembly sequences during learning are "replayed" during hippocampal ripples and contribute to the consolidation of episodic memories. However, neuronal sequences may also reflect preexisting dynamics. We report that sequences of place-cell firing in a novel environment are formed from a combination of the contributions of a rigid, predominantly fast-firing subset of pyramidal neurons with low spatial specificity and limited change across sleep-experience-sleep and a slow-firing plastic subset. Slow-firing cells, rather than fast-firing cells, gained high place specificity during exploration, elevated their association with ripples, and showed increased bursting and temporal coactivation during postexperience sleep. Thus, slow- and fast-firing neurons, although forming a continuous distribution, have different coding and plastic properties."

      My concern is that much of the reported results in the present manuscript appear to recapitulate the observations of Grosmark and Buzsaki, but without accounting for differences in firing rate. A parsimonious alternative explanation for what is observed in the present manuscript is that high firing rate neurons, more integrated into the local network and orchestrating local gamma activity (PING), exhibit more coupling to theta and gamma. In this alternative perspective, it's not something special about how the neurons are entrained to the routed fast gamma, but that the higher firing rate neurons are better able to engage and entrain their local interneurons and, thus modulate local gamma. However, this interpretation challenges the discussion around the importance of fast gamma routed from the MEC.

      a. Please integrate the Grosmark & Buzsaki paper into the discussion.

      b. Also, please provide data that refutes or supports the alternative hypothesis in which the high firing rate cells are just more gamma modulated as they orchestrate local gamma activity through monosynaptic connections with local interneurons (e.g., Marshall et al., 2002, Hippocampal pyramidal cell-interneuron spike transmission is frequency dependent and responsible for place modulation of interneuron discharge). Otherwise, the attribution to a MEC routed fast gamma routing seems tenuous.

      c. It is mentioned that fast-spiking interneurons were removed from the analysis. It would be worth including these cells, calculating the MVL in 1 Hz increments as well as the reciprocal of their ISIs (described above).

      We thank the review for this suggestion.  Because we found the mean firing rate of FG-cells was higher than that of NFG-cells, it would be possible that the FG-cells are mainly overlapped with fast-firing cells (rigid cells) in Grosmark et al., 2016 Science.  Actually, in this study, we aimed to investigate how fast and slow gamma rhythms modulated neurons dynamically during learning, rather than defining new cell types.  Thus, we don’t think this work was just a replication of the previous publication.  We have added this description in the Discussion part (Lines 439-441).  In addition, we don’t have enough number of interneurons to support the analysis between interneurons and place cells.  Therefore, we couldn’t make any statement about where was the fast gamma originated (CA1 locally or routed from MEC) in this study.

      (3) Methods - Spectral decomposition and Theta Harmonics.

      a. It is challenging to interpret the exact parameters that the authors used for their multi-taper analysis in the methods (lines 516-526). Tallon-Baudry et al., (1997; Oscillatory γ-Band (30-70 Hz) Activity Induced by a Visual Search Task in Humans) discuss a time-frequency trade-off where frequency resolution changes with different temporal windows of analysis. This trade-off between time and frequency resolution is well known as the uncertainty principle of signal analysis, transcending all decomposition methods. It is not only a function of wavelet or FFT, and multi-tapers do not directly address this. (The multitaper method, by using multiple specially designed tapers -like the Slepian sequences- smooths the spectrum. This smoothing doesn't eliminate leakage but distributes its impact across multiple estimates). Given the brevity of methods and the issues of theta harmonics as offered above, it is worth including some benchmark trace testing for the multi-taper as part of the supplemental figures.

      i. Please spectrally decompose an asymmetric 8 Hz sawtooth wave showing the trace and the related power spectral density using the multiple taper method discussed in the methods.

      ii. Please also do the same for an elliptical oscillation (perfectly symmetrical waves, but also capable of casting harmonics). Matlab code on how to generate this time series is provided below:

      A = 1; % Amplitude

      T = 1/8; % Period corresponding to 8 Hz frequency

      omega = 2*pi/T; % Angular frequency

      C = 1; % Wave speed

      m = 0.9; % Modulus for the elliptic function (0<m<1 for cnoidal waves)

      x = linspace(0, 2*pi, 1000); % temporal domain

      t = 0; % Time instant

      % Calculate B based on frequency and speed

      B = sqrt(omega/C);

      % Cnoidal wave equation using the Jacobi elliptic function

      u = A .* ellipj(B.*(x - C*t), m).^2;

      % Plotting the cnoidal wave

      figure;

      plot(x./max(x), u);

      title('8 Hz Cnoidal Wave');

      xlabel('time (x)');

      ylabel('Wave amplitude (u)');

      grid on;

      The Symbolic Math Toolbox needs to be installed and accessible in your MATLAB environment to use ellipj. Otherwise, I trust that, rather than plotting a periodic orbit around a circle (sin wave) the authors can trace the movement around an ellipse with significant eccentricity (the distance between the two foci should be twice the distance between the co-vertices).

      We thank the review for this suggestion.  In the main text of manuscript, we only applied Morlet's wavelet method to calculate the time varying power of rhythms.  Multitaper method was used for the estimation of power spectra across running speeds, which was shown in the manuscript.  Therefore, we removed the description of Multitaper method and updated the Morlet's wavelet power spectral analysis in the Methods (Lines 541-544).

      As suggested, we estimated the power spectral densities of 8 Hz sawtooth and elliptical oscillation by using these methods, and compared them with the results from FFT.  We found that both the Multitaper's and Morlet's wavelet methods could well capture the 8Hz oscillatory components (Author response image 7).  However, we could observe harmonic components from FFT spectrum.

      Author response image 7.

      iii. Line 522: "The power spectra across running speeds and absolute power spectrum (both results were not shown).". Given the potential complications of multi-taper discussed above, and as each convolution further removes one from the raw data, it would be the most transparent, simple, and straightforward to provide power spectra using the simple fft.m code in Matlab (We imagine that the authors will agree that the results should be robust against different spectral decomposition methods. Otherwise, it is concerning that the results depend on the algorithm implemented and should be discussed. If gamma transience is a concern, the authors should trigger to 2-second epochs in which slow/fast gamma exceeds 3-7 std. dev. above the mean, comparing those resulting power spectra to 2-second epochs with ripples - also a transient event). The time series should be at least 2 seconds in length (to avoid spectral leakage issues and the issues discussed in Talon-Baudry et al., 1997 above).

      Please show the unmolested power spectra (Y-axis units in mV2/Hz, X-axis units as Hz) as a function of running speed (increments of 5 cm/s) for each animal. I imagine three of these PSDs for 3 of the animals will appear in supplemental methods while one will serve as a nice manuscript figure. With this plot, please highlight the regions that the authors are describing as theta, slow, and fast gamma. Also, any issues should be addressed should there be notable differences in power across animals or tetrodes (issues with locations along proximal-distal CA1 in terms of MEC/LEC input and using a local reference electrode are discussed below).

      As suggested, we firstly estimated the power spectra as a function of running speeds in each running lap, and showed them separately for each rat, by using the multitaper spectral analysis (Author response image 8).  In addition, to achieve unmolested power spectra, the short-time Fourier transform (STFT) was used for this analysis at the same frequency resolution (Author response image 9).  We could see that the power spectra were consistent between these two methods.  Notably, there seems no significant theta harmonic component in the slow gamma band range.

      The multitaper spectral analysis was performed as follows.  The power spectra were measured across different running speeds as described previously (Ahmed et al., 2012 J Neurosci; Zheng et al., 2015 Hippocampus; Zheng et al., 2016 eNeuro).  Briefly, the absolute power spectrum was calculated for 0.5s moving window and 0.2s step size of the LFPs recordings each lap, using the multitaper spectral analysis in the Chronux toolbox (Mitra and Bokil, 2008, http://chronux.org/) and STFT spectral analysis in Matlab script stft.m.  In the multitaper method, the time-bandwidth product parameter (TW) was set at 3, and the number of tapers (K) was set at 5.  In the STFT method, the FFT length was set at 2048, which was equivalent with the parameters used in multitaper method.  Running speed was calculated (see “Estimation of running speed and head direction” section in the manuscript) and averaged within each 0.5s time window corresponding to the LFP segments.  Then, the absolute power at each frequency was smoothed with a Gaussian kernel centered on given speed bin.  The power spectral as a function of running speed and frequency were plotted in log scale.  Also, the colormap was in log scale, allowing for comparisons across different frequencies that would otherwise be difficult due to the 1/f decay of power in physiological signals.

      Author response image 8.

      Author response image 9.

      iv. Schomberg and colleagues (2014) suggested that the modulation of neurons in the slow gamma range could be related to theta harmonics (see above). Harmonics can often extend in a near infinite as they regress into the 1/f background (contributing to power, but without a peak above the power spectral density slope), making arbitrary frequency limits inappropriate. Therefore, in order to support the analyses and assertions regarding slow gamma, it seems necessary to calculate a "theta harmonic/slow gamma ratio". Aru et al. (2015; Untangling cross-frequency coupling in neuroscience) offer that: " The presence of harmonics in the signal should be tested by a bicoherence analysis and its contribution to CFC should be discussed." Please test both the synthetic signals above and the raw LFP, using temporal windows of greater than 4 seconds (again, the large window optimizes for frequency resolution in the time-frequency trade-off) to calculate the bicoherence. As harmonics are integers of theta coupled to itself and slow gamma is also coupled to theta, a nice illustration and contribution to the field would be a method that uses the bispectrum to isolate and create a "slow gamma/harmonic" ratio.

      We thank the reviewer for providing the method regarding on the theta harmonics.  We firstly measured the theta harmonics on the synthesized signal by using the biphasic coherence method, and we could clearly observe the nonlinear coupling between theta rhythm and its harmonics (Author response image 10).

      Author response image 10.

      In addition, we also measured the bicoherence on raw traces during slow gamma episodes.  We did not see nonlinear coupling between slow gamma and theta bands in this real data (mean bicoherence=0.1±0.0002) compared with that in the synthesized signal (mean bicoherence=0.7 for elliptical waves and 0.5 for sawtooth waves), suggesting that the slow gamma detected in this study was not pure theta harmonic (Author response image 11C, F, I, in red boxes).  Therefore, we believe that the contribution of theta harmonic in slow gamma is not significant.

      Author response image 11.

      (4) I appreciate the inclusion of the histology for the 4 animals. Knerim and colleagues describe a difference in MEC projection along the proximal-distal axis of the CA1 region (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3866456/)- "There are also differences in their direct projections along the transverse axis of CA1, as the LEC innervates the region of CA1 closer to the subiculum (distal CA1), whereas the MEC innervates the region of CA1 closer to CA2 and CA3 (proximal CA1)" From the histology, it looks like some of the electrodes are in the part of CA1 that would be dominated by LEC input while a few are closer to where the MEC would project.

      a. How do the authors control for these differences in projections? Wouldn't this change whether or not fast gamma is observed in CA1?

      b. I am only aware of one manuscript that describes slow gamma in the LEC which appeared in contrast to fast gamma from the MEC (https://www.science.org/doi/10.1126/science.abf3119). One would surmise that the authors in the present manuscript would have varying levels of fast gamma in their CA1 recordings depending on the location of the electrodes in the Proximal-distal axis, to the extent that some of the more medial tetrodes may need to be excluded (as they should not have fast gamma, rather they should be exclusively dominated by slow gamma). Alternatively, the authors may find that there is equal fast gamma power across the entire proximal-distal axis. However, this would pose a significant challenge to the LEC/slow gamma and MEC/fast gamma routing story of Fernandez-Ruiz et al. and require reconciliation/discussion.

      c. Is there a difference in neuron modulation to these frequencies based on electrode location in CA1?

      We thank the reviewer for this concern, which was also raised by Reviewer2.  We aligned the physical location of LFP channels in the proximal-distal axis based on histology.  In our dataset, only 2 rats were recorded from both distal and proximal hippocampus, so we calculated the gamma power from both sites in these rats.  We found that slow power was higher from proximal tetrodes than that from distal tetrodes (Author response image 12, repeated measure ANOVA, F(1,7)=10.2, p=0.02, partial η <sup>2</sup>=0.8).  However, fast gamma power were similar between different recording sites (F(1,7)=0.008, p=0.9, partial η <sup>2</sup>=0.001).  These results are partially consistent with the LEC/slow gamma and MEC/fast gamma routing story of Fernandez-Ruiz’s work.  The main reason would be that all LFPs were recorded from tetrodes in stratum pyramidale, deep layer in particular (Author response image 4E), so that it was hard to precisely identify their distance to distal/proximal apical dendrites.

      Author response image 12.

      In terms of the anatomical location of FG and NFG cells, we identified tetrode traces in slices for each cell.  We found that both FG and NFG cells were recorded from the deep layer of dorsal CA1, with no difference of proportions between cell types (Author response image 4E, Chi-squared test, χ<sup>2</sup>=0.5, p=0.5, Cramer V=0.05).  The distribution of FG-cells he NFG-cells along the transverse axis was also similar between cell types (Author response image 4F, χ<sup>2</sup>=0.08, p=0.8, Cramer V=0.02).

      (5) Given a comment in the discussion (see below), it will be worth exploring changes in theta, theta harmonic, slow gamma, and fast gamma power with running speed as no changes were observed with theta sequences or lap number versus. Notably, Czurko et al., report an increase in theta and harmonic power with running speed (1999) while Ahmed and Mehta (2012) report a similar effect for gamma.

      a. Please determine if the oscillations change in power and frequency of the rhythms discussed above change with running speed using the same parameters applied in the present manuscript. The specific concern is that how the authors calculate running speed is not sensitive enough to evaluate changes.

      We thank the reviewer for this suggestion.  The description of running speed quantification has been updated in the Method (see “Estimation of running speed and head direction” section, Lines 501-511).  Overall, the sample frequency of running speed was25Hz which would be sensitive enough to evaluate the behavioral changes.

      By measuring the rhythmic power changing as a function of running speed (Author response image 8 and Author response image 9), we could observe that theta power was increased as running speed getting higher.  Consistent with the results in (Ahmed and Mehta, 2012) and our previous study (Zheng et al., 2015), the fast gamma power was increasing and slow gamma power was decreasing when running speed was getting high.

      In addition, we also estimated the rhythmic frequency as a function of running speed in the slow and fast episodes respectively.  We found that fast gamma frequency was increased with running speed (Author response image 13, linear regression, R<sup>2</sup>=0.4, corr=0.6, p=9.9×10<sup>-15</sup>), whereas slow gamma frequency was decreased with running speed (R<sup>2</sup>=0.2, corr=-0.4, p=8.8×10<sup>-6</sup>).  Although significant correlation was found between gamma frequency and running speed, consistent with the previous studies, the frequency change (~70-75Hz for fast gamma and ~30-28Hz for slow gamma) was not big enough to affect the sequence findings in this study.  In additiontheta frequency was maintained in either slow episodes (R<sup>2</sup>=0.02, corr=-0.1, p=0.1) or fast episodes (R<sup>2</sup>=0.004, corr=0.06, p=0.5), consistent with results in Fig.1G of Kropff et al., 2021 Neuron.

      Author response image 13.

      b. It is astounding that animals ran as fast as they did in what appears to be the first lap (Figure 3F), especially as rats' natural proclivity is thigmotaxis and inquisitive exploration in novel environments. Can the authors expand on why they believe their rats ran so quickly on the first lap in a novel environment and how to replicate this? Also, please include the individual values for each animal on the same plot.

      We thank the reviewer for pointing this out.  The task was not brand new to rats in this dataset, because only days with good enough recording quality for sequence decoding were included in this paper, which were about day2-day10 for each rat.  However, we still observed the process of sequence formation because of the rat’s exploration interest during early laps.  Thus, in terms exploration behaviors, the rats ran at relative high speeds across laps (Author response image 14, each gray line represents the running speed within an individual session).

      Author response image 14.

      c. Can the authors explain how the statistics on line 169 (F(4,44)) work? Specifically, it is challenging to determine how the degrees of freedom were calculated in this case and throughout if there were only 4 animals (reported in methods) over 5 laps (depicted in Figure 3F. Given line 439, it looks like trials and laps are used synonymously). Four animals over 5 laps should have a DOF of 16.

      This statistic result was performed with each session/day as a sample (n=12 sessions/days).  The statistics were generated by repeated measures ANOVA on 5 trials in 12 sessions, with a DOF of 44.

      (6) Throughout the manuscript, I am concerned about an inflation of statistical power. For example on line 162, F(2,4844). The large degrees of freedom indicate that the sample size was theta sequences or a number of cells. Since multiple observations were obtained from the same animal, the statistical assumption of independence is violated. Therefore, the stats need to be conducted using a nested model as described in Aarts et al. (2014; https://pubmed.ncbi.nlm.nih.gov/24671065/). A statistical consult may be warranted.

      We thank the reviewer for this suggestion.  We have replaced this statistic result by using generalized linear mixed model with ratID being a covariate.  These results have been updated in the revised manuscript (Lines 164-167).

      (7) It is stated that one tetrode served as a quiet recording reference. The "quiet" part is an assumption when often, theta and gamma can be volume conducted to the cortex (e.g., Sirota et al., 2008; This is often why laboratories that study hippocampal rhythms use the cerebellum for the differential recording electrode and not an electrode in the corpus callosum). Generally, high frequencies propagate as well as low frequencies in the extracellular milieu (https://www.eneuro.org/content/4/1/ENEURO.0291-16.2016). For transparency, the authors should include a limitation paragraph in their discussion that describes how their local tetrode reference may be inadvertently diminishing and/or distorting the signal that they are trying to isolate. Otherwise, it would be worth hearing an explanation as to how the author's approach avoids this issue.

      In terms of the locations of references, we had 2 screws above the cerebellum in the skull connected to the recording drive ground, and 1 tetrode in a quiet area of the cortex serving as the recording reference.  We agree that the theta and gamma can be volume conducted to the cortex which may affect the power of these rhythms in the stratum pyramidale.  However, we didn’t mean to measure or compare the absolute theta or gamma power in this study, as we only cared about the phase modulation of gamma to place cells.  Therefore, we believe the location of recording reference would not make significant effect on our conclusion.

      Apologetically, this review is already getting long. Moreover, I have substantial concerns that should be resolved prior to delving into the remainder of the analyses. e.g., the analyses related to Figure 3-5 assert that FG cells are important for sequences. However, the relationship to gamma may be secondary to either their relationship to theta or, based on the Grosmark and Buzsaki paper, it may just be a phenomenon coupled to the fast-firing cells (fast-firing cells showing higher gamma modulation due to a local PING dynamic). Moreover, the observation of slow gamma is being challenged as theta harmonics, even by the major proponents of the slow/fast gamma theory. Therefore, the report of slow gamma precession would come as an unsurprising extension should they be revealed to be theta harmonics (however, no control for harmonics was implemented; suggestions were made above). Following these amendments, I would be grateful for the opportunity to provide further feedback.

      III. Discussion.

      a. Line 330- it was offered that fast gamma encodes information while slow gamma integrates in the introduction. However, in a task such as circular track running (from the methods, it appears that there is no new information to be acquired within a trial), one would guess that after the first few laps, slow gamma would be the dominant rhythm. Therefore, one must wonder why there are so few neurons modulated by slow gamma (~3.7%).

      The proportion of ~3.7% was the part of place cells phase-locked to slow gamma.  However, we aimed to find that the slow gamma phase precession of place cells promoted the theta sequence development.  We would not expect the cells phase-locked to slow gamma if phase precession occurred.

      b. Line 375: The authors contend that: "...slow gamma, related to information compression, was also required to modulate fast gamma phase-locked cells during sequence development. We replicated the results of slow gamma phase precession at the ensemble level (Zheng et al., 2016), and furthermore observed it at late development, but not early development, of theta sequences." In relation to the idea that slow gamma may be coupled to - if not a distorted representation of - theta harmonics, it has been observed that there are changes in theta relative to novelty.

      i. A. Jeewajee, C. Lever, S. Burton, J. O'Keefe, and N. Burgess (2008) report a decrease in theta frequency in novel circumstances that disappears with increasing familiarity.

      ii. One could surmise that this change in frequency is associated with alterations in theta harmonics (observed here as slow gamma), challenging the author's interpretation.

      iii. Therefore, the authors have a compelling opportunity to replicate the results of Jeewajee et al., characterizing changes of theta along with the development of slow gamma precession, as the environment becomes familiar. It will become important to demonstrate, using bicoherence as offered by Aru et al., how slow gamma can be disambiguated from theta harmonics. Specifically, we anticipate that the authors will be able to quantify A) theta harmonics (the number, and their respective frequencies and amplitudes), B) the frequency and amplitude of slow gamma, and C) how they can be quantitatively decoupled. Through this, their discussion of oscillatory changes with novelty-familiarity will garner a significant impact.

      We think we have demonstrated that the slow gamma observed in this study was not purely theta harmonics.  We didn’t focus on the frequency change of slow gamma or theta rhythms in this study.  Further investigation will be carried out on this topic in the future.

      c. Broadly, it is interesting that the authors emphasize the gamma frequency throughout the discussion. Given that the power spectral density of the Local Field Potential (LFP) exhibits a log-log relationship between amplitude and frequency, as described by Buzsáki (2005) in "Rhythms of the Brain," and considering that the LFP is primarily generated through synaptic transmembrane currents (Buzsáki et al., 2012), it seems parsimonious to consider that the bulk of synaptic activity occurs at lower frequencies (e.g., theta). Since synaptic transmission represents the most direct form of inter-regional communication, one might wonder why gamma (characterized by lower amplitude rhythms) is esteemed so highly compared to the higher amplitude theta rhythm. Why isn't the theta rhythm, instead, regarded as the primary mode of communication across brain regions? A discussion exploring this question would be beneficial.

      We thank the reviewer for this deep thinking.  When stating the conclusion on gamma rhythms, we didn’t mean to weaken the role of theta rhythm.  Conversely, the fast or slow gamma episodes were detected riding on theta rhythms, and we believe that the information compression should occur at a finer scale within a theta cycle scale.  More investigation will be carried out on this topic in the future.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) It is helpful to clearly define "FG-cell sequences" before the relevant results are described in the Results section. More importantly, the seemingly conflicting results between Figure 3 and Figure 8 may need to be clarified.

      The “exFG-sequences and exNFG sequences”, “FG-cell sequences and NFG-cell sequences” have been defined clearly in the revised manuscript.  Moreover, the seemingly conflicting results between Figure 3 and Figure 8 have been interpreted properly.

      (2) It is helpful to clearly state the N and what defines a sample whenever a result is described.

      In each statistical results, the N and what defines a sample have been clarified in the revised manuscript.

      (3) Addressing the questions regarding the methods (#5) would clarify some of the results.

      The questions regarding the Methods part has addressed in the revised manuscript.

      (4) Line #244: "successful" should be "successive"?

      Fixed.

      Reviewer #2 (Recommendations For The Authors):

      - The writing of the manuscript can be substantially improved.

      The manuscript can be substantially revised and updated.

      - I noticed that the last author of the manuscript is not the lead or corresponding and has only provided a limited contribution to this work (according to the detailed author contributions). The second to last author seems to be the main senior intellectual contributor and supervisor, together with the third to last author. This speaks of potential bad academic practices where a senior person whose intellectual contribution to the study is relatively minor takes the last author position, against the standard conventions on authorship worldwide. I strongly suggest that this is corrected.

      We thank the reviewer for raising this problem.  The last author Dr. Ming was also a senior author and supervised this project with large contribution.  We have fixed his role as a co-corresponding author in the revised manuscript.